Demonstration and Mitigation of Spatial Sampling Bias for Machine-Learning Predictions
- Wendi Liu (The University of Texas at Austin) | Svetlana Ikonnikova (The University of Texas at Austin and Technical University of Munich) | H. Scott Hamlin (The University of Texas at Austin) | Livia Sivila (The University of Texas at Austin (now with EnerVest Ltd.)) | Michael J. Pyrcz (The University of Texas at Austin)
- Document ID
- Society of Petroleum Engineers
- SPE Reservoir Evaluation & Engineering
- Publication Date
- October 2020
- Document Type
- Journal Paper
- 2020.Society of Petroleum Engineers
- spatial declustering, spatial sampling bias, machine learning, unconventional reservoir, decision tree
- 32 in the last 30 days
- 32 since 2007
- Show more detail
- View rights & permissions
|SPE Member Price:||USD 5.00|
|SPE Non-Member Price:||USD 35.00|
Machine learning provides powerful methods for inferential and predictive modeling of complicated multivariate relationships to support decision-making for spatial problems such as optimization of unconventional reservoir development. Current machine-learning methods have been widely used in exhaustive spatial data sets like satellite images. However, geological subsurface characterization is significantly different because it is conditioned by sparse, nonrepresentative sampling. These sparse spatial data sets are generally not sampled in a representative manner; therefore, they are biased. The critical questions are: first, does spatial bias in training data result in a bias for machine-learning-based predictive models; and if there is a bias, how can we mitigate the bias in these spatial machine-learning-based predictions?
The presence and mitigation of prediction with spatial sampling bias is demonstrated with tree-based machine learning due to its high degree of interpretability. In expectation, training data bias imposes bias in machine-learning predictions over a wide variety of spatial data configurations and degrees of bias, even when the model is applied to make predictions with unbiased testing and real-world data. We reduce the bias in prediction with a novel spatial weighted tree method over a variety of spatial data configurations and degrees of spatial sampling bias. The proposed method is able to improve the accuracy for reservoir evaluation. We recommend modeling checking and bias mitigation for all machine-learning prediction models with sparse, spatial data sets, because bias in, bias out.
|File Size||25 MB||Number of Pages||13|
Baumgardner, R. W., Hamlin, H. S., and Rowe, H. D. 2016. Lithofacies of the Wolfcamp and Lower Leonard Intervals, Southern Midland Basin, Texas. Report RI0281D, The University of Texas at Austin Bureau of Economic Geology Report of Investigations, Austin, Texas, USA. https://doi.org/10.23867/ri0281D.
Breiman, L. 1996. Bagging Predictors. Mach Learn 24 (2): 123–140. https://doi.org/10.1007/BF00058655.
Casella, G., Robert, C. P., and Wells, M. T. 2004. Generalized Accept-Reject Sampling Schemes. In A Festschrift for Herman Rubin, Vol. 45, 342–347. Beachwood, Ohio, USA: Lecture Notes—Monograph Series, Institute of Mathematical Statistics.
Cortes, C. and Mohri, M. 2014. Domain Adaptation and Sample Bias Correction Theory and Algorithm for Regression. Theor Comput Sci 519: 103–126. https://doi.org/10.1016/j.tcs.2013.09.027.
Cortes, C., Mohri, M., Riley, M. et al. 2008. Sample Selection Bias Correction Theory. In Algorithmic Learning Theory. ALT 2008, ed. Y. Freund, L. Györfi, G. Turán, and T. Zeugmann, Vol. 5254. Berlin, Heidelberg, Germany: Lecture Notes in Computer Science, Springer. https://doi.org/10.1007/978-3-540-87987-9_8.
Delmelle, E. 2009. Spatial Sampling. In The SAGE Handbook of Spatial Analysis, ed. A. Fotheringham, Chap. 10. New York, New York, USA: SAGE Publications.
Deutsch, C. V. and Journel, A. G. 1997. GSLIB Geostatistical Software Library and User’s Guide, second edition. Oxford, England, UK: Oxford University Press.
Fan, J. and Li, R. 2006. Statistical Challenges with High Dimensionality: Feature Selection in Knowledge Discovery. Cornell University, https://arxiv.org/abs/math/0602133.
Hall, R., Bertram, R., Gonzenbach, G. et al. 2010. Guidelines for the Practical Evaluation of Undeveloped Reserves in Resource Plays, Houston, Texas, USA: SPEE Monograph 3, Society of Petroleum Evaluation Engineers.
Hamlin, H. S. and Baumgardner, R. W. 2012. Wolfberry (Wolfcampian-Leonardian) Deep-Water Depositional Systems in the Midland Basin: Stratigraphy, Lithofacies, Reservoirs, and Source Rocks. Report RI0277D. The University of Texas at Austin Bureau of Economic Geology Report of Investigations, Austin, Texas, USA.
Hastie, T., Tibshirani, R., and Friedman, J. 2009. The Elements of Statistical Learning, second edition. New York, New York, USA: Springer Series in Statistics, Springer.
Hengl, T., Nussbaum, M., Wright, M. N. et al. 2018. Random Forest as a Generic Framework for Predictive Modeling of Spatial and Spatial-Temporal Variables. PeerJ 6: e5518. https://doi.org/10.7717/peerj.5518.
James, G., Witten, D., Hastie, T. et al. 2013. An Introduction to Statistical Learning with Application in R, first edition. New York, New York, USA: Springer.
Jiang, Z. and Shekhar, S. 2017. Spatial Big Data Science, first edition. New York, New York, USA: Springer International Publishing.
Journel, A. G. 1983. Nonparametric Estimation of Spatial Distributions. J Int Assoc Math Geol 15: 445–468. https://doi.org/10.1007/BF01031292.
Journel, A. G. 1999. Markov Models for Cross-Covariances. Math Geol 31: 955–964. https://doi.org/10.1023/A:1007553013388.
Keller, V. D. J., Tanguy, M., Prosdocimi, I. et al. 2015. CEH-GEAR: 1 km Resolution Daily and Monthly Areal Rainfall Estimates for the UK for Hydrological and Other Applications. Earth Syst Sci Data 7: 143–155. https://doi.org/10.5194/essd-7-143-2015.
Kohavi, R. 1995. A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection. Paper presented at the 14th International Joint Conference on Artificial Intelligence, Montreal, Quebec, Canada, 20–25 August. https://dl.acm.org/doi/10.5555/1643031.1643047.
Li, S., Zhang, Y. M., Ma, Y. Z. et al. 2018. A Comparative Study of Reservoir Modeling Techniques and Their Impact on Predicted Performance of Fluvial-Dominated Deltaic Reservoirs: Discussion. AAPG Bull. 102 (8): 1659–1663. https://doi.org/10.1306/0108181613516519.
Ma, Y. Z. 2019. Quantitative Geosciences: Data Analytics, Geostatistics, Reservoir Characterization and Modeling, first edition. New York, New York, USA: Springer International Publishing.
Ma, Y. Z. and Gomez, E. 2019. Sampling Biases and Mitigations in Modeling Shale Reservoirs. J Nat Gas Sci Eng 71: 102968. https://doi.org/10.1016/j.jngse.2019.102968.
Okabe, A., Boots, B., Sugihara, K. et al. 2000. Spatial Tessellations: Concepts and Applications of Voronoi Diagrams, second edition. New York, New York, USA: Wiley.
Olea, R. A., Houseknecht, D. W., Garrity, C. P. et al. 2011. Formulation of a Correlated Variables Methodology for Assessment of Continuous Gas Resources with an Application to the Woodford Play, Arkoma Basin, Eastern Oklahoma. Bol Geol Min 122 (4): 483–496. http://pubs.er.usgs.gov/publication/70036391.
Pyrcz, M. J. and Deutsch, C. V. 2003. Declustering and Debiasing. http://www.ccgalberta.com/ccgresources/report04/2002-124-declusterdebias-ccg.pdf.
Pyrcz, M. J. and Deutsch, C. V. 2014. Geostatistical Reservoir Modeling, second edition. Oxford, England, UK: Oxford University Press.
Pyrcz, M. J., Gringarten, E., Frykman, P. et al. 2006. Representative Input Parameters for Geostatistical Simulation. In Stochastic Modeling and Geostatistics: Principles, Methods, and Case Studies, ed. T. C. Coburn, J. M. Yarus, and R. L. Chambers, Vol. II, 123–137. Tulsa, Oklahoma, USA: American Association of Petroleum Geologists. https://doi.org/10.1306/1063811CA53230.
Pyrcz, M. J., Janele, P., Weaver, D. et al. 2017. Geostatistical Methods for Unconventional Reservoir Uncertainty Assessments. In Geostatistics Valencia 2016, ed. J. Gómez-Hernández, J. Rodrigo-Ilarri, M. Rodrigo-Clavero, E. Cassiraga, and J. Vargas-Guzmán, Vol. 19. Cham, Switzerland: Springer International Publishing.
Ribeiro, M. T., Singh, S., and Guestrin, C. 2016. Why Should I Trust You?: Explaining the Predictions of Any Classifier. Paper presented at the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, California, USA, 13–17 August. https://doi.org/10.1145/2939672.2939778.
Verleysen, M. and François, D. 2005. The Curse of Dimensionality in Data Mining and Time Series Prediction. In Computational Intelligence and Bioinspired Systems, IWANN 2005, ed. J. Cabestany, A. Prieto, and F. Sandoval, Vol. 3512. Berlin, Heidelberg: Lecture Notes in Computer Science, Springer.
Wang, X. 2017. Lacustrine Shale Gas Accumulation and Its Influencing Factors. In Lacustrine Shale Gas: Case Study from the Ordos Basin, ed. X. Wang, Chap. 5, 243–287. Houston, Texas, USA: Gulf Professional Publishing.
Zadrozny, B. 2004. Learning and Evaluating Classifiers under Sample Selection Bias. Paper presented at the International Conference on Machine Learning and Application, Louisville, Kentucky, USA, 16–18 December. https://doi.org/10.1145/1015330.1015425.