A Data-Analytics Tutorial: Building Predictive Models for Oil Production in an Unconventional Shale Reservoir
- Jared Schuetter (Battelle Memorial Institute) | Srikanta Mishra (Battelle Memorial Institute) | Ming Zhong (Baker Hughes) | Randy LaFollette (Baker Hughes (retired))
- Document ID
- Society of Petroleum Engineers
- SPE Journal
- Publication Date
- August 2018
- Document Type
- Journal Paper
- 1,075 - 1,089
- 2018.Society of Petroleum Engineers
- analytics, production data, Machine learning, unconventional reservoirs, data mining
- 29 in the last 30 days
- 758 since 2007
- Show more detail
- View rights & permissions
|SPE Member Price:||USD 10.00|
|SPE Non-Member Price:||USD 30.00|
Considerable amounts of data are being generated during the development and operation of unconventional reservoirs. Statistical methods that can provide data-driven insights into production performance are gaining in popularity. Unfortunately, the application of advanced statistical algorithms remains somewhat of a mystery to petroleum engineers and geoscientists. The objective of this paper is to provide some clarity to this issue, focusing on how to build robust predictive models and how to develop decision rules that help identify factors separating good wells from poor performers. The data for this study come from wells completed in the Wolfcamp Shale Formation in the Permian Basin. Data categories used in the study included well location and assorted metrics capturing various aspects of well architecture, well completion, stimulation, and production.
Predictive models for the production metric of interest are built using simple regression and other advanced methods such as random forests (RFs), support-vector regression (SVR), gradient-boosting machine (GBM), and multidimensional Kriging. The data-fitting process involves splitting the data into a training set and a test set, building a regression model on the training set and validating it with the test set. Repeated application of a “cross-validation” procedure yields valuable information regarding the robustness of each regression-modeling approach. Furthermore, decision rules that can identify extreme behavior in production wells (i.e., top x% of the wells vs. bottom x%, as ranked by the production metric) are generated using the classification and regression-tree algorithm. The resulting decision tree (DT) provides useful insights regarding what variables (or combinations of variables) can drive production performance into such extreme categories.
The main contributions of this paper are to provide guidelines on how to build robust predictive models, and to demonstrate the utility of DTs for identifying factors responsible for good vs. poor wells.
|File Size||1 MB||Number of Pages||15|
Abadi, M., Barham, P., Chen, J. et al. 2016. TensorFlow: A System for Large-Scale Machine Learning. Proc., 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI ’16), Savannah, Georgia, 2–4 November.
Ahmed, U. and Meehan, D. N. 2016. Unconventional Oil and Gas Resources: Exploitation and Development. Boca Raton, Florida: CRC Press.
Akaike, H. 1973. Information Theory and an Extension of the Maximum Likelihood Principle. In Second International Symposium on Information Theory, ed. B. N. Petrov and B. F. Csaki, 267–281. Budapest, Hungary: Academiai Kiado.
Bhattacaharya, S., Maucec, M., Yarus, J. et al. 2013. Causal Analysis and Data Mining of Well Stimulation Data Using Classification and Regression Tree With Enhancements. Presented at the SPE Annual Technology Conference and Exhibition, New Orleans, 30 September–2 October. SPE-166472-MS. https://doi.org/10.2118/166472-MS.
Breiman, L. 2001. Random Forests. Mach. Learn. 45 (1): 5–32. https://doi.org/10.1023/A:1010933404324.
Breiman, L., Friedman, J., Stone, C. J. et al. 1984. Classification and Regression Trees. Boca Raton, Florida: CRC Press.
Cipolla, C. L., Lolon, E. P., Erdle, J. C. et al. 2010. Reservoir Modeling in Shale-Gas Reservoirs. SPE Res Eval & Eng 13 (4): 638-653. SPE-125530-PA. https://doi.org/10.2118/125530-PA.
Cressie, N. 1993. Statistics for Spatial Data. New York City: Wiley.
Dimitriadou, E., Hornik, K., Leisch, F. et al. 2011. e1071: Misc Functions of the Department of Statistics. TU Wien. R Package Version 1.6.
Ding, D. Y., Wu, Y.-S., Farah, N. et al. 2014. Numerical Simulation of Low Permeability Unconventional Gas Reservoirs. Presented at the SPE/EAGE European Unconventional Resources Conference and Exhibition, Vienna, Austria, 25–27 February. SPE-167711-MS. https://doi.org/10.2118/167711-MS.
Draper, N. R., Smith, H., and Pownell, E. 1966. Applied Regression Analysis, Vol. 3. New York City: Wiley.
Drucker, H., Burges, C. J., Kaufman, L. et al. 1997. Support Vector Regression Machines. Proc., 9th International Conference on Neural Information Processing Systems, Denver, 3–5 December, 155–161.
Duda, R. O. and Hart, P. E. 1973. Pattern Classification and Scene Analysis. New York City: John Wiley & Sons.
Elith, J., Leathwick, J. R., and Hastie, T. 2008. A Working Guide to Boosted Regression Trees. J. Anim. Ecol. 77 (4): 802–813. https://doi.org/10.1111/j.1365-2656.2008.01390.x.
Friedman, J. H. 1991. Multivariate Adaptive Regression Splines. Annal. Stat. 19 (1): 1–67. https://doi.org/10.1214/aos/1176347963.
Friedman, J. H. 2001. Greedy Function Approximation: A Gradient Boosting Machine. Annal. Stat. 29 (5): 1189–1232. https://doi.org/10.1214/aos/1013203451.
Geladi, P. and Kowalski, B. R. 1986. Partial Least-Squares Regression: A Tutorial. Anal. Chim. Ac. 185: 1–17. https://doi.org/10.1016/0003-2670(86)80028-9.
Gevrey, M., Dimopoulos, I., and Lek, S. 2003. Review and Comparison of Methods to Study the Contribution of Variables in Artificial Neural Network Models. Ecol. Model. 160 (3): 249–264. https://doi.org/10.1016/S0304-3800(02)00257-0.
Gupta, S., Fuehrer, F., and Jeyachandra, B. C. 2014. Production Forecasting in Unconventional Resources Using Data Mining and Time Series Analysis. Presented at the SPE/CSUR Unconventional Resources Conference, Calgary, 30 September–2 October. SPE-171588-MS. https://doi.org/10.2118/171588-MS.
Hastie, T., Tibshirani, R., and Friedman, J. 2001. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. New York City: Springer.
Hopfield, J. J. 1982. Neural Networks and Physical Systems With Emergent Collective Computational Abilities. Proc. Natl. Acad. Sci. USA 79 (8): 2554–2558.
Huber, P. J. 1964. Robust Estimation of a Location Parameter. Annal. Math. Stat. 35 (1): 73–101. https://doi.org/10.1214/aoms/1177703732.
Krige, D. G. 1951. A Statistical Approach to Some Mine Valuation and Allied Problems on the Witwatersrand. Master’s thesis, University of the Witwatersrand, Johannesburg, South Africa.
Kulga, B., Artun, E., and Ertekin, T. 2017. Development of a Data-Driven Forecasting Tool for Hydraulically Fractured, Horizontal Wells in Tight-Gas Sands. Comput. Geosci. 103 (June): 99–110. https://doi.org/10.1016/j.cageo.2017.03.009.
LaFollette, R. F., Holcomb, W. D., and Aragon, J. 2012. Practical Data Mining: Analysis of Barnett Shale Production Results With Emphasis on Well Completion and Fracture Stimulation. Presented at the SPE Hydraulic Fracturing Technology Conference and Exhibition, The Woodlands, Texas, 6–8 February. SPE-152531-MS. https://doi.org/10.2118/152531-MS.
Langley, P., Iba, W., and Thompson, K. 1992. An Analysis of Bayesian Classifiers. Proc., 10th National Conference on Artificial Intelligence, San Jose, California, 12–16 July, 223–228.
Liaw, A. and Wiener, M. 2002. Classification and Regression by randomForest, R News. 2/3: 18–22.
Mathworks. Matlab Version 2017a. Natick, Massachusetts: MathWorks.
McCulloch, W. S. and Pitts, W. 1943. A Logical Calculus of the Ideas Immanent in Nervous Activity. Bull. Mathemat. Biophys. 5 (4): 115–133. https://doi.org/10.1007/BF02478259.
Mishra, S. 2012. A New Approach to Reserves Estimation in Shale Gas Reservoirs Using Multiple Decline Curve Analysis Models. Presented at the SPE Eastern Regional Meeting, Lexington, Kentucky, 3–5 October. SPE-161092-MS. https://doi.org/10.2118/161092-MS.
Mishra, S. and Lin, L. 2017. Application of Data Analytics for Production Optimization in Unconventional Reservoirs: A Critical Review. Presented at the SPE/AAG/SEG Unconventional Resources Technology Conference, Austin, Texas, 24–26 July. URTEC-2670157-MS.
Mishra, S., Deeds, N. E., and Ruskauff, G. J. 2009. Global Sensitivity Analysis Techniques for Probabilistic Ground Water Modeling. Ground Water 47 (5): 730–747. https://doi.org/10.1111/j.1745-6584.2009.00604.x.
Mohaghegh, S. 2013. Shale Asset Management via Advanced Data-Driven and Predictive Analytics. SPE Webinar, recorded 13 November 2013.
Perez, H. H., Datta-Gupta, A., and Mishra, S. 2005. The Role of Electrofacies, Lithofacies, and Hydraulic Flow Units in Permeability Predictions from Well Logs: A Comparative Analysis Using Classification Trees. SPE Res Eval & Eng 8 (2): 143–155. SPE-84301-PA. https://doi.org/10.2118/84301-PA.
Popa, A. and Wood, W. 2011. Application of Case-Based Reasoning for Well Fracturing Planning and Execution. J. Nat. Gas Sci. Eng. 3 (6): 687–696. https://doi.org/10.1016/j.jngse.2011.07.013.
R Development Core Team. 2014. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing.
Ridgeway, G. 2007. Generalized Boosted Models: A Guide to the GBM Package.
Ridgeway, G. 2010. GBM: Generalized Boosted Regression Models. R Package Version 1.6–3.1.
Roustant, O., Ginsbourger, D., and Deville, Y. 2011. Package DiceKriging: Kriging Methods for Computer Experiments, R Package Version 1.3.2.
Rossum, G. V. 2007. Python Programming Language. Oral presentation given at the USENIX Annual Technical Conference, Santa Clara, California, 20 June.
Rumelhart, D. E., Hinton, G. E., and Williams, R. J. 1986. Learning Internal Representations by Error Propagation. In Parallel Distributed Processing: Explorations in the Microstructure of Cognition, ed. D. E. Rumelhart and J. L. McClelland, Chap. 8, 318–362. Cambridge, Massachusetts: The MIT Press.
SAS Institute. 2017. SAS/STAT User’s Guide, Version 9.4. Cary, North Carolina: SAS.
Schwarz, G. 1978. Estimating the Dimension of a Model. Annal. Stat. 6 (2): 461–464. https://doi.org/10.1214/aos/1176344136.
Therneau, T. M., Atkinson, B., and Ripley, B. 2012. rpart: Recursive Partitioning. R Package Version 3.1-51.
Tibshirani, R. 1988. Estimating Transformations for Regression Via Additivity and Variance Stabilization. J. Am. Stat. Assoc. 83 (402): 394–405. https://doi.org/10.2307/2288855.
Vapnik, V. 2000. The Nature of Statistical Learning Theory. New York City: Springer.
Yan, B., Mi, L., Wang, Y. et al. 2017. Mechanistic Simulation Workflow in Shale Gas Reservoirs. Presented at the SPE Reservoir Simulation Conference, Montgomery, Texas, 20–22 February. SPE-182623-MS. https://doi.org/10.2118/182623-MS.
Yarus, J. M., Srivastava, R. M., and Chambers, R. L. 2006. Geologic Success but Economic Failure: Uncovering Hidden Problems Using Recursive Partitioning. Presented at the AAPG Annual Convention and Exhibition, Houston.
Zhong, M., Schuetter, J., Mishra, S. et al. 2015. Do Data Mining Methods Matter? A Wolfcamp Shale Case Study. Presented at the SPE Hydraulic Fracturing Technology Conference and Exhibition, The Woodlands, Texas, 35 February. SPE-173334-MS. https://doi.org/10.2118/173334-MS.
Zou, H. and Hastie, T. 2005. Regularization and Variable Selection via the Elastic Net. J. R. Statist. Soc. B. 67 (2): 301–320. https://doi.org/10.1111/j.1467-9868.2005.00503.x.