The oil and gas industry is engulfed by a plethora of disparate data collated
across multiple geoscientific and siloed disciplines. Moreover, the data are
growing exponentially as digital oilfields are being implemented in some
fashion to manage conventional and unconventional assets. Performing
exploratory data analysis and generating data marts tailored to specific
advanced analytical workflows are cornerstones to enable development and
deployment of predictive models that are data driven, both in real-time and
across historical data sets.
To build data driven models that can predict under uncertainty is essential to
rapidly identify multi-dimensional parameters in a multivariate environment and
thus surface hidden patterns and relationships in data that subsequently reduce
time and resources in the critical decision-making cycles. With improved
workflows and advances in High Performance Computing, it is now possible to
ascertain risk and quantify uncertainty for very large populations of data
without sampling and losing knowledge garnered by predictive models driven by
the data and not by empirical petroleum engineering algorithms or deterministic
methodologies. By marrying the stochastic with the interpretive school of
thought, the upstream community can maintain robust data driven models that are
kept current as new data are introduced.
This paper draws upon two case studies that ameliorate the path from raw data
to invaluable knowledge. We shall look at a suite of predictive models driven
by real-time data that were built upon patterns surfaced in historical data.
These models have been implemented to identify optimized drilling and
production strategies in the North American tight gas plays and acid
stimulation strategies in the Gulf of Mexico.
Currently it is a commonplace activity among geoscientists to be reactive and
not proactive when deliberating optimized remediation strategies to preclude
wellbore impairment in the unconventional reservoirs in the United States. This
ineffective practice is driven primarily by an explosion in drilling and
production data that are not modeled sufficiently well owing to a lack of
structured analytical methodologies to surface knowledge from the raw data. As
a result of poor decision-making and inept exploitation plans and tactics
valuable production is lost while awaiting optimized remediation processes to
be enacted. It is necessary to rank viable remediations based on historical
production rates post all strategies, and via a statistical process determine
the optimized remediation, be it a hydraulic package or acid stimulation. It is
thus plausible to optimize well performance by modeling drivers and leading
indicators of production.
The first step of any analytical workflow is data cleansing, as seen in Figure
1 that depicts the SEMMA process detailing the progression of analytical steps
including Sample, Explore, Modify, Model and Assess. The core steps follow a
robust and logical path where data are accessed and controlled (Sample), then
explored with various statistical methods and exploratory data visualization
techniques for determining hypotheses worth modeling (Explore). Where missing
or unreliable data are present, techniques for normalization, filtering and
imputation are implemented (Modify) before proceeding to a suite of various
model configurations (Model). Finally control of the model deployment and usage
phases are enacted to deliver optimum results (Assess).