Uncertainty
Jasper Slingsby
Uncertainty
Uncertainty determines the utility of a forecast:
If the uncertainty in a forecast is too high, then it is of no utility to a decision maker.
If the uncertainty is not properly quantified and presented, it can lead to poor decision outcomes.
By extension:
- over-reporting can exclude useful information
- under-reporting creates false confidence in outcomes
Uncertainty
This leaves forecasters with four overarching questions:
- What determines the limits to the utility of predictions?
- What determines prediction uncertainty?
- How can we propagate uncertainty through our models and into our predictions?
- How can we reduce prediction uncertainty?
The utility of predictions
- What determines the limits to the utility of predictions?
The utility of a model / forecast depends on:
the rate at which uncertainty grows into the future (i.e. loss of proficiency),
the limit at which the forecast performs no better than a historical baseline (i.e. model adds no value)
Some forecasts may gain uncertainty and lose proficiency very quickly, crossing the forecast limit sooner.
Sources and types of uncertainty
- What determines prediction uncertainty?
Dietze (M. C. Dietze 2017a, 2017b) express the sources of prediction uncertainty (see “Growth rate” in previous Figure) as an equation (here spread over multiple lines):
\[
\underbrace{Var[Y_{t+1}]}_\text{predictive variance} \approx \;
\underbrace{stability*uncertainty}_\text{initial conditions} \; + \\
\] \[
\underbrace{sensitivity*uncertainty}_\text{drivers} \; + \\
\] \[
\underbrace{sensitivity*(uncertainty+variability)}_\text{(parameters + random effects)} \; + \\
\] \[
\underbrace{Var[\epsilon]}_\text{process error} \; \;
\]
Sources and types of uncertainty
If we break the terms down into (something near) English, we get:
The dependent variable:
\[Var[Y_{t+1}] \approx\]
“The uncertainty in the prediction for the variable of interest (\(Y\)) in the next time step (\(t+1\)) is approximately equal to…”
And now the independent variables (or terms in the model):
Sources of uncertainty: Initial cond.
\[\underbrace{stability*uncertainty}_\text{initial conditions} \; +\]
“The stability multiplied by the uncertainty in the initial conditions, plus”
- Initial conditions = the state of \(Y\) and associated parameters at time \(t_0\) (time of start).
- Stability = whether it is a variable with stabilizing feedbacks (think of alternate stable states), versus one that changes very quickly (or even rapidly tends towards chaos such as atmospheric conditions often do).
- Another example is populations of \(r\) versus \(K\) selected species (i.e. unstable - high growth rate with short-lived individuals that tend to boom and bust versus stable - low growth rate of long-lived individuals with high survival)
- Uncertainty is uncertainty in the state of \(Y\) and parameters due to observation error.
- e.g. Since atmospheric properties are highly unstable, weather forecasters try to minimize uncertainty in the initial conditions by minimizing observation error.
Sources of uncertainty: Drivers
\[\underbrace{sensitivity * uncertainty}_\text{drivers} \; + \]
“The sensitivity to, multiplied by the uncertainty in, external drivers, plus”
- External drivers are just the independent variables (covariates) in the model.
- Predictability of \(Y\) depends on its sensitivity to each covariate (i.e. how much would \(Y\) change for a given change in the covariate), and uncertainty in those covariates.
- Worst scenario is if \(Y\) is highly sensitive and the covariates are highly uncertain.
- Note: since we’re forecasting, some covariates may not be observed and their uncertainty may often reflect how well we can forecast the covariates (e.g. future climate). If we can’t predict \(X\), we can’t use it to predict \(Y\)… (e.g. occurrence of fire in the postfire recovery state space model). That said, if \(Y\) is not very sensitive to \(X\), there’s less of a problem. If it is, this can sometimes be addressed by running forecasts for different scenarios.
Sources of uncertainty: Parameters
\[\underbrace{sensitivity*(uncertainty+variability)}_\text{(parameters + random effects)} + \]
“The sensitivity to, multiplied by uncertainty and variability in, the parameters, plus”
- Parameter sensitivity is similar to driver sensitivity - “How much change do we expect in \(Y\) for a given change in the parameter?”
- Parameter uncertainty pertains to how good our estimates of the parameters are.
- This is usually a question of sample size - “Do we have enough data to obtain a good estimate (i.e. accurate mean, low uncertainty) of the parameters?”
- It is also linked to the number of parameters in the model. The more parameters, the more data you need to obtain good parameter estimates. This is another reason to avoid overly complex models.
Sources of uncertainty: Parameters
\[\underbrace{sensitivity*(uncertainty+variability)}_\text{(parameters + random effects)} + \]
“The sensitivity to, multiplied by uncertainty and variability in, the parameters, plus”
- Parameter variability reflects factors that cause deviation (or offsets) from the mean of the parameter that may be known, but may either be poorly estimated or not included in the rest of the model.
- These are random effects that can be caused by factors that create autocorrelation like space, time, phylogeny, etc.
- The overall contribution of a single parameter to the predictive variance (i.e. uncertainty in the forecast) depends on its sensitivity multiplied by the sum of its uncertainty and variability.
- Targeting effort (fieldwork etc) to better constrain poorly estimated parameters is one of the best ways to reduce prediction uncertainty.
Sources of uncertainty: Process error
\[\underbrace{Var[\epsilon]}_\text{process error}\] “The process error.”
- This refers to errors in the model due to structural uncertainty and stochasticity.
- Stochasticity refers to ecological phenomena of relevance that are very difficult to predict (at least within the context of the focal model).
- e.g. fire, dispersal or mortality - chance events like a coin toss.
- Model structural uncertainty simply reflects that all models are simplifications of reality and none are perfect. We’ll always be missing something.
- Includes “user error” such as specifying the wrong process model or probability distribution in the data model, etc.
- Using multiple models and employing model selection or averaging can help reduce structural uncertainty (or just specifying a better model of course…).
Propagating uncertainty
- “How can we propagate uncertainty through our models and into our predictions?”
There are many methods, but it’s worth recognizing that these are actually two steps:
- Propagating uncertainty through the model
- i.e. in fitting the model, so we can include uncertainty in our parameter estimates
- This is typically focused on “How does the uncertainty in X affect the uncertainty in Y?”
- Propagating uncertainty into our forecasts
- i.e. exploring the implications of uncertainty in our model (parameters etc) for our confidence in the forecast when making predictions with our fitted model
- Here we focus on “How do we forecast Y with uncertainty?”
- This second step is actually the first step in data assimilation, which is the subject of the next section/lecture
Propagating uncertainty
This could be a lecture series of its own. In short, there are 5 main methods for propagating uncertainty through the model, and most have related methods for propagation into the forecast (see Table on next slide).
The methods differ in whether they:
- Return distributions (e.g. Gaussian curve) or moments (means, medians, standard deviations, etc)
- They have analytical solutions, or need to be approximated numerically
They also have trade-offs between efficiency vs flexibility.
- The most efficient have the most rigid requirements and assumptions (analytical), while the most flexible (numeric approximations) can be computationally taxing (or impossible given a complex enough model).
Propagating uncertainty
Methods for propagating uncertainty through models (and into forecasts)
Analytical |
Variable Transform |
Analytical Moments (Kalman Filter) |
|
|
Taylor Series (Extended Kalman Filter) |
Numerical |
Monte Carlo (Particle Filter) |
Ensemble (Ensemble Kalman Filter) |
(A little) more on these in tomorrow’s lecture.
Analyzing and reducing uncertainty
- “How can we reduce prediction uncertainty?
Firstly, by working out where it’s coming from
- by analyzing and partitioning the sources of uncertainty
Secondly, by targeting and reducing sources of uncertainty
- ideally those that provide the best return on investment (important to note that these may not be the biggest sources of uncertainty, just the cheapest and easiest to resolve)
Analyzing and reducing uncertainty
Identifying the sources of uncertainty requires looking at the two ways in which they can be important for the uncertainty in predictions (largely covered in the equation earlier):
- because they’re highly uncertain, which requires you to:
- propagate uncertainty through the model as above
- partition uncertainty among your different drivers (covariates) and parameters
- because they’re highly sensitive, requiring you to perform:
- sensitivity analysis
- You’ll probably cover these in more detail in Res’ module, so I’m not going to go into them. The focus is on how a change in X translates into a change in Y. The bigger the relative change in Y, the more sensitive.
Analyzing and reducing uncertainty
Targeting and reducing sources of uncertainty is not always straightforward.
Parameters that are highly uncertain and to which our state variable (Y) are highly sensitive cause the most uncertainty for predictions.
But, given limited resources, may not be the best target for a number of reasons:
- they may be inherently uncertain and remain uncertain even with vast sampling effort
- power analysis can help (exploring how uncertainty changes with sample size)
- they may be hugely costly or time-consuming, trading off against resources you could focus on reducing other sources of uncertainty
In fact, you can build a model to predict where your effort is best invested by exploring the relationship between sample size and contribution to overall model uncertainty! You can even include economic principles to estimate monetary or person-hour implications. This is called observational design.
Invasive alien plants and streamflow
During the “Day Zero” drought the City scrambled for “alternative sources” of bulk water.
The options they explored (beyond demand management) included:
- Desalination
- Reclamation (i.e. purifying waste water)
- Groundwater (from the Cape Flats Sand Aquifer and the Table Mountain Group Aquifer)
Peer-reviewed research by Le Maitre et al. (2016) indicated that as of 2008 invasive alien plants were estimated to be using around 5% of runoff
- almost as much as Wemmershoek Dam or ~80 days worth of water under restrictions
- other research showed that alien infestations had become much worse since 2008
When asked why they were not considering clearing invasive alien plants from the major mountain catchments, one of the excuses was [paraphrased] “because we don’t trust the estimates, they don’t provide any estimates of uncertainty”.
Invasive alien plants and streamflow
While we knew this was a load of crock (they didn’t have uncertainty estimates for any of the other options either), Moncrieff, Slingsby, and Le Maitre (2021) decided it’d be a good idea to explore this issue by:
using a Bayesian framework to update and include uncertainty in the estimates of the volume and percent of streamflow lost to IAPs
exploring the relative contribution of sources of uncertainty to overall uncertainty in streamflow losses - to guide efforts to improve future estimates
providing a repeatable workflow1 to make it easy for anyone to query our methods or recalculate estimates as and when updated data become available
Invasive alien plants and streamflow
The impacts of IAPs on streamflow is predominantly determined from streamflow reduction curves from long-term catchment experiments, whereby the proportional reduction in streamflow is expressed as a function of plantation age and/or density.
These data for these can take 40 years to collect, and the Jonkershoek catchment study has been running for >80 years (see Slingsby et al. 2021)!!!
Invasive alien plants and streamflow
![]()
Streamflow reduction curves (relative to natural vegetation) for pine and eucalypt plantations under normal (suboptimal) or wet (optimal) conditions (from Moncrieff, Slingsby, and Le Maitre (2021)).
These curves are then used to extrapolate spatially to MODIS pixels (250m) nested within catchments, informed by the naturalized runoff and IAP density.
Invasive alien plants and streamflow
Propagating uncertainty
We used only inputs that could be sampled with uncertainty (i.e. from probability distributions), including rainfall, streamflow reduction curves, fire history, soil moisture and invasion density.
We then propagated that uncertainty through to our streamflow reduction estimates using a Monte Carlo (MC) approach.
Invasive alien plants and streamflow
- For each model run we:
- Assigned species to a curve (optimal or sub-optimal, Eucalypt or Pine)
- Sampled vegetation age from the distribution of fire return time
- Estimated streamflow reduction for every species by sampling from the posterior of the curves and age
- Determined additional water usage in riparian or groundwater zones
- Within each run, for each catchment we sample the density of each IAP species
Invasive alien plants and streamflow
- Within each catchment, for each pixel we:
- Estimate pixel-level naturalized runoff\(^1\) by sampling precipitation and converting to runoff
- Sum naturalized runoff across all pixels within each quaternary catchment and rescaled to match estimates from Bailey and Pitman (2015)
- Determine whether IAPs are in riparian or groundwater zones
- Calculate runoff lost by multiplying potential runoff by the proportional streamflow reduction for each IAP, and sum across species
\(^1\)i.e. the expected runoff without IAPs
Invasive alien plants and streamflow
![]()
Posterior probability distributions showing uncertainty in the impacts of IAPs on streamflow in the catchments feeding Cape Towns major dams (from Moncrieff, Slingsby, and Le Maitre (2021)).
Invasive alien plants and streamflow
This provided estimates of the impacts of IAPs on streamflow for all catchments in the Cape Floristic Region as posterior probability distributions - i.e. with uncertainty.
The posterior mean estimated streamflow loss in catchments surrounding Cape Town’s major dams was 25.5 million m\(^3\) per annum (range: 20.3 to 43.4).
Given target water use of 0.45 million m\(^3\) per day at the height of the drought, this is between 45 and 97 days of water supply!!!
This was still using the 2008 estimates of IAP invasions…
Invasive alien plants and streamflow
We did additional analyses to partition the relative uncertainty among the various potential sources, by running the model with uncertainty for only the focal variable and setting the uncertainty in all other inputs to zero.
From this it’s clear the data we need most is better estimates of the extent and density of invasions!!!
- Fortunately, this is easier than 40-year catchment experiments!
Invasive alien plants and streamflow
Our estimates are very similar to Le Maitre et al. (2016), albeit slightly higher for low density invasions and lower for high density invasions. Either way, the losses are huge and likely to have been much worse during the “Day Zero” drought!
The difference may be a result of Jensen’s Inequality. This takes many forms, but here indicates that the mean of a nonlinear function is not equal to the function evaluated at the mean of its inputs…
- i.e. Running a non-linear model under the mean parameter set will not produce the mean outcome…