Uncertainty

Jasper Slingsby

Uncertainty

Uncertainty determines the utility of a forecast:

If the uncertainty in a forecast is too high, then it is of no utility to a decision maker.
If the uncertainty is not properly quantified and presented, it can lead to poor decision outcomes.

under-reporting creates false confidence in outcomes
over-reporting can exclude useful information

Uncertainty

This leaves forecasters with four overarching questions:

What determines the limits to the utility of predictions?
What determines prediction uncertainty?
How can we propagate uncertainty through our models and into our predictions?
How can we reduce prediction uncertainty?

The utility of predictions

What determines the limits to the utility of predictions?

The utility of a model/forecast depends on:

the rate of accumulation of uncertainty (or loss of proficiency) through time, as it tends towards being no better than chance,

combined with

the precision or “forecast proficiency threshold” required for the decision in question.

Together these determine the “ecological forecast horizon” (Petchey et al. (2015)).

Ecological Forecast Horizon

The ecological forecast horizon (from Petchey et al. (2015)).

Some forecasts may lose proficiency very quickly, crossing (or starting below) the forecast proficiency threshold. If the forecast loses proficiency more slowly, or the proficiency threshold requirements are lower, the forecast horizon is further into the future.

Sources and types of uncertainty

What determines prediction uncertainty?

Dietze classifies prediction uncertainty in his book (Dietze 2017a) and subsequent paper (Dietze 2017b) in the form of an equation (note that I’ve spread it over multiple lines):

\[ \underbrace{Var[Y_{t+1}]}_\text{predictive variance} \approx \; \underbrace{stability*uncertainty}_\text{initial conditions} \; + \\ \] \[ \underbrace{sensitivity*uncertainty}_\text{drivers} \; + \\ \] \[ \underbrace{sensitivity*(uncertainty+variability)}_\text{(parameters + random effects)} \; + \\ \] \[ \underbrace{Var[\epsilon]}_\text{process error} \; \; \]

Sources and types of uncertainty

If we break the terms down into (something near) English, we get:

The dependent variable:

\[Var[Y_{t+1}] \approx\]

“The uncertainty in the prediction for the variable of interest (\(Y\)) in the next time step (\(t+1\)) is approximately equal to…”

And now the independent variables (or terms in the model):

Sources of uncertainty: Initial cond.

\[\underbrace{stability*uncertainty}_\text{initial conditions} \; +\]

“The stability multiplied by the uncertainty in the initial conditions, plus”

Initial conditions = the state of \(Y\) and associated parameters at time \(t_0\) (time of start).
Stability = whether it is a variable with stabilizing feedbacks (think of alternate stable states), versus one that changes very quickly (or even rapidly tends towards chaos such as atmospheric conditions often do).
- Another example is populations of \(r\) versus \(K\) selected species (i.e. unstable - high growth rate with short-lived individuals that tend to boom and bust versus stable - low growth rate of long-lived individuals with high survival)
Uncertainty is uncertainty in the state of \(Y\) and parameters due to observation error.

Sources of uncertainty: Drivers

\[\underbrace{sensitivity * uncertainty}_\text{drivers} \; + \]

“The sensitivity to, multiplied by the uncertainty in, external drivers, plus”

External drivers are just the independent variables (covariates) in the model.
Predictability of \(Y\) depends on its sensitivity to each covariate (i.e. how much would \(Y\) change for a given change in the covariate), and uncertainty in those covariates.
- Worst scenario is if \(Y\) is highly sensitive and the covariates are highly uncertain.
- Note: since we’re forecasting, some covariates may not be observed and their uncertainty may often reflect how well we can forecast the covariates (e.g. future climate). If we can’t predict \(X\), we can’t use it to predict \(Y\)… (e.g. occurrence of fire in the postfire recovery state space model). That said, if \(Y\) is not very sensitive to \(X\), there’s less of a problem.

Sources of uncertainty: Parameters

\[\underbrace{sensitivity*(uncertainty+variability)}_\text{(parameters + random effects)} + \]

“The sensitivity to, multiplied by uncertainty and variability in, the parameters, plus”

Parameter uncertainty pertains to how good our estimates of the parameters are.
- This is usually a question of sample size - “Do we have enough data to obtain a good estimate (i.e. accurate mean, low uncertainty) of the parameters?”
- It is also linked to the number of parameters in the model. The more parameters, the more data you need to obtain good parameter estimates. This is another reason to avoid overly complex models.
Parameter sensitivity is similar to driver sensitivity - “How much change do we expect in \(Y\) for a given change in the parameter?”

Sources of uncertainty: Parameters

\[\underbrace{sensitivity*(uncertainty+variability)}_\text{(parameters + random effects)} + \]

“The sensitivity to, multiplied by uncertainty and variability in, the parameters, plus”

The overall contribution of a single parameter to the predictive variance (i.e. uncertainty in the forecast) depends on its sensitivity multiplied by its uncertainty.
- Targeting effort (fieldwork etc) to better constrain poorly estimated parameters is one of the best ways to reduce prediction uncertainty.
Parameter variability reflects factors that cause deviation (or offsets) from the mean of the parameter that may be known, but may either be poorly estimated or not included in the rest of the model.
- These are random effects that can be caused by factors that create autocorrelation like space, time, phylogeny, etc.

Sources of uncertainty: Process error

\[\underbrace{Var[\epsilon]}_\text{process error}\] “The process error.”

This refers to errors in the model due to structural uncertainty and stochasticity.
Stochasticity refers to ecological phenomena of relevance that are very difficult to predict (at least within the context of the focal model).
- e.g. fire, dispersal or mortality - chance events like a coin toss.
Model structural uncertainty simply reflects that all models are simplifications of reality and none are perfect. We’ll always be missing something.
- Includes “user error” such as specifying the wrong process model or probability distribution in the data model, etc.
- Using multiple models and employing model selection or averaging can help reduce structural uncertainty (or just specifying a better model of course…).

Propagating uncertainty

“How can we propagate uncertainty through our models and into our predictions?”

There are many methods, but it’s worth recognizing that these are actually two steps:

Propagating uncertainty through the model
- i.e. in fitting the model, so we can include uncertainty in our parameter estimates
- This is typically focused on “How does the uncertainty in X affect the uncertainty in Y?”
Propagating uncertainty into our forecasts
- i.e. exploring the implications of uncertainty in our model (parameters etc) for our confidence in the forecast when making predictions with our fitted model
- Here we focus on “How do we forecast Y with uncertainty?”
- This second step is actually the first step in data assimilation, which is the subject of the next section/lecture

Propagating uncertainty

This could be a lecture series of its own. In short, there are 5 main methods for propagating uncertainty through the model, and most have related methods for propagation into the forecast (see Table on next slide).

The methods differ in whether they:

Return distributions (e.g. Gaussian curve) or moments (means, medians, standard deviations, etc)
They have analytical solutions, or need to be approximated numerically

They also have trade-offs between efficiency vs flexibility.

The most efficient have the most rigid requirements and assumptions (analytical), while the most flexible (numeric approximations) can be computationally taxing (or impossible given a complex enough model).

Propagating uncertainty

**Methods for propagating uncertainty through models (and into forecasts)**
Approach	Distribution	Moments
Analytical	Variable Transform	Analytical Moments (Kalman Filter)
		Taylor Series (Extended Kalman Filter)
Numerical	Monte Carlo (Particle Filter)	Ensemble (Ensemble Kalman Filter)

Note: It is possible to propagate uncertainty through the model and into your forecast in one step with Bayesian methods, by treating the forecast states as “missing data” values and estimating posterior distributions for them. This would essentially fit with Monte Carlo methods in the table. This approach may not suit all forecasting circumstances though.

Analyzing and reducing uncertainty

“How can we reduce prediction uncertainty?

Firstly, by working out where it’s coming from

by analyzing and partitioning the sources of uncertainty

Secondly, by targeting and reducing sources of uncertainty

ideally those that provide the best return on investment (important to note that these may not be the biggest sources of uncertainty, just the cheapest and easiest to resolve)

Analyzing and reducing uncertainty

Identifying the sources of uncertainty requires looking at the two ways in which they can be important for the uncertainty in predictions (largely covered in the equation earlier):

because they’re highly uncertain, which requires you to:
- propagate uncertainty through the model as above
- partition uncertainty among your different drivers (covariates) and parameters

because they’re highly sensitive, requiring you to perform:
- sensitivity analysis
  - You’ll probably cover these in more detail in Res’ module, so I’m not going to go into them. The focus is on how a change in X translates into a change in Y. The bigger the relative change in Y, the more sensitive.

Analyzing and reducing uncertainty

Targeting and reducing sources of uncertainty is not always straightforward.

Parameters that are highly uncertain and to which our state variable (Y) are highly sensitive cause the most uncertainty for predictions.

But, given limited resources, may not be the best target for a number of reasons:

they may be inherently uncertain and remain uncertain even with vast sampling effort
- power analysis can help (exploring how uncertainty changes with sample size)
they may be hugely costly or time-consuming, trading off against resources you could focus on reducing other sources of uncertainty

In fact, you can build a model to predict where your effort is best invested by exploring the relationship between sample size and contribution to overall model uncertainty! You can even include economic principles to estimate monetary or person-hour implications. This is called observational design.

Invasive alien plants and streamflow

During the “Day Zero” drought the City scrambled for “alternative sources” of bulk water.

The options they explored (beyond demand management) included:

Desalination
Reclamation (i.e. purifying waste water)
Groundwater (from the Cape Flats Sand Aquifer and the Table Mountain Group Aquifer)

Peer-reviewed research by Le Maitre et al. (2016) indicated that as of 2008 invasive alien plants were estimated to be using around 5% of runoff

almost as much as Wemmershoek Dam or ~80 days worth of water under restrictions
other research showed that alien infestations had become much worse since 2008

When asked why they were not considering clearing invasive alien plants from the major mountain catchments, one of the excuses was [paraphrased] “because we don’t trust the estimates, they don’t provide any estimates of uncertainty”.

Invasive alien plants and streamflow

While we knew this was a load of crock (they didn’t have uncertainty estimates for any of the other options either), Moncrieff, Slingsby, and Le Maitre (2021) decided it’d be a good idea to explore this issue by:

using a Bayesian framework to update and include uncertainty in the estimates of the volume and percent of streamflow lost to IAPs
explore the relative contribution of sources of uncertainty to overall uncertainty in streamflow losses - to guide efforts to improve future estimates
providing a fully repeatable workflow to make it easy for anyone to query our methods or recalculate estimates as and when updated data become available

Invasive alien plants and streamflow

Historical photo of the Jonkershoek Multiple Catchment Experiment (image from Hugh Taylor).

The impacts of IAPs on streamflow is predominantly determined from streamflow reduction curves from long-term catchment experiments, whereby the proportional reduction in streamflow is expressed as a function of plantation age and/or density.

These data for these can take 40 years to collect, and the Jonkershoek catchment study has been running for >80 years (see Slingsby et al. 2021)!!!

Invasive alien plants and streamflow

Streamflow reduction curves (relative to natural vegetation) for pine and eucalypt plantations under normal (suboptimal) or wet (optimal) conditions (from Moncrieff, Slingsby, and Le Maitre (2021)).

These curves are then used to extrapolate spatially to MODIS pixels (250m) nested within catchments, informed by the naturalized runoff and IAP density.

Invasive alien plants and streamflow

Overview of method (Moncrieff, Slingsby, and Le Maitre (2021))

Propagating uncertainty

We used only inputs that could be sampled with uncertainty (i.e. from probability distributions), including rainfall, streamflow reduction curves, fire history, soil moisture and invasion density.

We then propagated that uncertainty through to our streamflow reduction estimates using a Monte Carlo (MC) approach.

Invasive alien plants and streamflow

For each model run we:
1. Assigned species to a curve (optimal or sub-optimal, Eucalypt or Pine)
2. Sampled vegetation age from the distribution of fire return time
3. Estimated streamflow reduction for every species by sampling from the posterior of the curves and age
4. Determined additional water usage in riparian or groundwater zones
Within each run, for each catchment we sample the density of each IAP species

Invasive alien plants and streamflow

Within each catchment, for each pixel we:
1. Estimate pixel-level naturalized runoff by sampling precipitation and converting to runoff
2. Sum naturalized runoff across all pixels within each quaternary catchment and rescaled to match estimates from Bailey and Pitman (2015)
3. Determine whether IAPs are in riparian or groundwater zones
4. Calculate runoff lost by multiplying potential runoff by the proportional streamflow reduction for each IAP, and sum across species

Invasive alien plants and streamflow

Posterior probability distributions showing uncertainty in the impacts of IAPs on streamflow in the catchments feeding Cape Towns major dams (from Moncrieff, Slingsby, and Le Maitre (2021)).

Invasive alien plants and streamflow

IAP impacts on streamflow in the Cape Floristic Region (from Moncrieff, Slingsby, and Le Maitre (2021)).

This provided estimates of the impacts of IAPs on streamflow for all catchments in the Cape Floristic Region as posterior probability distributions - i.e. with uncertainty.

The posterior mean estimated streamflow loss in catchments surrounding Cape Town’s major dams was 25.5 million m\(^3\) per annum (range: 20.3 to 43.4).

Given target water use of 0.45 million m\(^3\) per day at the height of the drought, this is between 45 and 97 days of water supply!!!

This was still using the 2008 estimates of IAP invasions…

Invasive alien plants and streamflow

The estimated uncertainty attributed to each of the model inputs (Moncrieff, Slingsby, and Le Maitre 2021)

We did additional analyses to partition the relative uncertainty among the various potential sources, by running the model with uncertainty for only the focal variable and setting the uncertainty in all other inputs to zero.

From this it’s clear the data we need most is better estimates of the extent and density of invasions!!!

Fortunately, this is easier than 40-year catchment experiments!

Invasive alien plants and streamflow

Comparison of estimates of IAP impacts on streamflow with (Moncrieff, Slingsby, and Le Maitre 2021) and without (Le Maitre et al. 2016) uncertainty.

Our estimates are very similar to Le Maitre et al. (2016), albeit slightly higher for low density invasions and lower for high density invasions. Either way, the losses are huge and likely to have been much worse during the “Day Zero” drought!

The difference may be a result of Jensen’s Inequality. This takes many forms, but here indicates that the mean of a nonlinear function is not equal to the function evaluated at the mean of its inputs…

i.e. Running a non-linear model under the mean parameter set will not produce the mean outcome…

References

Dietze, Michael C. 2017a. Ecological Forecasting. Princeton University Press. https://doi.org/10.2307/j.ctvc7796h.

———. 2017b. “Prediction in ecology: a first-principles framework.” Ecological Applications: A Publication of the Ecological Society of America 27 (7): 2048–60. https://doi.org/10.1002/eap.1589.

Le Maitre, David C, Greg G Forsyth, Sebinasi Dzikiti, and Mark B Gush. 2016. “Estimates of the impacts of invasive alien plants on water flows in South Africa.” Water SA 42 (4): 659–72. https://doi.org/10.4314/wsa.v42i4.17.

Moncrieff, Glenn R, Jasper A Slingsby, and David C Le Maitre. 2021. “Propagating uncertainty from catchment experiments to estimates of streamflow reduction by invasive alien plants in southwestern South Africa.” Hydrological Processes, April. https://doi.org/10.1002/hyp.14161.

Petchey, Owen L, Mikael Pontarp, Thomas M Massie, Sonia Kéfi, Arpat Ozgul, Maja Weilenmann, Gian Marco Palamara, et al. 2015. “The ecological forecast horizon, and examples of its uses and determinants.” Ecology Letters 18 (7): 597–611. https://doi.org/10.1111/ele.12443.

Slingsby, Jasper A, Abri Buys, Adrian D A Simmers, Eric Prinsloo, Greg G Forsyth, Julia Glenday, and Nicky Allsopp. 2021. “Jonkershoek: Africa’s oldest catchment experiment ‐ 80 years and counting.” Hydrological Processes, February. https://doi.org/10.1002/hyp.14101.