Approach | Distribution | Moments |
---|---|---|
Analytical | Variable Transform | Analytical Moments (Kalman Filter) |
Taylor Series (Extended Kalman Filter) | ||
Numerical | Monte Carlo (Particle Filter) | Ensemble (Ensemble Kalman Filter) |
Jasper Slingsby
Uncertainty determines the utility of a forecast:
If the uncertainty in a forecast is too high, then it is of no utility to a decision maker.
If the uncertainty is not properly quantified and presented, it can lead to poor decision outcomes.
This leaves forecasters with four overarching questions:
The utility of a model/forecast depends on:
combined with
Together these determine the “ecological forecast horizon” (Petchey et al. (2015)).
The ecological forecast horizon (from Petchey et al. (2015)).
Some forecasts may lose proficiency very quickly, crossing (or starting below) the forecast proficiency threshold. If the forecast loses proficiency more slowly, or the proficiency threshold requirements are lower, the forecast horizon is further into the future.
Dietze classifies prediction uncertainty in his book (Dietze 2017a) and subsequent paper (Dietze 2017b) in the form of an equation (note that I’ve spread it over multiple lines):
\[ \underbrace{Var[Y_{t+1}]}_\text{predictive variance} \approx \; \underbrace{stability*uncertainty}_\text{initial conditions} \; + \\ \] \[ \underbrace{sensitivity*uncertainty}_\text{drivers} \; + \\ \] \[ \underbrace{sensitivity*(uncertainty+variability)}_\text{(parameters + random effects)} \; + \\ \] \[ \underbrace{Var[\epsilon]}_\text{process error} \; \; \]
If we break the terms down into (something near) English, we get:
The dependent variable:
\[Var[Y_{t+1}] \approx\]
“The uncertainty in the prediction for the variable of interest (\(Y\)) in the next time step (\(t+1\)) is approximately equal to…”
And now the independent variables (or terms in the model):
\[\underbrace{stability*uncertainty}_\text{initial conditions} \; +\]
“The stability multiplied by the uncertainty in the initial conditions, plus”
\[\underbrace{sensitivity * uncertainty}_\text{drivers} \; + \]
“The sensitivity to, multiplied by the uncertainty in, external drivers, plus”
\[\underbrace{sensitivity*(uncertainty+variability)}_\text{(parameters + random effects)} + \]
“The sensitivity to, multiplied by uncertainty and variability in, the parameters, plus”
\[\underbrace{sensitivity*(uncertainty+variability)}_\text{(parameters + random effects)} + \]
“The sensitivity to, multiplied by uncertainty and variability in, the parameters, plus”
\[\underbrace{Var[\epsilon]}_\text{process error}\] “The process error.”
There are many methods, but it’s worth recognizing that these are actually two steps:
This could be a lecture series of its own. In short, there are 5 main methods for propagating uncertainty through the model, and most have related methods for propagation into the forecast (see Table on next slide).
The methods differ in whether they:
They also have trade-offs between efficiency vs flexibility.
Approach | Distribution | Moments |
---|---|---|
Analytical | Variable Transform | Analytical Moments (Kalman Filter) |
Taylor Series (Extended Kalman Filter) | ||
Numerical | Monte Carlo (Particle Filter) | Ensemble (Ensemble Kalman Filter) |
Note: It is possible to propagate uncertainty through the model and into your forecast in one step with Bayesian methods, by treating the forecast states as “missing data” values and estimating posterior distributions for them. This would essentially fit with Monte Carlo methods in the table. This approach may not suit all forecasting circumstances though.
Firstly, by working out where it’s coming from
Secondly, by targeting and reducing sources of uncertainty
Identifying the sources of uncertainty requires looking at the two ways in which they can be important for the uncertainty in predictions (largely covered in the equation earlier):
Targeting and reducing sources of uncertainty is not always straightforward.
Parameters that are highly uncertain and to which our state variable (Y) are highly sensitive cause the most uncertainty for predictions.
But, given limited resources, may not be the best target for a number of reasons:
In fact, you can build a model to predict where your effort is best invested by exploring the relationship between sample size and contribution to overall model uncertainty! You can even include economic principles to estimate monetary or person-hour implications. This is called observational design.
During the “Day Zero” drought the City scrambled for “alternative sources” of bulk water.
The options they explored (beyond demand management) included:
Peer-reviewed research by Le Maitre et al. (2016) indicated that as of 2008 invasive alien plants were estimated to be using around 5% of runoff
When asked why they were not considering clearing invasive alien plants from the major mountain catchments, one of the excuses was [paraphrased] “because we don’t trust the estimates, they don’t provide any estimates of uncertainty”.
While we knew this was a load of crock (they didn’t have uncertainty estimates for any of the other options either), Moncrieff, Slingsby, and Le Maitre (2021) decided it’d be a good idea to explore this issue by:
using a Bayesian framework to update and include uncertainty in the estimates of the volume and percent of streamflow lost to IAPs
explore the relative contribution of sources of uncertainty to overall uncertainty in streamflow losses - to guide efforts to improve future estimates
providing a fully repeatable workflow to make it easy for anyone to query our methods or recalculate estimates as and when updated data become available
The impacts of IAPs on streamflow is predominantly determined from streamflow reduction curves from long-term catchment experiments, whereby the proportional reduction in streamflow is expressed as a function of plantation age and/or density.
These data for these can take 40 years to collect, and the Jonkershoek catchment study has been running for >80 years (see Slingsby et al. 2021)!!!
Streamflow reduction curves (relative to natural vegetation) for pine and eucalypt plantations under normal (suboptimal) or wet (optimal) conditions (from Moncrieff, Slingsby, and Le Maitre (2021)).
These curves are then used to extrapolate spatially to MODIS pixels (250m) nested within catchments, informed by the naturalized runoff and IAP density.
Propagating uncertainty
We used only inputs that could be sampled with uncertainty (i.e. from probability distributions), including rainfall, streamflow reduction curves, fire history, soil moisture and invasion density.
We then propagated that uncertainty through to our streamflow reduction estimates using a Monte Carlo (MC) approach.
Posterior probability distributions showing uncertainty in the impacts of IAPs on streamflow in the catchments feeding Cape Towns major dams (from Moncrieff, Slingsby, and Le Maitre (2021)).
This provided estimates of the impacts of IAPs on streamflow for all catchments in the Cape Floristic Region as posterior probability distributions - i.e. with uncertainty.
The posterior mean estimated streamflow loss in catchments surrounding Cape Town’s major dams was 25.5 million m\(^3\) per annum (range: 20.3 to 43.4).
Given target water use of 0.45 million m\(^3\) per day at the height of the drought, this is between 45 and 97 days of water supply!!!
This was still using the 2008 estimates of IAP invasions…
We did additional analyses to partition the relative uncertainty among the various potential sources, by running the model with uncertainty for only the focal variable and setting the uncertainty in all other inputs to zero.
From this it’s clear the data we need most is better estimates of the extent and density of invasions!!!
Our estimates are very similar to Le Maitre et al. (2016), albeit slightly higher for low density invasions and lower for high density invasions. Either way, the losses are huge and likely to have been much worse during the “Day Zero” drought!
The difference may be a result of Jensen’s Inequality. This takes many forms, but here indicates that the mean of a nonlinear function is not equal to the function evaluated at the mean of its inputs…