Jasper Slingsby
No one ever does them…
…but they could save so much pain and suffering if they did!!!
Statistical power is the probability of a hypothesis test finding an effect if there is an effect to be found.
Power analysis is a calculation typically used to estimate the smallest sample size needed for an experiment, given a required significance level, statistical power, and effect size.
Firstly, it helps you plan your analyses before you’ve done your data collection, which is always useful, because it will inform how you collect your data.
Secondly, not knowing the statistical power of your analysis can result in:
Type II Error:
Type I Error:
Type I and Type II Errors and how they result in false or missing findings, respectively. Image from Norton and Strube 2001, JOSPT.
Is determined by the combination of the:
We usually use an \(\alpha\) of 0.05 to indicate significant difference.
This is a subjective cut-off, but is generally accepted in the literature…
You have greater statistical power when you have greater differences in means (effect size). P1 vs P3 has greater power than either P1 vs P2 or P2 vs P3.
Greater variability among subjects results in larger standard deviations, reducing our ability to distinguish among groups (i.e. statistical power).
Increasing sample size increases statistical power by improving the estimate of the mean and constricting the distribution of the test statistic (i.e. reducing the standard error (SE)). Dashed lines indicate the true population means.
Simulate the data you would expect to collect, varying the:
…and test for significant difference using the appropriate statistical test (possibly varying the \(\alpha\) (“significance”) level (e.g. P < 0.05) if justified).
First, we need to simulate some data.
If we believe our data are normally distributed, we can use the handy rnorm()
function, like so:
Type ?Distributions
Now let’s look at our new data
We can plot it like so:
Tests the hypothesis that the mean of our population is a specific value (e.g. 0).
t.test(x = df$Data, # set our vector of data values
alternative = "two.sided", # specify the alternative hypothesis (which in this case is "not zero" so it is two-sided (verses "greater" or "less"))
mu = 0) # set the "true value" of the mean
One Sample t-test
data: df$Data
t = 6.4457, df = 49, p-value = 4.793e-08
alternative hypothesis: true mean is not equal to 0
95 percent confidence interval:
0.6609861 1.2598437
sample estimates:
mean of x
0.9604149
In this case, the difference is highly significant! P < 0.00000005!!!
What if we fiddle with the \(\alpha\) (“significance” level)?
but
Now let’s fiddle with the difference between group means (effect size).
In this case this is easiest done by shifting the mu to closer to the mean of our randomly generated data, like so
t.test(x = df$Data, # set our vector of data values
alternative = "two.sided", # specify the alternative hypothesis
mu = 0.5) # set the "true value" of the mean
One Sample t-test
data: df$Data
t = 3.09, df = 49, p-value = 0.003295
alternative hypothesis: true mean is not equal to 0.5
95 percent confidence interval:
0.6609861 1.2598437
sample estimates:
mean of x
0.9604149
Here we’ve reduced the effect size to from 1 to 0.5, but the result is still significantly different.
Now let’s fiddle with variability among subjects.
# Make new data with greater variability (standard deviation = 2)
df <- data.frame(Data =
rnorm(n = 50, # set the sample size
mean = 1, # set the mean
sd = 2), # set bigger standard deviation
Treatment = 1)
# Run t-test
t.test(x = df$Data,
alternative = "two.sided",
mu = 0.5)
One Sample t-test
data: df$Data
t = 0.92312, df = 49, p-value = 0.3605
alternative hypothesis: true mean is not equal to 0.5
95 percent confidence interval:
0.1965978 1.3189782
sample estimates:
mean of x
0.757788
With double the variability (standard deviation), and an effect size of 0.5, the result is no longer significantly different…
Now let’s increase the sample size.
# Make new data with greater sample size (n = 100)
df <- data.frame(Data =
rnorm(n = 100, # set the sample size
mean = 1, # set the mean
sd = 2), # set bigger standard deviation
Treatment = 1)
# Run t-test
t.test(x = df$Data,
alternative = "two.sided",
mu = 0.5)
One Sample t-test
data: df$Data
t = 3.8761, df = 99, p-value = 0.000191
alternative hypothesis: true mean is not equal to 0.5
95 percent confidence interval:
0.8571468 1.6063205
sample estimates:
mean of x
1.231734
Aha! By doubling our sample size, our result is significantly different once again…
Repeatedly rerunning our simulation with different sample size (n) would rapidly become tedious…
Fortunately, there’s a better way (for common tests…)!
library(pwr) allows us to input the effect size and power required and returns the required n.
library(pwr)
pwr.t.test(d = 0.8,
n = NULL,
sig.level = 0.05,
power = 0.8,
type = "one.sample",
alternative = "two.sided")
One-sample t test power calculation
n = 14.30276
d = 0.8
sig.level = 0.05
power = 0.8
alternative = two.sided
Which suggests we need 15 samples to achieve our desired statistical power.
Here we have set n = NULL because that’s the property we want to estimate.
power = the power of the test (1 minus the Type II error probability), which in this case we have set to 80%
d = Cohen’s d = a measure of effect size = the difference between the means divided by the pooled standard deviation
= the difference between the means divided by the pooled standard deviation (i.e. the standard deviation of the difference)
So, for our simulated data, assuming we’re comparing our data to 0:
Rerun for our observed d:
pwr.p.test
: one-sample proportion testpwr.2p.test
: two-sample proportion testpwr.2p2n.test
: two-sample proportion test (unequal sample sizes)pwr.t.test
: two-sample, one-sample and paired t-testspwr.t2n.test
: two-sample t-tests (unequal sample sizes)pwr.anova.test
: one-way balanced ANOVApwr.r.test
: correlation testpwr.chisq.test
: chi-squared test (goodness of fit and association)pwr.f2.test
: test for the general linear modelEach test has a different metric of effect size, each calculated in a different way…
Fortunately, library(pwr) has a convenient function (cohen.ES
) that can provide these for you for small, medium and large effect sizes.
Conventional effect size from Cohen (1982)
test = t
size = medium
effect.size = 0.5
If you’re not sure of the effect size you’d expect, the conservative approach is to use “small”
You can pass the results of cohen.ES
directly to the pwr function by calling the effect.size
slot in the returned object:
List of 4
$ test : chr "t"
$ size : chr "medium"
$ effect.size: num 0.5
$ method : chr "Conventional effect size from Cohen (1982)"
- attr(*, "class")= chr "power.htest"
For comparisons among 3 or more groups (k) for different effect sizes (f)
Tests whether 2 categorical variables (dimensions of a contingency table) are independent
w = Effect size, df = Degrees of freedom
r = correlation coefficient (i.e. it is not \(R^2\))
Note that for regression analysis you’d need to set the “alternative” to “greater” or “less”, because it assumes that one variable is dependent on the other
For regression analysis with multiple covariates (explanatory variables)
f2 = Effect size
u = degrees of freedom for numerator (= number of groups - 1)
v = degrees of freedom for denominator (= total number of individuals across groups - the number of groups)
Think carefully about the data you plan to collect and how to analyze them
Decide on your statistical analyses/tests
Do power analyses for each of your analyses/tests to determine necessary sample sizes
Include a description of your intended analyses and (ideally) present power analysis in your presentations on Thursday (not for marks, but very useful)
Include a power analysis in your write-ups!!! (for marks!)
library(simr) for generalised linear mixed effects models (GLMM), e.g. this demo.