Statistical Power (Analysis)

What is statistical power?

Statistical power is the likelihood of correctly rejecting the null hypothesis, when there is an effect to be found (i.e., when the alternative is true/null hypothesis is false).

Other Interpretations

The probability of finding an effect, when there is an effect to be found.
The likelihood of accepting an alternative hypothesis, when that hypothesis is true.

Important to Note!

Power analysis is a general analysis that doesn't take into account the specific research question or hypothesis.

The purpose of power analysis is to determine the appropriate sample size for a study based on assumptions about the effect size, alpha level, and statistical power.

It is a useful tool for researchers to ensure that their study has enough power to detect the effect they are interested in, and to avoid underpowered studies that may lead to false negatives.

How can we Compute the Power?

The statistical power of a test is the inverse probability of making a type 2 error, such that the higher the statistical power for a given test, the lower the probability of making a Type II (false negative) error.

P o w e r = 1 - β

Therefore, we can compute the type 2 error by computing the power and finding: $β = 1 - P o w e r$ .

Important Note: Power

\approx

Rejecting the Null

The corresponding chance of obtaining a $p \leq α$ when the alternative is true is referred to as the power of the experiment, ranging from $α$ to near $100 %$ .

We can compute the statistical power of a test given:

1. Effect Size: magnitude of the result (i.e., practical significance) you expect/want to observe.

2. Sample Size: for practical considerations

3. Significance: Significance level used in the statistical test (probability of rejecting the null hypothesis)

β

This is the probability of failing to reject a null hypothesis when it is indeed false.

Required Sample Size: How do we determine the effect size to use?

Should be conservative and start with a lower effect size os that we can see the minimum sample size required - lowest is 0.20 using Cohen's d.

What is a "good" power?

The higher the better.

However, rule of thumb:

P o w e r > 80 %, A power of at least 0.8 is "good enough".

When finding the required sample size, how do we determine what power to use?

A commonly used rule of thumb is to aim for a power of 0.8, which corresponds to an 80% chance of detecting an effect if it is truly present.

Practical Implications

Before the Experiment

Statistical power analysis can be conducted to estimate the required sample size to detect a desired effect size (e.g., if you are looking for an effect size of 0.8, we can use the power analysis to find $n$ to get the desired 0.8 effect), at a given alpha significance level and statistical power .

After the Experiment

Tells you how confident you should be in the results by computing the type 2 error (the likelihood of failing to reject the null when it is false) by using the given sample size, effect size, and significance level, consequently helping to conclude whether the probability of committing a Type II error is acceptable from a decision-making perspective.

We can compute a power analysis using functions from the statsmodels.stats.power package.

Different Power Functions

NOTE: there are different statistical power functions available depending on the statistical test you are conducting (see: list of functions)

Power Analysis Using Python

See also: Power Curves to see how power varies by effect size and sample size using .plot_power().

The stats.power module of the statsmodels package in Python contains the required functions for carrying out power analysis for the most commonly used statistical tests such as t-test, normal based test, F-tests, and Chi-square goodness of fit test.

It’s solve_power function takes 3 of the 4 variables mentioned above as input parameters and calculates the remaining 4th variable (e.g., if you want to find the required sample size before an experiment you need to know your estimated effect size, statistical power of your test, and alpha significance level in which you intend to conduct your test at)

Python: Power Analysis

First, import the relevant function from the statsmodels.stats.power module depending on the type of test you want to perform (e.g., TTestIndPower() for independent t-tests).

import statsmodels.stats.power as smp

Instantiate the relevant power analysis class, specifying the required parameters:

# Instantiate the TTestIndPower class 
power_analysis = smp.TTestIndPower() 

# Specify the parameters for the power analysis 
effect_size = 0.5   # expected effect size
alpha = 0.05        # significance level
n1 = 50             # sample size of group 1
n2 = 50             # sample size of group 2
ratio = 1           # ratio of sample sizes between groups; 1 = "equal" as in 1:1

Note about sample sizes of the group and ratio

It is generally recommended to try to obtain information about the likely sample sizes before conducting the power analysis to ensure more accurate and reliable results.

Otherwise, we would need to make assumptions about the likely sample sizes or use some other method to estimate the effect size and perform the power analysis

Compute the power of the test using the power() method:

power = power_analysis.power(effect_size=effect_size, nobs1=n1, alpha=alpha, ratio=ratio, alternative='two-sided')

Python: Required Sample Size

First, import the necessary libraries:

import statsmodels.stats.power as smp

Instantiate the power function depending on the hypothesis test being conducted:

power_analysis = smp.<POWER FUNCTION>

To find the required sample size, we can call the .solve_power() method onto the power object, specifying the desired effect size, power, and significance level at which the test is to be conducted at:

sample_size = power_analysis.solve_power(effect_size=2, power=0.8, alpha=0.05)
# conservative effect size - 0.2
# generally power is set to 0.8

print(sample size)