Welch's ANOVA

tags: #statistics/inferential/anova/one_way

When to use Welch's ANOVA

Welch's ANOVA is more robust to violations of the normality assumption and can handle unequal variances better than the regular one-way ANOVA. However, it is still generally recommended to have approximately normally distributed data when using any ANOVA test.

Welch's one-way ANOVA is an alternative to the standard one-way ANOVA that does not assume equal variances across groups.

The standard one-way ANOVA assumes that the variances of the populations from which the samples were drawn are equal. When this assumption is violated, the results of the standard ANOVA can be misleading.

Welch's ANOVA uses a modified test statistic that takes into account the differences in sample variances across groups. The degrees of freedom used in the test statistic are also adjusted to account for the unequal variances.

Methods

Option 1: Pingouin Welch ANOVA

To perform Welch's ANOVA in Python, you can use the welch_anova function from the pingouin package.

pingouin provides a user-friendly and comprehensive statistical package that includes Welch's one-way ANOVA, as well as a range of other statistical tests and functionalities.

Sample code:

import pandas as pd
import pingouin as pg

# Perform Welch's ANOVA
aov = pg.welch_anova(dv='target', between='categorical_Var', data=df)

# Print the ANOVA results
print(aov)

   Source  ddof1     ddof2       F     p-unc      np2
0   Group      2  13.84638  3.0082  0.080017  0.31005

Here:

ddof1 represents the degrees of freedom for the numerator (between group variance)
ddof2 represents the degrees of freedom for the denominator in the F-test (within group variance)
np2 (Eta-squared) is a measure of effect size in ANOVA, which describes the proportion of the total variance in the dependent variable that is accounted for by the independent variable:

n p 2 = \frac{S S_{b e t w e e n}}{S S_{T o t a l}}

p-unc is the uncorrected p-value for multiple comparisons

Option 2: f_oneway ANOVA for Unequal Variances

Alternatively, we conduct a Welch's one-way ANOVA using f_oneway from the scipy.stasts module as used in conducting the one-way ANOVA with equal variances by setting the equal_var as False:

from scipy.stats import f_oneway

# subsetting
groups = list(df["categorical_var"].unique())
subsets = [df[df["categorical_var"]==group]["dv"] for group in groups]

# Perform Welch's ANOVA
f_stat, p_val = f_oneway(*subsets, equal_var=False)

# Print results
print("F-statistic:", f_stat)
print("p-value:", p_val)

Option 3: Statsmodel

Another method is using statsmodel by specifying the robust parameter as 'hc3' in the anova_lm function for conducting one-way ANOVA:

import statsmodels.api as sm
from statsmodels.formula.api import ols

# create the model using ols
model = ols('dependent_variable ~ categorical_variable', data=df).fit(cov_type="HC3")

# perform the Welch's one-way ANOVA using anova_lm
anova_results = sm.stats.anova_lm(model, typ=2, robust='hc3')

# view the results
print(anova_results)

This code performs the same task as the pingouin.welch_anova function, but with a different library and function call.

Uncorrected p-value

The output of the statsmodels Welch one-way ANOVA result is an uncorrected p-value by default. However, if you specify the cov_type parameter to be "HC3" or "HC4", it will use a heteroscedasticity-robust covariance matrix to estimate standard errors and can produce a corrected p-value.