2. One-way ANOVA

tags: #statistics/inferential/anova/one_way

What is a One-way ANOVA?

One-way ANOVAs is a bivariate inferential statistic technique that involves:

  1. A continuous dependent variable
  2. A categorical independent variable with three or more groups

This is used to to compare the difference between the means of a continuous dependent variable across three or more independent groups of a SINGLE categorical variable.

This is also known as: one-factor analysis or between subject ANOVA.



Theoretical Framework

Hypothesis Model

H0:μ1=μ2=...μk, where k is the number of groups HA:μ1μ2...μk, where k is the number of groups

Sampling Distribution and Statistical Test: F-test

The F-test is used in ANOVAs to test the statistical significance of the differences among the means of three or more groups.

The F-test uses the F-distribution to determine the probability of obtaining a test statistic as extreme or more extreme than the one observed, assuming that the null hypothesis is true .

Computing the test statistic

The F-statistic is the ratio of the variability between groups to the variability within groups.

F=MSBMSW Source of VariationDegrees of FreedomSum of SquaresMean SquareF-ValueBetween Groupsk1SSbetweenMSbetween=SSbetweenk1F=MSbetweenMSwithinWithin GroupsNkSSwithinMSwithin=SSwithinNkTotalN1SStotalSource of VariationSum of Squares (SS)Between GroupsSSbetween=i=1kni(y¯iy¯)2Within GroupsSSwithin=i=1kj=1ni(yijy¯i)2Source of VariationDegrees of Freedom (df)Between Groupsdfbetween=k1Within Groupsdfwithin=NkTotaldftotal=N1
Interpreting the Results

To determine whether to reject or fail to reject the null, we can compare the F-critical value from the F-distribution table (used to establish the critical region for rejection) to the observed F statistic:

F>F, Reject NullF<F, Fail to Reject Null

Screen Shot 2023-02-20 at 5.13.04 PM.png



Running One-way ANOVA in Python

Method 1: Using SciPy

We can conduct a one-way ANOVA test in Python using the f_oneway function from scipy.stats:

# import function
from scipy.stats import f_oneway 

# Conduct One-way ANOVA 
f_oneway(*samples) # there must be at least two array-like arguments (i.e., samples)

# in the context of a dataset after subsetting for relevant features
# retrieving the raw data values of the DV for "group_1"
sample_1 = df[df["Categorical_Var"]=="group_1"]["DV"]

This returns a tuple of the F-statistic and the corresponding p-value computed at 0.05.

F_onewayResult(statistic, pvalue)

To run and display the test statistic and p-value using tuple unpacking:

statistic, pvalue = f_oneway(*samples) 

print(f'One-way ANOVA: s = {statistic}, p = {pvalue}')
One-way ANOVA: s = 93.73300962036718 (F-statistic), p = 2.1376700154385954e-28

Method 2: Using statsmodel

Alternatively, we can use statsmodel which includes a summary output:

import statsmodels.api as sm
from statsmodels.formula.api import ols

# assume your data is stored in a pandas dataframe called 'df' 
# with the dependent variable 'y' and the independent variable 'x'

model = ols('y ~ x', data=df).fit()
anova_table = sm.stats.anova_lm(model, type=2)
Using OLS

Note that the ols function will automatically group the data by the categorical IV and compute the mean of DV for each group, so you don't need to do any pre-processing of the data.

If results are significant, follow up with: Post-Hoc Tests

Powered by Forestry.md