Shapiro-Wilk Test
tags: #statistics/inferential/assumption_check
We can test the assumption of normality for of the DV each independent group using the stats.shapiro() function:
from scipy.stats import shapiro
To run:
# note: before running shapiro, create subsets for each IV group
# get list of unique classes in a categorical variable
subgroup = df[df['categorical_var'] == group]['dv']
# run shapiro-wilk test
shapiro(subgroup)
This returns a tuple of (Wilk statistic, p-value)
We can be confident that the samples follow a normal distribution if p-value > 0.05; otherwise, if the p-value < 0.05, it suggests that the data deviates from the normal distribution.
-
if violated, consider: Transformation for Normality
-
Alternative Test: Kolmogorov-Smirnov Test for Normality