Post-Hoc Tests
tags: #statistics/inferential/anova/post_hoc
What is Tukey HSD?
The Tukey's Honestly Significant Difference (HSD) test is a post-hoc test that can be used following a significant ANOVA result to determine which specific groups are significantly different from each other.
The test is based on a pairwise comparison of group means, and it calculates a critical value for the difference between the means that must be exceeded to conclude that the difference is statistically significant.
Tukey HSD has the properties of being both: conservative and robust
-
'Conservative' i.e., the tests err on the side of caution (has strict assumptions)[1]
-
'Robust', such that even if assumptions are violated, the results are still interpretable
-
Also includes a type of Bonferroni Correction in its calculation to account for the Multiple Comparison Problem to adjust the p-value threshold based on the number of pairwise comparisons being made, so that the overall type I error rate is controlled.
We can conduct a post-hoc Tukey HSD whenever there is a significant F statistic.
- For Two-way ANOVA, we can test for where the difference lies with respect to the main effects should the test found them to be significant
What makes Tukey different from T-tests?
- The Turkey HSD (honest significant difference) takes into account multiple comparisons itself[2].
To compute the statistic for one-way ANOVAs:
where,
-
is the critical value from the Tukey HSD table for a given significance level , number of groups , and within-group degrees of freedom . -
is the mean square within-groups. -
is the total sample size across all groups.
Import libraries
import pandas as pd
import numpy as np
from scipy.stats import f_oneway
from statsmodels.stats.multicomp import pairwise_tukeyhsd
Run Tukey
# perform Tukey's test
tukey = pairwise_tukeyhsd(endog=df['CONTINUOUS DV'],
groups=df['CATEGORICAL IDV'],
alpha=0.05)
#display results
print(tukey)
Multiple Comparison of Means - Tukey HSD, FWER=0.05
=====================================================
group1 group2 meandiff p-adj lower upper reject
-----------------------------------------------------
a b 8.4 0.0158 1.4272 15.3728 True
a c 1.3 0.8864 -5.6728 8.2728 False
b c -7.1 0.0453 -14.0728 -0.1272 True
-----------------------------------------------------
A conservative approach is generally preferred in situations where false positives (Type I errors) are considered to be more problematic than false negatives (Type II errors) ↩︎
https://medium.com/mlearning-ai/two-way-anova-post-hoc-analysis-what-is-the-difference-between-tukeyhsd-and-t-test-121ce557797f ↩︎