Post-Hoc Tests

tags: #statistics/inferential/anova/post_hoc

What is Tukey HSD?

The Tukey's Honestly Significant Difference (HSD) test is a post-hoc test that can be used following a significant ANOVA result to determine which specific groups are significantly different from each other.

The test is based on a pairwise comparison of group means, and it calculates a critical value for the difference between the means that must be exceeded to conclude that the difference is statistically significant.

Tukey HSD has the properties of being both: conservative and robust

'Conservative' i.e., the tests err on the side of caution (has strict assumptions)^[1]
'Robust', such that even if assumptions are violated, the results are still interpretable
Also includes a type of Bonferroni Correction in its calculation to account for the Multiple Comparison Problem to adjust the p-value threshold based on the number of pairwise comparisons being made, so that the overall type I error rate is controlled.

We can conduct a post-hoc Tukey HSD whenever there is a significant F statistic.

For Two-way ANOVA, we can test for where the difference lies with respect to the main effects should the test found them to be significant

What makes Tukey different from T-tests?

The Turkey HSD (honest significant difference) takes into account multiple comparisons itself^[2].

Formula

To compute the statistic for one-way ANOVAs:

H S D = q_{α, k, d f_{w i t h i n}} \times \sqrt{\frac{M S_{w i t h i n}}{n}}

where,

$q_{α, k, d f_{w i t h i n}}$ is the critical value from the Tukey HSD table for a given significance level $α$ , number of groups $k$ , and within-group degrees of freedom $d f_{w i t h i n}$ .
$M S_{w i t h i n}$ is the mean square within-groups.
$n$ is the total sample size across all groups.

Import libraries

import pandas as pd
import numpy as np
from scipy.stats import f_oneway
from statsmodels.stats.multicomp import pairwise_tukeyhsd

Run Tukey

# perform Tukey's test
tukey = pairwise_tukeyhsd(endog=df['CONTINUOUS DV'],
                          groups=df['CATEGORICAL IDV'],
                          alpha=0.05)

#display results
print(tukey)

 Multiple Comparison of Means - Tukey HSD, FWER=0.05 
=====================================================
group1 group2 meandiff p-adj   lower    upper  reject
-----------------------------------------------------
     a      b      8.4 0.0158   1.4272 15.3728   True
     a      c      1.3 0.8864  -5.6728  8.2728  False
     b      c     -7.1 0.0453 -14.0728 -0.1272   True
-----------------------------------------------------

A conservative approach is generally preferred in situations where false positives (Type I errors) are considered to be more problematic than false negatives (Type II errors) ↩︎
https://medium.com/mlearning-ai/two-way-anova-post-hoc-analysis-what-is-the-difference-between-tukeyhsd-and-t-test-121ce557797f ↩︎