Two-Sample (Independent) T-tests

tags: #statistics/inferential/ttest/two_sample #assumptions

1. Assumptions and Conditions Two-Sample (Independent) T-tests

Related: INF 1344 W8L017 - Bivariate Analysis: Independent T-tests

Assumptions and Conditions

You have ONE dependent variable that is measured on a continuous scale (i.e., interval/ratio)
The independent variable is binary with two categories or independent groups - if more than 2 $\to$ One-way ANOVA
Observations are INDEPENDENT - i.e., there is no relationship between the observations in EACH group of the IDV, OR between the two independent groups; otherwise leads to contamination -> influences internal validity

Note: Assumptions 1-3 are about the study design and measurements; rest are about the nature of the data

No signifiant outliers in either of the two groups of the IDV in terms of the DV
DV should approximate a normal distribution for EACH IDV GROUP (note: however, if n>50 for each group, the assumption is not required).
Homogeneity of Variance - i.e., population variances should be the same for each IDV group

2. Hypothesis Model

The hypothesis model in a two-sample t-test for comparing the means of two groups is:

H_{0} : μ_{1} - μ_{2} = 0, such that there is no difference in two group means

H_{A} : {\bar{y}}_{1} - {\bar{y}}_{2} \neq 0, such that there is a difference in two group means

Hypothesized Difference

The hypothesized value for the true difference between two mean is always 0:

△_{0} = 0, where the hypothesized difference of the null is almost always 0

3. Running Independent T-test in Python

The scipy.stats module contains a ttest_ind function to run student t-test on the means of TWO INDEPENDENT samples; hence ind.

Importing function

To import the function, we use the following command:

from scipy.stats import ttest_ind

Running the t-test

To run the function, we can pass two array-like objects in which we want to compare the means of two independent samples:

ttest_ind(a, b, axis=0, equal_var=True, altnerative="two-sided")

# two-tailed test is conducted on default

(statistic, p-value)

Alpha Significance Level

Note: In Python, you can set the alpha significance level for an independent t-test using the ttest_ind() function from the scipy.stats module. The default alpha value is 0.05, but you can change it by passing the alpha parameter to the function:

ttest_ind(a, b, axis=0, equal_var=True, altnerative, alpha=0.05) # DEFAULT

4. Interpreting the Results

Depending on the established alpha-significance level ( $α$ ), we can:

Reject the Null Hypothesis, if $p < α$ , such that there is sufficient evidence to suggest that the difference in means of the two independent groups is NOT due to chance and are DIFFERENT.
Fail to reject the Null hypothesis, if $p > α$ , such that there is no sufficient evidence to suggest that there is a difference between the means of the two independent group, and that the observed effect is DUE to chance.

Practical Significance

Note that p-values only tells us if the results are statistically significant, but does not tell us the MAGNITUDE of the result, i.e. the practical significance of the result, such that whether the result is large/strong enough to have any value in the real world.

We can measure the magnitude of the result using Cohen's d to calculate effect size (i.e., the how big the difference is between the two groups in the context of the two-sample t -test).

Therefore, even if the p-value is found to be SS, need to also consider the magnitude of the difference.