Computing the Confidence Interval (CI)

tags: #statistics/inferential/ttest/two_sample

Formula: CI of Independent T-tests

The formula for computing the confidence interval of a two-sample independent t-test of unequal variances is:

C I (\bar{x_{1}} - \bar{x_{2}}) = (\bar{x_{1}} - \bar{x_{2}}) \pm t^{*} \cdot S E (\bar{x_{1}} - \bar{x_{2}})

where:

CI is the confidence interval
$\bar{x_{1}}$ and $\bar{x_{2}}$ are the means of the two samples
t is the critical t-value for the desired confidence level and degrees of freedom
SE is the standard error of the difference between the means, computed as follows:

S E (\bar{x_{1}} - \bar{x_{2}}) = \sqrt{\frac{s_{1}^{2}}{n_{1}} + \frac{s_{2}^{2}}{n_{2}}}

Pooled variances is used if the assumption of equal population variances is reasonable (can test using Levene's). This increases the precision of the estimate and therefore narrows the confidence interval; however, can just use formula above.

Degrees of Freedom

The DOF for a two-sample t-test can be computed as:

d o f = n_{1} + n_{2} - 2

Computing the CI in Python

Important Modules

from scipy.stats import t
import numpy as np

The t function in the scipy.stats module is used to calculate the cumulative distribution function (CDF) and probability density function (PDF) of the t-distribution.
you can use it to calculate various properties of the t-distribution, such as the probability of a given t-value or the critical t-value for a given level of significance.

Example:

# Generate two independent samples
sample1 = np.array([1, 2, 3, 4, 5])
sample2 = np.array([2, 4, 6, 8, 10])

# Compute the means and standard deviations of the samples
mean1 = np.mean(sample1)
mean2 = np.mean(sample2)
std1 = np.std(sample1, ddof=1) 
std2 = np.std(sample2, ddof=1)

The ddof parameter in the np.std() function specifies the number of degrees of freedom to use in the calculation of the standard deviation.
By default, ddof=0, which corresponds to the population standard deviation.
If you set ddof=1, the function will use the sample standard deviation formula with (n - 1) in the denominator.

# Compute the standard error of the difference between the means
se = np.sqrt((std1**2 / len(sample1)) + (std2**2 / len(sample2)))

# Set the alpha significance and degrees of freedom
alpha = 0.05
df = len(sample1) + len(sample2) - 2

# Compute the critical t-value
t_crit = t.ppf(1-alpha, df)

# Compute the confidence interval
ci_lower = (mean1 - mean2) - t_crit * se
ci_upper = (mean1 - mean2) + t_crit * se

# Print the results
print("Mean difference:", mean1 - mean2)
print("Confidence interval:", (ci_lower, ci_upper))