Computing the Confidence Interval (CI)
tags: #statistics/inferential/ttest/two_sample
Formula: CI of Independent T-tests
The formula for computing the confidence interval of a two-sample independent t-test of unequal variances is:
where:
CIis the confidence intervaland are the means of the two samples tis the critical t-value for the desired confidence level and degrees of freedomSEis the standard error of the difference between the means, computed as follows:
Pooled variances is used if the assumption of equal population variances is reasonable (can test using Levene's). This increases the precision of the estimate and therefore narrows the confidence interval; however, can just use formula above.
The DOF for a two-sample t-test can be computed as:
Computing the CI in Python
Important Modules
from scipy.stats import t
import numpy as np
-
The
tfunction in thescipy.statsmodule is used to calculate the cumulative distribution function (CDF) and probability density function (PDF) of the t-distribution. -
you can use it to calculate various properties of the t-distribution, such as the probability of a given t-value or the critical t-value for a given level of significance.
Example:
# Generate two independent samples
sample1 = np.array([1, 2, 3, 4, 5])
sample2 = np.array([2, 4, 6, 8, 10])
# Compute the means and standard deviations of the samples
mean1 = np.mean(sample1)
mean2 = np.mean(sample2)
std1 = np.std(sample1, ddof=1)
std2 = np.std(sample2, ddof=1)
-
The
ddofparameter in thenp.std()function specifies the number of degrees of freedom to use in the calculation of the standard deviation. -
By default,
ddof=0, which corresponds to the population standard deviation. -
If you set
ddof=1, the function will use the sample standard deviation formula with(n - 1)in the denominator.
# Compute the standard error of the difference between the means
se = np.sqrt((std1**2 / len(sample1)) + (std2**2 / len(sample2)))
# Set the alpha significance and degrees of freedom
alpha = 0.05
df = len(sample1) + len(sample2) - 2
# Compute the critical t-value
t_crit = t.ppf(1-alpha, df)
# Compute the confidence interval
ci_lower = (mean1 - mean2) - t_crit * se
ci_upper = (mean1 - mean2) + t_crit * se
# Print the results
print("Mean difference:", mean1 - mean2)
print("Confidence interval:", (ci_lower, ci_upper))