V. Statistics
1) Statistical Modules in Python
| File | Comments | Tags |
|---|---|---|
| Introduction to SciPy | How to install and import | |
| Introduction to statsmodel.api | How to install and import |
{ .block-language-dataview}
2) Overview: How to Conduct Hypothesis Tests
| File | Comments |
|---|---|
| _Introduction to Hypothesis Testing | Overview of how to conduct Hypothesis Testing |
| Direction of Alternative Hypothesis Testing | One-tailed (Upper, Lower) and Two-Tailed testing. |
| Interpreting P-values Against Alpha | Misconceptions about p-values and how to interpret it with respect to alpha. |
| Null Sampling Distribution | About the Null Sampling Distribution |
| Pitfalls to Statistial Analysis | Threats and biases to statistical analysis (e.g., p-hacking, data-dependent analysis). |
| Type I and Type II Errors | Also note on which error should take precedence. Trade-offs between Type 1 and 2 Errors. |
| Understanding Confidence Intervals | What are CI and how we can interpret CI in hypothesis testing |
Types of Sampling Distributions
| File | Comments |
|---|---|
| The F-Distribution | Continuous probability distribution of the the null distribution of a the F-statistic/F-ratio used in ANOVA. |
Assumptions Check & Data Transformations
| File | Comments | Assumption Check |
|---|---|---|
| Levene's Test | Test for homogeneity of variance. | Homogeneneity of Variance |
| Shapiro-Wilk Test | Test for normality of distribution for small sample sizes (n<5000). |
Normality |
| Kolmogorov-Smirnov Test for Normality | Test for normality for large sample sizes. | Normality Test |
| Winsorizing Outliers | Replacing extreme values in the dataset with values at specified percentiles. | Outliers |
| Transformation for Normality | Techniques for transforming skewed data to statisfy Normality assumption. | Transformation, Normality |
3) Running different statistical tests in Python
Statistical Power & Power Analysis
| File | Comments |
|---|---|
| Introduction to Statistical Power (Analysis) | How to conduct a power analysis and how to use power to find the required sample size prior to an experiment. |
| Power Curves | Using plot_power() to show how the statistical power varies as a function of effect size and sample size, at a given alpha. |
| Reporting Effect Sizes | Effect size should be reported after a statisically significant result as an indicator of the magnitude of the effect; i.e., measures the STRENGTH of the relationship in determining whether the effect is trival or substantial. Should be reported after a statistical significant result. |
{ .block-language-dataview}
Python: T-tests
| File | Comments | Type of Analysis |
|---|---|---|
| _Introduction to t-tests | - | - |
| {Test} One-Sample T-tests | - | Bivariate |
| {Test} Paired (Dependent) T-test | Univariate inferential statistic technique used for within-group experimental design (to examine the effect of an IV on a DV before and after intervention) | Univariate |
| {Test} Two-Sample (Independent) T-tests | For EQUAL VARIANCES. |
Bivariate |
| {Test} Welch's T-tests for Unequal Variances | T-tests for unequal variances. |
Bivariate |
| Confidence Interval for Population Mean (One-Sample t-test) | Computing CI of population means using t.interval() function. |
- |
| Confidence Interval of Two-Sample T-tests | Formula and code for computing the CI of two-sample t-tests. | - |
| Effect Size for T-tests | This only includes Cohen's d for comparing two group means and Pearson's r for quantifying the magnitude of the difference for t-tests. |
- |
| How to Compute the t-critical Values | Computing t-critical values with respect to the alpha significance level and dof. | - |
Python: ANOVAs
| File | Comments | Type of Analysis |
|---|---|---|
| _Introduction to ANCOVA | Comparison of adjusted means across 3 or more groups, controlling for a covariate (i.e., an independent variable can influence the outcome of a DV, but is not of interest). |
- |
| _Introduction to ANOVAs | ANOVAs for comparing the means of three or more groups. Includes assumptions and conditions for conducting ANOVA and explanation of the Multiple Hypothesis Problem. |
- |
| {Test} Post-Hoc Tukey HSD | Given statistical signifcant result of ANOVA, this only tells you there is a difference. Post-hoc to find which groups are significantly different from each other. | Pairwise Comparison |
| {Test} One-way ANCOVA | Conducting one-way ANCOVA. |
Bivariate |
| {Test} One-way ANOVA | One-way ANOVA is one type of ANOVA used to compare the difference between the means of a continuous dependent variable across three or more independent groups of a SINGLE categorical variable. | Bivariate |
| {Test} Two-way ANOVA | With REPLICATION. Used to compare the effect of two categorical IV on a continuous DV, and whether there is an interaction between the two IV on the outcome. |
Multivariate |
| {Test} Welch's ANOVA | Welch's ANOVA is a modification of the traditional one-way ANOVA that can be used when the homogeneity of variance assumption is violated. | Bivariate |
| Bonferroni Correction | Alpha adjustment method for the Multiple Hypothesis Problem to account for the inflation of the Type 1 Error in ANOVAs. |
- |
| Interaction Plot | Interaction/Profile Plots to visualize the interaction effect in a two-way ANOVA if interaction term is significant. |
Interaction Effect |
{ .block-language-dataview}
4) Miscellaneous Code Snippets
| File | Comments | Tags |
|---|---|---|
| Unpacking Lists for Statistical Tests | Application of the * symbol in unpacking subgroups into individual arguments of a statistical function. |
|
| Formatting Test Statistics and P-values in Python | User-defined functions to return significance of p-values as a string of stars and to return a rounded p-value. |