_Introduction to Hypothesis Testing

tags: #statistical_application #hypothesis_testing

What is a hypothesis?

A hypothesis is an assumption about a population parameter which we will either support or reject based on empirical evidence.

How do we develop a hypothesis?

When defining a hypothesis, we use existing theory or empirical evidence/research based on literature review to develop our hypothesis.

What is the difference between a research question and a hypothesis?

The research question is the question in which the study is set out to answer, whereas, the hypothesis is the tentative answer to the research question.

Note: we say tentative i.e., unconfirmed because it requires empirical evidence to either reject or support the hypothesis

Hypothesis Testing

Hypothesis testing is a form of inferential statistic^[1] where we make generalizations (inferences) about a population parameter based on a sample statistic by quantifying evidence against the null.

Used to validate a hypothesis (assumption) about a population parameter based on a sample

Standard Method: Null-Hypothesis Significance Testing (NHST)

Null-Hypothesis Significance Testing

The NHST combines Fisher's p-value to quantify evidence against the null with Neyman/Pearson's hypothesis in prescribing what to do with the measure in fixing whether to accept or reject the null.

Neyman-Pearson

Derived from the Neyman-Pearon's approach to hypothesis testing, the NHST considers two competing hypothesis:

(1) Null Hypothesis ( $H_{0}$ )
(2) Alternative Hypothesis ( $H_{A}$ )

Significance testing is tested under the null $\to$ hence, quantifying evidence against the null.

Fisher's P-Value

Fisher's p-value is used to determine the statistical significance of a result.

P values correspond to the probability of observing a sample statistic that is at least as extreme as the observed statistic, assuming the null is true.

If observations are sufficiently unlikely from the POV of the null hypothesis, this should be treated as evidence against the null.

Overview: Conducting a Hypothesis Test

Pre-check: Assumptions and Conditions

Before conducting any statistical test, make sure all assumptions and conditions are satisfied. Otherwise, results will not be interpretable.

Step 1: Generate a hypothesis model

Guidelines for Generating a Hypothesis Model

Hypothesis is always expressed in terms of population parameters^[2]
Null hypothesis is expressed as a statement of equality
Direction of the alternative hypothesis depends on the context of the question - i.e., whether you want to test whether the population parameter is different, greater, or smaller than your claim

Null Hypothesis ( $H_{0}$ )

The Null Hypothesis is a statement about the point-value assumption about what the true population parameter value is.

Posits that the observed phenomenon is due to chance.

This is the assumption we want to "disprove", but is assumed correct unless there is evidence to oppose it.

Point-Valued Assumption

The point-valued assumption is used as the basis for testing whether the observed data provides enough evidence to reject the null in favour of an alternative hypothesis that suggets a different value for the parameter.

Caveat: About "disproving" the null

We cannot directly prove that it is true or false, this is a likelihood. Hypothesis testing is conducted under the condition of the null hypothesis - i.e., likelihood of seeing observing a statistic that is at least as extreme given that the null is true.

Null hypothesis assumes no difference or change. Expressed as a statement of equality, such that:

H_{0} : Parameter = Claim

Alternative Hypothesis ( $H_{A}$ )

This is what you are testing for - that the true parameter does NOT equal to the null.

Posits that the null is not true and observed phenomenon is NOT due to chance.

Represents what is logically implied when the null is False.

Expressed in one of the 3 statements about the population parameter depending on the directionality of the test:

Parameter \leq Claim, One-tail Lower

Parameter \geq Claim, One-tail Upper

Parameter \neq Claim, Two-tailed

Step 2: Alpha Significance Level

The alpha significance level is the threshold for determining whether the observed effect is statistically significant - i.e., did not occur by chance, by defining the Rejection (Critical) Regions.

This represents the maximum allowable probability of rejecting the null hypothesis when it is True, and specifies how strongly the sample evidence must contradict the null hypothesis before you can reject the null for the entire population.

The lower the significance level, the stronger evidence required before you will reject the null.

Should be set before the study - otherwise, leads to p-value hacking.

Common Alpha Levels

In practice, 0.01, 0.05, and 0.001 are the most commonly used values for alpha, representing a 1%, 5%, and 0.1% chance of a Type I error occurring (i.e. rejecting the null hypothesis when it is in fact correct).

Which alpha significance level should you use?

Depends on the consequences of making a Type I or Type II error
i.e., lower alpha would minimize the chance of making a Type 1 Error
However, a high alpha would increase the chance of detecting a significant effect (lower alpha requires strong evidence to contradict the null)

P-value Hacking

This is an exploitation of data analysis in order to discover patterns which would be presented as statistically significant, when in reality, there is no underlying effect.

Step 3: Testing The Hypothesis

Compute the test statistic

Hypothesis test are conducted by computing a test statistic of the sample.

What is a Test Statistic?

The test statistic is a numerical summary of the data used to assess evidence against the null hypothesis as a measure of how far the observed data deviate from what would be expected under the null hypothesis.

The appropriate test statistic and its corresponding Null Sampling Distribution used depends on:

The type of data being dealt with; and
Scope of the research question (i.e., context in which the study is being conducted in)

This will dictate the type of experimental design to be conducted to answer the research question.

Determine the p-value

P-values is a measure of statistical significance (i.e., whether the test statistic is statistically significant).

This represents the probability (fraction of times you would see) of observing a test statistic at least as extreme your statistic using the null sampling distribution.

When computing the p-value of the test statistic, we are computing the cumulative probability of the null sampling distribution of observing the test statistic smaller than or equal to your observed test statistic.

Note: The exact method of computing the p-value depends on the type of hypothesis test and the distribution of the test statistic under the null hypothesis.

⚠ Switch to EXCALIDRAW VIEW in the MORE OPTIONS menu of this document. ⚠

Text Elements

observed test statistic
P(x)
P-value of test statistic =
Cumulative Probability of
observing the test statistic

Step 4: Interpreting P-values Against the Alpha

Important to remember that regardless of the outcome of the p-value in the hypothesis test, the results are UNRELATED to the truth or falsity of the alternative hypothesis.

INF 1344 Statistics - Lecture 1 ↩︎
This is because we are interested in making inferences about the population parameter ↩︎