Interpreting the LR Model

tags: #ML/supervised/regression

Statistical Significance

A statistically significant coefficient indicates that there is an association between the predictor (x) and the outcome (y) variable. This is visually indicated by the asterisks to the right of the output for each predictor, each corresponding to a specific level of significance.

* - 0.05

** - 0.01

*** - 0.001

Sample Output:

                            OLS Regression Results                            
==============================================================================
Dep. Variable:               response   R-squared:                       0.850
Model:                            OLS   Adj. R-squared:                  0.840
Method:                 Least Squares   F-statistic:                     84.54
Date:                Sun, 24 Apr 2023   Prob (F-statistic):           4.69e-22
Time:                        12:00:00   Log-Likelihood:                -247.43
No. Observations:                  50   AIC:                             502.9
Df Residuals:                      46   BIC:                             510.0
Df Model:                           3                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef.    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
const         14.5802      6.452      2.258      0.029       1.538      27.622
predictor1     0.6568      0.103      6.392      0.000       0.449       0.864
predictor2     0.4647      0.118      3.933      0.000       0.226       0.703
predictor3    -0.0006      0.001     -0.452      0.654      -0.003       0.002
==============================================================================
Omnibus:                        0.044   Durbin-Watson:                   1.811
Prob(Omnibus):                  0.978   Jarque-Bera (JB):                0.201
Skew:                          -0.031   Prob(JB):                        0.905
Kurtosis:                       2.648   Cond. No.                     8.36e+03
==============================================================================

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The condition number is large, 8.36e+03. This might indicate that there are
strong multicollinearity or other numerical problems.

1. Estimated Regression Coefficient b

The regression coefficient of the IDV is represented by the parameter b, where:

y=bx+b0

This means that for every unit increase in the IDV, the DV increases/decreases by the value of the regression coefficient (depending on the sign of the coefficient).

Example:

			Equation 1				Equation 2
			----------				----------
			b	  beta				b	 beta
			-----------------------------------
Education  0.076* 0.220.           0.041* 0.

Dependent Variable: Self-reproted health
Independent Variable: Education

2. Estimated Regression Coefficient b (Dummy)

When reporting the regression coefficients of DV - you are making DIRECT comparisons to the reference group (the group that is omitted).

Example:

Screen Shot 2023-04-24 at 6.49.37 PM.png400

Categorical Variable:

  1. Sex - Males have a self-rated health that is on average 0.114 HIGHER than females

  2. Race - Whites have a self-rated health that is about 0.239 points HIGHER than non-whites

  3. Marital Status - Married people have a self-rated health that is 0.105 points HIGHER than that of single people.

3. Y-intercept (constants)

The y-intercept (or constant) represents:

It also represents the:

4. t-value

t-tests are used to test the statistical significance of the coefficients in a linear regression model:

H0:The coefficient are not statistically different from 0.HA:The coefficient are statistically different from 0.

For each regression output, there is a corresponding t-value associated with each variable to assess whether the regression coefficient is significantly different from zero.

5. Coefficient of Determination (R2)

The most common measure of how accurate the model (prediction) is, is the Coefficient of Determination (R2) (i.e., how good is the model in making predictions?)

This is measured as:

R2=1SSEregressionSSEmeanonly

where,

This reflects the proportional of the total variation in the DV, explained by the IDV(s) in the MLR model.

Interpretation

R2 will ALWAYS increase as variables are added to the model

R2 ranges from 0.0 to 1.0 such that:

1. R2=1.0 indicates that by using the linear regression model, we have:

  • Reduced the uncertainty by 100% - all observations fall on the line and is a perfect fit for the data
  • Independent variable(s) in the model account for 100% of the variation in the DV
  • All observations fall on the regression line and prediction error = 0.0

2. R2=0 indicates that by using the linear regression model with IDV, X, to predict the DV, Y, does NOT improve the prediction of Y

  • This indicates a POOR FIT or a well-fitting line with b=0 (i.e., a slope of 0).
  • A well-fitting fitting line must have a non-zero slope for b.
Powered by Forestry.md