Introduction to statsmodel.api
tags: #python/statistic_modules
statsmodel is a Python module that provides classes and functions for the estimation of many different statistical models, as well as for conducting statistical tests, and statistical data exploration.
Supports specifying models using R-style formulas and pandas DataFrames.
Installing statsmodel
# installing the package
pip install statsmodels
This will download and install the latest version of statsmodels. Once the installation is complete, you can import the statsmodels.api module in your Python code using the following statement below.
Importing the module
import statsmodels.api as sm
Popular Modules
statsmodels.formula.api: This module allows you to use R-style formulas to specify the models.
from statsmodels.formula.api import ols
formula = "dv~iv"
model = ols(formula, data=df)
statsmodels.api: This module provides a wide range of statistical models and methods, including regression analysis, ANOVA, time series analysis, and others.
# import
import statsmodels.api as sm
# ANOVA
model = sm.formula.ols("continuous_var~independent_var", data=df)
# Add a constant to the independent variable (for regression and logistic)
X = sm.add_constant(x)
# Fit the model
model = sm.OLS(y, X) results = model.fit() # regression
model = sm.logit(y, X) results = model.fit() # logistic
# Print the summary of the results
print(results.summary())
-
statsmodels.graphics: This module provides functions for generating various statistical graphics, including residual plots, influence plots, and others. -
statsmodels.datasets: This module provides a set of datasets for testing and practicing statistical analysis. -
statsmodels.stats: This module provides statistical tests, confidence intervals, and other statistical methods. -
statsmodels.tsa: This module provides time series analysis models and methods. -
statsmodels.regression: This module provides regression models and methods, including linear regression, logistic regression, and others. -
statsmodels.tools: This module provides various tools for working with statistics, including data manipulation, hypothesis testing, and others.