IV. ML Models

Refer to General Data Science for more specific topics.

Core Principles of ML/AI

Everything we do in ML/AI is a permutation of the following principles:

For each ML algorithm, we have an underlying computation model with a set of learnable parameters (e..g, kNN, where $k$ is a learnable parameter).
To find a corresponding set of optimal learnable parameters to the computational model, we have a learning algorithm to find the best set of learnable parameters that can accurately map the input to its output.
We find this optimal set of learnable parameters by minimizing some measure of error (e.g., Gradient descent, PLR for binary logistic regression)

When learning a new model, understand and identify its learnable parameters and how those set of optimal parameters are found (through some loss function).

Supervised ML Algorithms: Regression

What is a regression task?

A regression task is used for predicting continuous and real-valued numerical values based on a set of input features.

Linear Regression

Assumptions for Linear Regression

One of the assumptions for using the linear regression model is that the underlying relationship is linear. Therefore, before conducting a linear regression - ensure that the relationship between the IDV and DV are LINEAR. Otherwise, use polynomial (non-linear) regression.

File	Comments	Type
_About Linear Regression	-	Regression, Multivariate Analysis
Interpreting the LR Model	How to interpret the output of a LR model (regression coefficients, intercepts, p-values, t-statistics)	Output
Performance Metrics for Regression Models	R2 Score, MAE, MSE etc.	Model Evaluation
Regression Summary	Generating a regression summary based on the performance metrics.	Model Evaluation
Running Linear Regression with sklearn	Running LR with `sklearn`.	Code
Running Linear Regression with statsmodel.api	Running LR using `statsmodel.api` and `ols` as the regression model.	Code
Simple vs Multiple Linear Regression	Understanding the difference between simple and multiple linear regression (partial slopes).	-
Visualizing Predicted vs Actual	Visualizing predicted vs actual values in a scatterplot.	Scatterplot

{ .block-language-dataview}

Supervised ML Algorithms: Classification

What are classification algorithms?

Classification is a form of supervised ML algorithm used for predicting the class output of a categorical target variable from a predefined and discrete set of classes that has been labelled.

What are the two types of supervised classification?

There are two main classification algorithms:

Binary - the goal is to classify instances into one of two classes. The output variable is a binary variable, which means that there are only two possible outcomes.
Multi-class - the goal is to classify instances into one of three or more classes. The output variable is a categorical variable with three or more possible outcomes.

Logistic Regression

Note!

Binary/Linear Logistic Regression is a classification algorithm. Also have an underlying assumption that the data is linearly separable, otherwise use non-linear classification algorithms.

File	Comments	Type
_About Binary Classificaition	Binary classification using `Perceptron Learning Rule` and using an activation function to convert the binary classifier into a `Logistic Regression` algorithm as a probabilistic model.	Binary and Multi-class
Classification Reports	Computing the precision, recall and F1-score for each class.	Model Evaluation
Confusion Matrix	Evaluating the performance of a classification model using a confusion matrix based on 4 outcomes, `TP, TN, FP, FN`. Information on the 4 performance metrics including the weighted f1-score for multi-classification tasks. How to generate cm in python.	Model Evaluation
Running Logit with statsmodel	Running Logit with `statsmodel.api`	Multivariate Analysis, Code

Naive Bayes

File	Comments	Type
_About Naive Bayes	Bayes Theorem in NB algorithm for both binary and multi-classification tasks.	Binary and Multiclass
Running Gaussian Naive Bayes with sklearn	Running Naive Bayes Classification with `sklearn`	Code

k-Nearest Neighbours (kNN)*

Can be used for regression tasks

File	Comments	Type
_About kNN	How the algorithm works using Euclidean distance.	Binary and Multi-class classification, Regression
{Code} Running k-NN with sklearn	Building a knn classifier in python using `sklearn`	-
Finding the Optimal k	How to find the optimal k for knn.	-

Decision Trees*

Can be used for regression tasks

File	Comments	Type
_About Decision Trees	What are decision trees, how it determines which features to split the data on, and importance of limiting the depth.	Binary or multi-classification, Regression.
{Code} Running Decision Trees in sklearn	Building a decisoin tree classifier in python using `sklearn`.	Code
Graphical Representation & Interpreting a Decision Tree Output	How to interpret the output of a decision tree and code building the graphical representation of the decision tree in `python`.	Output
Understanding Entropy and Information Gain	How entropy and IG is used to select features of importance to split the data.	Theory