Skip to content

Evaluation measure: MSE versus MAE, RMSE

This comic explains MSE and MAE, the commonly used evaluation metrics for regression. MSE emphasizes large deviations, while MAE provides a more robust measure when outliers are less significant. MSE is preferred as a loss function due to its ability to penalize larger errors more heavily and its suitability for mathematical optimization, stability, and statistical interpretation. RMSE is the square root of MSE and also penalizes large errors.

Parameters and Loss function

Machine learning parameters are values learned from training data to minimize prediction errors. For example, in a uniform distribution for bus arrival times, parameters $latex a$ and $latex b$ define the range. They are the model’s knobs for accurate predictions.

Unsupervised learning helps detect shady people

Unsupervised learning is a type of machine learning algorithm used to draw inferences from datasets consisting of input data without labeled responses. In unsupervised learning, the goal is to infer the natural structure present within… 

The model that’s not a girl & time machine

Comments: I already asked my student, and he confirmed that the reason he studied the ML class was because there was a model in that class ?. So, Mr. Fox left the class after he… 

Supervised learning: who’s supervising the forest?

Supervised learning involves training an algorithm on labeled data and pairing input with correct output. Unsupervised learning uses unlabeled data to find patterns. For example, predicting pizza delivery tips involves features like time, pizza type, distance, and tip history, with the goal of predicting tip outcomes.

A comic guide to Train – test split + Python & R codes

After collecting and preprocessing the dataset, it is essential to divide it into two distinct sets: training set and testing set. The training set is used to train the model while the testing set is used to evaluate its performance. This allows assessment of the model’s generalization to new data. Two code examples in Python and R demonstrate how to create synthetic data and split it into training and testing sets using popular libraries.

Simple Linear Regression Review: Sunlight & Selfie

Simple linear regression is a statistical method used to model and analyze the relationship between two continuous variables. Specifically, it aims to predict the value of one variable (the dependent or response variable) based on… 

Residual plot for model diagnostic

Assessing assumptions like linearity, constant variance, error independence, and normal residuals is essential for linear regression. Residual plots visually assess the model’s goodness of fit, identifying patterns and influential data points. This post provides the Python & R codes for the residual plot

Simple Linear Regression & Least square method

Simple linear regression is a statistical method to model the relationship between two continuous variables, aiming to predict the dependent variable based on the independent variable. The regression equation is Y = a + bX, where Y is the dependent variable, X is the independent variable, a is the intercept, and b is the slope. The method of least squares minimizes the sum of squared residuals to find the best-fitting line coefficients.

Model ensembling

Model ensembling combines multiple models to improve overall performance by leveraging diverse data patterns. Bagging trains model instances on different data bootstraps, while Boosting corrects errors sequentially. Stacking combines models using a meta-model, and Voting uses majority/average predictions. Ensembles reduce variance without significantly increasing bias, but may complicate interpretation and computational cost.

AI, Stat, Math, coding cheat sheets & learning songs

Free cheatsheets & learning tricks pandas-numpy-sklearn mnemonic cheat sheet Machine Learning & Deep Learning formulas & properties Basic probability and statistics formula sheet‘ classical missing data strategies clustering Set Identities with Intuitive Explanations Tips &… 

Encoding categorical data in python

Handling categorical data involves several steps to convert it into a format that machine learning algorithms can process effectively. Here are common methods used to handle categorical data: 1. Label Encoding Label encoding converts categorical… 

Kernel tricks, SVM properties & kernel choice

Some popular types of kernels in SVM: 1. Linear Kernel 2. Polynomial Kernel 3. Radial Basis Function (RBF) Kernel (Gaussian Kernel) 4. Sigmoid Kernel Visualizing the decision boundaries To visualize the decision boundaries, we’ll use… 

Logistic Regression: method + Python & R codes

Logistic regression & Bernoulli distribution Logistic regression is a statistical method used for analyzing datasets in which there are one or more independent variables that determine an outcome. The outcome is typically a binary variable,… 

AIC and BIC for Feature Selection

Akaike Information Criterion (AIC) Bayesian Information Criterion (BIC) Comparison and Use in Feature Selection By applying AIC and BIC in feature selection, we can make informed decisions about which features to include in their models,… 

K-Nearest Neighbors (KNN): an introduction

K-Nearest Neighbors (KNN) is a popular algorithm used for both classification and regression tasks. In KNN, the output is a class membership, which is assigned based on the majority of the k nearest data points.… 

Linear Discriminant Analysis Implementation in Python & R

Linear Discriminant Analysis (LDA) is a classifier that creates a linear decision boundary by fitting class-conditional densities to the data and applying Bayes’ rule. The model assumes that each class follows a Gaussian distribution with… 

Stepwise Feature Selection +example

Stepwise feature selection is a systematic approach to identifying the most relevant features for a predictive model by combining both forward and backward selection techniques. The process begins with either an empty model. Then, we… 

Backward feature selection + example

Backward feature selection involves iteratively removing the least significant feature from a model based on adjusted R-squared. In this example, we are predicting nuts collected by squirrels, features like temperature and rainfall are chosen as significant predictors through this method. The process aims to finalize a model with the most influential features.

Forward feature selection: a step by step example

Forward feature selection starts with an empty model and adds features one by one. At each step, the feature that improves the model performance the most is added to the model. The process continues until… 

ElasticNet Regression: Method & Codes

ElasticNet regression is a regularized regression method that linearly combines both L1 and L2 penalties of the Lasso and Ridge methods. This allows it to perform both feature selection (like Lasso) and maintain some of… 

Ridge regression: method & R codes

Motivation Now, recall that for LASSO Ridge Regression: Ridge regression: Ridge adds the penalty, which is the sum of the squares of the coefficients, to the loss function in linear regression. Ridge regression shrinks the… 

Lasso Regression and LassoCV: methods & Python codes

The Lasso (Least Absolute Shrinkage and Selection Operator) is a regression technique that enhances prediction accuracy and interpretability by applying L1 regularization to shrink coefficients. Unlike traditional regression methods, Lasso forces some coefficients to become… 

Expectation Maximization (EM) & implementation

Expectation Maximization (EM) is an iterative algorithm used for finding maximum likelihood estimates of parameters in statistical models, particularly when the model involves latent variables (variables that are not directly observed). The algorithm is commonly… 

A comic guide to denoising noisy data

Handling noisy data is a crucial step in data preprocessing and analysis. In general, here are some common approaches to manage noisy data: 1. Data Cleaning 2. Data Transformation 3. Statistical Techniques 4. Machine Learning… 

A comical guide to Missing Not At Random (MNAR)

Recall that Missing Not At Random (MNAR) is a type of missing data mechanism where the probability of missingness is related to the unobserved data itself. Here are some more examples of MNAR: In each… 

What’s Missing at Random (MAR)?

Missing at Random (MAR) is a statistical term indicating that the likelihood of data being missing is related to some of the observed data but not to the missing data itself. This means that the… 

error: Content is protected !!