machine learning in a random forest – Page 2

The model that’s not a girl & time machine

by Kurious Fox
September 6, 2024December 13, 2024

Comments: I already asked my student, and he confirmed that the reason he studied the ML class was because there was a model in that class ?. So, Mr. Fox left the class after he… The model that’s not a girl & time machine

Supervised learning: who’s supervising the forest?

by Kurious Fox
September 6, 2024November 19, 2024

Supervised learning involves training an algorithm on labeled data and pairing input with correct output. Unsupervised learning uses unlabeled data to find patterns. For example, predicting pizza delivery tips involves features like time, pizza type, distance, and tip history, with the goal of predicting tip outcomes.

A comic guide to Train – test split + Python & R codes

by Kurious Fox
September 6, 2024April 8, 2025

After collecting and preprocessing the dataset, it is essential to divide it into two distinct sets: training set and testing set. The training set is used to train the model while the testing set is used to evaluate its performance. This allows assessment of the model’s generalization to new data. Two code examples in Python and R demonstrate how to create synthetic data and split it into training and testing sets using popular libraries.

A comic guide to model generalization and overfitting

by Kurious Fox
September 6, 2024April 8, 2025

Model generalization in machine learning is a crucial concept that refers to the ability of a trained model to perform well on new, unseen data. When a model generalizes well, it demonstrates an understanding of… A comic guide to model generalization and overfitting

Simple Linear Regression Review: Sunlight & Selfie

by Kurious Fox
September 6, 2024November 19, 2024

Simple linear regression is a statistical method used to model and analyze the relationship between two continuous variables. Specifically, it aims to predict the value of one variable (the dependent or response variable) based on… Simple Linear Regression Review: Sunlight & Selfie

Residual plot for model diagnostic

by Kurious Fox
August 20, 2024November 16, 2024

Assessing assumptions like linearity, constant variance, error independence, and normal residuals is essential for linear regression. Residual plots visually assess the model’s goodness of fit, identifying patterns and influential data points. This post provides the Python & R codes for the residual plot

Simple Linear Regression & Least square method

by Kurious Fox
August 20, 2024November 16, 2024

Simple linear regression is a statistical method to model the relationship between two continuous variables, aiming to predict the dependent variable based on the independent variable. The regression equation is Y = a + bX, where Y is the dependent variable, X is the independent variable, a is the intercept, and b is the slope. The method of least squares minimizes the sum of squared residuals to find the best-fitting line coefficients.

Model ensembling

by Kurious Fox
August 9, 2024October 12, 2024

Model ensembling combines multiple models to improve overall performance by leveraging diverse data patterns. Bagging trains model instances on different data bootstraps, while Boosting corrects errors sequentially. Stacking combines models using a meta-model, and Voting uses majority/average predictions. Ensembles reduce variance without significantly increasing bias, but may complicate interpretation and computational cost.

AI, Stat, Math, coding cheat sheets & learning songs

by Kurious Fox
August 4, 2024July 27, 2025

Free cheatsheets & learning tricks pandas-numpy-sklearn mnemonic cheat sheet Machine Learning & Deep Learning formulas & properties Basic probability and statistics formula sheet‘ classical missing data strategies clustering Set Identities with Intuitive Explanations Tips &… AI, Stat, Math, coding cheat sheets & learning songs

Acc, precision, recall in detecting Rare “Silly Squirrel Syndrome” in Forest Animals

by Kurious Fox
July 20, 2024September 15, 2024

Consider the use of accuracy, precision, recall a funny classification problem in a medical scenario set in a forest involving forest animals. So, the data has: Total Forest Animals Tested: 1000Animals with Silly Squirrel Syndrome… Acc, precision, recall in detecting Rare “Silly Squirrel Syndrome” in Forest Animals

Encoding categorical data in python

by Kurious Fox
July 10, 2024October 12, 2024

Handling categorical data involves several steps to convert it into a format that machine learning algorithms can process effectively. Here are common methods used to handle categorical data: 1. Label Encoding Label encoding converts categorical… Encoding categorical data in python

example of stack generalization using K-Nearest Neighbors (KNN) and Random Forest + Python codes

by Kurious Fox
July 9, 2024August 18, 2024

This example demonstrates the basic steps of stack generalization with two classifiers (KNN and Random Forest) and a Logistic Regression model as the meta-learner. The predictions of the base models on the training data are… example of stack generalization using K-Nearest Neighbors (KNN) and Random Forest + Python codes

Kernel tricks, SVM properties & kernel choice

by Kurious Fox
June 30, 2024September 9, 2024

Some popular types of kernels in SVM: 1. Linear Kernel 2. Polynomial Kernel 3. Radial Basis Function (RBF) Kernel (Gaussian Kernel) 4. Sigmoid Kernel Visualizing the decision boundaries To visualize the decision boundaries, we’ll use… Kernel tricks, SVM properties & kernel choice

Logistic Regression: method + Python & R codes

by Kurious Fox
June 23, 2024September 9, 2024

Logistic regression is a statistical method used for analyzing datasets in which there are one or more independent variables that determine an outcome. The outcome is typically a binary variable, meaning it has two possible… Logistic Regression: method + Python & R codes

AIC and BIC for Feature Selection

by Kurious Fox
June 22, 2024April 25, 2025

Akaike Information Criterion (AIC) Bayesian Information Criterion (BIC) Comparison and Use in Feature Selection By applying AIC and BIC in feature selection, we can make informed decisions about which features to include in their models,… AIC and BIC for Feature Selection

KNN classification: practical notices & implementation using Python & R

by Kurious Fox
June 22, 2024September 9, 2024

We should normalize or standardize data before applying KNN because the algorithm is distance-based, and unscaled features can distort distance calculations, leading to biased results. In this example, we’ll use the Iris dataset, which is… KNN classification: practical notices & implementation using Python & R

K-Nearest Neighbors (KNN): an introduction

by Kurious Fox
June 22, 2024September 9, 2024

K-Nearest Neighbors (KNN) is a popular algorithm used for both classification and regression tasks. In KNN, the output is a class membership, which is assigned based on the majority of the k nearest data points.… K-Nearest Neighbors (KNN): an introduction

Linear Discriminant Analysis Implementation in Python & R

by Kurious Fox
June 21, 2024August 18, 2024

Linear Discriminant Analysis (LDA) is a classifier that creates a linear decision boundary by fitting class-conditional densities to the data and applying Bayes’ rule. The model assumes that each class follows a Gaussian distribution with… Linear Discriminant Analysis Implementation in Python & R

Stepwise Feature Selection +example

by Kurious Fox
June 20, 2024September 9, 2024

Stepwise feature selection is a systematic approach to identifying the most relevant features for a predictive model by combining both forward and backward selection techniques. The process begins with either an empty model. Then, we… Stepwise Feature Selection +example

Backward feature selection + example

by Kurious Fox
June 20, 2024September 9, 2024

Backward feature selection involves iteratively removing the least significant feature from a model based on adjusted R-squared. In this example, we are predicting nuts collected by squirrels, features like temperature and rainfall are chosen as significant predictors through this method. The process aims to finalize a model with the most influential features.

Forward feature selection: a step by step example

by Kurious Fox
June 20, 2024September 9, 2024

Forward feature selection starts with an empty model and adds features one by one. At each step, the feature that improves the model performance the most is added to the model. The process continues until… Forward feature selection: a step by step example

ElasticNet Regression: Method & Codes

by Kurious Fox
June 18, 2024September 9, 2024

ElasticNet regression is a regularized regression method that linearly combines both L1 and L2 penalties of the Lasso and Ridge methods. This allows it to perform both feature selection (like Lasso) and maintain some of… ElasticNet Regression: Method & Codes

Ridge regression: method & R codes

by Kurious Fox
June 18, 2024April 20, 2025

Motivation Now, recall that for LASSO Ridge Regression: Ridge regression: Ridge adds the penalty, which is the sum of the squares of the coefficients, to the loss function in linear regression. Ridge regression shrinks the… Ridge regression: method & R codes

Lasso Regression and LassoCV: methods & Python codes

by Kurious Fox
June 18, 2024April 22, 2025

The Lasso (Least Absolute Shrinkage and Selection Operator) is a regression technique that enhances prediction accuracy and interpretability by applying L1 regularization to shrink coefficients. Unlike traditional regression methods, Lasso forces some coefficients to become… Lasso Regression and LassoCV: methods & Python codes

Simple linear regression using train-test split in Python & R

by Kurious Fox
June 17, 2024October 12, 2025

An example of performing simple linear regression using train-test split where the process is as follows, 1. Generate a synthetic dataset: 2. Split the dataset: We use train_test_split to divide the data into training and… Simple linear regression using train-test split in Python & R

Expectation Maximization (EM) & implementation

by Kurious Fox
June 14, 2024October 12, 2024

Expectation Maximization (EM) is an iterative algorithm used for finding maximum likelihood estimates of parameters in statistical models, particularly when the model involves latent variables (variables that are not directly observed). The algorithm is commonly… Expectation Maximization (EM) & implementation

A comic guide to denoising noisy data

by Kurious Fox
June 13, 2024October 12, 2024

Handling noisy data is a crucial step in data preprocessing and analysis. In general, here are some common approaches to manage noisy data: 1. Data Cleaning 2. Data Transformation 3. Statistical Techniques 4. Machine Learning… A comic guide to denoising noisy data

A comical guide to Missing Not At Random (MNAR)

by Kurious Fox
June 13, 2024October 12, 2024

Recall that Missing Not At Random (MNAR) is a type of missing data mechanism where the probability of missingness is related to the unobserved data itself. Here are some more examples of MNAR: In each… A comical guide to Missing Not At Random (MNAR)

What’s Missing at Random (MAR)?

by Kurious Fox
June 13, 2024October 12, 2025

Missing at Random (MAR) is a statistical term indicating that the likelihood of data being missing is related to some of the observed data but not to the missing data itself. This means that the… What’s Missing at Random (MAR)?

Multiple regression analysis: waiting time to log in to Windows

by Kurious Fox
June 11, 2024August 18, 2024

Multiple regression analysis can be used to understand the relationship between the waiting time to log in to Windows (dependent variable) and several independent variables. Let’s assume we have the following independent variables: Suppose that… Multiple regression analysis: waiting time to log in to Windows

Estimating the sparse inverse covariance matrix (precision matrix) by Graphical Lasso (with Python implementation)

by Kurious Fox
May 28, 2024May 5, 2025

Graphical Lasso, also known as GLasso, is a statistical technique used for estimating the sparse inverse covariance matrix (precision matrix) of a multivariate Gaussian distribution. Here, Sparsity means that many elements of the matrix are… Estimating the sparse inverse covariance matrix (precision matrix) by Graphical Lasso (with Python implementation)

Generating missing data and evaluating missing data analysis in Python

by Kurious Fox
May 28, 2024October 12, 2024

Generating missing values Generating missing values with a given percentage of missingness for a dataframe or numpy array: Generating missing values with a given missing rate for a time series list: Calculating MSE ignoring missing… Generating missing data and evaluating missing data analysis in Python