Skip to content

K-Nearest Neighbors (KNN) imputation in sklearn

K-Nearest Neighbors (KNN) imputation is another method to handle missing data. It uses the ‘k’ closest instances (rows) to each instance that contains any missing values to fill in those values. In sklearn, you can… 

A comic guide to mean/median/mode imputation & Python codes

Handling missing data is a common preprocessing task in machine learning. In scikit-learn, you can handle missing data by using imputation techniques provided by the SimpleImputer class or by employing other strategies like dropping rows/columns with missing… 

SVD for dimension reduction

Singular Value Decomposition (SVD) is a powerful matrix decomposition technique that generalizes the concept of eigenvalue decomposition to non-square matrices. Eigenvalue decomposition specifically decomposes a square matrix into its constituent eigenvalues and eigenvectors. This decomposition… 

test for outliers in multivariate data in Python

To test for outliers in multivariate data in Python, you can use several libraries like numpy, scipy, pandas, sklearn, etc. Here’s how you can do it: Mahalanobis distance using Scipy library The Mahalanobis distance is a statistical measure used… 

error: Content is protected !!