Multiple Imputation with Chained Equations method & Python codes

MICE (Multiple Imputation by Chained Equations) is a statistical method used for handling missing data by creating multiple imputations or “guesses” for the missing values. It works by using a set of regression models to estimate missing values based on other data points, iterating this process through each variable with missing data in a cyclical manner. The results from all imputations are then pooled to produce a single estimation, making it a robust tool for datasets where data are Missing At Random (MAR).

IterativeImputer is a strategy for imputing missing values by modeling each feature with missing values as a function of other features in a round-robin fashion. Here is a simple example of how you can use IterativeImputer in Python:

from sklearn.experimental import enable_iterative_imputer
from sklearn.impute import IterativeImputer
import numpy as np

# Example dataset with missing values
X = [[7, 2, 3], 
     [4, np.nan, 6], 
     [10 ,5, 9],
     [np.nan, 2, 9], 
     [8, 4, np.nan]]

# Instantiate the IterativeImputer
imp = IterativeImputer(max_iter=10, random_state=0)

# Fit the imputer and transform the dataset
X_imputed = imp.fit_transform(X)

In the code above, max_iter=10 represents the maximum number of imputation rounds to perform before returning the imputed dataset, and random_state=0 is the seed for the random number generator.

After running this code, the X_imputed variable will contain the original dataset but with all the missing values imputed.

Note that IterativeImputer is experimental as of scikit-learn 0.23 and to use it, you need to explicitly import it via from sklearn.experimental import enable_iterative_imputer.

Remember to always validate that the imputation is improving your models and that the assumptions of IterativeImputer are valid for your data.

Using a custom estimator with IterativeImputer

To use a custom estimator with IterativeImputer in scikit-learn, you can pass your estimator to the estimator parameter when creating the IterativeImputer object.

Here is an example using a Random Forest Regressor:

from sklearn.ensemble import RandomForestRegressor
from sklearn.experimental import enable_iterative_imputer
from sklearn.impute import IterativeImputer
import numpy as np

# Example dataset with missing values
X = [[7, 2, 3], 
     [4, np.nan, 6], 
     [10 ,5, 9],
     [np.nan, 2, 9], 
     [8, 4, np.nan]]

# Create a Random Forest Regressor
estimator = RandomForestRegressor(n_estimators=10, random_state=0)

# Instantiate the IterativeImputer with the custom estimator
imp = IterativeImputer(estimator=estimator, max_iter=10, random_state=0)

# Fit the imputer and transform the dataset
X_imputed = imp.fit_transform(X)

In this example, the IterativeImputer will use the Random Forest Regressor instead of the default BayesianRidge estimator to impute the missing values.

Remember that not all estimators are suitable for the IterativeImputer, the estimator must be able to handle multiple feature datasets and be able to provide a fit and predict method.


Discover more from Science Comics

Subscribe to get the latest posts sent to your email.

Leave a Reply

error: Content is protected !!