

MICE (Multiple Imputation by Chained Equations) is a statistical method used for handling missing data by creating multiple imputations or “guesses” for the missing values. It works by using a set of regression models to estimate missing values based on other data points, iterating this process through each variable with missing data in a cyclical manner. The results from all imputations are then pooled to produce a single estimation, making it a robust tool for datasets where data are Missing At Random (MAR).
IterativeImputer
is a strategy for imputing missing values by modeling each feature with missing values as a function of other features in a round-robin fashion. Here is a simple example of how you can use IterativeImputer
in Python:
from sklearn.experimental import enable_iterative_imputer
from sklearn.impute import IterativeImputer
import numpy as np
# Example dataset with missing values
X = [[7, 2, 3],
[4, np.nan, 6],
[10 ,5, 9],
[np.nan, 2, 9],
[8, 4, np.nan]]
# Instantiate the IterativeImputer
imp = IterativeImputer(max_iter=10, random_state=0)
# Fit the imputer and transform the dataset
X_imputed = imp.fit_transform(X)
In the code above, max_iter=10
represents the maximum number of imputation rounds to perform before returning the imputed dataset, and random_state=0
is the seed for the random number generator.
After running this code, the X_imputed
variable will contain the original dataset but with all the missing values imputed.
Note that IterativeImputer
is experimental as of scikit-learn 0.23 and to use it, you need to explicitly import it via from sklearn.experimental import enable_iterative_imputer
.
Remember to always validate that the imputation is improving your models and that the assumptions of IterativeImputer
are valid for your data.
Using a custom estimator with IterativeImputer
To use a custom estimator with IterativeImputer
in scikit-learn, you can pass your estimator to the estimator
parameter when creating the IterativeImputer
object.
Here is an example using a Random Forest Regressor:
from sklearn.ensemble import RandomForestRegressor
from sklearn.experimental import enable_iterative_imputer
from sklearn.impute import IterativeImputer
import numpy as np
# Example dataset with missing values
X = [[7, 2, 3],
[4, np.nan, 6],
[10 ,5, 9],
[np.nan, 2, 9],
[8, 4, np.nan]]
# Create a Random Forest Regressor
estimator = RandomForestRegressor(n_estimators=10, random_state=0)
# Instantiate the IterativeImputer with the custom estimator
imp = IterativeImputer(estimator=estimator, max_iter=10, random_state=0)
# Fit the imputer and transform the dataset
X_imputed = imp.fit_transform(X)
In this example, the IterativeImputer
will use the Random Forest Regressor instead of the default BayesianRidge estimator to impute the missing values.
Remember that not all estimators are suitable for the IterativeImputer
, the estimator must be able to handle multiple feature datasets and be able to provide a fit
and predict
method.
Discover more from Science Comics
Subscribe to get the latest posts sent to your email.