Imputation using SoftImpute in python

SoftImpute is a matrix completion algorithm in Python that allows you to fill in missing data in your dataset. This method is based on Singular Value Decomposition (SVD) and Iterative Soft Thresholding.

Here’s a basic example of how to use SoftImpute.

First, you need to install the appropriate package. You can install it via pip:

pip install fancyimpute

Then you can use it in your Python code. Here is a simple example:

# import the necessary libraries
from fancyimpute import SoftImpute
# let's assume you have a dataset with missing values 'X'
# X_complete will be the dataset after filling the missing values
X_complete = SoftImpute().fit_transform(X)

Please replace ‘X’ with your actual dataset.

Remember that in the default setting, normalizer=None. Therefore, you may want to pass in a function if you want to normalize or scale the data. BiScaler is a convenient choice for this

BiScaler

BiScaler is a preprocessing utility in the fancyimpute package that scales your data (both rows and columns) to have zero mean and unit variance. It is quite useful when working with algorithms that are sensitive to the scale of the data.

The BiScaler works by iteratively standardizing the feature values (columns) and sample values (rows) until convergence.

Here are the main steps of the BiScaler:

  1. Initialize the column and row means to 0 and variances to 1.
  2. Iteratively do the following until convergence:
    1. Standardize the columns to have zero mean and unit variance.
    2. Standardize the rows to have zero mean and unit variance.

Here’s an example of how to use it:

from fancyimpute import BiScaler
# Initialize a BiScaler
biscaler = BiScaler()
# Scale the dataset
X_scaled = biscaler.fit_transform(X)
# Now X_scaled has zero mean and unit variance for both rows and columns.

In the context of imputation, BiScaler is often used before applying SoftImpute because SoftImpute operates best on standardized data. After imputation, you can use inverse_transform method of BiScaler to rescale the data back to the original scale.

SoftImpute with BiScaler

Sure, let’s modify the previous example to include the use of BiScaler which is used to scale the dataset before applying SoftImpute.

# import necessary libraries
import numpy as np
from fancyimpute import SoftImpute, BiScaler

# create a dataset with some missing values
X = np.array([[1, 2, np.nan, 4, 5],
             [6, 7, 8, 9, np.nan],
             [11, 12, 13, np.nan, 15],
             [16, np.nan, 18, 19, 20],
             [np.nan, 22, 23, 24, 25]])

# print the original dataset
print("Original:\n", X)

# initialize a BiScaler
biscaler = BiScaler()

# use BiScaler to scale the dataset
X_scaled = biscaler.fit_transform(X)

# print the scaled dataset
print("\nAfter BiScaler:\n", X_scaled)

# use SoftImpute to fill the missing values
X_complete = SoftImpute().fit_transform(X_scaled)

# inverse transform the completed dataset to its original scale
X_complete_orig_scale = biscaler.inverse_transform(X_complete)

# print the dataset after imputation
print("\nAfter SoftImpute:\n", X_complete_orig_scale)

In this code:

  1. We first import the necessary libraries.
  2. We create a sample 5×5 numpy array with some missing values represented by np.nan.
  3. We print the original dataset.
  4. We initialize a BiScaler and use it to scale the dataset.
  5. We print the scaled dataset.
  6. We use the SoftImpute function to fill in the missing values in the scaled dataset.
  7. We transform the completed dataset back to its original scale using the inverse_transform method of BiScaler.
  8. Finally, we print the dataset after imputation and scaling.

After running this code, you should see the original matrix, the scaled matrix, and the completed and inverse scaled matrix where all the missing values have been filled in.


Discover more from Science Comics

Subscribe to get the latest posts sent to your email.

Leave a Reply

error: Content is protected !!