Customizing KNN classifier with sklearn

Custom distance function in KNN

To use KNeighborsClassifier with a custom distance function, you can utilize the metric='pyfunc' parameter and directly pass your custom distance function. Here is example codes:

from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score
import numpy as np

# Define custom distance function
def custom_distance(x, y):
    return np.sqrt(np.sum((x - y) ** 2))  # Example: Euclidean distance

# Load dataset
data = load_iris()
X, y = data.data, data.target

# Split dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Create KNN classifier with custom distance metric
knn = KNeighborsClassifier(n_neighbors=5, weights='distance', metric='pyfunc', metric_params={"func": custom_distance})

# Fit the model
knn.fit(X_train, y_train)

# Make predictions
y_pred = knn.predict(X_test)

# Evaluate accuracy
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy:.2f}")

Run in Colab

Explanation:

  1. The metric parameter is set to 'pyfunc', which allows for custom Python functions.
  2. The metric_params={"func": custom_distance} argument specifies your custom distance function to be used by the classifier.

Using weights in KNN (Weighted KNN)

Weighted K-Nearest Neighbors (Weighted KNN) is a powerful and intuitive variation of the classic K-Nearest Neighbors (KNN) algorithm used in classification and regression tasks. Unlike the standard KNN, which assigns equal importance to all neighbors, Weighted KNN introduces the idea of assigning weights to the neighbors based on their distances from the query point. This modification ensures that closer neighbors have a stronger influence on the outcome than those further away.

To implement Weighted KNN using scikit-learn, you can use the KNeighborsClassifier class and set the weights parameter to "distance" for weighting neighbors by their distance. Here’s a basic example:

from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score

# Load dataset
data = load_iris()
X, y = data.data, data.target

# Split dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Create Weighted KNN classifier
knn = KNeighborsClassifier(n_neighbors=5, weights='distance')

# Fit the model
knn.fit(X_train, y_train)

# Make predictions
y_pred = knn.predict(X_test)

# Evaluate accuracy
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy:.2f}")

(Run in Colab)

In this example:

  • weights='distance' ensures closer neighbors have more influence.
  • You can customize the number of neighbors with n_neighbors.
  • The accuracy_score helps evaluate the performance of the model.

Customizing the KNN method

In this example, I will demonstrate how to create a simple custom K-Nearest Neighbors (KNN) classifier class that extends the KNeighborsClassifier from sklearn.neighbors. The goal here is to illustrate how you can leverage object-oriented programming (OOP) principles like inheritance to customize and extend existing classes.

from sklearn.neighbors import KNeighborsClassifier
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

class CustomKNN(KNeighborsClassifier):
    def __init__(self, n_neighbors=5, **kwargs):
        super().__init__(n_neighbors=n_neighbors, **kwargs)
    
    def fit(self, X, y):
        super().fit(X, y)
        # After fitting, print a summary
        print(f"CustomKNN fitted with {self.n_neighbors} neighbors.")
        # Calculate and print training accuracy
        train_predictions = self.predict(X)
        train_accuracy = accuracy_score(y, train_predictions)
        print(f"Training Accuracy: {train_accuracy:.2f}")
        return self

# Example usage
if __name__ == "__main__":
    # Load a dataset
    iris = load_iris()
    X = iris.data
    y = iris.target

    # Split the dataset into training and testing sets
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

    # Create and train the custom KNN model
    custom_knn = CustomKNN(n_neighbors=3)
    custom_knn.fit(X_train, y_train)

    # Make predictions on the test set
    y_pred = custom_knn.predict(X_test)

    # Calculate and print test accuracy
    test_accuracy = accuracy_score(y_test, y_pred)
    print(f"Test Accuracy: {test_accuracy:.2f}")

Run in Colab

This example demonstrates several OOP concepts:

  1. Inheritance: The CustomKNN class inherits from KNeighborsClassifier, allowing it to reuse existing functionality.
  2. Method Overriding: The fit method is overridden to extend its functionality by adding a summary printout after training.
  3. Encapsulation: The internal workings of KNeighborsClassifier are hidden, and we only interact with the public methods and properties (fitpredictn_neighbors).
  4. Abstraction: By creating a custom class, we abstract away the details of how the KNN works and provide a simpler interface for users.

This approach allows you to easily extend and customize existing machine learning models in scikit-learn while maintaining the flexibility and structure provided by object-oriented programming.



Discover more from Science Comics

Subscribe to get the latest posts sent to your email.

Leave a Reply

error: Content is protected !!