KNN classification: practical notices & implementation using Python & R

Subscribe to get access

??Subscribe to read the rest of the comics, the fun you can’t miss ??

We should normalize or standardize data before applying KNN because the algorithm is distance-based, and unscaled features can distort distance calculations, leading to biased results.

In this example, we’ll use the Iris dataset, which is commonly used for classification tasks. The Iris dataset consists of 150 samples from three species of Iris flowers (Iris setosa, Iris versicolor, and Iris virginica), each with four features (sepal length, sepal width, petal length, and petal width).

Dataset Overview

SampleSepal LengthSepal WidthPetal LengthPetal WidthClass
15.13.51.40.2setosa
24.93.01.40.2setosa
………………
1505.93.05.11.8virginica

Python Code

We’ll use K = 5 for this example.

# Step 1: Import necessary libraries
import numpy as np
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score, classification_report

# Step 2: Load the Iris dataset (or any dataset)
data = load_iris()
X, y = data.data, data.target

# Step 3: Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Step 4: Scale the features
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

# Step 5: Perform KNN classification
knn = KNeighborsClassifier(n_neighbors=5)  # You can change the number of neighbors
knn.fit(X_train_scaled, y_train)

# Step 6: Make predictions and evaluate the model
y_pred = knn.predict(X_test_scaled)

# Accuracy and classification report
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy * 100:.2f}%")

R Code Example

# Step 1: Load necessary libraries
library(caret)    # For train/test split and scaling
library(class)    # For KNN classification

# Step 2: Load the Iris dataset
data(iris)
set.seed(42)  # Set seed for reproducibility

# Step 3: Split the dataset into training and testing sets
index <- createDataPartition(iris$Species, p=0.7, list=FALSE)
train_data <- iris[index,]
test_data <- iris[-index,]

# Step 4: Scale the features (standardize the training and testing data separately)
# We don't scale the Species column since it's the target variable
scaler <- preProcess(train_data[, -5], method=c("center", "scale"))
train_data_scaled <- train_data
train_data_scaled[, -5] <- predict(scaler, train_data[, -5])
test_data_scaled <- test_data
test_data_scaled[, -5] <- predict(scaler, test_data[, -5])

# Step 5: Perform KNN classification (using k=5)
knn_pred <- knn(train = train_data_scaled[, -5],
                test = test_data_scaled[, -5],
                cl = train_data_scaled$Species,
                k = 5)


Discover more from Science Comics

Subscribe to get the latest posts sent to your email.

Leave a Reply

error: Content is protected !!