KNN classification: practical notices & implementation using Python & R

Subscribe to get access

??Subscribe to read the rest of the comics, the fun you can’t miss ??

We should normalize or standardize data before applying KNN because the algorithm is distance-based, and unscaled features can distort distance calculations, leading to biased results.

In this example, we’ll use the Iris dataset, which is commonly used for classification tasks. The Iris dataset consists of 150 samples from three species of Iris flowers (Iris setosa, Iris versicolor, and Iris virginica), each with four features (sepal length, sepal width, petal length, and petal width).

Dataset Overview

Sample	Sepal Length	Sepal Width	Petal Length	Petal Width	Class
1	5.1	3.5	1.4	0.2	setosa
2	4.9	3.0	1.4	0.2	setosa
…	…	…	…	…	…
150	5.9	3.0	5.1	1.8	virginica

Python Code

We’ll use $K = 5$ for this example.

# Step 1: Import necessary libraries
import numpy as np
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score, classification_report

# Step 2: Load the Iris dataset (or any dataset)
data = load_iris()
X, y = data.data, data.target

# Step 3: Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Step 4: Scale the features
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

# Step 5: Perform KNN classification
knn = KNeighborsClassifier(n_neighbors=5)  # You can change the number of neighbors
knn.fit(X_train_scaled, y_train)

# Step 6: Make predictions and evaluate the model
y_pred = knn.predict(X_test_scaled)

# Accuracy and classification report
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy * 100:.2f}%")

R Code Example

# Step 1: Load necessary libraries
library(caret)    # For train/test split and scaling
library(class)    # For KNN classification

# Step 2: Load the Iris dataset
data(iris)
set.seed(42)  # Set seed for reproducibility

# Step 3: Split the dataset into training and testing sets
index <- createDataPartition(iris$Species, p=0.7, list=FALSE)
train_data <- iris[index,]
test_data <- iris[-index,]

# Step 4: Scale the features (standardize the training and testing data separately)
# We don't scale the Species column since it's the target variable
scaler <- preProcess(train_data[, -5], method=c("center", "scale"))
train_data_scaled <- train_data
train_data_scaled[, -5] <- predict(scaler, train_data[, -5])
test_data_scaled <- test_data
test_data_scaled[, -5] <- predict(scaler, test_data[, -5])

# Step 5: Perform KNN classification (using k=5)
knn_pred <- knn(train = train_data_scaled[, -5],
                test = test_data_scaled[, -5],
                cl = train_data_scaled$Species,
                k = 5)

Discover more from Science Comics

Subscribe to get the latest posts sent to your email.

KNN classification: practical notices & implementation using Python & R

Subscribe to get access

Dataset Overview

Python Code

R Code Example

Like this:

Related

Discover more from Science Comics

Like this:

Like this:

Like this:

Leave a ReplyCancel reply

Subscribe to get access

Dataset Overview

Python Code

R Code Example

Share this:

Like this:

Related

Discover more from Science Comics

Related Posts

Share this:

Like this:

Share this:

Like this:

Share this:

Like this:

Leave a ReplyCancel reply