K-Nearest Neighbors (KNN): an introduction

June 22, 2024September 9, 2024by Kurious Fox

Subscribe to get access

??Subscribe to read the rest of the comics, the fun you can’t miss ??

K-Nearest Neighbors (KNN) is a popular algorithm used for both classification and regression tasks. In KNN, the output is a class membership, which is assigned based on the majority of the k nearest data points. One of the advantages of KNN is its ability to adapt to new training data. Additionally, it doesn’t make any assumptions about the underlying data distribution, which can be beneficial in certain scenarios.

Example

Consider an example of using the K-Nearest Neighbors (KNN) algorithm with a small dataset with 5 samples and 2 input features. Let’s assume we are classifying the samples into two classes.

Sample	Feature 1	Feature 2	Class
A	1.0	1.0	0
B	2.0	2.0	0
C	3.0	3.0	1
D	4.0	4.0	1
E	5.0	5.0	1

Steps to Implement KNN:

Choose the value of K: Let’s choose $K = 3$ .
Calculate the distance between the point to be classified and all other points in the dataset. We’ll use Euclidean distance.
Sort the distances and determine the nearest neighbors based on the chosen K value.
Vote for the classes of the nearest neighbors.
Assign the class with the highest vote to the point to be classified.

Example: Let’s classify a new point $(x, y) = (3.5, 3.5)$ .

Step 1: Calculate Distances

$\text{Distance from (3.5, 3.5) to (1.0, 1.0)} = \sqrt{(3.5-1.0)^2 + (3.5-1.0)^2} = \sqrt{2.5^2 + 2.5^2} = \sqrt{6.25 + 6.25} = \sqrt{12.5} \approx 3.54$

$\text{Distance from (3.5, 3.5) to (2.0, 2.0)} = \sqrt{(3.5-2.0)^2 + (3.5-2.0)^2} = \sqrt{1.5^2 + 1.5^2} = \sqrt{2.25 + 2.25} = \sqrt{4.5} \approx 2.12$

$\text{Distance from (3.5, 3.5) to (3.0, 3.0)} = \sqrt{(3.5-3.0)^2 + (3.5-3.0)^2} = \sqrt{0.5^2 + 0.5^2} = \sqrt{0.25 + 0.25} = \sqrt{0.5} \approx 0.71$

$\text{Distance from (3.5, 3.5) to (4.0, 4.0)} = \sqrt{(3.5-4.0)^2 + (3.5-4.0)^2} = \sqrt{(-0.5)^2 + (-0.5)^2} = \sqrt{0.25 + 0.25} = \sqrt{0.5} \approx 0.71$

$\text{Distance from (3.5, 3.5) to (5.0, 5.0)} = \sqrt{(3.5-5.0)^2 + (3.5-5.0)^2} = \sqrt{(-1.5)^2 + (-1.5)^2} = \sqrt{2.25 + 2.25} = \sqrt{4.5} \approx 2.12$

Step 2: Sort Distances:
C : 0.71
D : 0.71
B : 2.12
E : 2.12
A : 3.54

Step 3: Determine Nearest Neighbors: The 3 nearest neighbors to (3.5, 3.5) are:

C (0.71)
D (0.71)
B (2.12)

Step 4: Vote for Classes: The classes of the 3 nearest neighbors are:

C: Class 1
D: Class 1
B: Class 0

The votes are:

Class 0: 1 vote
Class 1: 2 votes

Step 5: Assign Class: The new point $(3.5, 3.5)$ is assigned to Class 1 since it received the highest number of votes.

Discover more from Science Comics

Subscribe to get the latest posts sent to your email.

example of stack generalization using K-Nearest Neighbors (KNN) and Random Forest + Python codes

This example demonstrates the basic steps of stack generalization with two classifiers (KNN and Random Forest) and a Logistic Regression…

Logistic Regression: method + Python & R codes

Logistic regression is a statistical method used for analyzing datasets in which there are one or more independent variables that…

Support Vector Machine + Python & R Codes

Support Vector Classifier (SVC) is a powerful algorithm for classification tasks, capable of handling linear and non-linear data using different kernel functions. It efficiently handles high-dimensional data for applications like image recognition and bioinformatics. Python and R codes demonstrate SVM usage for binary classification with breast cancer and mtcars datasets, respectively.

K-Nearest Neighbors (KNN): an introduction

Subscribe to get access

Example

Like this:

Related

Discover more from Science Comics

Like this:

Like this:

Like this:

Leave a ReplyCancel reply

Subscribe to get access

Example

Share this:

Like this:

Related

Discover more from Science Comics

Related Posts

Share this:

Like this:

Share this:

Like this:

Share this:

Like this:

Leave a ReplyCancel reply