K-Nearest Neighbors for Time Series Classification with Python

A K-Nearest Neighbors (KNN) classifier can be adapted for time series classification by employing distance metrics specifically designed for time series data. Time series classification with KNN often involves dynamic time warping (DTW) or other specialized distance measures to handle the sequential nature of the data. Here’s how to implement a KNN Classifier for time series:


Steps to Build a KNN Classifier for Time Series Classification

  1. Preprocess the Data:
    • Normalize or standardize the time series data to ensure that it is consistent across different scales and units, which facilitates more accurate analysis and modeling of trends and patterns over time.
    • Ensure all time series are of the same length, or use distance measures that can handle varying lengths.
  2. Choose a Distance Metric:
    • Euclidean Distance: This method is quite simple and widely used, but it requires the time series to be of the same length, which can pose a challenge in various real-world applications. Additionally, it is sensitive to misalignments in the data, making it essential to ensure proper alignment before calculation to avoid misleading results.
    • Dynamic Time Warping (DTW): Aligns time series by allowing non-linear distortions in time, making it particularly useful for comparing sequences that may vary in speed, ensuring an accurate evaluation of similarity and enabling applications in various fields such as speech recognition, data mining, and bioinformatics.
    • Other metrics like correlation-based distances or shape-based distances.
  3. KNN Algorithm:
    • Select k, the number of neighbors.
    • For a new time series X, compute its distance to all training samples.
    • Sort the distances and select the k-nearest neighbors.
    • Use majority voting or weighted voting (based on distances) to determine the class label.
  4. Evaluate the Classifier:
    • Use techniques like cross-validation or train-test split.
    • Metrics include accuracy, precision, recall, and F1-score.

Step by step Python implementation:

Let’s conduct some experiments using sktime library and the load_italy_power_demand dataset!


1. Importing Libraries

import numpy as np
from sktime.classification.distance_based import KNeighborsTimeSeriesClassifier
from sktime.datasets import load_italy_power_demand
from sktime.dists_kernels import FlatDist, ScipyDist
from sklearn.preprocessing import StandardScaler
  • numpy: For numerical operations and array manipulation.
  • KNeighborsTimeSeriesClassifier: A KNN-based classifier for time series data from the sktime library.
  • load_italy_power_demand: Loads the Italy Power Demand dataset for time series classification.
  • FlatDist and ScipyDist: Defines distance measures used by the KNN classifier.
  • StandardScaler: Used to normalize the data for consistent scaling.

2. Loading the Dataset

X_train, y_train = load_italy_power_demand(split="train", return_type="numpy3D")
X_test, y_test = load_italy_power_demand(split="test", return_type="numpy3D")
print(X_train[:2])
  • load_italy_power_demand: Loads the time series dataset as a 3D NumPy array.
    • Each time series is represented as a 3D array: (num_samples, num_dimensions, time_steps).
    • The Italy Power Demand dataset is a standard dataset for evaluating time series classification algorithms.
  • split="train" and split="test": Loads the training and testing splits.
  • return_type="numpy3D": Returns data in a 3D array format.
  • print(X_train[:2]): Displays the first two samples of the training data for inspection. The output looks like this
[[[-0.71051757 -1.1833204  -1.3724416  -1.5930829  -1.4670021
   -1.3724416  -1.0887599   0.04596695  0.92853223  1.0861332
    1.2752543   0.96005242  0.61333034  0.01444676 -0.6474772
   -0.26923494 -0.20619456  0.61333034  1.3698149   1.4643754
    1.054613    0.58181015  0.1720477  -0.26923494]]

 [[-0.99300935 -1.4267865  -1.5798843  -1.6054006  -1.6309169
   -1.3757539  -1.0185257  -0.35510183  0.71658276  1.2013925
    1.1248436   1.0482947   0.79313166  0.46141977  0.48693607
    0.56348497  0.61451757  0.30832197  0.25728936  1.0993273
    1.0482947   0.69106647 -0.04890624 -0.38061813]]]

3. Flattening the Data for Normalization

X_train_flat = X_train.reshape(X_train.shape[0], -1)
X_test_flat = X_test.reshape(X_test.shape[0], -1)
  • Time series data is originally 3D (samples x dimensions x time_steps).
  • To normalize the data with StandardScaler, it is flattened to 2D (samples x (dimensions * time_steps)).

4. Standardizing the Data

scaler = StandardScaler()
X_train_flat = scaler.fit_transform(X_train_flat)
X_test_flat = scaler.transform(X_test_flat)
  • StandardScaler: Standardizes the data to have zero mean and unit variance.
    • fit_transform: Fits the scaler to the training data and transforms it.
    • transform: Applies the same transformation to the test data.

5. Reshaping Back to 3D

X_train = X_train_flat.reshape(X_train.shape)
X_test = X_test_flat.reshape(X_test.shape)
  • After normalization, the data is reshaped back to its original 3D format to be compatible with the KNN classifier.

6. Defining the Distance Function

eucl_dist = FlatDist(ScipyDist())
  • FlatDist: A wrapper for distance functions that work with flattened data.
  • ScipyDist: Provides access to distance functions implemented in SciPy, such as Euclidean distance.
  • This combination ensures the classifier uses Euclidean distance to compute similarities.

7. Initializing the KNN Classifier

clf = KNeighborsTimeSeriesClassifier(n_neighbors=3, distance=eucl_dist)
  • KNeighborsTimeSeriesClassifier: Implements KNN for time series.
    • n_neighbors=3: Uses the 3 nearest neighbors for classification.
    • distance=eucl_dist: Specifies the Euclidean distance as the metric.

8. Training the Classifier

clf.fit(X_train, y_train)
  • The KNN model is trained using the normalized training data.

9. Making Predictions

y_pred = clf.predict(X_test)
  • Predicts the class labels for the test data based on the 3 nearest neighbors.

10. Evaluating Accuracy

accuracy = np.mean(y_test == y_pred)
print(f"Accuracy: {accuracy}")
  • np.mean(y_test == y_pred): Calculates the proportion of correctly predicted labels.
  • print(f"Accuracy: {accuracy}"): Displays the accuracy of the classifier.

Summary

This code demonstrates how to preprocess time series data, normalize it, and classify it using a KNN classifier with Euclidean distance. It evaluates the model’s performance using the Italy Power Demand dataset, showcasing the integration of sktime’s time series classification tools.

Combined codes (download):

# Import necessary libraries
import numpy as np
from sktime.classification.distance_based import KNeighborsTimeSeriesClassifier
from sktime.datasets import load_italy_power_demand
from sktime.dists_kernels import FlatDist, ScipyDist
from sklearn.preprocessing import StandardScaler

# Load the Italy Power Demand dataset
X_train, y_train = load_italy_power_demand(split="train", return_type="numpy3D")
X_test, y_test = load_italy_power_demand(split="test", return_type="numpy3D")
print(X_train[:2])

# Flatten the 3D array to 2D for normalization
X_train_flat = X_train.reshape(X_train.shape[0], -1)
X_test_flat = X_test.reshape(X_test.shape[0], -1)

# Initialize the StandardScaler and fit it to the training data
scaler = StandardScaler()
X_train_flat = scaler.fit_transform(X_train_flat)
X_test_flat = scaler.transform(X_test_flat)

# Reshape the data back to 3D after normalization
X_train = X_train_flat.reshape(X_train.shape)
X_test = X_test_flat.reshape(X_test.shape)

# Create an instance of the Euclidean distance function
eucl_dist = FlatDist(ScipyDist())

# Initialize the KNeighborsTimeSeriesClassifier with 3 neighbors and the Euclidean distance
clf = KNeighborsTimeSeriesClassifier(n_neighbors=3, distance=eucl_dist)

# Fit the KNeighborsTimeSeriesClassifier model on the training data
clf.fit(X_train, y_train)

# Predict the target values for the test data
y_pred = clf.predict(X_test)

# Calculate the accuracy of the predictions
accuracy = np.mean(y_test == y_pred)

# Print the accuracy
print(f"Accuracy: {accuracy}")

Discover more from Science Comics

Subscribe to get the latest posts sent to your email.

Leave a Reply

error: Content is protected !!