Skip to content

What is Hierarchical Classification + Python Code

Hierarchical classification is a method of assigning items to a category that is part of a larger, structured hierarchy. Unlike traditional “flat” classification where categories are independent, hierarchical classification considers the relationships between categories, organizing them in a parent-child structure. This approach is particularly useful when dealing with complex domains where a simple, single-level categorization is insufficient.

A classic example is the biological classification of living organisms, where species are grouped into genera, then families, orders, classes, phyla, and finally kingdoms. Each level represents a broader and more inclusive category. For instance, a lion is classified under the genus Panthera, which is part of the family Felidae (cats), which in turn belongs to the order Carnivora (carnivores), and so on up the hierarchy.

In the realm of machine learning, hierarchical classification is employed to tackle complex classification problems by breaking them down into a series of simpler decisions. Instead of a single model predicting from a large number of classes, a hierarchy of classifiers can be used. This can improve accuracy and provide more interpretable results. For example, in document classification, a document might first be classified as “sports” or “politics,” and then a subsequent classifier could further categorize a “sports” document into “basketball,” “soccer,” or “tennis.”

Key Approaches in Machine Learning:

There are two main strategies for implementing hierarchical classification in machine learning:

  • Local Classifiers: This approach involves training a separate classifier for each node or level in the hierarchy.
    • Local Classifier per Node: A binary classifier is trained for each class to decide if an instance belongs to that class or not.
    • Local Classifier per Parent Node: For each parent node, a multi-class classifier is trained to distinguish between its direct children.
    • Local Classifier per Level: A single multi-class classifier is trained for each level of the hierarchy.
  • Global Classifier (or Big-Bang Approach): This method involves training a single, more complex model that considers the entire class hierarchy simultaneously. This model is designed to predict the most specific class in the hierarchy, and the parent classes are implicitly assigned.

Advantages and Disadvantages:

AdvantagesDisadvantages
Improved Accuracy: By breaking down a complex problem into smaller, more manageable sub-problems, hierarchical classification can often lead to more accurate predictions.Error Propagation: In the local classifier approach, an error at a higher level of the hierarchy can propagate down to lower levels, leading to incorrect final classifications.
Interpretability: The hierarchical structure of the output can provide more insight into the classification decision.Complexity: Implementing and training hierarchical models can be more complex than traditional flat classification.
Scalability: It can be more efficient for problems with a very large number of categories.Data Sparsity: Some of the more specific, lower-level classes may have very few training examples, making it difficult to train accurate classifiers.

In essence, hierarchical classification offers a powerful framework for organizing and categorizing information in a structured and meaningful way, with applications ranging from the natural sciences to advanced machine learning tasks.


An Example in Game Recommendations

Traditional recommendation systems might use a “flat” structure of genres. A game is tagged as “Action,” “RPG,” “Strategy,” etc. If a user plays a lot of “Action” games, the system recommends more “Action” games.

This is limiting because “Action” is an incredibly broad category. A player who loves fast-paced, competitive shooters like Valorant (a Tactical FPS) might not enjoy a story-driven, third-person action game like The Last of Us (an Action-Adventure), even though both fall under the “Action” umbrella.

Building a Game Genre Hierarchy

Hierarchical classification solves this by organizing game genres into a tree-like structure, from broad to very specific. This allows the system to understand the user’s taste at a much more granular level.

Here’s a simplified example of a game genre hierarchy:

└── All Games
    ├── Action
    │   ├── Shooter
    │   │   ├── First-Person Shooter (FPS)
    │   │   │   ├── Tactical Shooter (e.g., Valorant, CS:GO)
    │   │   │   └── Looter Shooter (e.g., Borderlands, Destiny 2)
    │   │   └── Third-Person Shooter (TPS) (e.g., Gears of War)
    │   ├── Action-Adventure (e.g., The Last of Us, Tomb Raider)
    │   └── Hack and Slash / Beat 'em up (e.g., Devil May Cry)
    │
    ├── Role-Playing Game (RPG)
    │   ├── Action RPG (ARPG) (e.g., The Witcher 3, Elden Ring)
    │   ├── Japanese RPG (JRPG)
    │   │   ├── Turn-Based (e.g., Persona 5)
    │   │   └── Action-Based (e.g., Final Fantasy XVI)
    │   └── Strategy RPG (SRPG) (e.g., Fire Emblem)
    │
    └── Strategy
        ├── Real-Time Strategy (RTS) (e.g., StarCraft II)
        └── Turn-Based Strategy (TBS) (e.g., Civilization VI)

How Hierarchical Classification Powers Recommendations

A machine learning model uses this hierarchy to both classify games and profile users.

1. Creating Nuanced User Profiles

Instead of just knowing a user likes “RPGs,” the system can learn their specific preferences down the hierarchy.

  • User A plays The Witcher 3, Elden Ring, and Diablo IV. The system identifies a strong preference for the Action RPG (ARPG) sub-genre. It won’t recommend a Turn-Based JRPG like Persona 5, even though it’s also an RPG.
  • User B plays Valorant and Counter-Strike 2. The system profiles them as a fan of Tactical Shooters. The next recommendation is more likely to be Rainbow Six Siege (another Tactical Shooter) rather than a Looter Shooter like Borderlands.

The system builds a user profile not as a flat list of genres, but as a weighted tree, with stronger preferences indicated at specific nodes in the hierarchy.

2. Solving the “Cold Start” Problem for New Games

When a new game is released, it has no user ratings or play history. This is the “cold start” problem. With a hierarchy, the new game can be immediately classified.

For example, a new indie game, “Cosmic Raiders,” is classified as a Looter Shooter. Even with zero player data, the system can start recommending it to users who have a known preference for this specific sub-genre (like players of Destiny 2 or Borderlands). It can even recommend it to players who like the parent category, First-Person Shooter, to test the waters.

3. Enabling Smarter Discovery and Serendipity

Hierarchical classification allows the system to make “adjacent” recommendations. If a user has exhausted all the games in a very specific sub-genre, the system can move up one level in the hierarchy and recommend something from a “sibling” category.

  • A player who loves Turn-Based JRPGs might be willing to try a Strategy RPG (SRPG) like Fire Emblem, as both share strategic, turn-based elements.
  • A fan of Survival Horror (a sub-genre of Action-Adventure) might be recommended a Psychological Horror game.

This helps users discover new types of games they are likely to enjoy, preventing the “recommendation bubble” where they only see more of the exact same thing.

4. Improving Recommendation Accuracy

By breaking down the classification task, the system can be more accurate. A model trained to distinguish between “Tactical Shooters” and “Looter Shooters” (a fine-grained task) can perform much better than a single, massive model trying to predict one of a thousand possible flat genre tags. This is the “divide and conquer” principle. A local classifier at each node only has to solve one simple problem, leading to a more robust overall system.

In summary, applying hierarchical classification to gaming recommendations transforms a generic system into a highly personalized and intelligent discovery engine. It understands that a gamer’s taste is complex and multi-layered, leading to more satisfying and relevant suggestions that keep players engaged.


Example in Python

Now, let’s build and evaluate a hierarchical text classification model. The goal is to classify a document (in this case, a research paper abstract) not just into a single category, but into a path of categories, like Computer Science -> Artificial Intelligence. It uses the hiclass library, which is specifically designed for this type of problem.


1. Setup and Data Loading ⚙️

This first block handles installing the necessary libraries and loading the dataset.

Python

import numpy as np
from hiclass.datasets import load_hierarchical_text_classification

# Load the data with a 70/30 train-test split
# This function handles downloading, parsing, and splitting.
X_train_text, X_test_text, y_train, y_test = load_hierarchical_text_classification(
    test_size=0.3,
    random_state=42
)

print("--- Data Loaded Successfully ---")
print(f"Number of training samples: {X_train_text.shape[0]}")
print(f"Number of testing samples: {X_test_text.shape[0]}")
print(f"Shape of training labels: {y_train.shape}")
print("\nExample of a pre-formatted label (for 'cs.AI'):")
print(y_train[0])

  • pip install ...: This command installs the three Python libraries required:
    • pandas: Used for data manipulation (though not directly in this script, hiclass uses it internally).
    • scikit-learn: A fundamental machine learning library in Python. We use it for the text vectorizer, the base classifier, and the pipeline.
    • hiclass: A specialized library for hierarchical classification that works seamlessly with scikit-learn.
  • load_hierarchical_text_classification(...): This is a convenience function from hiclass. It automatically downloads a dataset of research paper abstracts from arXiv, where each paper is categorized into a hierarchy (e.g., cs -> AI).
    • X_train_text, X_test_text: These contain the raw text of the paper abstracts for training and testing.
    • y_train, y_test: These contain the corresponding hierarchical labels. Each label is an array of strings representing the path. For example, ['cs', 'AI', ''] represents the category “Artificial Intelligence” within “Computer Science”. The empty string '' is used for padding to ensure all label arrays have the same length.
    • test_size=0.3: This splits the data so that 30% is reserved for testing the model’s performance, and 70% is used for training.
    • random_state=42: This ensures that the data is split in the exact same way every time you run the code, making your results reproducible.

Output

--- Data Loaded Successfully ---
Number of training samples: 28000
Number of testing samples: 12000
Shape of training labels: (28000, 3)

Example of a pre-formatted label (for 'cs.AI'):
['health personal care' 'nutrition wellness' 'vitamins supplements']

2. Building the Machine Learning Pipeline 🏗️

A Pipeline is a powerful tool from scikit-learn that chains multiple steps together. This ensures that the same operations (like text processing and classification) are applied consistently to both training and testing data. Our pipeline has two main stages:

  1. Vectorizer: Converts raw text into numerical data.
  2. Classifier: Learns to predict the category path from the numerical data.

Python

from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.linear_model import LogisticRegression
from sklearn.pipeline import Pipeline
from hiclass import LocalClassifierPerParentNode

# 1. Define the base classifier for each node
base_classifier = LogisticRegression(solver='liblinear', random_state=42)

# 2. Create the hierarchical classifier
hierarchical_classifier = LocalClassifierPerParentNode(
    local_classifier=base_classifier,
    n_jobs=-1 
)

# 3. Build the full pipeline
pipeline = Pipeline([
    ('vectorizer', TfidfVectorizer(
        stop_words='english',
        max_features=10000,
        ngram_range=(1, 2)
    )),
    ('classifier', hierarchical_classifier)
])

  • TfidfVectorizer: This is the first step. It converts the text abstracts into a matrix of TF-IDF features.
    • TF-IDF stands for Term Frequency-Inverse Document Frequency. It’s a numerical statistic that reflects how important a word is to a document in a collection. It gives higher weight to words that are frequent in one document but rare across all other documents.
    • stop_words='english': Removes common English words like “the”, “a”, and “in”, which don’t carry much meaning for classification.
    • max_features=10000: Considers only the top 10,000 most frequent words to keep the model manageable.
    • ngram_range=(1, 2): Looks at both single words (like “machine”) and pairs of adjacent words (like “machine learning”). This helps capture more context.
  • LocalClassifierPerParentNode (LCPN): This is the core of our hierarchical model from the hiclass library.
    • Its strategy is to train one standard classifier for each “parent” category in the hierarchy. For example, it trains one classifier at the root level to decide between cs, math, physics, etc. Then, if it predicts cs, it uses a different classifier specifically trained to distinguish between the children of cs (like AI, LG, CV).
    • local_classifier=base_classifier: We tell the LCPN to use a LogisticRegression model for each of these decision points. LogisticRegression is a simple, fast, and effective baseline for text classification.
    • n_jobs=-1: This is a performance trick that tells the model to use all available CPU cores to speed up training.

Output


3. Training the Model 🧠

This is the simplest-looking but most computationally intensive step.

Python

pipeline.fit(X_train_text, y_train)

  • pipeline.fit(...): This single command executes the entire training process.
    1. The TfidfVectorizer first learns the vocabulary from X_train_text and transforms the text into a numerical matrix.
    2. The LocalClassifierPerParentNode then takes this matrix and the hierarchical labels y_train to train all its internal LogisticRegression models.

4. Making a Prediction 🔮

Now that the model is trained, we can use it to predict the category for a new, unseen abstract from our test set.

Python

from hiclass.metrics import f1

# Select a sample abstract from the test set
sample_idx = 42
sample_abstract = [X_test_text[sample_idx]] 
true_path = y_test[sample_idx]

# Predict the hierarchy for the sample abstract
predicted_path = pipeline.predict(sample_abstract)[0]

# Clean up empty strings for a cleaner display
true_path_clean = list(filter(None, true_path))
predicted_path_clean = list(filter(None, predicted_path))

print("\n--- Prediction Example ---")
# ... (print statements)

  • pipeline.predict(sample_abstract): We feed a sample abstract into the trained pipeline. Note that the input must be an iterable (like a list), which is why [X_test_text[sample_idx]] is used.
  • The pipeline automatically applies the same steps: it vectorizes the text using the already learned vocabulary and then uses the trained hierarchical classifier to predict the category path.
  • The filter function is a nice Python trick to remove the '' padding strings from the label arrays for a cleaner printout.

5. Evaluating Model Performance 📊

Finally, we evaluate the model’s performance not just on one example, but on the entire test set to get a reliable measure of how well it works.

Python

# Evaluate the model's performance on the entire test set
y_pred = pipeline.predict(X_test_text)
h_f1_score = f1(y_test, y_pred)

print(f"\nOverall Hierarchical F1-Score on Test Set: {h_f1_score:.4f}")

  • pipeline.predict(X_test_text): We get predictions for all the abstracts in our test set.
  • f1(y_test, y_pred): We use the f1 score metric from hiclass. A standard F1-score is a balance between precision and recall, but this hierarchical F1-score is specially designed to measure performance in a hierarchy. It correctly rewards predictions that are partially correct (e.g., getting the top-level category right but the sub-category wrong) and penalizes predictions that are completely wrong.
  • The final score gives you a single number summarizing how accurate your model’s hierarchical predictions are across the entire test dataset.

Output:

--- Prediction Example ---
Abstract:
'Crown Crafts The Original NoJo BabySling by Dr. Sears in Black Chambray...'

=========================
✅ True Category Path:	['baby products', 'gear', 'backpacks carriers']
🤖 Predicted Category Path:	[np.str_('baby products'), np.str_('gear'), np.str_('backpacks carriers')]

Evaluating on the full test set...

Overall Hierarchical F1-Score on Test Set: 0.8207

Full code:

#pip install pandas scikit-learn hiclass

import numpy as np
from hiclass.datasets import load_hierarchical_text_classification

# Load the data with a 70/30 train-test split
# This function handles downloading, parsing, and splitting.
X_train_text, X_test_text, y_train, y_test = load_hierarchical_text_classification(
    test_size=0.3, 
    random_state=42
)

print("--- Data Loaded Successfully ---")
print(f"Number of training samples: {X_train_text.shape[0]}")
print(f"Number of testing samples: {X_test_text.shape[0]}")
print(f"Shape of training labels: {y_train.shape}")
print("\nExample of a pre-formatted label (for 'cs.AI'):")
print(y_train[0])


from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.linear_model import LogisticRegression
from sklearn.pipeline import Pipeline
from hiclass import LocalClassifierPerParentNode

# 1. Define the base classifier for each node in the hierarchy
base_classifier = LogisticRegression(solver='liblinear', random_state=42)

# 2. Create the hierarchical classifier
# This part of the code remains the same
hierarchical_classifier = LocalClassifierPerParentNode(
    local_classifier=base_classifier,
    n_jobs=-1 
)

# 3. Build the full pipeline
pipeline = Pipeline([
    ('vectorizer', TfidfVectorizer(
        stop_words='english',
        max_features=10000,
        ngram_range=(1, 2)
    )),
    ('classifier', hierarchical_classifier)
])

pipeline.fit(X_train_text, y_train)

from hiclass.metrics import f1

# Select a sample abstract from the test set to inspect
sample_idx = 42

# Select the string at the index and wrap it in a list for the pipeline
sample_abstract = [X_test_text[sample_idx]] 
true_path = y_test[sample_idx]

# Predict the hierarchy for the sample abstract
predicted_path = pipeline.predict(sample_abstract)[0]

# Clean up empty strings from padding for a cleaner display
true_path_clean = list(filter(None, true_path))
predicted_path_clean = list(filter(None, predicted_path))

print("\n--- Prediction Example ---")
# Since sample_abstract is now a list with one item, access it with [0]
print(f"Abstract:\n'{sample_abstract[0][:400]}...'") 
print("\n" + "="*25)
print(f"✅ True Category Path:\t{true_path_clean}")
print(f"🤖 Predicted Category Path:\t{predicted_path_clean}")

# Evaluate the model's performance on the entire test set
print("\nEvaluating on the full test set...")
# The predict method works correctly with the full NumPy array
y_pred = pipeline.predict(X_test_text)
h_f1_score = f1(y_test, y_pred)

print(f"\nOverall Hierarchical F1-Score on Test Set: {h_f1_score:.4f}")

Run/download code in Colab

Leave a Reply

error: Content is protected !!