Word Embeddings in PyTorch: A Complete Guide

February 21, 2025February 21, 2025by Kurious Fox

This guide covers:

Implementing word embeddings in PyTorch
Training word embeddings
Saving the trained embeddings
Loading the saved embeddings for reuse

1. Implementing Word Embeddings in PyTorch

PyTorch provides nn.Embedding for creating word embeddings.

import torch
import torch.nn as nn

# Define the vocabulary size and embedding dimension
vocab_size = 10  # Example vocabulary size
embedding_dim = 5  # Dimension of word vectors

# Create an embedding layer
embedding_layer = nn.Embedding(num_embeddings=vocab_size, embedding_dim=embedding_dim)

# Example input (word indices)
word_indices = torch.tensor([1, 3, 5, 7])  # Example words

# Get embeddings for input words
word_embeddings = embedding_layer(word_indices)
print(word_embeddings)

nn.Embedding(num_embeddings, embedding_dim): Creates an embedding matrix of size [vocab_size, embedding_dim]
word_indices: Index values for words
The output will be a tensor of shape [num_words, embedding_dim]

2. Training Word Embeddings in PyTorch

Training embeddings typically involves using a neural network like Skip-gram or CBOW.

Dataset Preparation

import torch
import torch.nn as nn
import torch.optim as optim

# Sample dataset (word pairs)
data = [(0, 1), (1, 2), (2, 3), (3, 4)]  # (center word, context word)
vocab_size = 5  # Number of unique words
embedding_dim = 3

# Create model
class WordEmbeddingModel(nn.Module):
    def __init__(self, vocab_size, embedding_dim):
        super(WordEmbeddingModel, self).__init__()
        self.embeddings = nn.Embedding(vocab_size, embedding_dim)
        self.linear = nn.Linear(embedding_dim, vocab_size)
    
    def forward(self, center_word):
        embed = self.embeddings(center_word)
        output = self.linear(embed)
        return output

# Initialize model
model = WordEmbeddingModel(vocab_size, embedding_dim)
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(model.parameters(), lr=0.01)

# Training loop
for epoch in range(100):
    total_loss = 0
    for center, context in data:
        center_tensor = torch.tensor([center], dtype=torch.long)
        target_tensor = torch.tensor([context], dtype=torch.long)

        optimizer.zero_grad()
        output = model(center_tensor)
        loss = criterion(output, target_tensor)
        loss.backward()
        optimizer.step()

        total_loss += loss.item()
    
    if epoch % 20 == 0:
        print(f"Epoch {epoch}, Loss: {total_loss}")

The model learns embeddings by predicting context words given a center word.
nn.CrossEntropyLoss() is used for training.
SGD optimizer updates embeddings based on loss.

3. Saving the Trained Embedding Model

Once trained, we save the embedding layer.

# Save only the embedding layer weights
torch.save(model.embeddings.state_dict(), "word_embeddings.pth")

# Save the entire model (optional)
torch.save(model.state_dict(), "embedding_model.pth")

torch.save(model.embeddings.state_dict(), "word_embeddings.pth") saves just the embedding weights.
torch.save(model.state_dict(), "embedding_model.pth") saves the full model.

4. Loading the Saved Embeddings

To reuse the trained embeddings:

# Load model structure
loaded_model = WordEmbeddingModel(vocab_size, embedding_dim)

# Load saved embeddings
loaded_model.embeddings.load_state_dict(torch.load("word_embeddings.pth"))

# Example usage
word_idx = torch.tensor([2])  # Example word index
print(loaded_model.embeddings(word_idx))

WordEmbeddingModel(vocab_size, embedding_dim): Recreate the model structure.
load_state_dict(torch.load("word_embeddings.pth")): Load saved embeddings.
Now, embeddings can be used in further training or inference.

Implement in Colab

Discover more from Science Comics

Subscribe to get the latest posts sent to your email.

Understanding Word Embeddings: Word2Vec vs GloVe + Python Code

Word embeddings like Word2Vec and GloVe provide vector representations of words, capturing meanings and relationships. Word2Vec utilizes a neural network approach, while GloVe is based on matrix factorization from co-occurrence statistics.

Visualizing Text Keywords in Python: Top Methods

There are several ways to visualize text keywords in Python, like word clouds, bar charts, network graphs, and dimensionality reduction…

WordPiece Tokenization: A Deep Dive

WordPiece Tokenization enhances classical tokenization strategies by breaking words into subwords to manage rare and out-of-vocabulary terms effectively, resulting in improved model performance and better language processing across diverse languages.

Word Embeddings in PyTorch: A Complete Guide

1. Implementing Word Embeddings in PyTorch

2. Training Word Embeddings in PyTorch

Dataset Preparation

3. Saving the Trained Embedding Model

4. Loading the Saved Embeddings

Implement in Colab

Like this:

Related

Discover more from Science Comics

Like this:

Like this:

Like this:

Leave a ReplyCancel reply

1. Implementing Word Embeddings in PyTorch

2. Training Word Embeddings in PyTorch

Dataset Preparation

3. Saving the Trained Embedding Model

4. Loading the Saved Embeddings

Implement in Colab

Share this:

Like this:

Related

Discover more from Science Comics

Related Posts

Share this:

Like this:

Share this:

Like this:

Share this:

Like this:

Leave a ReplyCancel reply