Speeding Up PyTorch Graph Learning Models with `torch.compile` and PyG

Tools & Engineering

The Engineer

29 Apr 2025 · 3 min read

Discover how `torch.compile` and PyTorch Geometric can boost your graph ML models' performance by up to 35%, diving into the intricacies of Relational Deep Learning without sacrificing accuracy.

If you're working on graph machine learning (ML) models, especially in the realm of Relational Deep Learning (RDL), performance optimization is crucial. This article will guide you through how to speed up your PyTorch graph ML models using torch.compile and PyTorch Geometric (PyG). We'll explore practical techniques that can yield up to 35% speed improvements in our experiments, all while maintaining accuracy.

Behind the Scenes of Relational Deep Learning

Relational Deep Learning (RDL) is an advanced AI approach that combines deep learning with relational reasoning to model and learn from interconnected, structured data. RDL leverages graph-based representations, such as those found in relational databases, to enable neural networks to capture complex dependencies and interactions between entities. In our setup at Kumo, we construct a large-scale heterogeneous graph from relational tables, where each table corresponds to a node type, and each row is an instance of that node type. Primary key-foreign key pairs define the edges.

For example, consider a graph with three node types: Users, Interactions, and Items. Each user can interact with items, and these interactions are captured as edges in the graph. RDL uses Graph Transformers and Graph Neural Networks (GNNs) to achieve state-of-the-art performance by incorporating connectivity patterns and diverse multi-modal features.

Implementation Details

To implement RDL at Kumo, we rely on several open-source ML libraries:

PyTorch Geometric (PyG): Handles graph data and message passing in GNNs and Graph Transformers.
PyTorch Frame: Processes multi-modal data by encoding numerical, categorical, text, images, timestamps, etc., into a shared embedding space.
PyTorch Lightning: Structures and scales training processes, reducing boilerplate code and handling various graph learning tasks like item recommendation, customer churn prediction, or fraud detection.

Here’s a step-by-step breakdown of our implementation:

Data Preprocessing:
- Use PyTorch Frame to encode multi-modal features into a shared embedding space.
- Convert relational tables into a graph structure using primary key-foreign key pairs.
Graph Construction:
- Define node types and edges based on the relational data.
- Encode the preprocessed features as node attributes in the graph.
Model Training:
- Use PyG to perform message passing across the graph.
- Ensure temporal information is incorporated to avoid data leakage.
- Utilize PyTorch Lightning for efficient training and scalability.

Compiling PyTorch Models with `torch.compile`

PyTorch’s eager mode offers flexibility but can be slower compared to compiled code. To address this, we use torch.compile, which converts the model into a more optimized form before execution. This can lead to significant performance gains without sacrificing accuracy.

How `torch.compile` Works

Graph Tracing: torch.compile traces the computational graph of your model.
Fusion and Optimization: It fuses operations and applies various optimizations, such as loop unrolling and kernel fusion.
JIT Compilation: The optimized graph is then just-in-time (JIT) compiled into efficient machine code.

Implementation Steps

Install torch.compile:
```
pip install torchdynamo
```

Enable torch.compile in Your Model:

import torch
from torch_geometric.data import Data
from torch_geometric.nn import GATConv

class GraphModel(torch.nn.Module):
    def __init__(self, num_features, hidden_channels, num_classes):
        super().__init__()
        self.conv1 = GATConv(num_features, hidden_channels)
        self.conv2 = GATConv(hidden_channels, num_classes)

    def forward(self, x, edge_index):
        x = self.conv1(x, edge_index).relu()
        x = self.conv2(x, edge_index)
        return x

model = GraphModel(num_features=10, hidden_channels=64, num_classes=3)
compiled_model = torch.compile(model)

Train the Model:

optimizer = torch.optim.Adam(compiled_model.parameters(), lr=