Getting Started with openDLX — Installation, Basics, and First ModelopenDLX is an open-source deep learning framework designed to be lightweight, modular, and easy to extend. It targets researchers and engineers who want a minimal but powerful toolkit to prototype models, experiment with custom layers and optimizers, and deploy trained networks without the heavy abstractions of some larger libraries. This guide walks you through installing openDLX, understanding its basic components, and building your first working model — a simple image classifier — including training and evaluation.
Why choose openDLX?
- Lightweight and modular: openDLX provides core deep learning building blocks without imposing heavy design patterns. You can pick only the parts you need.
- Readable codebase: Designed for learning and research, the code emphasizes clarity and simplicity.
- Extensible: Adding custom layers, optimizers, and datasets is straightforward.
- Interoperable: Provides utilities to convert and import models/weights from other frameworks where feasible.
System requirements and prerequisites
Before installing openDLX, ensure you have:
- Python 3.8+ (3.10 recommended)
- pip or a virtual environment manager (venv, conda)
- C compiler (for optional GPU extensions)
- CUDA toolkit and cuDNN for GPU support (if you plan to use GPU acceleration)
- Basic familiarity with Python and linear algebra
Recommended packages (will be installed as dependencies where applicable):
- numpy
- scipy
- matplotlib
- pillow
- tqdm
Installation
There are two main ways to install openDLX: via pip (official release) or from source (for the latest features).
-
Install from PyPI (recommended for most users)
python -m pip install opendlx
-
Install the latest from GitHub
git clone https://github.com/opendlx/opendlx.git cd opendlx python -m pip install -e .
-
Optional: install GPU extensions (if available for your platform)
# Example, may vary by platform and release python -m pip install opendlx-gpu
After installation, verify with:
python -c "import opendlx; print(opendlx.__version__)"
openDLX core concepts
openDLX centers around a few straightforward abstractions:
- Tensors: The primary data structure, built on top of numpy for CPU and optionally on CUDA arrays for GPU. Tensors support basic ops, broadcasting, and automatic differentiation.
- Layers / Modules: Reusable building blocks (Linear, Conv2D, BatchNorm, Activation, etc.). Layers expose forward and backward methods.
- Models: Compositions of layers; models are callables that define forward passes.
- Loss functions: Common losses (CrossEntropy, MSE, etc.) with gradient implementations.
- Optimizers: SGD, Adam, RMSProp — lightweight implementations that update model parameters.
- DataLoaders: Utilities to create iterable batches, with shuffling and simple augmentation.
- Training loop: Minimal trainer utility that handles epochs, logging, checkpointing, and evaluation hooks.
Quick tour: a minimal example
Here’s a concise example creating a simple feedforward classifier on a toy dataset.
import numpy as np from opendlx import Tensor, nn, optim, data, losses # Synthetic dataset X = np.random.randn(1000, 20).astype(np.float32) y = (np.sum(X[:, :5], axis=1) > 0).astype(np.int64) dataset = data.ArrayDataset(X, y) loader = data.DataLoader(dataset, batch_size=32, shuffle=True) # Model class SimpleMLP(nn.Module): def __init__(self): super().__init__() self.net = nn.Sequential( nn.Linear(20, 64), nn.ReLU(), nn.Linear(64, 2) ) def forward(self, x): return self.net(x) model = SimpleMLP() criterion = losses.CrossEntropy() optimizer = optim.Adam(model.parameters(), lr=1e-3) # Training loop for epoch in range(10): for xb, yb in loader: xb = Tensor(xb) yb = Tensor(yb) preds = model(xb) loss = criterion(preds, yb) optimizer.zero_grad() loss.backward() optimizer.step() print(f"Epoch {epoch+1}: loss={loss.item():.4f}")
Building your first real model: CIFAR-10 classifier
Below is a step-by-step guide to build, train, and evaluate a small convolutional neural network on the CIFAR-10 dataset using openDLX.
1) Prepare dataset
Use the built-in dataset utilities to download and preprocess CIFAR-10.
from opendlx.data import CIFAR10, DataLoader, transforms train_ds = CIFAR10(root='./data', train=True, download=True, transform=transforms.Compose([ transforms.RandomCrop(32, padding=4), transforms.RandomHorizontalFlip(), transforms.ToTensor(), transforms.Normalize((0.4914,0.4822,0.4465), (0.247,0.243,0.261)) ])) test_ds = CIFAR10(root='./data', train=False, download=True, transform=transforms.Compose([ transforms.ToTensor(), transforms.Normalize((0.4914,0.4822,0.4465), (0.247,0.243,0.261)) ])) train_loader = DataLoader(train_ds, batch_size=128, shuffle=True, num_workers=4) test_loader = DataLoader(test_ds, batch_size=256, shuffle=False, num_workers=2)
2) Define the model
import opendlx.nn as nn class ConvNet(nn.Module): def __init__(self, num_classes=10): super().__init__() self.features = nn.Sequential( nn.Conv2d(3, 64, kernel_size=3, padding=1), nn.BatchNorm2d(64), nn.ReLU(), nn.MaxPool2d(2), nn.Conv2d(64, 128, kernel_size=3, padding=1), nn.BatchNorm2d(128), nn.ReLU(), nn.MaxPool2d(2), nn.Conv2d(128, 256, kernel_size=3, padding=1), nn.BatchNorm2d(256), nn.ReLU(), nn.AdaptiveAvgPool2d((1,1)) ) self.classifier = nn.Linear(256, num_classes) def forward(self, x): x = self.features(x) x = x.view(x.shape[0], -1) return self.classifier(x)
3) Training setup
model = ConvNet().to('cuda') # or 'cpu' criterion = nn.CrossEntropy() optimizer = optim.SGD(model.parameters(), lr=0.1, momentum=0.9, weight_decay=5e-4) scheduler = optim.lr_scheduler.StepLR(optimizer, step_size=30, gamma=0.1)
4) Train & evaluate
for epoch in range(100): model.train() running_loss = 0.0 for xb, yb in train_loader: xb, yb = xb.to('cuda'), yb.to('cuda') preds = model(xb) loss = criterion(preds, yb) optimizer.zero_grad() loss.backward() optimizer.step() running_loss += loss.item() * xb.shape[0] scheduler.step() train_loss = running_loss / len(train_loader.dataset) # validation model.eval() correct = 0 total = 0 with opendlx.no_grad(): for xb, yb in test_loader: xb, yb = xb.to('cuda'), yb.to('cuda') preds = model(xb) _, predicted = preds.max(1) correct += (predicted == yb).sum().item() total += yb.size(0) acc = correct / total print(f"Epoch {epoch+1}: train_loss={train_loss:.4f}, test_acc={acc:.4f}")
Tips, debugging, and performance tuning
- Use smaller batch sizes when GPU memory is limited.
- Profile data loading; use num_workers > 0 if CPU-bound.
- Start with a higher learning rate and reduce with a scheduler or cosine annealing.
- Use mixed precision (AMP) if supported to speed up training and reduce memory.
- Save checkpoints frequently and include optimizer state to resume training.
- If gradients vanish/explode, check weight initialization and activation functions.
Extending openDLX
- Custom layer example: subclass nn.Module, implement forward and register parameters.
- Custom optimizer: create a class inheriting from optim.Optimizer and implement step().
- Converters: import weights from other frameworks by matching parameter names and shapes.
Common pitfalls
- Mismatched tensor devices (CPU vs GPU) — move both model and data to the same device.
- Incorrect loss shapes (e.g., forgetting to pass logits vs probabilities to CrossEntropy).
- Forgetting model.train() / model.eval() mode for layers like BatchNorm and Dropout.
Resources and next steps
- Explore the opendlx docs for detailed API references and advanced examples.
- Try building larger architectures (ResNet, Transformer) using openDLX primitives.
- Contribute to the project: bug reports, feature requests, or pull requests to extend functionality.
If you want, I can generate a ready-to-run Colab notebook version of the CIFAR-10 example or a stripped-down CPU-only tutorial.