DeepL: Efficient Deep Learning Framework

A high-performance deep learning framework specifically designed to solve complex graph optimization challenges in machine learning, enabling researchers and developers to build more efficient and precise neural network models.

Problem Solved

Scientific and industrial machine learning projects often struggle with computational inefficiency and limited flexibility when handling complex neural network architectures. This framework addresses these challenges by providing a unified solution that bridges performance and adaptability.

Key Features

Deep Learning Framework with Automatic Differentiation using Reverse AutoGrad
High-Performance Tensor Operations on both CPU and GPU
Dynamic Computational Graph
Python and C++ APIs
CUDA-accelerated computations
Built-in Components:
- Optimizers: SGD
- Layers: Sequential, Linear, ReLU
- Loss Functions: Cross Entropy Loss
- Automatic Differentiation Engine

Implementation Highlights

Designed and implemented a custom deep learning framework with reverse-mode auto-differentiation, supporting efficient computation graphs for backpropagation in neural networks.
Optimized tensor operations on CPU and GPU by implementing parallelized matrix operations, fused kernels, and CUDA-based execution, improving computational efficiency.
Developed a Python-C++ bridge using Pybind11, enabling seamless interaction between high-level Python API and low-level C++ core for efficient model execution.
Implemented dynamic memory management and optimized data structures for tensor storage and operations, reduc ing memory fragmentation and improving runtime performance.

Quick Start

# Clone the repository
git clone https://github.com/ashwin1596/deepL.git
cd deepL

# Setup environment
conda env create -f environment.yml
conda activate deepl

# Build the library
bash scripts/build.sh

# Set Python path
export PYTHONPATH="${PWD}/build:${PYTHONPATH}"

# Run example
python examples/python/mnist_classifier.py

Prerequisites

Python 3.8 or higher
CUDA Toolkit 11.0 or higher
CMake 3.15 or higher
C++17 compatible compiler
GPU with compute capability 6.0 or higher(for faster computation), if not CPU mode can be used
Required Python packages:
- NumPy
- PyBind11
- PyTorch (for examples)
- torchvision (for MNIST example)

Installation

Building from Source

Clone the Repository:

git clone https://github.com/ashwin1596/deepL.git
cd deepL

Create Conda Environment:

conda env create -f environment.yml
conda activate deepl

Build the Library:

bash scripts/build.sh

Set Python Path:

export PYTHONPATH="${PWD}/build:${PYTHONPATH}"

Common Installation Issues

CUDA Not Found: Verify CUDA installation and ensure CUDA_HOME is set correctly
Build Fails: Check compiler compatibility and CMake version
Import Errors: Verify PYTHONPATH is set correctly
Missing Dependencies: Ensure all required packages are installed via conda/pip

Usage Examples

Simple Neural Network

import deepl as dl
import numpy as np

# Create input and target data
input_data = np.random.randn(10, 2).astype(np.float32)
target_data = np.random.randn(10, 1).astype(np.float32)

# Create input and target nodes
input_node = dl.Tensor(input_data)
target_node = dl.Tensor(target_data)

# Create model
builder = dl.GraphBuilder()
model = dl.Sequential(builder)
model.add_layer(dl.Linear(2, 4, builder))
model.add_layer(dl.ReLU(builder))
model.add_layer(dl.Linear(4, 1, builder))

# Create loss function
loss_fn = dl.CrossEntropyLoss(builder)

# Create optimizer
parameters = model.parameters()
optimizer = dl.SGD(parameters, learning_rate=0.01)

# Training loop
for epoch in range(10):
    optimizer.zero_grad()
    
    # Forward pass
    outputs = model.forward(input_node)
    loss = loss_fn.forward(outputs, target_node)
    
    # Backward pass
    builder.backward(loss)
    optimizer.step()

Using C++ API

#include <deepl/core/tensor.cuh>
#include <deepl/layers/sequential.h>
#include <deepl/builder/graph_builder.h>

int main() {
    // Create model
    auto builder = GraphBuilder();
    auto model = Sequential(builder);
    model.add_layer(Linear(784, 128, builder));
    model.add_layer(ReLU(builder));
    model.add_layer(Linear(128, 10, builder));
    
    return 0;
}

Library Structure

The project is organized into several components:

bindings/python: Python bindings for easy integration with Python-based workflows.
docs: Documentation for the library and examples.
examples: Python and C++ based examples demonstrating usage of the library.
include/deepl: Header files defining core components, layers, loss functions, optimizers, and utilities.
src: Source files implementing the core functionality.
scripts: Scripts for building and maintaining the project.

Build Instructions

Navigate to scripts and run build.sh

bash build.sh

Python

For python set the PYTHONPATH to build directory

export PYTHONPATH=$(pwd)/build:$PYTHONPATH

To use the library you can import it as import deeplearning, see mnist_classifier.py.

C++(CUDA)

To use the library for C++ code, link the necessary libraries as below.

nvcc examples/cpp/mnist_classifier.cpp -Iinclude/deepl -Lbuild/ -ldeeplearning_cpp -lcudart -lcublas -o example/cpp

Implementation Details

Tensors

Tensors are used as storage objects.
Tensors can store data on both CPU and GPU devices, depending on the selected device.
Support for low-level operations:
- reshape: Change the shape of the tensor without altering its data.
- add: Perform element-wise addition of two tensors.
- elementwise_multiply: Perform element-wise multiplication of two tensors.
- matmul: Perform matrix multiplication between tensors.
- transpose: Transpose the dimensions of a tensor.
- divide: Perform element-wise division of two tensors.
- exp: Compute the exponential of each element in the tensor.
- binaralize: Convert tensor elements to binary values (e.g., thresholding).
- neg: Negate the elements of a tensor.
- log: Compute the natural logarithm of each element in the tensor.
Support for advanced operations:
- sumAlongAxis: Sum tensor elements along a specified axis.
- softmax: Apply the softmax function to normalize tensor values.
- batchMatmul: Perform batch matrix multiplication for tensors with batch dimensions.
All operations are supported on both GPU and CPU devices.
Operations such as matrix multiplication (matmul), addition, subtraction, and negation leverage CuBLAS for GPU-accelerated computation, enhancing the framework's speed and efficiency by utilizing GPU architecture effectively.

Computational Graph

Consists of graph nodes for storing data during the forward pass and adjoints during the backward pass.
Provides wrappers for all tensor operations.
Adjoint nodes store adjoint values and dependent nodes along with their partial derivatives.
During the backward pass, each adjoint node processes the gradient in topological order, propagating the gradients backward.

How It Works

DeepL employs a computation graph to represent neural network operations. The process includes:

Graph Construction: Nodes represent operations, and edges define dependencies.
Forward Pass: Compute outputs layer-by-layer.
Backward Pass: Compute gradients using reverse-mode autodifferentiation.

Example Neural Network

Here is a simple example of building and training a neural network using DeepL:

Python:

import deepl as dl
import numpy as np
from torchvision import datasets, transforms
from torch.utils.data import DataLoader


transform = transforms.Compose([
    transforms.ToTensor()
])  

config = dl.Config.get_instance()

config.set_device_type('GPU')
config.set_cuda_devices('0')
config.set_batch_size(32)
config.set_num_epochs(50)

batch_size = config.get_batch_size()
num_epochs = config.get_num_epochs()
num_classes = 10

# Download and transform the MNIST dataset
train_dataset = datasets.MNIST(root='./data', train=True, transform=transform, download=True)
test_dataset = datasets.MNIST(root='./data', train=False, transform=transform, download=True)

# Create data loaders
train_loader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True)
test_loader = DataLoader(test_dataset, batch_size=batch_size, shuffle=False)

builder = dl.GraphBuilder()
model = dl.Sequential(builder)
model.add_layer(dl.Linear(28 * 28, 256, builder, layer_num=1))  # Input: 784, Output: 128
model.add_layer(dl.ReLU(builder, layer_num=2))
model.add_layer(dl.Linear(256, 128, builder, layer_num=3))
model.add_layer(dl.ReLU(builder, layer_num=4))
model.add_layer(dl.Linear(128, 10, builder, layer_num=5))

loss_fn = dl.CrossEntropyLoss(builder)  # Loss function
parameters = model.parameters()  # Get model parameters
optimizer = dl.SGD(parameters, learning_rate=0.001)  # Optimizer

for epoch in range(num_epochs):
    total_loss = 0.0
    for batch_idx, (images, labels) in enumerate(train_loader):
        optimizer.zero_grad()

        # Flatten the images to match the input dimension (batch_size, 28*28)
        images = images.view(images.size(0), -1).numpy().astype(np.float32)  # Convert to numpy array
        labels = labels.numpy().astype(int)  # Convert labels to numpy array

        # One-hot encode the labels
        labels = np.eye(num_classes)[labels].astype(np.float32)

        # Convert to Tensor
        input_tensor = dl.Tensor(images, False)
        target_tensor = dl.Tensor(labels, False)

        input_node = builder.createVariable("input", input_tensor.transpose())
        target_node = builder.createVariable("target", target_tensor.transpose())

        # Forward pass
        outputs = model.forward(input_node)

        # Compute loss
        loss = loss_fn.forward(outputs, target_node)

        # Backward pass
        builder.backward(loss)

        optimizer.step()

        # Accumulate loss
        total_loss += loss.value().get_data()[0]

        print(f"Epoch: {epoch+1}/{num_epochs}, Batch: {batch_idx+1}, Loss: {loss.value().get_data()[0]}")
        	
    print(f"Epoch [{epoch+1}/{num_epochs}], Loss: {total_loss/len(train_loader):.4f}")

Project Structure

deepL\
|   .gitignore
|   CMakeLists.txt
|   dependencies.yml
|   readme.md
|
+---bindings
|   \---python
|           deeplearning.cpp
|
+---examples
|   \---python
|           gradmtest.py
|           mnist_classifier.py
|           trans_test.py
|   \---cpp
|           mnist_classifier.cpp_
|
+---include
|   \---deepl
|       +---core
|       +---layers
|       +---loss
|       +---optimisers
|       \---utils
+---scripts
|       build.sh
+---src
    +---core
    +---layers
    +---loss
    +---optimisers
    \---utils

Task List

Implement Tensor class and operations.
Add Python bindings.
Create example neural networks.
Expand documentation.
Optimize GPU implementations.
Add loss functions and optimizers.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

DeepL: Efficient Deep Learning Framework

Problem Solved

Key Features

Implementation Highlights

Quick Start

Prerequisites

Installation

Building from Source

Common Installation Issues

Usage Examples

Simple Neural Network

Using C++ API

Library Structure

Build Instructions

Python

C++(CUDA)

Implementation Details

Tensors

Computational Graph

How It Works

Example Neural Network

Python:

Project Structure

Task List

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
bindings/python		bindings/python
examples		examples
include/deepl		include/deepl
scripts		scripts
src		src
.gitignore		.gitignore
CMakeLists.txt		CMakeLists.txt
environment.yml		environment.yml
readme.md		readme.md

ashwin1596/deepL

Folders and files

Latest commit

History

Repository files navigation

DeepL: Efficient Deep Learning Framework

Problem Solved

Key Features

Implementation Highlights

Quick Start

Prerequisites

Installation

Building from Source

Common Installation Issues

Usage Examples

Simple Neural Network

Using C++ API

Library Structure

Build Instructions

Python

C++(CUDA)

Implementation Details

Tensors

Computational Graph

How It Works

Example Neural Network

Python:

Project Structure

Task List

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages