Skip to content

Create your first deep learning neural network

Introduction

This is the first part of our beginner tutorial series that will take you through creating, training, and running inference on a neural network. In this part, you will learn how to use the built-in Block to create your first neural network - a Multilayer Perceptron.

Step 1: Setup development environment

Installation

This tutorial requires the installation of the Java Jupyter Kernel. To install the kernel, see the Jupyter README.

// Add the snapshot repository to get the DJL snapshot artifacts
// %mavenRepo snapshots https://oss.sonatype.org/content/repositories/snapshots/

// Add the maven dependencies
%maven ai.djl:api:0.27.0
%maven org.slf4j:slf4j-simple:1.7.36
import ai.djl.*;
import ai.djl.nn.*;
import ai.djl.nn.core.*;
import ai.djl.training.*;

Neural Network

A neural network is a black box function. Instead of coding this function yourself, you provide many sample input/output pairs for this function. Then, we try to train the network to learn how to best approximate the observed behavior of the function given only these input/output pairs. A better model with more data can more accurately approximate the function.

Application

The first thing to figure out when trying to build a neural network, like building most functions, is what your function signature is. What are your input types and output types? Because most models use relatively consistent signatures, we refer to them as Applications. Within the Applications interface, you can find a list of some of the more common model applications used in deep learning.

In this tutorial, we will focus on the image classification application. It is one of the most common first applications and has a significant history with deep learning. In image classification, the input is a single image and it is classified based on the main subject of the image into a number of different possible classes. The classes for the image depend on the specific data you are training with.

Application application = Application.CV.IMAGE_CLASSIFICATION;

Dataset

Once you have figured out what application you want to learn, next you need to collect the data you are training with and form it into a dataset. Often, trying to collect and clean up the data is the most troublesome task in the deep learning process.

Using a dataset can either involve collecting custom data from various sources or using one of the many datasets freely available online. The custom data may better suit your use case, but a free dataset is often faster and easier to use. You can read our dataset guide to learn more about datasets.

MNIST

The dataset we will be using is MNIST, a database of handwritten digits. Each image contains a black and white digit from 0 to 9 in a 28x28 image. It is commonly used when getting started with deep learning because it is small and fast to train.

Mnist Image

Once you understand your dataset, you should create an implementation of the Dataset class. In this case, we provide the MNIST dataset built-in to make it easy for you to use it.

Multilayer Perceptron

Now that we have our dataset, we can choose a model to train with it. For this tutorial, we will build one of the simplest and oldest deep learning networks: a Multilayer Perceptron (MLP).

The MLP is organized into layers. The first layer is the input layer which contains your input data and the last layer is the output layer which produces the final result of the network. Between them are layers referred to as hidden layers. Having more hidden layers and larger hidden layers allows the MLP to represent more complex functions.

The example below contains an input of size 3, a single hidden layer of size 3, and an output of size 2. The number and sizes of the hidden layers are usually determined through experimentation. Between each pair of layers is a linear operation (sometimes called a FullyConnected operation because each number in the input is connected to each number in the output by a matrix multiplication). Not pictured, there is also a non-linear activation function after each linear operation. For more information, see the Multilayer Perceptron chapter of the D2l DJL book.

MLP Image

Step 2: Determine your input and output size

The MLP model uses a one dimensional vector as the input and the output. You should determine the appropriate size of this vector based on your input data and what you will use the output of the model for.

Our input vector will have size 28x28 because the MNIST input images have a height and width of 28 and it takes only a single number to represent each pixel. For a color image, you would need to further multiply this by 3 for the RGB channels.

Our output vector has size 10 because there are 10 possible classes (0 to 9) for each image.

long inputSize = 28*28;
long outputSize = 10;

Step 3: Create a SequentialBlock

NDArray

The core data type used for working with deep learning is the NDArray. An NDArray represents a multidimensional, fixed-size homogeneous array. It has very similar behavior to the Numpy python package with the addition of efficient computing. We also have a helper class, the NDList which is a list of NDArrays which can have different sizes and data types.

Block API

In DJL, Blocks serve a purpose similar to functions that convert an input NDList to an output NDList. They can represent single operations, parts of a neural network, and even the whole neural network. What makes blocks special is that they contain a number of parameters that are used in their function and are trained during deep learning. As these parameters are trained, the function represented by the blocks get more and more accurate.

When building these block functions, the easiest way is to use composition. Similar to how functions are built by calling other functions, blocks can be built by combining other blocks. We refer to the containing block as the parent and the sub-blocks as the children.

We provide several helpers to make it easy to build common block composition structures. For the MLP we will use the SequentialBlock, a container block whose children form a chain of blocks where each child block feeds its output to the next child block in a sequence.

SequentialBlock block = new SequentialBlock();

Step 4: Add blocks to SequentialBlock

An MLP is organized into several layers. Each layer is composed of a Linear Block and a non-linear activation function. If we just had two linear blocks in a row, it would be the same as a combined linear block ($f(x) = W_2(W_1x) = (W_2W_1)x = W_{combined}x$). An activation is used to intersperse between the linear blocks to allow them to represent non-linear functions. We will use the popular ReLU as our activation function.

The first layer and last layers have fixed sizes depending on your desired input and output size. However, you are free to choose the number and sizes of the middle layers in the network. We will create a smaller MLP with two middle layers that gradually decrease the size. Typically, you would experiment with different values to see what works the best on your data set.

block.add(Blocks.batchFlattenBlock(inputSize));
block.add(Linear.builder().setUnits(128).build());
block.add(Activation::relu);
block.add(Linear.builder().setUnits(64).build());
block.add(Activation::relu);
block.add(Linear.builder().setUnits(outputSize).build());

block
SequentialBlock {
    batchFlatten
    Linear
    LambdaBlock
    Linear
    LambdaBlock
    Linear
}

Summary

Now that you've successfully created your first neural network, you can use this network to train your model.

Next chapter: Train your first model

You can find the complete source code for this tutorial in the model zoo.