Skip to content

Run this notebook online:Binder

Create your first deep learning neural network

Introduction

This is the first part of our beginner tutorial series that will take you through creating, training, and running inference on a neural network. In this part, you will learn how to use the built-in Block to create your first neural network - a Multilayer Perceptron.

Neural Network

A neural network is a black box function. Instead of coding this function yourself, you provide many sample input/output pairs for this function. Then, we try to train the network to learn how to best approximate the observed behavior of the function given only these input/output pairs. A better model with more data can more accurately approximate the function.

Multilayer Perceptron

A Multilayer Perceptron (MLP) is one of the simplest and oldest deep learning networks. The MLP has an input layer which contains your input data, an output layer which is produced by the network, and some number of intermediate hidden layers.

The example below contains an input of size 3, a single hidden layer of size 3, and an output of size 2. The number and sizes of the hidden layers are determined through experimentation, but more layers enable the network to represent more complicated functions. Between each pair of layers is a linear operation (sometimes called a FullyConnected operation because each number in the input is connected to each number in the output by a matrix multiplication). Not pictured, there is also a non-linear activation function after each linear operation. For more information, see the Multilayer Perceptron chapter of the D2l DJL book.

MLP Image

Step 1: Setup development environment

Installation

This tutorial requires the installation of the Java Jupyter Kernel. To install the kernel, see the Jupyter README.

// Add the snapshot repository to get the DJL snapshot artifacts
// %mavenRepo snapshots https://oss.sonatype.org/content/repositories/snapshots/

// Add the maven dependencies
%maven ai.djl:api:0.8.0
%maven org.slf4j:slf4j-api:1.7.26
%maven org.slf4j:slf4j-simple:1.7.26

// See https://github.com/awslabs/djl/blob/master/mxnet/mxnet-engine/README.md
// for more MXNet library selection options
%maven ai.djl.mxnet:mxnet-native-auto:1.7.0-backport
import ai.djl.*;
import ai.djl.nn.*;
import ai.djl.nn.core.*;
import ai.djl.training.*;

Step 2: Determine your input and output size

The MLP model uses a one dimensional vector as the input and the output. You should determine the appropriate size of this vector based on your input data and what you will use the output of the model for. In a later tutorial, we will use this model for Mnist image classification.

Our input vector will have size 28x28 because the input images have a height and width of 28 and it takes only a single number to represent each pixel. For a color image, you would need to further multiply this by 3 for the RGB channels.

Our output vector has size 10 because there are 10 possible classes for each image.

long inputSize = 28*28;
long outputSize = 10;

Step 3: Create a SequentialBlock

NDArray

The core data type used for working with deep learning is the NDArray. An NDArray represents a multidimensional, fixed-size homogeneous array. It has very similar behavior to the Numpy python package with the addition of efficient computing. We also have a helper class, the NDList which is a list of NDArrays which can have different sizes and data types.

Block API

In DJL, Blocks serve a purpose similar to functions that convert an input NDList to an output NDList. They can represent single operations, parts of a neural network, and even the whole neural network. What makes blocks special is that they contain a number of parameters that are used in their function and are trained during deep learning. As these parameters are trained, the function represented by the blocks get more and more accurate.

When building these block functions, the easiest way is to use composition. Similar to how functions are built by calling other functions, blocks can be built by combining other blocks. We refer to the containing block as the parent and the sub-blocks as the children.

We provide several helpers to make it easy to build common block composition structures. For the MLP we will use the SequentialBlock, a container block whose children form a chain of blocks where each child block feeds its output to the next child block in a sequence.

SequentialBlock block = new SequentialBlock();

Step 4: Add blocks to SequentialBlock

An MLP is organized into several layers. Each layer is composed of a Linear Block and a non-linear activation function. If we just had two linear blocks in a row, it would be the same as a combined linear block ($f(x) = W_2(W_1x) = (W_2W_1)x = W_{combined}x$). An activation is used to intersperse between the linear blocks to allow them to represent non-linear functions. We will use the popular ReLU as our activation function.

The first layer and last layers have fixed sizes depending on your desired input and output size. However, you are free to choose the number and sizes of the middle layers in the network. We will create a smaller MLP with two middle layers that gradually decrease the size. Typically, you would experiment with different values to see what works the best on your data set.

block.add(Blocks.batchFlattenBlock(inputSize));
block.add(Linear.builder().setUnits(128).build());
block.add(Activation::relu);
block.add(Linear.builder().setUnits(64).build());
block.add(Activation::relu);
block.add(Linear.builder().setUnits(outputSize).build());

block
Sequential(
    Lambda()
    Linear(Uninitialized)
    Lambda()
    Linear(Uninitialized)
    Lambda()
    Linear(Uninitialized)
)

Summary

Now that you've successfully created your first neural network, you can use this network to train your model.

Next chapter: Train your first model

You can find the complete source code for this tutorial in the model zoo.