Skip to content

Run this notebook online:Binder

PaddleOCR DJL example

In this tutorial, we will be using pretrained PaddlePaddle model from PaddleOCR to do Optical character recognition (OCR) from the given image. There are three models involved in this tutorial:

  • Word detection model: used to detect the word block from the image
  • Word direction model: used to find if the text needs to rotate
  • Word recognition model: Used to recognize test from the word block

Import dependencies and classes

PaddlePaddle is one of the Deep Engines that requires DJL hybrid mode to run inference. Itself does not contains NDArray operations and needs a supplemental DL framework to help with that. So we import Pytorch DL engine as well in here to do the processing works.

// %mavenRepo snapshots

%maven ai.djl:api:0.11.0
%maven ai.djl.paddlepaddle:paddlepaddle-model-zoo:0.11.0
%maven ai.djl.paddlepaddle:paddlepaddle-native-auto:2.0.2
%maven org.slf4j:slf4j-api:1.7.26
%maven org.slf4j:slf4j-simple:1.7.26

// second engine to do preprocessing and postprocessing
%maven ai.djl.pytorch:pytorch-engine:0.11.0
%maven ai.djl.pytorch:pytorch-native-auto:1.8.1
import ai.djl.*;
import ai.djl.inference.Predictor;
import ai.djl.modality.Classifications;
import ai.djl.ndarray.*;
import ai.djl.ndarray.types.DataType;
import ai.djl.ndarray.types.Shape;
import ai.djl.repository.zoo.*;
import ai.djl.translate.*;
import java.util.concurrent.ConcurrentHashMap;

the Image

Firstly, let's take a look at our sample image, a flight ticket:

String url = "";
Image img = ImageFactory.getInstance().fromUrl(url);

Word detection model

In our word detection model, we load the model exported from PaddleOCR. After that, we can spawn a DJL Predictor from it called detector.

var criteria1 = Criteria.builder()
                .setTypes(Image.class, DetectedObjects.class)
                .optTranslator(new PpWordDetectionTranslator(new ConcurrentHashMap<String, String>()))
var detectionModel = ModelZoo.loadModel(criteria1);
var detector = detectionModel.newPredictor();
[IJava-executor-0] INFO ai.djl.pytorch.engine.PtEngine - Number of inter-op threads is 1
[IJava-executor-0] INFO ai.djl.pytorch.engine.PtEngine - Number of intra-op threads is 2

Then, we can detect the word block from it. The original output from the model is a bitmap that marked all word regions. The PpWordDetectionTranslator convert the output bitmap into a rectangle bounded box for us to crop the image.

var detectedObj = detector.predict(img);
Image newImage = img.duplicate(Image.Type.TYPE_INT_ARGB);