Engine Configuration¶
This covers the available configurations for DJL and engines.
DJL settings¶
DJLServing build on top of Deep Java Library (DJL). Here is a list of settings for DJL:
| Key | Type | Description |
|---|---|---|
| DJL_DEFAULT_ENGINE | env var/system prop | The preferred engine for DJL if there are multiple engines, default: PyTorch |
| ai.djl.default_engine | system prop | The preferred engine for DJL if there are multiple engines, default: PyTorch |
| DJL_CACHE_DIR | env var/system prop | The cache directory for DJL: default: $HOME/.djl.ai/ |
| ENGINE_CACHE_DIR | env var/system prop | The cache directory for engine native libraries: default: $DJL_CACHE_DIR |
| ai.djl.dataiterator.autoclose | system prop | Automatically close data set iterator, default: true |
| ai.djl.repository.zoo.location | system prop | global model zoo search locations, not recommended |
| DJL_OFFLINE | env var | Don't access network for downloading engine's native library and model zoo metadata |
| ai.djl.offline | system prop | Don't access network for downloading engine's native library and model zoo metadata |
| collect-memory | system prop | Enable memory metric collection, default: false |
| disableProgressBar | system prop | Disable progress bar, default: false |
PyTorch¶
| Key | Type | Description |
|---|---|---|
| PYTORCH_LIBRARY_PATH | env var/system prop | User provided custom PyTorch native library |
| PYTORCH_VERSION | env var/system prop | PyTorch version to load |
| PYTORCH_EXTRA_LIBRARY_PATH | env var/system prop | Custom pytorch library to load (e.g. torchneuron/torchvision/torchtext) |
| PYTORCH_PRECXX11 | env var/system prop | Load precxx11 libtorch |
| PYTORCH_FLAVOR | env var/system prop | To force override auto detection (e.g. cpu/cpu-precxx11/cu102/cu116-precxx11) |
| PYTORCH_JIT_LOG_LEVEL | env var | Enable JIT logging |
| ai.djl.pytorch.native_helper | system prop | A user provided custom loader class to help locate pytorch native resources |
| ai.djl.pytorch.num_threads | system prop | Override OMP_NUM_THREAD environment variable |
| ai.djl.pytorch.num_interop_threads | system prop | Set PyTorch interop threads |
| ai.djl.pytorch.graph_optimizer | system prop | Enable/Disable JIT execution optimize, default: true. See: https://github.com/deepjavalibrary/djl/blob/master/docs/development/inference_performance_optimization.md#graph-optimizer |
| ai.djl.pytorch.cudnn_benchmark | system prop | To speed up ConvNN related model loading, default: false |
| ai.djl.pytorch.use_mkldnn | system prop | Enable MKLDNN, default: false, not recommended, use with your own risk |
TensorFlow¶
| Key | Type | Description |
|---|---|---|
| TENSORFLOW_LIBRARY_PATH | env var/system prop | User provided custom TensorFlow native library |
| TENSORRT_EXTRA_LIBRARY_PATH | env var/system prop | Extra TensorFlow custom operators library to load |
| TF_CPP_MIN_LOG_LEVEL | env var | TensorFlow log level |
| ai.djl.tensorflow.debug | env var | Enable devicePlacement logging, default: false |
Huggingface tokenizers¶
| Key | Type | Description |
|---|---|---|
| TOKENIZERS_CACHE | env var | User provided custom Huggingface tokenizer native library |
Python¶
| Key | Type | Description |
|---|---|---|
| PYTHON_EXECUTABLE | env var | The location is python executable, default: python |
| DJL_ENTRY_POINT | env var | The entrypoint python file or module, default: model.py |
| MODEL_LOADING_TIMEOUT | env var | Python worker load model timeout: default: 240 seconds |
| PREDICT_TIMEOUT | env var | Python predict call timeout, default: 120 seconds |
| MAX_NETTY_BUFFER_SIZE | env var/system prop | Max response size in bytes, default 20 * 1024 * 1024 (20M) |
| DJL_VENV_DIR | env var/system prop | The venv directory, default: $DJL_CACHE_DIR/venv |
| ai.djl.python.disable_alternative | system prop | Disable alternative engine |
| TENSOR_PARALLEL_DEGREE | env var | Set tensor parallel degree. For mpi mode, the default is number of accelerators. Use "max" for non-mpi mode to use all GPUs for tensor parallel. |
Engine specific settings¶
DJL support 12 deep learning frameworks, each framework has their own settings. Please refer to each frameworkâs document for detail.
A common setting for most of the engines is OMP_NUM_THREADS, for the best throughput,
DJLServing set this to 1 by default.
Since this is a global environment variable, setting this value will impact all other engines.
The follow table show some engine specific environment variables that is override by default by DJLServing:
| Key | Engine | Description |
|---|---|---|
| TF_NUM_INTEROP_THREADS | TensorFlow | default 1, OMP_NUM_THREADS will override this value |
| TF_NUM_INTRAOP_THREADS | TensorFlow | default 1 |
| TF_CPP_MIN_LOG_LEVEL | TensorFlow | default 1 |