Engine Configuration¶
This covers the available configurations for DJL and engines.
DJL settings¶
DJLServing build on top of Deep Java Library (DJL). Here is a list of settings for DJL:
Key | Type | Description |
---|---|---|
DJL_DEFAULT_ENGINE | env var/system prop | The preferred engine for DJL if there are multiple engines, default: MXNet |
ai.djl.default_engine | system prop | The preferred engine for DJL if there are multiple engines, default: MXNet |
DJL_CACHE_DIR | env var/system prop | The cache directory for DJL: default: $HOME/.djl.ai/ |
ENGINE_CACHE_DIR | env var/system prop | The cache directory for engine native libraries: default: $DJL_CACHE_DIR |
ai.djl.dataiterator.autoclose | system prop | Automatically close data set iterator, default: true |
ai.djl.repository.zoo.location | system prop | global model zoo search locations, not recommended |
DJL_OFFLINE | env var | Don't access network for downloading engine's native library and model zoo metadata |
ai.djl.offline | system prop | Don't access network for downloading engine's native library and model zoo metadata |
collect-memory | system prop | Enable memory metric collection, default: false |
disableProgressBar | system prop | Disable progress bar, default: false |
PyTorch¶
Key | Type | Description |
---|---|---|
PYTORCH_LIBRARY_PATH | env var/system prop | User provided custom PyTorch native library |
PYTORCH_VERSION | env var/system prop | PyTorch version to load |
PYTORCH_EXTRA_LIBRARY_PATH | env var/system prop | Custom pytorch library to load (e.g. torchneuron/torchvision/torchtext) |
PYTORCH_PRECXX11 | env var/system prop | Load precxx11 libtorch |
PYTORCH_FLAVOR | env var/system prop | To force override auto detection (e.g. cpu/cpu-precxx11/cu102/cu116-precxx11) |
PYTORCH_JIT_LOG_LEVEL | env var | Enable JIT logging |
ai.djl.pytorch.native_helper | system prop | A user provided custom loader class to help locate pytorch native resources |
ai.djl.pytorch.num_threads | system prop | Override OMP_NUM_THREAD environment variable |
ai.djl.pytorch.num_interop_threads | system prop | Set PyTorch interop threads |
ai.djl.pytorch.graph_optimizer | system prop | Enable/Disable JIT execution optimize, default: true. See: https://github.com/deepjavalibrary/djl/blob/master/docs/development/inference_performance_optimization.md#graph-optimizer |
ai.djl.pytorch.cudnn_benchmark | system prop | To speed up ConvNN related model loading, default: false |
ai.djl.pytorch.use_mkldnn | system prop | Enable MKLDNN, default: false, not recommended, use with your own risk |
TensorFlow¶
Key | Type | Description |
---|---|---|
TENSORFLOW_LIBRARY_PATH | env var/system prop | User provided custom TensorFlow native library |
TENSORRT_EXTRA_LIBRARY_PATH | env var/system prop | Extra TensorFlow custom operators library to load |
TF_CPP_MIN_LOG_LEVEL | env var | TensorFlow log level |
ai.djl.tensorflow.debug | env var | Enable devicePlacement logging, default: false |
MXNet¶
Key | Type | Description |
---|---|---|
MXNET_LIBRARY_PATH | env var/system prop | User provided custom MXNet native library |
MXNET_VERSION | env var/system prop | The version of custom MXNet build |
MXNET_EXTRA_LIBRARY_PATH | env var/system prop | Load extra MXNet custom libraries, e.g. Elastice Inference |
MXNET_EXTRA_LIBRARY_VERBOSE | env var/system prop | Set verbosity for MXNet custom library |
ai.djl.mxnet.static_alloc | system prop | CachedOp options, default: true |
ai.djl.mxnet.static_shape | system prop | CachedOp options, default: true |
ai.djl.use_local_parameter_server | system prop | Use java parameter server instead of MXNet native implemention, default: false |
Huggingface tokenizers¶
Key | Type | Description |
---|---|---|
TOKENIZERS_CACHE | env var | User provided custom Huggingface tokenizer native library |
Python¶
Key | Type | Description |
---|---|---|
PYTHON_EXECUTABLE | env var | The location is python executable, default: python |
DJL_ENTRY_POINT | env var | The entrypoint python file or module, default: model.py |
MODEL_LOADING_TIMEOUT | env var | Python worker load model timeout: default: 240 seconds |
PREDICT_TIMEOUT | env var | Python predict call timeout, default: 120 seconds |
MAX_NETTY_BUFFER_SIZE | env var/system prop | Max response size in bytes, default 20 * 1024 * 1024 (20M) |
DJL_VENV_DIR | env var/system prop | The venv directory, default: $DJL_CACHE_DIR/venv |
ai.djl.python.disable_alternative | system prop | Disable alternative engine |
TENSOR_PARALLEL_DEGREE | env var | Set tensor parallel degree. For mpi mode, the default is number of accelerators. Use "max" for non-mpi mode to use all GPUs for tensor parallel. |
Engine specific settings¶
DJL support 12 deep learning frameworks, each framework has their own settings. Please refer to each frameworkâs document for detail.
A common setting for most of the engines is OMP_NUM_THREADS
, for the best throughput,
DJLServing set this to 1 by default. For some engines (e.g. MXNet, this value must be one).
Since this is a global environment variable, setting this value will impact all other engines.
The follow table show some engine specific environment variables that is override by default by DJLServing:
Key | Engine | Description |
---|---|---|
TF_NUM_INTEROP_THREADS | TensorFlow | default 1, OMP_NUM_THREADS will override this value |
TF_NUM_INTRAOP_THREADS | TensorFlow | default 1 |
TF_CPP_MIN_LOG_LEVEL | TensorFlow | default 1 |
MXNET_ENGINE_TYPE | MXNet | this value must be NaiveEngine |