BYOC instruction for using LMI container on SageMaker¶

In this tutorial, you will bring your own container from docker hub to SageMaker and run inference with it. Please make sure the following permission granted before running the notebook:

ECR Push/Pull access
S3 bucket push access
SageMaker access

Step 1: Let's bump up SageMaker and import stuff¶

%pip install sagemaker boto3 awscli --upgrade  --quiet

import boto3
import sagemaker
from sagemaker import Model, serializers, deserializers

role = sagemaker.get_execution_role()  # execution role for the endpoint
sess = sagemaker.session.Session()  # sagemaker session for interacting with different AWS APIs
region = sess._region_name  # region name of the current SageMaker Studio environment
account_id = sess.account_id()  # account_id of the current SageMaker Studio environment

Step 2 pull and push the docker from Docker hub to ECR repository¶

Note: Please make sure you have the permission in AWS credential to push to ECR repository

This process may take a while, depends on the container size and your network bandwidth

%%sh

# The name of our container
repo_name=djlserving-byoc
# Target container
target_container="deepjavalibrary/djl-serving:0.27.0-deepspeed"

account=$(aws sts get-caller-identity --query Account --output text)

# Get the region defined in the current configuration (default to us-west-2 if none defined)
region=$(aws configure get region)
region=${region:-us-west-2}

fullname="${account}.dkr.ecr.${region}.amazonaws.com/${repo_name}:latest"
echo "Creating ECR repository ${fullname}"

# If the repository doesn't exist in ECR, create it.

aws ecr describe-repositories --repository-names "${repo_name}" &gt; /dev/null 2&gt;&amp;1

if [ $? -ne 0 ]
then
    aws ecr create-repository --repository-name "${repo_name}" &gt; /dev/null
fi

# Get the login command from ECR and execute it directly
aws ecr get-login-password --region ${region} | docker login --username AWS --password-stdin "${account}.dkr.ecr.${region}.amazonaws.com"

# Build the docker image locally with the image name and then push it to ECR
# with the full name.
echo "Start pulling container: ${target_container}"

docker pull ${target_container}
docker tag ${target_container} ${fullname}
docker push ${fullname}

Step 3: Start preparing model artifacts¶

In LMI contianer, we expect some artifacts to help setting up the model - serving.properties (required): Defines the model server settings - model.py (optional): A python file to define the core inference logic - requirements.txt (optional): Any additional pip wheel need to install

%%writefile serving.properties
# Start writing content here

%%writefile model.py
# Start writing content here (remove this cell if not used)

%%writefile requirements.txt
# Start writing content here (remove this file if not neeed)

%%sh
mkdir mymodel
mv serving.properties mymodel/
# remove the following lines if not needed
mv model.py mymodel/
mv requirements.txt mymodel/
tar czvf mymodel.tar.gz mymodel/
rm -rf mymodel

Step 4: Start building SageMaker endpoint¶

In this step, we will build SageMaker endpoint from scratch

4.1 Upload artifact on S3 and create SageMaker model¶

s3_code_prefix = "large-model-lmi/code"
bucket = sess.default_bucket()  # bucket to house artifacts
code_artifact = sess.upload_data("mymodel.tar.gz", bucket, s3_code_prefix)
print(f"S3 Code or Model tar ball uploaded to --- &gt; {code_artifact}")

repo_name="djlserving-byoc"
image_uri = f"{account_id}.dkr.ecr.{region}.amazonaws.com/{repo_name}:latest"
env = {"HUGGINGFACE_HUB_CACHE": "/tmp", "TRANSFORMERS_CACHE": "/tmp"}

model = Model(image_uri=image_uri, model_data=code_artifact, env=env, role=role)

4.2 Create SageMaker endpoint¶

You need to specify the instance to use and endpoint names

instance_type = "ml.g4dn.4xlarge"
endpoint_name = sagemaker.utils.name_from_base("lmi-model")

model.deploy(initial_instance_count=1,
             instance_type=instance_type,
             endpoint_name=endpoint_name,
             # container_startup_health_check_timeout=3600
            )

# our requests and responses will be in json format so we specify the serializer and the deserializer
predictor = sagemaker.Predictor(
    endpoint_name=endpoint_name,
    sagemaker_session=sess,
    serializer=serializers.JSONSerializer(),
    deserializer=deserializers.JSONDeserializer(),
)

Step 5: Test and benchmark the inference¶

%%timeit -n3 -r1
predictor.predict(
    {"inputs": "Large model inference is", "parameters": {}}
)

Clean up the environment¶

sess.delete_endpoint(endpoint_name)
sess.delete_endpoint_config(endpoint_name)
model.delete_model()