Chat Completions API Schema¶

This document describes the API schema for the chat completions endpoints (v1/chat/completions) when using the built-in inference handlers in LMI containers. This schema is applicable to our latest release, v0.34.0, and is compatible with OpenAI Chat Completions API. Documentation for previous releases is available on our GitHub on the relevant version branch (e.g. 0.30.0-dlc).

On SageMaker, Chat Completions API schema is supported with the /invocations endpoint without additional configurations. If the request contains the "messages" field, LMI will treat the request as a chat completions style request, and respond back with the chat completions response style.

When using the Chat Completions Schema, you should make sure that the model you are serving has a chat template. The chat template ensures that the payload is tokenized appropriately for your model. See the HuggingFace documentation on chat templates for more information.

This processing happens per request, meaning that you can support our standard schema, as well as chat completions schema in the same endpoint.

Request Schema¶

Request Body Fields:

Field Name	Field Type	Required	Possible Values
`messages`	array of Message	yes	See the Message documentation
`model`	string	no	example: "gpt2"
`frequency_penalty`	float	no	Float number between -2.0 and 2.0.
`logit_bias`	dict	no	example: {"2435": -100.0, "640": -100.0}
`logprobs`	boolean	no	`true`, `false` (default)
`top_logprobs`	int	no	Integer between 0 and 20.
`max_tokens`	int	no	Positive integer.
`n`	int	no	Positive integer.
`presence_penalty`	float	no	Float number between -2.0 and 2.0.
`seed`	int	no	Integer.
`stop`	string or array	no	example: ["test"]
`stream`	boolean	no	`true`, `false` (default)
`temperature`	float	no	Float number between 0.0 and 2.0.
`top_p`	float	no	Float number.
`user`	string	no	example: "test"
`ignore_eos`	boolean	no	`true`, `false` (default)
`tools`	array of Tools	no	See the Tools documentation
`tool_choice`	string or object	no	See the Tools documentation

Example request using curl

curl -X POST https://my.sample.endpoint.com/v1/chat/completions \
  -H "Content-type: application/json" \
  -d '{
    "messages": [
      {
        "role": "system",
        "content": "You are a helpful assistant."
      },
      {
        "role": "user",
        "content": "What is deep learning?"
      }
    ],
    "logprobs":true,
    "top_logprobs":1,
    "max_tokens":256
  }'

Response Schema¶

By default, streaming is not enabled. The response is returned as application/json content-type:

Field Name	Field Type	Always Returned	Possible Values
`id`	string	yes	Example: "chatcmpl-0"
`object`	string	yes	"chat.completion", "chat.completion.chunk"
`created`	int	yes	The timestamp of when the chat completion was created.
`choices`	array of Choice	yes	See the Choice documentation
`usage`	Usage	yes	See the Usage documentation

Example response:

{
   "id":"chatcmpl-0",
   "object":"chat.completion",
   "created":1711997860,
   "choices":[
      {
         "index":0,
         "message":{
            "role":"assistant",
            "content":"Hello! I\\'m here to help you with any questions you may have. Deep learning is a subfield of machine learning that involves the use"
         },
         "logprobs":null,
         "finish_reason":"length"
      }
   ],
   "usage":{
      "prompt_tokens":33,
      "completion_tokens":30,
      "total_tokens":63
   }
}

Chat Completions API supports streaming, and the response format for streaming differs from the response format for non-streaming.

To use streaming, set "stream": true.

The response is returned token by token as application/jsonlines content-type:

Field Name	Field Type	Always Returned	Possible Values
`id`	string	yes	Example: "chatcmpl-0"
`object`	string	yes	"chat.completion", "chat.completion.chunk"
`created`	int	yes	The timestamp of when the chat completion was created.
`choices`	array of StreamChoice	yes	See the StreamChoice documentation

Example response:

{
  "id": "chatcmpl-0", 
  "object": "chat.completion.chunk", 
  "created": 1712792433, 
  "choices": [
    {
      "index": 0, 
      "delta": {"content": " Oh", "role": "assistant"}, 
      "logprobs": [
        {
          "content": [
            {
              "token": " Oh", 
              "logprob": -4.499478340148926, 
              "bytes": [32, 79, 104], 
              "top_logprobs": [
                {
                  "token": -4.499478340148926, 
                  "logprob": -4.499478340148926, 
                  "bytes": [32, 79, 104]
                }
              ]
            }
          ]
        }
      ], 
      "finish_reason": null
    }
  ]
}
...
{
  "id": "chatcmpl-0", 
  "object": "chat.completion.chunk", 
  "created": 1712792436, 
  "choices": [
    {
      "index": 0, 
      "delta": {"content": " assist"}, 
      "logprobs": [
        {
          "content": [
            {
              "token": " assist", 
              "logprob": -1.019672155380249, 
              "bytes": [32, 97, 115, 115, 105, 115, 116], 
              "top_logprobs": [
                {
                  "token": -1.019672155380249, 
                  "logprob": -1.019672155380249, 
                  "bytes": [32, 97, 115, 115, 105, 115, 116]
                }
              ]
            }
          ]
        }
      ], 
      "finish_reason": "length"
    }
  ]
}

API Object Schemas¶

The following sections describe each of the request or response objects in more detail.

Message¶

The message object represents a single message of the conversation. It contains the following fields:

Field Name	Type	Description	Example
`role`	string enum	The role of the message	"system", "user", "assistant"
`content`	string or array of objects	The content of the message	Example: "You are a helpful assistant."

Example:

{
    "role":"system",
    "content":"You are a helpful assistant."
}

Vision/Image Support¶

Note: Image/MultiModal support is only enabled for the vLLM backend.

You can specify an image as part of the content when using a vision language model. Image data can either be specified as a url, or via a base64 encoding of the image data.

Example:

{
    "role": "user",
    "content": [
        {
            "type": "text",
            "text": "What is this an image of?"
        },
        {
            "type": "image_url",
            "image_url": {
                "url": "<base64 encoded image data | image url>"
            }
        }
    ]
}

We recommend that you use the base64 encoding to ensure no network failures occur when retrieving the image within the endpoint. Network calls to fetch images can increase latency and introduce another failure point.

Choice¶

The choice object represents a chat completion choice. It contains the following fields:

Field Name	Type	Description	Example
`index`	int	The index of the choice	0
`message`	Message	A chat completion message generated by the model.	See the Message documentation
`logprobs`	Logprobs	The log probability of the token	See the Logprobs documentation
`finish_reason`	string enum	The reason the model stopped generating tokens	"length", "eos_token", "stop_sequence"

Example:

{
    "index":0,
    "message":{
       "role":"assistant",
       "content":"Hello! I\\'m here to help you with any questions you may have. Deep learning is a subfield of machine learning that involves the use"
    },
    "logprobs":null,
    "finish_reason":"length"
}

StreamChoice¶

The choice object represents a chat completion choice. It contains the following fields:

Field Name	Type	Description	Example
`index`	int	The index of the choice	0
`delta`	Message	A chat completion delta generated by streamed model responses.	See the Message documentation
`logprobs`	Logprobs	The log probability of the token	See the Logprobs documentation
`finish_reason`	string enum	The reason the model stopped generating tokens	"length", "eos_token", "stop_sequence"

Example:

{
  "index": 0, 
  "delta": {"content": " Oh", "role": "assistant"}, 
  "logprobs": [
    {
      "content": [
        {
          "token": " Oh", 
          "logprob": -4.499478340148926, 
          "bytes": [32, 79, 104], 
          "top_logprobs": [
            {
              "token": -4.499478340148926, 
              "logprob": -4.499478340148926, 
              "bytes": [32, 79, 104]
            }
          ]
        }
      ]
    }
  ]
}

Logprobs¶

Log probability information for the choice. This is only available when enable logprobs. You must specify logprobs=true and set top_logprobs to a positive integer in the input parameters.

It contains a single field "content". Inside content, it contains the following fields:

Field Name	Type	Description	Example
`token`	string	The token generated.	" Oh"
`logprob`	float	The log probability of the generated token.	-4.499478340148926
`bytes`	array of float	The bytes representation of the token.	[32, 79, 104]
`top_logprobs`	array of TopLogprob	Array of the most likely tokens and their log probability.	See the TopLogprob documentation

Example:

{
   "content":[
      {
         "token":" ",
         "logprob":-4.768370445162873e-07,
         "bytes":[32],
         "top_logprobs":[
            {
               "token":" ",
               "logprob":-4.768370445162873e-07,
               "bytes":[32]
            }
         ]
      },
      {
         "token":" Deep",
         "logprob":-1.1869784593582153,
         "bytes":[32,68,101,101,112],
         "top_logprobs":[
            {
               "token":" Deep",
               "logprob":-1.1869784593582153,
               "bytes":[32,68,101,101,112]
            }
         ]
      }
   ]
}

TopLogprob¶

Top log probability information for the choice. It contains the following fields:

Field Name	Type	Description	Example
`token`	string	The token generated.	" Oh"
`logprob`	float	The log probability of the generated token.	-4.499478340148926
`bytes`	array of float	The bytes representation of the token.	[32, 79, 104]

{
    "token":" Deep",
    "logprob":-1.1869784593582153,
    "bytes":[32,68,101,101,112]
}

Usage¶

Usage statistics for the completion request. It contains the following fields:

Field Name	Type	Description	Example
`completion_tokens`	int	Number of tokens in the generated completion.	33
`prompt_tokens`	int	Number of tokens in the prompt.	100
`total_tokens`	int	Total number of tokens used in the request (prompt + completion).	133

Example:

"usage":{
    "prompt_tokens":33,
    "completion_tokens":100,
    "total_tokens":133
}

Tools¶

Note: Tool usage has only been validated for the vLLM backend.

The tools object provides function defitions for the model to use. Tools and function calling are only supported by vLLM currently. Please see the tool calling documentation for details on enabling function calling.

Field Name	Type	Description	Example
`type`	string	The type of tool	`function` is the only valid option currently
`function`	Function	The function definition	See the Function documentation

Function¶

Field Name	Type	Description	Example
`name`	string	the name of the function to be called	`get_current_weather`
`description`	string	description of what the function does	`gets the weather for a given location`
`parameters`	object	parameters of the function defined as a JSON Schema object	see the following example

When specifying tools, you can also optionally specify tool_choice in the payload. Valid options for tool_choice are none, auto, or object specifying which tool to use.

The following is an example payload demonstrating tool usage.

{
    "messages": [
        {
            "role": "user",
            "content": "Hi! How are you doing today?"
        }, {
            "role": "assistant",
            "content": "I'm doing well! How can I help you?"
        }, {
            "role": "user",
            "content": "Can you tell me what the temperate will be in Dallas, in fahrenheit?"
        }
    ],
    "tools": [
        {
            "type": "function",
            "function": {
                "name": "get_current_weather",
                "description": "Get the current weather in a given location",
                "parameters": {
                    "type": "object",
                    "properties": {
                        "city": {
                            "type":
                            "string",
                            "description":
                            "The city to find the weather for, e.g. 'San Francisco'"
                        },
                        "state": {
                            "type":
                            "string",
                            "description":
                            "the two-letter abbreviation for the state that the city is in, e.g. 'CA' which would mean 'California'"
                        },
                        "unit": {
                            "type": "string",
                            "description":
                            "The unit to fetch the temperature in",
                            "enum": ["celsius", "fahrenheit"]
                        }
                    },
                    "required": ["city", "state", "unit"]
                }
            }
        }
    ],
    "tool_choice": {
        "type": "function",
        "function": {
            "name": "get_current_weather"
        }
    }
}