Skip to main content
POST
/
v1
/
chat
/
completions
curl -X POST https://api-llm.sunra.ai/v1/chat/completions \
  -H "Authorization: Bearer <SUNRA_KEY>" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openai/gpt-4o",
    "messages": [
      {
        "role": "system",
        "content": "You are a helpful assistant."
      },
      {
        "role": "user",
        "content": "What is the capital of France?"
      }
    ]
  }'
{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1677652288,
  "model": "openai/gpt-4o",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "The capital of France is Paris.",
        "refusal": null
      },
      "finish_reason": "stop",
      "logprobs": null
    }
  ],
  "system_fingerprint": "fp_44709d6fcb",
  "usage": {
    "prompt_tokens": 25,
    "completion_tokens": 8,
    "total_tokens": 33,
    "prompt_tokens_details": null,
    "completion_tokens_details": null
  }
}
Sends a request for a model response for the given chat conversation. Supports both streaming and non-streaming modes, text, images, audio, video, files, function calling, reasoning, and structured outputs. Compatible with the OpenAI Chat Completions API format.

Authentication

Authorization
string
required
Bearer token. Use your API key as the bearer token in the Authorization header.Format: Bearer <SUNRA_KEY>

Request

This endpoint expects an object.
messages
object[]
required
List of messages for the conversation. Each message has a role and content.
model
string
required
The model to use for the completion. Browse available models at sunra.ai/models.
stream
boolean
default:false
If set to true, partial message deltas will be sent as server-sent events (SSE).
max_completion_tokens
number | null
Maximum tokens in completion. Replaces max_tokens as the preferred parameter.
max_tokens
number | null
Maximum tokens in completion. Deprecated — use max_completion_tokens instead. Note: some providers enforce a minimum of 16.
temperature
number | null
default:1
Sampling temperature between 0 and 2. Higher values like 0.8 make output more random, lower values like 0.2 make it more focused and deterministic.
top_p
number | null
default:1
Nucleus sampling parameter (0-1). An alternative to temperature sampling where the model considers the tokens with top_p probability mass.
frequency_penalty
number | null
default:0
Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model’s likelihood to repeat the same line verbatim.
presence_penalty
number | null
default:0
Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model’s likelihood to talk about new topics.
stop
string | string[]
Up to 4 sequences where the API will stop generating further tokens.
n
integer
default:1
How many chat completion choices to generate for each input message.
logprobs
boolean | null
default:false
Whether to return log probabilities of the output tokens.
top_logprobs
number | null
An integer between 0 and 20 specifying the number of most likely tokens to return at each token position. logprobs must be set to true if this parameter is used.
logit_bias
object | null
Token logit bias adjustments. Modify the likelihood of specified tokens appearing in the completion. Maps token IDs to bias values from -100 to 100.
reasoning
object
Configuration options for reasoning models.
response_format
object
An object specifying the format that the model must output.
seed
integer | null
If specified, the system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result.
tools
object[]
A list of tools the model may call.
tool_choice
string | object
Controls which (if any) tool is called by the model. none means no tool calls. auto means the model decides. required means the model must call a tool. Can also specify a particular function.
parallel_tool_calls
boolean | null
default:true
Whether to enable parallel function calling during tool use.
modalities
string[]
Output modalities for the response. Supported values: text, image, audio.
metadata
object
Key-value pairs for additional object information. Maximum 16 pairs, 64 character keys, 512 character values.
cache_control
object
Enable automatic prompt caching. When set, the system automatically applies cache breakpoints to the last cacheable block in the request. Currently supported for Anthropic Claude models.
user
string
A unique identifier representing your end-user, which can help monitor and detect abuse.

Response

Successful chat completion response.
id
string
A unique identifier for the chat completion.
object
string
The object type. Always chat.completion.
created
number
The Unix timestamp (in seconds) of when the chat completion was created.
model
string
The model used for the chat completion.
choices
object[]
A list of chat completion choices. Can be more than one if n is greater than 1.
usage
object
Usage statistics for the completion request.
system_fingerprint
string | null
This fingerprint represents the backend configuration that the model runs with. Can be used with the seed parameter to understand when backend changes have been made.
curl -X POST https://api-llm.sunra.ai/v1/chat/completions \
  -H "Authorization: Bearer <SUNRA_KEY>" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openai/gpt-4o",
    "messages": [
      {
        "role": "system",
        "content": "You are a helpful assistant."
      },
      {
        "role": "user",
        "content": "What is the capital of France?"
      }
    ]
  }'
{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1677652288,
  "model": "openai/gpt-4o",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "The capital of France is Paris.",
        "refusal": null
      },
      "finish_reason": "stop",
      "logprobs": null
    }
  ],
  "system_fingerprint": "fp_44709d6fcb",
  "usage": {
    "prompt_tokens": 25,
    "completion_tokens": 8,
    "total_tokens": 33,
    "prompt_tokens_details": null,
    "completion_tokens_details": null
  }
}