Skip to main content
POST
/
v1
/
messages
curl -X POST https://api-llm.sunra.ai/v1/messages \
  -H "Authorization: Bearer <SUNRA_KEY>" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "anthropic/claude-sonnet-4-20250514",
    "max_tokens": 1024,
    "messages": [
      {
        "role": "user",
        "content": "Hello, how are you?"
      }
    ]
  }'
{
  "id": "msg_01XFDUDYJgAACzvnptvVoYEL",
  "type": "message",
  "role": "assistant",
  "container": null,
  "content": [
    {
      "type": "text",
      "text": "Hello! I'm doing well, thank you for asking. How can I help you today?",
      "citations": null
    }
  ],
  "model": "anthropic/claude-sonnet-4-20250514",
  "stop_reason": "end_turn",
  "stop_sequence": null,
  "usage": {
    "input_tokens": 12,
    "output_tokens": 19,
    "cache_creation_input_tokens": null,
    "cache_read_input_tokens": null,
    "cache_creation": null,
    "inference_geo": null,
    "server_tool_use": null,
    "service_tier": null
  }
}
Creates a message using the Anthropic Messages API format. Supports text, images, PDFs, tools, and extended thinking.

Authentication

Authorization
string
required
Bearer token. Use your API key as the bearer token in the Authorization header.Format: Bearer <SUNRA_KEY>

Request

This endpoint expects an object.
model
string
required
The model that will complete your prompt. Browse available models at sunra.ai/models.
messages
object[] | null
required
Input messages. Each input message must be an object with a role and content. You can specify a single user-role message, or include multiple user and assistant messages for multi-turn conversations.
max_tokens
number
The maximum number of tokens to generate before stopping. Note that the model may stop before reaching this maximum. Different models have different maximum values for this parameter.
system
string | object[]
System prompt. A system prompt is a way of providing context and instructions to the model.Can be a string or an array of TextBlockParam objects, each containing text, type ("text"), optional cache_control, and optional citations.
stream
boolean
default:false
Whether to incrementally stream the response using server-sent events (SSE).
temperature
number
default:1
Amount of randomness injected into the response. Ranges from 0.0 to 1.0. Use temperature closer to 0.0 for analytical/multiple choice tasks, and closer to 1.0 for creative and generative tasks. Note that even with temperature of 0.0, the results will not be fully deterministic.
top_p
number
Use nucleus sampling. Computes the cumulative distribution over all options for each subsequent token in decreasing probability order and cuts it off once it reaches the probability specified by top_p. Recommended for advanced use cases only. You usually only need to use temperature.
top_k
number
Only sample from the top K options for each subsequent token. Used to remove “long tail” low probability responses. Recommended for advanced use cases only.
stop_sequences
string[]
Custom text sequences that will cause the model to stop generating. If the model encounters one of the custom sequences, the response stop_reason value will be "stop_sequence" and the response stop_sequence value will contain the matched stop sequence.
tools
object[]
Definitions of tools that the model may use. Supports custom tools, Anthropic built-in tools, and server tools.
tool_choice
object
How the model should use the provided tools. The model can use a specific tool, any available tool, decide by itself, or not use tools at all.
thinking
object
Configuration for enabling Claude’s extended thinking. When enabled, responses include thinking content blocks showing Claude’s thinking process before the final answer. Requires a minimum budget of 1,024 tokens.
output_config
object
Configuration for controlling output behavior. Supports the effort parameter and structured output format.
cache_control
object
Top-level cache control. Automatically applies a cache_control marker to the last cacheable block in the request.
service_tier
string
Determines whether to use priority capacity or standard capacity for this request. Supported values: auto, standard_only.
metadata
object
An object describing metadata about the request.

Response

Successful message response.
id
string
Unique message identifier, e.g. msg_01XFDUDYJgAACzvnptvVoYEL.
type
string
Object type. Always message.
role
string
Conversational role of the generated message. Always assistant.
container
object | null
Information about the container used in the request (for the code execution tool).
content
object[]
Content generated by the model. This is an array of content blocks, each of which has a type that determines its shape.
model
string
The model that handled the request.
stop_reason
string | null
The reason that the model stopped generating. Possible values:
  • end_turn — the model reached a natural stopping point
  • max_tokens — exceeded max_tokens or the model’s maximum
  • stop_sequence — one of your custom stop sequences was generated
  • tool_use — the model invoked one or more tools
  • pause_turn — a long-running turn was paused
  • refusal — streaming classifiers intervened for potential policy violations
stop_sequence
string | null
Which custom stop sequence was generated, if any.
usage
object
Billing and rate-limit usage.
curl -X POST https://api-llm.sunra.ai/v1/messages \
  -H "Authorization: Bearer <SUNRA_KEY>" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "anthropic/claude-sonnet-4-20250514",
    "max_tokens": 1024,
    "messages": [
      {
        "role": "user",
        "content": "Hello, how are you?"
      }
    ]
  }'
{
  "id": "msg_01XFDUDYJgAACzvnptvVoYEL",
  "type": "message",
  "role": "assistant",
  "container": null,
  "content": [
    {
      "type": "text",
      "text": "Hello! I'm doing well, thank you for asking. How can I help you today?",
      "citations": null
    }
  ],
  "model": "anthropic/claude-sonnet-4-20250514",
  "stop_reason": "end_turn",
  "stop_sequence": null,
  "usage": {
    "input_tokens": 12,
    "output_tokens": 19,
    "cache_creation_input_tokens": null,
    "cache_read_input_tokens": null,
    "cache_creation": null,
    "inference_geo": null,
    "server_tool_use": null,
    "service_tier": null
  }
}