LLM
Create a message
POST
Creates a message using the Anthropic Messages API format. Supports text, images, PDFs, tools, and extended thinking.
Authentication
Bearer token. Use your API key as the bearer token in the Authorization header.Format:
Bearer <SUNRA_KEY>Request
This endpoint expects an object.The model that will complete your prompt. Browse available models at sunra.ai/models.
Input messages. Each input message must be an object with a
role and content. You can specify a single user-role message, or include multiple user and assistant messages for multi-turn conversations.The maximum number of tokens to generate before stopping. Note that the model may stop before reaching this maximum. Different models have different maximum values for this parameter.
System prompt. A system prompt is a way of providing context and instructions to the model.Can be a string or an array of
TextBlockParam objects, each containing text, type ("text"), optional cache_control, and optional citations.Whether to incrementally stream the response using server-sent events (SSE).
Amount of randomness injected into the response. Ranges from 0.0 to 1.0. Use
temperature closer to 0.0 for analytical/multiple choice tasks, and closer to 1.0 for creative and generative tasks. Note that even with temperature of 0.0, the results will not be fully deterministic.Use nucleus sampling. Computes the cumulative distribution over all options for each subsequent token in decreasing probability order and cuts it off once it reaches the probability specified by
top_p. Recommended for advanced use cases only. You usually only need to use temperature.Only sample from the top K options for each subsequent token. Used to remove “long tail” low probability responses. Recommended for advanced use cases only.
Custom text sequences that will cause the model to stop generating. If the model encounters one of the custom sequences, the response
stop_reason value will be "stop_sequence" and the response stop_sequence value will contain the matched stop sequence.Definitions of tools that the model may use. Supports custom tools, Anthropic built-in tools, and server tools.
How the model should use the provided tools. The model can use a specific tool, any available tool, decide by itself, or not use tools at all.
Configuration for enabling Claude’s extended thinking. When enabled, responses include
thinking content blocks showing Claude’s thinking process before the final answer. Requires a minimum budget of 1,024 tokens.Configuration for controlling output behavior. Supports the effort parameter and structured output format.
Top-level cache control. Automatically applies a cache_control marker to the last cacheable block in the request.
Determines whether to use priority capacity or standard capacity for this request. Supported values:
auto, standard_only.An object describing metadata about the request.
Response
Successful message response.Unique message identifier, e.g.
msg_01XFDUDYJgAACzvnptvVoYEL.Object type. Always
message.Conversational role of the generated message. Always
assistant.Information about the container used in the request (for the code execution tool).
Content generated by the model. This is an array of content blocks, each of which has a
type that determines its shape.The model that handled the request.
The reason that the model stopped generating. Possible values:
end_turn— the model reached a natural stopping pointmax_tokens— exceededmax_tokensor the model’s maximumstop_sequence— one of your custom stop sequences was generatedtool_use— the model invoked one or more toolspause_turn— a long-running turn was pausedrefusal— streaming classifiers intervened for potential policy violations
Which custom stop sequence was generated, if any.
Billing and rate-limit usage.