Creates a streaming or non-streaming response using the OpenAI Responses API format. Supports text, images, files, audio, video, function calling, web search, file search, code interpreter, reasoning, and more.
Authentication
Bearer token. Use your API key as the bearer token in the Authorization header. Format: Bearer <SUNRA_KEY>
Request
This endpoint expects an object.
Model ID used to generate the response. Browse available models at sunra.ai/models .
Input for the response request. Can be a string or an array of input items including messages, function calls, function call outputs, reasoning items, and output messages.
Inserts a system (or developer) message as the first item in the model’s context. When used with input, the instructions are inserted at the start of the input.
If set to true, the response will be streamed using server-sent events (SSE).
An upper bound for the number of output tokens, including visible output tokens and reasoning tokens.
Sampling temperature between 0 and 2. Higher values increase randomness.
Nucleus sampling parameter. An alternative to sampling with temperature.
Sample only from the top K options for each subsequent token. Used to remove “long tail” low-probability responses.
Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text.
Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they already appear in the text.
An integer specifying the number of most likely tokens to return at each token position.
Maximum number of tool calls the model can make in a single response.
An array of tools the model may call. Function
Web Search Preview
Web Search
File Search
Computer Use Preview
Code Interpreter
MCP
Image Generation
The name of the function.
A description of the function.
A JSON Schema object defining the function parameters.
Whether strict schema adherence is enabled.
Value: web_search_preview or web_search_preview_2025_03_11.
Size of the search context. Supported values: low, medium, high.
User location information for search personalization. Value: web_search or web_search_2025_08_26.
Size of the search context. Supported values: low, medium, high.
Domain filters for search results. List of allowed domains to restrict search results to.
User location information for search personalization.
IDs of vector stores to search.
Filters for file search. Can be a comparison filter (eq, ne, gt, gte, lt, lte) or a compound filter (and, or).
Maximum number of results to return.
Ranking options for search results. Ranker to use. Supported values: auto, default-2024-11-15.
Minimum score threshold for results.
Value: computer_use_preview.
Display height in pixels.
The environment. Supported values: windows, mac, linux, ubuntu, browser.
Container configuration. Can be a container ID string or an object. Show properties (when object)
File IDs to make available in the container.
Memory limit. Supported values: 1g, 4g, 16g, 64g.
A label for the MCP server.
The URL of the MCP server.
Tools the model is allowed to use from this server.
Approval requirements for tool calls. String values: always, never. Can also be an object with never and always lists.
Custom headers to include in requests to the MCP server.
Description of the MCP server.
Background type. Supported values: transparent, opaque, auto.
Model to use. Supported values: gpt-image-1, gpt-image-1-mini.
Image quality. Supported values: low, medium, high, auto.
Image size. Supported values: 1024x1024, 1024x1536, 1536x1024, auto.
Output format. Supported values: png, webp, jpeg.
Moderation level. Supported values: auto, low.
Compression level for output.
Number of partial images to return during generation.
Controls tool selection behavior. String values: none, auto, required. Can also specify a particular function or tool type. function
web_search_preview
The name of the function to use.
Value: web_search_preview or web_search_preview_2025_03_11.
Whether to allow the model to run tool calls in parallel.
Configuration for text response format. The text format configuration. text
json_object
json_schema
The name of the response format.
The JSON schema definition.
Description of the schema.
Whether strict schema adherence is enabled.
Controls the verbosity of the text output. Supported values: high, medium, low.
Configuration for reasoning output. Constrains effort on reasoning. Supported values: xhigh, high, medium, low, minimal, none.
Controls reasoning summary verbosity. Supported values: auto, concise, detailed.
Maximum number of tokens for reasoning.
Whether reasoning is enabled.
Output modalities for the response. Supported values: text, image.
The ID of a previous response to use as context for this request.
Additional fields to include in the response. Supported values: file_search_call.results, message.input_image.image_url, computer_call_output.output.image_url, reasoning.encrypted_content, code_interpreter_call.outputs.
Whether to store the generated response for later retrieval.
The service tier to use for this request. Supported values: auto.
Truncation strategy. Supported values: auto, disabled.
Whether to run the request in the background.
Set of key-value pairs that can be attached to the response. Keys must be ≤64 characters. Values must be ≤512 characters. Maximum 16 pairs allowed.
A unique identifier representing your end-user. Maximum of 128 characters.
Response
Successful response object.
Unique response identifier.
The object type. Always response.
Unix timestamp (in seconds) of when the response was created.
Unix timestamp (in seconds) of when the response completed.
The status of the response. Possible values: completed, incomplete, in_progress, failed, cancelled, queued.
The model used for generating the response.
An array of output items generated by the model. OutputMessage
Reasoning
FunctionCall
WebSearchCall
FileSearchCall
ImageGenerationCall
The unique ID of the output message.
Status of the message. Possible values: completed, incomplete, in_progress.
The content of the output message. The generated text content.
Annotations for the content. Types include:
file_citation: {type, file_id, filename, index}
url_citation: {type, url, title, start_index, end_index}
file_path: {type, file_id, index}
Log probability information for output tokens. Each item contains token, bytes, logprob, and top_logprobs.
The phase of the message. Possible values: commentary, final_answer.
The unique ID of the reasoning item.
Array of reasoning text items, each with type: "reasoning_text" and text.
Array of reasoning summary items, each with type: "summary_text" and text.
Encrypted reasoning content.
Status. Possible values: completed, incomplete, in_progress.
A signature for the reasoning content, used for verification.
The format of the reasoning content. Possible values: unknown, openai-responses-v1, azure-openai-responses-v1, xai-responses-v1, anthropic-claude-v1, google-gemini-v1.
The unique ID of the function call.
The name of the function called.
The arguments in JSON string format.
The call ID for matching with function call output.
Status. Possible values: completed, incomplete, in_progress.
The unique ID of the web search call.
The search action. Types include:
search: {type, query, queries?, sources?}
open_page: {type, url}
find_in_page: {type, pattern, url}
Status. Possible values: completed, searching, in_progress, failed.
The unique ID of the file search call.
Status. Possible values: completed, searching, in_progress, failed.
Value: image_generation_call.
The unique ID of the image generation call.
The generated image data (base64).
Status. Possible values: in_progress, completed, generating, failed.
Convenience field containing the concatenated text output from all output messages.
Details about why the response is incomplete, if applicable. The reason. Possible values: max_output_tokens, content_filter.
An error object if the generation failed. Error code. Possible values: server_error, rate_limit_exceeded, invalid_prompt, vector_store_timeout, invalid_image, invalid_image_format, invalid_base64_image, invalid_image_url, image_too_large, image_too_small, image_parse_error, image_content_policy_violation, invalid_image_mode, image_file_too_large, unsupported_image_media_type, empty_image_file, failed_to_download_image, image_file_not_found.
Human-readable error message.
Token usage statistics for the response. The number of input tokens.
The number of output tokens.
The total number of tokens.
Breakdown of input tokens. The number of cached tokens.
Breakdown of output tokens. The number of reasoning tokens.
The sampling temperature used.
The nucleus sampling value used.
The max output tokens setting used.
The top logprobs setting used.
The max tool calls setting used.
The presence penalty used.
The frequency penalty used.
The instructions/system message used.
The metadata attached to the response.
The tools configuration used.
The tool choice configuration used.
Whether parallel tool calls was enabled.
The reasoning configuration used.
The service tier used. Possible values: auto, default, flex, priority, scale.
Whether the response was stored.
The truncation strategy used. Possible values: auto, disabled.
The text format configuration used.
The ID of the previous response used as context.
Whether the request ran in the background.
curl -X POST https://api-llm.sunra.ai/v1/responses \
-H "Authorization: Bearer <SUNRA_KEY>" \
-H "Content-Type: application/json" \
-d '{
"model": "openai/gpt-4o",
"input": [
{
"type": "message",
"role": "user",
"content": "Hello, how are you?"
}
]
}'
{
"id" : "resp-abc123" ,
"object" : "response" ,
"created_at" : 1704067200 ,
"completed_at" : 1704067201 ,
"status" : "completed" ,
"model" : "openai/gpt-4o" ,
"output" : [
{
"type" : "message" ,
"id" : "msg_abc123" ,
"role" : "assistant" ,
"status" : "completed" ,
"content" : [
{
"type" : "output_text" ,
"text" : "Hello! I'm doing well, thank you for asking. How can I help you today?" ,
"annotations" : []
}
]
}
],
"output_text" : "Hello! I'm doing well, thank you for asking. How can I help you today?" ,
"incomplete_details" : null ,
"error" : null ,
"temperature" : 1.0 ,
"top_p" : 1.0 ,
"max_output_tokens" : null ,
"top_logprobs" : 0 ,
"presence_penalty" : null ,
"frequency_penalty" : null ,
"instructions" : null ,
"metadata" : {},
"tools" : [],
"tool_choice" : "auto" ,
"parallel_tool_calls" : true ,
"reasoning" : null ,
"service_tier" : "auto" ,
"store" : true ,
"truncation" : "disabled" ,
"text" : {
"format" : {
"type" : "text"
}
},
"usage" : {
"input_tokens" : 15 ,
"output_tokens" : 18 ,
"total_tokens" : 33 ,
"input_tokens_details" : {
"cached_tokens" : 0
},
"output_tokens_details" : {
"reasoning_tokens" : 0
}
}
}