Skip to main content
POST
/
v1
/
responses
curl -X POST https://api-llm.sunra.ai/v1/responses \
  -H "Authorization: Bearer <SUNRA_KEY>" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openai/gpt-4o",
    "input": [
      {
        "type": "message",
        "role": "user",
        "content": "Hello, how are you?"
      }
    ]
  }'
{
  "id": "resp-abc123",
  "object": "response",
  "created_at": 1704067200,
  "completed_at": 1704067201,
  "status": "completed",
  "model": "openai/gpt-4o",
  "output": [
    {
      "type": "message",
      "id": "msg_abc123",
      "role": "assistant",
      "status": "completed",
      "content": [
        {
          "type": "output_text",
          "text": "Hello! I'm doing well, thank you for asking. How can I help you today?",
          "annotations": []
        }
      ]
    }
  ],
  "output_text": "Hello! I'm doing well, thank you for asking. How can I help you today?",
  "incomplete_details": null,
  "error": null,
  "temperature": 1.0,
  "top_p": 1.0,
  "max_output_tokens": null,
  "top_logprobs": 0,
  "presence_penalty": null,
  "frequency_penalty": null,
  "instructions": null,
  "metadata": {},
  "tools": [],
  "tool_choice": "auto",
  "parallel_tool_calls": true,
  "reasoning": null,
  "service_tier": "auto",
  "store": true,
  "truncation": "disabled",
  "text": {
    "format": {
      "type": "text"
    }
  },
  "usage": {
    "input_tokens": 15,
    "output_tokens": 18,
    "total_tokens": 33,
    "input_tokens_details": {
      "cached_tokens": 0
    },
    "output_tokens_details": {
      "reasoning_tokens": 0
    }
  }
}
Creates a streaming or non-streaming response using the OpenAI Responses API format. Supports text, images, files, audio, video, function calling, web search, file search, code interpreter, reasoning, and more.

Authentication

Authorization
string
required
Bearer token. Use your API key as the bearer token in the Authorization header.Format: Bearer <SUNRA_KEY>

Request

This endpoint expects an object.
model
string
required
Model ID used to generate the response. Browse available models at sunra.ai/models.
input
string | object[]
Input for the response request. Can be a string or an array of input items including messages, function calls, function call outputs, reasoning items, and output messages.
instructions
string | null
Inserts a system (or developer) message as the first item in the model’s context. When used with input, the instructions are inserted at the start of the input.
stream
boolean
default:false
If set to true, the response will be streamed using server-sent events (SSE).
max_output_tokens
number | null
An upper bound for the number of output tokens, including visible output tokens and reasoning tokens.
temperature
number | null
Sampling temperature between 0 and 2. Higher values increase randomness.
top_p
number | null
Nucleus sampling parameter. An alternative to sampling with temperature.
top_k
number
Sample only from the top K options for each subsequent token. Used to remove “long tail” low-probability responses.
frequency_penalty
number | null
Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text.
presence_penalty
number | null
Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they already appear in the text.
top_logprobs
integer | null
An integer specifying the number of most likely tokens to return at each token position.
max_tool_calls
integer | null
Maximum number of tool calls the model can make in a single response.
tools
object[]
An array of tools the model may call.
tool_choice
string | object
Controls tool selection behavior. String values: none, auto, required. Can also specify a particular function or tool type.
parallel_tool_calls
boolean | null
Whether to allow the model to run tool calls in parallel.
text
object
Configuration for text response format.
reasoning
object
Configuration for reasoning output.
modalities
string[]
Output modalities for the response. Supported values: text, image.
previous_response_id
string | null
The ID of a previous response to use as context for this request.
include
string[]
Additional fields to include in the response. Supported values: file_search_call.results, message.input_image.image_url, computer_call_output.output.image_url, reasoning.encrypted_content, code_interpreter_call.outputs.
store
boolean
Whether to store the generated response for later retrieval.
service_tier
string
The service tier to use for this request. Supported values: auto.
truncation
string
Truncation strategy. Supported values: auto, disabled.
background
boolean | null
Whether to run the request in the background.
metadata
object
Set of key-value pairs that can be attached to the response. Keys must be ≤64 characters. Values must be ≤512 characters. Maximum 16 pairs allowed.
user
string
A unique identifier representing your end-user. Maximum of 128 characters.

Response

Successful response object.
id
string
Unique response identifier.
object
string
The object type. Always response.
created_at
number
Unix timestamp (in seconds) of when the response was created.
completed_at
number | null
Unix timestamp (in seconds) of when the response completed.
status
string
The status of the response. Possible values: completed, incomplete, in_progress, failed, cancelled, queued.
model
string
The model used for generating the response.
output
object[]
An array of output items generated by the model.
output_text
string
Convenience field containing the concatenated text output from all output messages.
incomplete_details
object | null
Details about why the response is incomplete, if applicable.
error
object | null
An error object if the generation failed.
usage
object
Token usage statistics for the response.
temperature
number | null
The sampling temperature used.
top_p
number | null
The nucleus sampling value used.
max_output_tokens
number | null
The max output tokens setting used.
top_logprobs
number
The top logprobs setting used.
max_tool_calls
number | null
The max tool calls setting used.
presence_penalty
number | null
The presence penalty used.
frequency_penalty
number | null
The frequency penalty used.
instructions
string | object[] | null
The instructions/system message used.
metadata
object
The metadata attached to the response.
tools
object[]
The tools configuration used.
tool_choice
string | object
The tool choice configuration used.
parallel_tool_calls
boolean
Whether parallel tool calls was enabled.
reasoning
object
The reasoning configuration used.
service_tier
string
The service tier used. Possible values: auto, default, flex, priority, scale.
store
boolean
Whether the response was stored.
truncation
string
The truncation strategy used. Possible values: auto, disabled.
text
object
The text format configuration used.
previous_response_id
string | null
The ID of the previous response used as context.
background
boolean | null
Whether the request ran in the background.
curl -X POST https://api-llm.sunra.ai/v1/responses \
  -H "Authorization: Bearer <SUNRA_KEY>" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openai/gpt-4o",
    "input": [
      {
        "type": "message",
        "role": "user",
        "content": "Hello, how are you?"
      }
    ]
  }'
{
  "id": "resp-abc123",
  "object": "response",
  "created_at": 1704067200,
  "completed_at": 1704067201,
  "status": "completed",
  "model": "openai/gpt-4o",
  "output": [
    {
      "type": "message",
      "id": "msg_abc123",
      "role": "assistant",
      "status": "completed",
      "content": [
        {
          "type": "output_text",
          "text": "Hello! I'm doing well, thank you for asking. How can I help you today?",
          "annotations": []
        }
      ]
    }
  ],
  "output_text": "Hello! I'm doing well, thank you for asking. How can I help you today?",
  "incomplete_details": null,
  "error": null,
  "temperature": 1.0,
  "top_p": 1.0,
  "max_output_tokens": null,
  "top_logprobs": 0,
  "presence_penalty": null,
  "frequency_penalty": null,
  "instructions": null,
  "metadata": {},
  "tools": [],
  "tool_choice": "auto",
  "parallel_tool_calls": true,
  "reasoning": null,
  "service_tier": "auto",
  "store": true,
  "truncation": "disabled",
  "text": {
    "format": {
      "type": "text"
    }
  },
  "usage": {
    "input_tokens": 15,
    "output_tokens": 18,
    "total_tokens": 33,
    "input_tokens_details": {
      "cached_tokens": 0
    },
    "output_tokens_details": {
      "reasoning_tokens": 0
    }
  }
}