LLM API¶
/api/v1/llm
LLM request routing with tool_use support. All endpoints require authentication.
Backends: NoopLLMProvider, BedrockConverseProvider, OpenAICompatibleProvider
Endpoints¶
| Method | Path | Description |
|---|---|---|
POST |
/completions |
Route an LLM completion request. |
POST |
/completions/stream |
Stream an LLM completion via SSE. |
GET |
/models |
List available models. |
Completions¶
curl -X POST http://localhost:8000/api/v1/llm/completions \
-H "Content-Type: application/json" \
-d '{
"model": "us.anthropic.claude-sonnet-4-20250514-v1:0",
"messages": [{"role": "user", "content": "What is 2+2?"}]
}'
Request body:
| Field | Type | Required | Description |
|---|---|---|---|
model |
string | no | Model ID. Defaults to the provider's configured default_model if omitted. |
messages |
list | yes | Conversation messages (role + content). |
system |
string | no | System prompt. |
tools |
list | no | Tool definitions for tool_use. |
tool_choice |
object | no | Tool selection strategy. |
max_tokens |
int | no | Maximum tokens to generate. |
temperature |
float | no | Sampling temperature. |
Response:
{
"content": "2 + 2 = 4",
"model": "us.anthropic.claude-sonnet-4-20250514-v1:0",
"stop_reason": "end_turn",
"tool_calls": [],
"usage": {"prompt_tokens": 15, "completion_tokens": 8}
}
Streaming Completions¶
curl -X POST http://localhost:8000/api/v1/llm/completions/stream \
-H "Content-Type: application/json" \
-d '{
"model": "us.anthropic.claude-sonnet-4-20250514-v1:0",
"messages": [{"role": "user", "content": "What is 2+2?"}]
}'
Returns an SSE stream (text/event-stream) with the following event types:
| Event type | Fields | Description |
|---|---|---|
content_delta |
delta |
Text token fragment |
tool_use_start |
id, name |
Start of a tool call |
tool_use_delta |
id, delta |
Incremental tool call arguments |
tool_use_complete |
id, name, input |
Completed tool call with parsed arguments |
message_stop |
stop_reason, model |
End of response |
metadata |
usage |
Token usage (input_tokens, output_tokens) |
Example SSE stream:
data: {"type": "content_delta", "delta": "2 + 2"}
data: {"type": "content_delta", "delta": " = 4"}
data: {"type": "message_stop", "stop_reason": "end_turn", "model": "us.anthropic.claude-sonnet-4-20250514-v1:0"}
data: {"type": "metadata", "usage": {"input_tokens": 15, "output_tokens": 8}}
The request body is the same as the non-streaming /completions endpoint.
Client Usage¶
# Strands
model = client.get_model(format="strands")
# LangChain
model = client.get_model(format="langchain")
Both adapters route inference through this streaming endpoint. See the OpenAI Compatible and Bedrock provider docs for configuration.
List Models¶
Returns a list of available model IDs from the configured backend.