LLM API¶

/api/v1/llm

LLM request routing with tool_use support. All endpoints require authentication.

Backends: NoopLLMProvider, BedrockConverseProvider, OpenAICompatibleProvider

Endpoints¶

Method	Path	Description
`POST`	`/completions`	Route an LLM completion request.
`POST`	`/completions/stream`	Stream an LLM completion via SSE.
`GET`	`/models`	List available models.

Completions¶

curl -X POST http://localhost:8000/api/v1/llm/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "us.anthropic.claude-sonnet-4-20250514-v1:0",
    "messages": [{"role": "user", "content": "What is 2+2?"}]
  }'

Request body:

Field	Type	Required	Description
`model`	string	no	Model ID. Defaults to the provider's configured `default_model` if omitted.
`messages`	list	yes	Conversation messages (`role` + `content`).
`system`	string	no	System prompt.
`tools`	list	no	Tool definitions for tool_use.
`tool_choice`	object	no	Tool selection strategy.
`max_tokens`	int	no	Maximum tokens to generate.
`temperature`	float	no	Sampling temperature.

Response:

{
  "content": "2 + 2 = 4",
  "model": "us.anthropic.claude-sonnet-4-20250514-v1:0",
  "stop_reason": "end_turn",
  "tool_calls": [],
  "usage": {"prompt_tokens": 15, "completion_tokens": 8}
}

Streaming Completions¶

curl -X POST http://localhost:8000/api/v1/llm/completions/stream \
  -H "Content-Type: application/json" \
  -d '{
    "model": "us.anthropic.claude-sonnet-4-20250514-v1:0",
    "messages": [{"role": "user", "content": "What is 2+2?"}]
  }'

Returns an SSE stream (text/event-stream) with the following event types:

Event type	Fields	Description
`content_delta`	`delta`	Text token fragment
`tool_use_start`	`id`, `name`	Start of a tool call
`tool_use_delta`	`id`, `delta`	Incremental tool call arguments
`tool_use_complete`	`id`, `name`, `input`	Completed tool call with parsed arguments
`message_stop`	`stop_reason`, `model`	End of response
`metadata`	`usage`	Token usage (`input_tokens`, `output_tokens`)

Example SSE stream:

data: {"type": "content_delta", "delta": "2 + 2"}
data: {"type": "content_delta", "delta": " = 4"}
data: {"type": "message_stop", "stop_reason": "end_turn", "model": "us.anthropic.claude-sonnet-4-20250514-v1:0"}
data: {"type": "metadata", "usage": {"input_tokens": 15, "output_tokens": 8}}

The request body is the same as the non-streaming /completions endpoint.

Client Usage¶

# Strands
model = client.get_model(format="strands")

# LangChain
model = client.get_model(format="langchain")

Both adapters route inference through this streaming endpoint. See the OpenAI Compatible and Bedrock provider docs for configuration.

List Models¶

curl http://localhost:8000/api/v1/llm/models

Returns a list of available model IDs from the configured backend.