Skip to content

LlamaIndex Knowledge Provider

LlamaIndex-backed RAG / graph retrieval with pluggable storage — vector, property-graph, or hybrid GraphRAG — configured through YAML, no new provider class per backend.

Configuration

providers:
  knowledge:
    backend: "agentic_primitives_gateway.primitives.knowledge.llamaindex.LlamaIndexKnowledgeProvider"
    config:
      store_type: vector          # vector | graph | hybrid
      vector_store:                # optional — defaults to SimpleVectorStore (in-memory)
        provider: pinecone         # simple | pinecone | pgvector | milvus | weaviate
        config: {...}              # passed through to the LlamaIndex store
      graph_store:                 # optional — required when store_type is graph or hybrid
        provider: falkordb         # falkordb | neo4j
        config:
          url: "redis://localhost:6379"
          database: "apg_knowledge"
      embed_model:                 # embeddings still use an external model
        provider: bedrock          # bedrock | openai | huggingface
        config:
          model_name: amazon.titan-embed-text-v2:0
      llm:                         # optional — used ONLY by query()
        backend_name: bedrock      # pins a gateway LLM backend
        model: us.anthropic.claude-sonnet-4-20250514-v1:0
        max_tokens: 2048
Key Default Description
store_type vector vector (VectorStoreIndex), graph (PropertyGraphIndex), or hybrid (both)
vector_store.provider simple simple (in-memory), pinecone, pgvector, milvus, weaviate
graph_store.provider falkordb, neo4j
embed_model.provider bedrock, openai, huggingface
llm.backend_name providers.llm.default Gateway LLM backend name to pin for query() synthesis. Optional — falls back to the LLM primitive's operator-declared default.
llm.model Model string forwarded to the resolved LLM backend's route_request.

LLM routing through the gateway

query() (retrieve-and-generate) routes its synthesis call through the gateway's LLM primitive via the GatewayLlamaLLM adapter in primitives/knowledge/_llama_llm_bridge.py. Synthesis therefore inherits per-user OIDC-resolved credentials, LLM audit events (llm.generate), and token accounting (gateway_llm_tokens_total).

Synthesis LLM selection is operator-scope. The bridge resolves the synthesis backend in this order and explicitly bypasses the request-scoped X-Provider-Llm contextvar:

  1. llm.backend_name on this knowledge config, if set.
  2. providers.llm.default — the LLM primitive's operator-declared default.

That matches LlamaIndex's own idiom of llm or Settings.llm (see RetrieverQueryEngine, as_query_engine), where the gateway's providers.llm.default plays the role of Settings.llm. Callers cannot redirect RAG synthesis via X-Provider-Llm — that header routes caller-facing LLM calls (chat completions, tool calls). Routing synthesis per-request would let callers silently change which LLM handles an operator-configured RAG path.

Embeddings still use an external model — the gateway has no embeddings primitive yet, so embed_model points directly at Bedrock / OpenAI / HuggingFace.

Install

pip install 'agentic-primitives-gateway[knowledge-llamaindex]'
# For the FalkorDB graph store:
pip install 'agentic-primitives-gateway[knowledge-falkordb]'

Quick example

# Ingest three documents.
curl -X POST http://localhost:8000/api/v1/knowledge/demo/documents \
  -H 'Content-Type: application/json' \
  -d '{"documents":[
    {"text":"The Eiffel Tower is in Paris."},
    {"text":"The Colosseum is in Rome."},
    {"text":"Paris has excellent pastries."}
  ]}'

# Retrieve relevant chunks.
curl -X POST http://localhost:8000/api/v1/knowledge/demo/retrieve \
  -H 'Content-Type: application/json' \
  -d '{"query":"What is in Paris?", "top_k": 2}'

# Native retrieve-and-generate (routes synthesis through the gateway LLM).
curl -X POST http://localhost:8000/api/v1/knowledge/demo/query \
  -H 'Content-Type: application/json' \
  -d '{"question":"What is in Paris?"}'

Structured citations

When retrieve() is called with include_citations=True (REST: {"include_citations": true} in the body; agent tool: search_knowledge(..., include_sources=true)), each returned chunk carries a citations: list[Citation] populated from LlamaIndex node metadata:

Citation field Source
source _apg_source marker, or metadata.source, or metadata.file_path, or metadata.file_name
uri metadata.url or metadata.uri when present
page metadata.page_label or metadata.page_number (common for PDF readers)
span (node.start_char_idx, node.end_char_idx) when LlamaIndex populated them during node parsing
snippet First 200 chars of the chunk text
metadata Remaining node metadata, with the fields above and internal _apg_* markers stripped

Default behaviour (flag off) leaves chunk.citations = None — the common path stays compact.

Observability

Knowledge-specific metrics are emitted automatically (labels bounded by provider/store_type taxonomy):

  • gateway_knowledge_chunks_retrieved_total
  • gateway_knowledge_retrieval_score (histogram of top-1 scores)
  • gateway_knowledge_documents_ingested_total
  • gateway_knowledge_query_tokens_total (when synthesis tokens are surfaced)

Audit events: knowledge.ingest, knowledge.retrieve, knowledge.query, knowledge.delete — each carries chunk_count, top_score, document_count in metadata where relevant.