Hybrid Search Architecture¶
This document describes the hybrid search design for MCP servers and A2A agents in the registry.
Overview¶
The registry implements hybrid search that combines semantic (vector) search with lexical (keyword) matching. This approach provides both conceptual understanding of queries and precise matching when users reference entities by name.
Architecture Diagram¶
+-------------------+
| Search Query |
| "context7 docs" |
+--------+----------+
|
+-----------------+-----------------+
| |
v v
+------------------+ +-------------------+
| Query Embedding | | Query Tokenizer |
| (Vector Model) | | (Keyword Extract)|
+--------+---------+ +---------+---------+
| |
| [0.12, -0.34, ...] | ["context7", "docs"]
| |
v v
+------------------+ +-------------------+
| Vector Search | | Keyword Match |
| (Cosine Sim) | | (Regex on path, |
| | | name, desc, |
| | | tags, tools) |
+--------+---------+ +---------+---------+
| |
| semantic_score | text_boost
| |
+----------------+------------------+
|
v
+---------------------+
| Score Combination |
| relevance_score = |
| semantic + boost |
+----------+----------+
|
v
+---------------------+
| Result Grouping |
| - servers (top 3) |
| - agents (top 3) |
| - virtual_servers |
| (top 3) |
| - skills (top 3) |
+----------+----------+
|
v
+---------------------+
| Tool Extraction |
| Extract matching |
| tools from servers |
| -> tools[] (top 3) |
+---------------------+
Search Flow¶
1. Query Processing¶
When a search query arrives:
-
Embedding Generation: Query is converted to a vector embedding using the configured model (Amazon Bedrock, OpenAI, or local sentence-transformers)
-
Tokenization: Query is split into meaningful keywords
- Non-word characters are removed
- Stopwords filtered (a, the, is, are, etc.)
- Tokens shorter than 3 characters removed
2. Dual Search Strategy¶
Vector Search (Semantic) - Uses HNSW index on DocumentDB (production) or application-level cosine similarity on MongoDB CE - Finds conceptually similar content even with different wording - Returns results sorted by cosine similarity - DocumentDB uses configurable efSearch parameter (default 100) for HNSW recall quality - Minimum k=50 ensures small collections are fully covered
Keyword Search (Lexical) - Regex matching on path, name, description, tags, and tool names/descriptions - Catches explicit references that semantic search might miss - Runs as separate query due to DocumentDB limitations (no $unionWith support) - Each query keyword is matched independently using case-insensitive regex - Keyword matches from both vector results and separate keyword query are merged, with the highest boost per document kept
3. Score Combination¶
The final relevance score combines both approaches:
normalized_vector_score = (cosine_similarity + 1.0) / 2.0 # Map [-1,1] to [0,1]
text_boost_contribution = text_boost * 0.1 # Scale boost down
relevance_score = normalized_vector_score + text_boost_contribution
relevance_score = clamp(relevance_score, 0.0, 1.0)
The multiplier 0.1 is consistent across both DocumentDB and MongoDB CE search paths.
Text boost values (cumulative per keyword match): | Match Location | Boost Value | |----------------|-------------| | Path | +5.0 | | Name | +3.0 | | Description | +2.0 | | Tags | +1.5 | | Tool (each) | +1.0 |
4. Score-Before-Filter Pattern¶
All candidate results are scored before applying the top-3-per-entity-type filter. This ensures the highest-scoring documents are selected:
- Vector search returns candidates (up to
kresults) - Keyword search returns additional matches (merged by highest boost per document)
- Every candidate receives a hybrid score (vector + text boost)
- All candidates are sorted by hybrid score descending
- Top 3 per entity type (servers, agents, tools) are selected from the sorted list
This prevents lower-scoring documents from consuming a top-3 slot before higher-scoring documents are evaluated.
5. Diagnostic Logging¶
Both search paths emit a Score for log line for every candidate, enabling search quality debugging:
Score for 'Context7' (type=mcp_server): vector=0.3412, normalized_vector=0.6706,
text_boost=8.0, boost_contrib=0.8000, final=1.0000
6. Result Structure¶
Search returns grouped results (top 3 per entity type):
{
"servers": [
{
"path": "/context7",
"server_name": "Context7 MCP Server",
"relevance_score": 1.0,
"matching_tools": [
{"tool_name": "query-docs", "description": "..."}
]
}
],
"tools": [
{
"server_path": "/context7",
"tool_name": "query-docs",
"inputSchema": {...}
}
],
"agents": [...],
"virtual_servers": [
{
"path": "/virtual/dev-tools",
"server_name": "Dev Tools",
"relevance_score": 0.85,
"backend_paths": ["/github", "/jira"],
"tool_count": 5
}
],
"skills": [...]
}
Entity Types¶
MCP Servers¶
What's included in the embedding: - Server name - Server description - Tags (prefixed with "Tags: ") - Tool names (each tool's name) - Tool descriptions (each tool's description)
What's NOT included in the embedding: - Tool inputSchema (JSON schema is stored but not embedded) - Server path - Server metadata
Stored document fields: - path, name, description, tags, is_enabled - tools[] array with name, description, inputSchema per tool - embedding vector - metadata (full server info for reference)
A2A Agents¶
What's included in the embedding: - Agent name - Agent description - Tags (prefixed with "Tags: ") - Capabilities (prefixed with "Capabilities: ") - Skill names (each skill's name) - Skill descriptions (each skill's description)
What's NOT included in the embedding: - Agent path - Agent card metadata (stored but not embedded) - Skill IDs, tags, and examples
Stored document fields: - path, name, description, tags, is_enabled - capabilities[] array - embedding vector - metadata (full agent card for reference)
Tools¶
- Not indexed separately - extracted from parent server documents
- When a server matches, its tools are checked for keyword matches
- Top-level
tools[]array contains full schema (inputSchema) matching_toolsin server results is a lightweight reference (no schema)
Virtual MCP Servers¶
Virtual MCP Servers are indexed in the unified mcp_embeddings_{dimensions} collection (e.g., mcp_embeddings_384 for 384-dimension models) alongside regular servers and agents, distinguished by entity_type: "virtual_server".
What's included in the embedding: - Server name - Server description - Tags (prefixed with "Tags: ") - Tool names (alias or original name from each tool mapping) - Tool description overrides (if specified in mappings)
What's NOT included in the embedding: - Virtual server path - Backend server paths - Required scopes - Tool input schemas
Stored document fields: - path, name, description, tags, is_enabled - entity_type: "virtual_server" - tools[] array with name (alias or original) per tool mapping - embedding vector - metadata object containing: - server_name, num_tools, backend_count - backend_paths[] (list of backend server paths) - required_scopes[], supported_transports[] - created_by
Search result structure:
{
"virtual_servers": [
{
"entity_type": "virtual_server",
"path": "/virtual/dev-tools",
"server_name": "Dev Tools",
"description": "Aggregated development tools",
"relevance_score": 0.85,
"tags": ["development", "tools"],
"backend_paths": ["/github", "/jira"],
"tool_count": 5,
"matching_tools": [
{"tool_name": "github_search"}
]
}
]
}
Backend Implementations¶
DocumentDB (Production)¶
- Native HNSW vector index with
$searchaggregation pipeline - Keyword query runs separately and merges results (no
$unionWithsupport) - Text boost calculated in aggregation pipeline using
$regexMatch
MongoDB CE (Development/Local)¶
- No native vector search support (
$vectorSearchnot available) - Falls back to application-level search (in Python backend, not the calling agent):
- Fetch all documents with embeddings from collection
- Calculate cosine similarity in Python code
- Apply keyword matching and text boost in application
- Sort and limit results
- Same API contract as DocumentDB implementation
Lexical Fallback Mode¶
When the embedding model is unavailable (misconfigured, network issues, API key expired, model not found), the search system automatically degrades to lexical-only mode instead of failing entirely.
How It Works¶
- Detection: On the first search request, if the embedding model fails to generate a query vector, the
_embedding_unavailableflag is set inDocumentDBSearchRepository - Fallback: All subsequent searches skip embedding generation and use
_lexical_only_search()instead - Error Caching: The
SentenceTransformersClientcaches load errors in_load_errorto avoid repeated download attempts (e.g., hitting HuggingFace on every call) - Indexing: When the model is unavailable during startup, servers and agents are indexed without embeddings. Documents are stored with empty embedding vectors
- Response: The API response includes a
search_modefield set to"lexical-only"(instead of the normal"hybrid") so callers know the search quality is reduced
Lexical-Only Search Flow¶
+-------------------+
| Search Query |
| "context7 docs" |
+--------+----------+
|
v
+-----------------------+
| Embedding Model Check |
| _embedding_unavailable|
| == True? |
+-----------+-----------+
|
Yes (fallback)
|
v
+-----------------------+
| Keyword Tokenization |
| ["context7", "docs"] |
+-----------+-----------+
|
v
+-----------------------+
| MongoDB Aggregation |
| $regexMatch on path, |
| name, description, |
| tags, tools |
+-----------+-----------+
|
v
+-----------------------+
| Text Boost Scoring |
| Normalized by |
| MAX_LEXICAL_BOOST |
| (12.5) |
+-----------+-----------+
|
v
+-----------------------+
| Result Grouping |
| search_mode: |
| "lexical-only" |
+-----------------------+
Scoring in Lexical-Only Mode¶
In lexical-only mode, the text boost score is normalized to a 0-1 range using a fixed denominator (MAX_LEXICAL_BOOST = 12.5):
The same boost weights from hybrid mode apply:
| Match Location | Boost Value |
|---|---|
| Path | +5.0 |
| Name | +3.0 |
| Description | +2.0 |
| Tags | +1.5 |
| Tool (each) | +1.0 |
Recovery¶
When the embedding model becomes available again (e.g., after a restart with correct configuration), the system automatically returns to full hybrid search mode. The _embedding_unavailable flag and _load_error cache are per-process and reset on restart.
HNSW Tuning (DocumentDB)¶
The DocumentDB $search pipeline includes two tunable parameters:
| Parameter | Default | Description |
|---|---|---|
k | max(max_results * 3, 50) | Number of nearest neighbors to retrieve. Minimum 50 ensures small collections are fully covered. |
efSearch | 100 (configurable via VECTOR_SEARCH_EF_SEARCH) | Controls HNSW recall quality. Higher values improve recall at the cost of query latency. Default DocumentDB value is ~40, which can miss documents in small collections. |
The efSearch setting is configured in registry/core/config.py as vector_search_ef_search.
Performance Considerations¶
- Result Limiting: Top 3 per entity type to reduce payload size
- Score-Before-Filter: All candidates scored and sorted before applying top-3 filter
- Index Reuse: HNSW index parameters (m=16, efConstruction=128) optimized for recall
- efSearch Tuning: Set to 100 for near-exact recall in typical deployments
- Embedding Caching: Lazy-loaded model with singleton pattern
- Keyword Fallback: Separate query ensures explicit matches are not missed
- Error Caching: Failed model loads are cached to avoid repeated download/API attempts
Example: Why Hybrid Matters¶
Query: "context7"
- Vector-only: Might return documentation servers with similar semantic content
- Keyword-only: Finds exact match but misses related servers
- Hybrid: Ranks /context7 at top (keyword boost) while including semantically similar alternatives