Retrieval
The retrieval pipeline is responsible for finding the most relevant context for a user’s query. It goes through several stages to ensure both high recall (finding relevant documents) and high precision (filtering out noise).
Pipeline Stages
Stage 1: Query Normalization
The pipeline starts by processing the raw user query:
- Extract core intent — What is the user really asking?
- Generate search queries — Up to 3 optimized queries for different search aspects
- Identify taxonomy concepts — Categories for filtering and graph lookup
# Inputuser_query = "What books talk about machine learning for beginners?"
# After intent detectionretrieval_queries = [ "machine learning beginner books", "ML introductory textbooks", "learn machine learning basics"]taxonomy_concepts = ["AI", "Machine Learning", "Education"]Stage 2: Parallel Retrieval
ContextRouter fetches from multiple sources simultaneously for minimum latency:
User Query │ ▼ ┌───────────────┼───────────────┐ │ │ │ ▼ ▼ ▼ ┌──────────┐ ┌──────────┐ ┌──────────┐ │ Providers│ │Connectors│ │ Graph │ │ │ │ │ │ Facts │ │• Postgres│ │• Web │ │ │ │• Vertex │ │• RSS │ │• Entities│ │ │ │ │ │• Relations│ └────┬─────┘ └────┬─────┘ └────┬─────┘ │ │ │ └──────────────┴──────────────┘ │ ▼ All ResultsEach source returns BisquitEnvelope objects with full provenance.
Stage 3: Deduplication
Results from multiple queries often overlap. Deduplication prevents showing the same content twice:
# SHA256-based deduplicationdef get_document_hash(doc): content = f"{doc.url}|{doc.snippet}|{doc.content}" return hashlib.sha256(content.encode()).hexdigest()
# Only keep unique documentsseen_hashes = set()unique_docs = []for doc in all_docs: doc_hash = get_document_hash(doc) if doc_hash not in seen_hashes: seen_hashes.add(doc_hash) unique_docs.append(doc)This is memory-efficient compared to storing full content strings.
Stage 4: Selection
After deduplication, select the top results based on configuration:
General Mode (default):
[rag]general_retrieval_enabled = truegeneral_retrieval_final_count = 10 # Total documentsPer-Source Mode (for diverse results):
[rag]general_retrieval_enabled = falsemax_books = 5max_videos = 3max_qa = 5max_web = 3Hybrid Search (PostgreSQL)
When using PostgreSQL, combine vector and keyword search for better results.
Why Hybrid?
| Search Type | Strengths | Weaknesses |
|---|---|---|
| Vector | Semantic similarity, synonyms | Misses exact terms, rare words |
| Keyword | Exact matches, names, codes | Misses paraphrases, concepts |
| Hybrid | Best of both | Requires tuning |
Fusion Methods
Reciprocal Rank Fusion (RRF)
Combines rankings without needing normalized scores:
score = Σ 1/(k + rank_i)Where k is a constant (default 60) and rank_i is the document’s rank in each list.
[rag]hybrid_fusion = "rrf"rrf_k = 60Pros: Robust, works well out of the box Cons: Less tunable
Weighted Fusion
Linear combination of normalized scores:
score = w_vec × vector_score + w_text × text_score[rag]hybrid_fusion = "weighted"hybrid_vector_weight = 0.7hybrid_text_weight = 0.3Pros: Fine-grained control Cons: Requires score normalization, more tuning
Keyword-Aware Vector Search
ContextRouter stores a separate keywords_vector containing NER entities and key phrases. This is searched alongside the main content vector:
-- Simplified querySELECT content, (embedding <=> query_vec) * 0.5 + (keywords_vector <=> query_vec) * 0.5 AS combined_scoreFROM documentsORDER BY combined_scoreLIMIT 10;Graph Facts
Alongside document retrieval, the pipeline fetches knowledge graph relationships:
# Query: "Who founded OpenAI?"
# Graph facts returned:facts = [ ("OpenAI", "founded_by", "Sam Altman"), ("OpenAI", "founded_by", "Greg Brockman"), ("OpenAI", "type", "AI Research Laboratory"), ("OpenAI", "founded", "2015"),]Graph facts:
- Provide structured context for reasoning
- Are separate from citations (don’t appear in UI)
- Come from the knowledge graph built during ingestion
Configuration
[rag]# Provider selectionprovider = "postgres" # or "vertex"
# Hybrid search (Postgres only)hybrid_fusion = "rrf"enable_fts = truerrf_k = 60
# Result limitsgeneral_retrieval_final_count = 10max_retrieval_queries = 3
# Graph factsgraph_facts_enabled = truemax_graph_facts = 50
# Cachingcache_enabled = truecache_ttl_seconds = 3600Performance Tips
Batch Queries
Limit the number of parallel queries:
max_retrieval_queries = 3Index Tuning
For PostgreSQL with many documents:
-- Adjust lists based on row count-- Rule of thumb: lists = sqrt(row_count)CREATE INDEX idx_embedding ON documents USING ivfflat (embedding vector_cosine_ops) WITH (lists = 100);Caching
Enable caching for repeated queries:
[rag]cache_enabled = truecache_ttl_seconds = 3600 # 1 hour