Storage Providers
Providers manage indexed knowledge — persisted data that’s been preprocessed and optimized for fast retrieval. Unlike Connectors (which fetch live data), Providers query your own knowledge base.
PostgreSQL Provider
The recommended provider for most deployments. Combines three powerful search capabilities:
- pgvector for semantic similarity search
- tsvector for full-text keyword search
- ltree for taxonomy-based filtering
Setup
- Install PostgreSQL with pgvector extension:
# Ubuntu/Debiansudo apt install postgresql postgresql-contrib# Install pgvector: https://github.com/pgvector/pgvector
# macOS with Homebrewbrew install postgresql pgvector- Create database:
createdb contextrouterpsql contextrouter -c "CREATE EXTENSION vector;"psql contextrouter -c "CREATE EXTENSION ltree;"- Configure:
[postgres]host = "localhost"port = 5432database = "contextrouter"user = "postgres"password = "${POSTGRES_PASSWORD}"pool_size = 10Schema
ContextRouter creates and manages these tables:
-- Main documents tableCREATE TABLE documents ( id UUID PRIMARY KEY DEFAULT gen_random_uuid(), content TEXT NOT NULL, embedding vector(768), -- Semantic search keywords_vector vector(768), -- Keyword-aware search metadata JSONB, -- Flexible attributes source_type VARCHAR(50), -- book, video, qa, etc. taxonomy ltree, -- Category path created_at TIMESTAMP DEFAULT NOW());
-- Indexes for fast retrievalCREATE INDEX idx_documents_embedding ON documents USING ivfflat (embedding vector_cosine_ops);
CREATE INDEX idx_documents_fts ON documents USING GIN (to_tsvector('english', content));
CREATE INDEX idx_documents_taxonomy ON documents USING GIST (taxonomy);Hybrid Search
PostgreSQL provider supports combining vector and keyword search:
[rag]provider = "postgres"hybrid_fusion = "rrf" # "rrf" or "weighted"enable_fts = true # Enable full-text searchrrf_k = 60 # RRF constanthybrid_vector_weight = 0.7 # Vector score weighthybrid_text_weight = 0.3 # Text score weightFusion Methods:
| Method | Formula | Best For |
|---|---|---|
| RRF (Reciprocal Rank Fusion) | Σ 1/(k + rank_i) | General use |
| Weighted | w₁·score₁ + w₂·score₂ | Tuned systems |
Usage
from contextrouter.modules.providers import PostgresProvider
provider = PostgresProvider(config)
# Read (search)results = await provider.read( query="machine learning fundamentals", limit=10, source_types=["book", "qa"], # Filter by type taxonomy_filter="AI.ML", # Filter by category)
for doc in results: print(f"Score: {doc.score}") print(f"Content: {doc.content[:100]}...") print(f"Source: {doc.metadata['source']}")
# Write (index)await provider.write(envelope)Vertex AI Search Provider
Google’s enterprise search service for large-scale, managed deployments.
Setup
- Create a Vertex AI Search datastore in Google Cloud Console
- Configure:
[vertex]project_id = "your-project"location = "us-central1"datastore_id = "your-datastore-id"Features
- Automatic ranking — Google’s search quality
- Managed infrastructure — No database administration
- Grounding — Native integration with Gemini models
- Enterprise SLA — 99.9% uptime guarantee
Usage
from contextrouter.modules.providers import VertexProvider
provider = VertexProvider(config)
results = await provider.read( query="product documentation", limit=10)Google Cloud Storage Provider
For storing raw assets (PDFs, images, documents) alongside your indexed knowledge.
Setup
[gcs]bucket = "my-contextrouter-assets"project_id = "your-project"Usage
from contextrouter.modules.providers import GCSProvider
provider = GCSProvider(config)
# Write a fileurl = await provider.write( envelope, path="documents/report.pdf")print(f"Stored at: {url}")
# Read a fileenvelope = await provider.read(path="documents/report.pdf")Switching Providers
Via Configuration
[rag]provider = "postgres" # Default providerVia Environment Variable
export RAG_PROVIDER=vertex# orexport RAG_BACKEND=postgresVia Runtime Settings
Override per-request:
runtime_settings = { "provider": "vertex", # Use Vertex for this query}
async for event in runner.stream(query, runtime_settings=runtime_settings): process(event)Blue/Green Deployments
Support zero-downtime updates by maintaining two datastores:
[rag]# Production datastoredata_store_id_blue = "prod-datastore-v1"
# Staging datastore (new data being ingested)data_store_id_green = "prod-datastore-v2"
# Currently activeactive_store = "blue"Deployment Workflow
- Ingest new data to green
- Test and validate green
- Switch
active_storeto"green" - Green becomes the new blue
- Old blue becomes available for next update
Runtime Override
# Test against stagingruntime_settings = {"rag_dataset": "green"}Creating Custom Providers
Build providers for other databases:
from contextrouter.core.registry import register_providerfrom contextrouter.core.interfaces import BaseProvider, IRead, IWrite
@register_provider("redis")class RedisProvider(BaseProvider, IRead, IWrite): """Redis-based caching provider."""
def __init__(self, config): self.client = redis.Redis( host=config.redis_host, port=config.redis_port )
async def read(self, query: str, limit: int = 10) -> list[BisquitEnvelope]: # Search implementation keys = await self.client.keys(f"doc:*:{query}*") results = [] for key in keys[:limit]: data = await self.client.get(key) results.append(BisquitEnvelope.model_validate_json(data)) return results
async def write(self, envelope: BisquitEnvelope) -> str: key = f"doc:{envelope.id}" await self.client.set(key, envelope.model_dump_json()) return key