Embeddings
Embedding models convert text into dense vectors, enabling semantic search and similarity matching. Choosing the right embedding model is crucial for RAG quality.
Available Providers
| Provider | Model | Dimensions | Best For |
|---|---|---|---|
| Vertex AI | text-embedding-004 | 768 | Production, multilingual |
| OpenAI | text-embedding-3-small | 1536 | Quality, ecosystem |
| HuggingFace | all-mpnet-base-v2 | 768 | Local, privacy, cost |
| HuggingFace | all-MiniLM-L6-v2 | 384 | Speed, lower memory |
Configuration
Set your default embedding model:
[models]default_embeddings = "vertex/text-embedding-004"Or via environment variable:
export RAG_EMBEDDINGS_MODEL=vertex/text-embedding-004Using Embeddings
Basic Usage
from contextrouter.modules.models import model_registryfrom contextrouter.core import get_core_config
config = get_core_config()
# Get embedding modelembeddings = model_registry.create_embeddings( "vertex/text-embedding-004", config=config)
# Embed a single queryquery_vector = await embeddings.embed_query("What is machine learning?")print(f"Vector dimensions: {len(query_vector)}")# Output: Vector dimensions: 768
# Embed multiple documentsdoc_vectors = await embeddings.embed_documents([ "Machine learning is a subset of AI...", "Deep learning uses neural networks...", "Natural language processing enables...",])print(f"Embedded {len(doc_vectors)} documents")Batch Processing
For large-scale ingestion, use batching:
from contextrouter.modules.models.embeddings import batch_embed
documents = load_documents() # Your documents
# Process in batchesvectors = await batch_embed( embeddings=embeddings, texts=documents, batch_size=100, # Adjust based on API limits show_progress=True)Vertex AI Embeddings
Google’s production-grade embedding service.
Setup
Requires GCP project with Vertex AI enabled:
[vertex]project_id = "your-project"location = "us-central1"Features
- High quality for English and multilingual text
- 768 dimensions — good balance of quality and storage
- Batch processing support
- Enterprise SLA
embeddings = model_registry.create_embeddings( "vertex/text-embedding-004", config=config)HuggingFace Embeddings (Local)
Run embeddings locally without API calls.
Setup
pip install contextrouter[hf-transformers]Usage
# Default: all-mpnet-base-v2 (768 dimensions, high quality)embeddings = model_registry.create_embeddings( "hf/sentence-transformers", config=config)
# Faster, smaller model (384 dimensions)embeddings = model_registry.create_embeddings( "hf/sentence-transformers", config=config, model_name="all-MiniLM-L6-v2")
# Custom modelembeddings = model_registry.create_embeddings( "hf/sentence-transformers", config=config, model_name="BAAI/bge-large-en-v1.5" # Any sentence-transformers model)When to Use Local Embeddings
- Air-gapped environments — No external API calls
- Cost-sensitive applications — No per-token charges
- Custom fine-tuned models — Load your own models
- Privacy requirements — Data never leaves your infrastructure
OpenAI Embeddings
embeddings = model_registry.create_embeddings( "openai/text-embedding-3-small", config=config)Features
- 1536 dimensions — higher dimensional representation
- Good quality across diverse content
- Simple setup — just needs API key
Dimension Considerations
Your vector database must match the embedding dimensions:
-- PostgreSQL with pgvector-- For 768-dimensional embeddings (Vertex, mpnet)CREATE TABLE documents ( id SERIAL PRIMARY KEY, content TEXT, embedding vector(768));
-- For 384-dimensional embeddings (MiniLM)CREATE TABLE documents ( id SERIAL PRIMARY KEY, content TEXT, embedding vector(384));
-- For 1536-dimensional embeddings (OpenAI)CREATE TABLE documents ( id SERIAL PRIMARY KEY, content TEXT, embedding vector(1536));Important: Changing embedding models requires re-embedding all your documents. Plan your embedding strategy before large-scale ingestion.
Choosing an Embedding Model
For Production RAG
| Requirement | Recommended Model |
|---|---|
| Best quality + GCP | vertex/text-embedding-004 |
| Best quality + OpenAI | openai/text-embedding-3-small |
| Privacy/local | hf/sentence-transformers (mpnet) |
| Speed/memory constrained | hf/sentence-transformers (MiniLM) |
Quality vs Performance Trade-offs
Quality: text-embedding-004 > mpnet > MiniLMSpeed: MiniLM > mpnet > text-embedding-004Memory: MiniLM < mpnet < text-embedding-004Cost: Local (free) < Vertex < OpenAIMultilingual Support
- Vertex AI — Good multilingual support built-in
- HuggingFace — Use
sentence-transformers/paraphrase-multilingual-mpnet-base-v2
Best Practices
- Choose once, stick with it — Changing models requires full re-embedding
- Match dimensions — Ensure database schema matches model output
- Batch large workloads — Use batch processing for efficiency
- Test retrieval quality — Evaluate with your actual queries before committing
- Consider latency — Local models have no network round-trip