Quick Start
This guide walks you through building a simple RAG-powered agent. By the end, you’ll have a working chatbot that can answer questions using your knowledge base.
Prerequisites
Before starting, make sure you have:
- ContextRouter installed (
pip install contextrouter[vertex]or your preferred provider) - At least one LLM provider configured (see Installation)
Step 1: Create Configuration
Create a settings.toml file in your project directory:
[models]default_llm = "vertex/gemini-2.0-flash"default_embeddings = "vertex/text-embedding-004"
[vertex]project_id = "your-gcp-project"location = "us-central1"
[rag]provider = "postgres" # or "vertex" for Vertex AI SearchOr use environment variables:
export VERTEX_PROJECT_ID=your-gcp-projectexport VERTEX_LOCATION=us-central1Step 2: Simple Chat
Create a file called chat.py:
import asynciofrom contextrouter.cortex import stream_agent
async def main(): # Example messages messages = [{"role": "user", "content": "Explain what RAG is in simple terms"}]
# Stream the response from the shared brain print("Assistant: ", end="") async for event in stream_agent( messages=messages, session_id="quickstart-session", platform="console" ): # Events follow the AG-UI protocol format if event.get("event") == "text_delta": print(event.get("data", {}).get("text", ""), end="", flush=True) print() # New line at the end
if __name__ == "__main__": asyncio.run(main())Run it:
python chat.pyYou should see a streaming response explaining RAG!
Step 3: Interactive Chat Loop
Let’s make it interactive:
import asynciofrom contextrouter.cortex import stream_agent
async def chat_loop(): print("ContextRouter Chat (type 'quit' to exit)") print("-" * 40)
history = []
while True: user_input = input("\nYou: ").strip()
if user_input.lower() in ['quit', 'exit', 'q']: print("Goodbye!") break
if not user_input: continue
history.append({"role": "user", "content": user_input})
print("\nAssistant: ", end="") full_response = "" async for event in stream_agent( messages=history, session_id="interactive-session", platform="console" ): if event.get("event") == "text_delta": delta = event.get("data", {}).get("text", "") print(delta, end="", flush=True) full_response += delta print()
history.append({"role": "assistant", "content": full_response})
if __name__ == "__main__": asyncio.run(chat_loop())Step 4: Enable Web Search
Add real-time web search capability:
import asynciofrom contextrouter.cortex import stream_agent
async def main(): print("Assistant: ", end="") async for event in stream_agent( messages=[{"role": "user", "content": "What are the latest developments in AI this week?"}], session_id="web-search-session", platform="console", enable_web_search=True ): if event.get("event") == "text_delta": print(event.get("data", {}).get("text", ""), end="", flush=True) print()
if __name__ == "__main__": asyncio.run(main())Step 5: Access Citations
Get sources for the response:
import asynciofrom contextrouter.cortex import stream_agent
async def main(): citations = []
print("Assistant: ", end="") async for event in stream_agent( messages=[{"role": "user", "content": "What is machine learning?"}], session_id="citations-session", platform="console" ): if event.get("event") == "text_delta": print(event.get("data", {}).get("text", ""), end="", flush=True)
# Collect citations from search_end event if event.get("event") == "search_end": citations.extend(event.get("data", {}).get("citations", []))
print("\n")
if citations: print("Sources:") for i, citation in enumerate(citations, 1): print(f" {i}. {citation.get('title', 'Unknown')} - {citation.get('url', 'N/A')}")
if __name__ == "__main__": asyncio.run(main())Step 6: Using the CLI
ContextRouter includes a CLI for quick testing:
# Interactive chatcontextrouter rag chat
# Single querycontextrouter rag query "What is RAG?"
# Validate your configurationcontextrouter rag validateUsing Different Providers
With OpenAI
[models]default_llm = "openai/gpt-4o"export OPENAI_API_KEY=sk-...With Anthropic
[models]default_llm = "anthropic/claude-sonnet-4"export ANTHROPIC_API_KEY=sk-ant-...With Local Ollama
[models]default_llm = "local/llama3.2"# Make sure Ollama is runningollama serveollama pull llama3.2export LOCAL_OLLAMA_BASE_URL=http://localhost:11434/v1What’s Happening Under the Hood?
When you call runner.stream(), ContextRouter:
- Detects intent — Uses an LLM to understand what you’re asking
- Decides on retrieval — Should it search the knowledge base? The web?
- Fetches context — Queries providers and connectors in parallel
- Reranks results — Sorts by relevance using neural reranking
- Generates response — Creates an answer using the context
- Suggests follow-ups — Proposes related questions
All of this is orchestrated by the LangGraph-powered Cortex, while the actual implementation is handled by pluggable Modules.
Next Steps
Now that you have a working agent, explore:
- Core Concepts — Understand the architecture in depth
- Models — Configure fallbacks and multiple providers
- RAG Pipeline — Fine-tune retrieval and reranking
- Ingestion — Load your own documents into the knowledge base