Quick Start¶
Choose your deployment method and get Beyond Retrieval v2 running. Each section is self-contained — pick the one that fits your setup.
| Method | Best For | Requirements |
|---|---|---|
| Docker on Local Machine | Development, testing, offline/air-gapped | Docker Desktop |
| Docker on VPS | Production, public-facing, auto-HTTPS | VPS + domain name |
| Without Docker (Bare-Metal) | Frontend/backend development, debugging | Python 3.12+, Node.js 22+ |
| Google Cloud Run | Serverless production, auto-scaling, pay-per-use | GCP account + Supabase cloud |
Docker on Local Machine¶
The fastest way to get everything running — backend, frontend, Caddy reverse proxy, Ollama, Docling, and a local Supabase database. Zero cloud dependencies.
Prerequisites¶
- Docker Desktop (Windows/Mac) or Docker Engine + Compose v2 (Linux)
- Python 3.12+ (for the start script)
- Git
Step 1: Clone and configure¶
git clone https://github.com/your-org/beyond-retrieval.git
cd beyond-retrieval/beyond-retrieval-pythonv
cp .env.example .env
The default .env.example runs everything locally with zero cloud dependencies — local Supabase, local Ollama, no API keys required at startup.
API Keys
LLM provider keys (OpenRouter, OpenAI, Mistral) are configured from the Global Settings page inside the app — not in .env.
Step 2: Generate Supabase JWT keys¶
You need valid JWT keys for the local Supabase instance. The .env.example includes demo keys that work out of the box. For production, generate your own:
- Go to supabase.com/docs/guides/self-hosting/docker#generate-api-keys
- Enter a strong
JWT_SECRET(at least 32 characters) - Copy the generated
anonkey →ANON_KEY - Copy the generated
service_rolekey →SUPABASE_SERVICE_ROLE_KEY - Set
LOCAL_SUPABASE_KEYto the same value asSUPABASE_SERVICE_ROLE_KEY
Step 3: Start all services¶
This starts 16 containers: backend, frontend, Caddy, PostgreSQL, PostgREST, GoTrue, Kong, Storage, Studio, Meta, Ollama, and Docling.
First run takes a few minutes to pull images (~8GB total). Subsequent starts are fast.
Step 4: Open the app¶
| Service | URL |
|---|---|
| App | http://localhost:3000 |
| Supabase Studio | http://localhost:54321 |
| FastAPI Docs | http://localhost:8000/docs |
GPU Support (optional)¶
If you have a GPU and want faster Ollama inference:
# NVIDIA GPU (CUDA):
python start_services.py --profile nvidia --build
# AMD GPU (ROCm):
python start_services.py --profile amd --build
Skipping Heavy Services (optional)¶
# Skip Docling sidecar (~3GB image):
python start_services.py --profile cpu --no-docling --build
# Skip Ollama (use cloud LLMs only):
python start_services.py --profile cpu --no-ollama --build
# Skip local Supabase (use cloud Supabase instead):
python start_services.py --profile cpu --no-supabase --build
# Minimal — backend + frontend only:
python start_services.py --no-ollama --no-docling --no-supabase --build
Using Cloud Supabase Instead¶
If you prefer a hosted Supabase project instead of the local Docker one:
Edit .env and fill in your Supabase URL and service role key, then:
Management Commands¶
python start_services.py --stop # Stop all services
python start_services.py --logs backend # Tail logs for a service
python start_services.py --logs # Tail all logs
python start_services.py --status # Show service status
Verify it works¶
Expected response:
Docker on a VPS¶
Deploy to a VPS (Ubuntu, Debian, etc.) with Docker, Caddy auto-HTTPS, and a public domain. Runs the full stack including local Supabase.
Prerequisites¶
- A VPS with at least 4GB RAM and 20GB disk (8GB RAM recommended)
- A domain name pointed at your VPS IP (A record in DNS)
- Docker Engine + Docker Compose v2 installed on the VPS
- Python 3.12+ on the VPS
- Ports 80 and 443 open in your firewall
Step 1: SSH into your VPS and clone¶
ssh user@your-server-ip
git clone https://github.com/your-org/beyond-retrieval.git
cd beyond-retrieval/beyond-retrieval-pythonv
Step 2: Create your .env file¶
Set these critical values:
# ── Your domain (triggers auto-HTTPS via Caddy + Let's Encrypt) ──
APP_HOSTNAME=app.yourdomain.com
# ── CORS — must include your domain ──
CORS_ORIGINS=["https://app.yourdomain.com"]
# ── Auth — disable bypass for production ──
BYPASS_AUTH=false
# ── Performance ──
WEB_CONCURRENCY=4
# ── Local Supabase JWT keys ──
# Generate at: https://supabase.com/docs/guides/self-hosting/docker#generate-api-keys
POSTGRES_PASSWORD=your-very-strong-password-here
JWT_SECRET=your-jwt-secret-at-least-32-characters-long
ANON_KEY=your-generated-anon-jwt
SUPABASE_SERVICE_ROLE_KEY=your-generated-service-role-jwt
LOCAL_SUPABASE_KEY=your-generated-service-role-jwt # must match above
DASHBOARD_USERNAME=admin
DASHBOARD_PASSWORD=a-strong-studio-password
Generate real JWT keys
Do NOT use the demo keys from .env.example in production. Generate your own at the Supabase link above. The LOCAL_SUPABASE_KEY must equal SUPABASE_SERVICE_ROLE_KEY (same JWT_SECRET).
Step 3: (Optional) Add Supabase Studio subdomain¶
If you want to access Supabase Studio publicly, add a second DNS A record and set:
Step 4: Start all services¶
For a GPU server:
Caddy automatically provisions SSL certificates from Let's Encrypt. Your app is live at https://app.yourdomain.com within minutes.
Step 5: Verify¶
Expected:
VPS with Cloud Supabase¶
If you prefer using a managed Supabase project (supabase.co) instead of running the database locally:
Set your domain and Supabase credentials:
APP_HOSTNAME=app.yourdomain.com
CORS_ORIGINS=["https://app.yourdomain.com"]
SUPABASE_URL=https://your-project.supabase.co
SUPABASE_SECRET_KEY=your-service-role-key
BYPASS_AUTH=false
Then start without local Supabase:
Database Schema
When using cloud Supabase, you must apply the schema manually:
- Open the SQL Editor in your Supabase Dashboard
- Copy the contents of
db/migrations/001_initial_schema.sql - Paste and click Run
Post-Deployment Checklist¶
- Health check passes:
curl https://app.yourdomain.com/api/health - Auth config correct:
curl https://app.yourdomain.com/api/auth/config - Open the app and create a test notebook
- Upload a document and verify ingestion completes
- Ask a question in chat — citations should appear
- Configure API keys in Global Settings (OpenRouter, etc.)
- Set up authentication provider if
BYPASS_AUTH=false
Without Docker (Bare-Metal)¶
Run the backend and frontend directly on your machine. Best for active development with hot-reload.
Prerequisites¶
| Tool | Version | Notes |
|---|---|---|
| Python | 3.12+ | Required for type \| None syntax |
| Node.js | 22+ | Includes npm; used for the React frontend |
| Git | 2.x+ | Standard version control |
You also need a Supabase database — either:
- A free project on supabase.com (easiest), or
- A local Supabase via Supabase CLI or Docker (advanced)
Step 1: Clone the repository¶
git clone https://github.com/your-org/beyond-retrieval.git
cd beyond-retrieval/beyond-retrieval-pythonv
Step 2: Set up the database¶
- Create a free project at supabase.com/dashboard
- Open SQL Editor
- Copy the contents of
db/migrations/001_initial_schema.sql - Paste and click Run
- Note your Project URL and Service Role Key from Settings > API
If you already have a Supabase project with the schema applied, grab your credentials from Settings > API.
Step 3: Configure environment¶
Edit .env with your Supabase credentials:
SUPABASE_URL=https://your-project.supabase.co
SUPABASE_SECRET_KEY=your-service-role-key
BYPASS_AUTH=true
OLLAMA_BASE_URL=http://localhost:11434 # if running Ollama locally
Step 4: Start the backend¶
cd backend
python -m venv venv
# Activate the virtual environment
source venv/bin/activate # Linux / macOS
# venv\Scripts\activate # Windows
pip install -r requirements.txt
# Start the FastAPI server
uvicorn main:app --reload --port 8000
The backend serves at http://localhost:8000. API docs at http://localhost:8000/docs.
Windows: --reload limitation
On Windows, --reload does not detect new files created after the watcher started. When adding new router, service, or schema files, you must kill and restart uvicorn manually.
Step 5: Start the frontend¶
Open a new terminal:
The frontend opens at http://localhost:5173. Vite automatically proxies /api requests to http://localhost:8000.
Step 6: Verify¶
Open http://localhost:5173 in your browser.
Optional: Run Ollama locally¶
If you want local LLM inference without Docker:
- Install Ollama from ollama.com
- Pull a model:
ollama pull qwen2.5:1.5b - Ollama runs on
http://localhost:11434by default — the.envalready points there
Bare-Metal Limitations¶
| Feature | Available? | Why |
|---|---|---|
| Auto-HTTPS | No | Caddy only runs in Docker; use nginx/certbot manually |
| Local Supabase | No | Requires Docker containers; use cloud Supabase instead |
| Docling sidecar | No | Requires Docker; Docling falls back to local import |
| Hot-reload | Yes | --reload on backend, npm run dev on frontend |
Google Cloud Run¶
Deploy as two serverless containers that auto-scale from zero. Pay only for actual usage. Requires a cloud Supabase project (no local database on Cloud Run).
Prerequisites¶
| Requirement | How to get it |
|---|---|
| Google Cloud account | cloud.google.com (free tier available) |
| gcloud CLI | Install guide |
| Docker | Docker Desktop |
| Supabase project | supabase.com (free tier works) |
| OpenRouter API key | openrouter.ai |
Architecture¶
Internet
|
+-----------+-----------+
| |
Cloud Run (frontend) Cloud Run (backend)
React + nginx FastAPI
*.run.app *.run.app
| |
| nginx proxies |
| /api/* ---------->|
| |
| +-------+-------+
| | Supabase.co |
| | (Database, |
| | Auth, |
| | Storage) |
+ +---------------+
Both services auto-scale from 0 to N instances based on traffic. You pay nothing when idle.
Step 1: Set up Supabase¶
- Create a project at supabase.com/dashboard
- Open SQL Editor
- Copy
db/migrations/001_initial_schema.sqland run it - Note your Project URL and Service Role Key from Settings > API
Step 2: Set environment variables¶
# Required
export GCP_PROJECT_ID=my-gcp-project-id
export SUPABASE_URL=https://your-project.supabase.co
export SUPABASE_SECRET_KEY=your-service-role-key
export OPENROUTER_API_KEY=sk-or-v1-your-key
Optional:
export GCP_REGION=us-central1 # default: europe-west3
export BYPASS_AUTH=true # default: false
export MISTRAL_API_KEY=your-mistral-key # for OCR
export VITE_SUPABASE_URL=https://your-project.supabase.co
export VITE_SUPABASE_ANON_KEY=your-anon-key
Step 3: One-time GCP setup¶
This enables Cloud Run, Artifact Registry, and Cloud Build APIs, creates a Docker repo, and configures Docker auth. Run once per GCP project.
Step 4: Deploy¶
This builds and deploys both services. When done:
[OK] Deployment complete!
Backend: https://beyond-retrieval-backend-xxxxx-ey.a.run.app
Frontend: https://beyond-retrieval-frontend-xxxxx-ey.a.run.app
Step 5: Verify¶
Open the frontend URL in your browser. Create a notebook, upload a document, and test the RAG chat.
Deploy individual services¶
./cloudrun-deploy.sh deploy-backend # Backend only
./cloudrun-deploy.sh deploy-frontend # Frontend only
CI/CD with Cloud Build¶
Generate a cloudbuild.yaml for automated deployments on push:
Set up a Cloud Build trigger in the GCP Console to run on push to main.
Secret management
Store SUPABASE_SECRET_KEY and OPENROUTER_API_KEY in Secret Manager rather than hardcoding.
Custom Domain¶
Cloud Run services get a *.run.app URL by default. To use your own domain:
- Go to Cloud Run > your service > Domain Mappings
- Click Add Custom Domain
- Add the DNS records shown
- Wait for SSL (~15 minutes)
Cloud Run Service Specs¶
| Setting | Backend | Frontend |
|---|---|---|
| Memory | 2 GiB | 256 MiB |
| CPU | 2 | 1 |
| Min instances | 0 (scales to zero) | 0 |
| Max instances | 10 | 5 |
| Concurrency | 80 req/instance | 250 req/instance |
| Timeout | 300s | 300s |
Cloud Run Limitations¶
| Feature | Available? | Why |
|---|---|---|
| Ollama (local LLM) | No | No persistent GPU on Cloud Run |
| Docling (document parser) | No | Requires sidecar container |
| Local Supabase | No | No persistent storage |
| Cold starts | ~2-5s | First request after scale-to-zero |
| Auto-HTTPS | Yes | Built-in on *.run.app |
Comparison: All Deployment Methods¶
| Docker Local | Docker VPS | Bare-Metal | Cloud Run | |
|---|---|---|---|---|
| Difficulty | Easy | Medium | Medium | Medium |
| Cloud dependencies | None | None (or optional) | Supabase cloud | Supabase + GCP |
| Auto-HTTPS | No (localhost) | Yes (Caddy) | No | Yes (*.run.app) |
| Ollama (local LLM) | Yes | Yes | Yes (manual) | No |
| Docling parser | Yes | Yes | Fallback only | No |
| Local Supabase | Yes | Yes | No | No |
| Hot-reload | Via --dev flag | No | Yes | No |
| GPU support | NVIDIA / AMD | NVIDIA / AMD | Manual | No |
| Scaling | Single machine | Single machine | Single machine | Auto (0-N) |
| Cost | Free | VPS cost (~$5-20/mo) | Free | Pay-per-request |
| Best for | Dev & testing | Production self-hosted | Active development | SaaS / low traffic |
After Deployment: First Steps¶
Once your app is running, regardless of deployment method:
1. Open the Dashboard¶
Navigate to your Beyond Retrieval instance URL.
2. Configure API Keys¶
Go to Global Settings and add your LLM provider keys:
- OpenRouter (recommended) — openrouter.ai/keys
- OpenAI (optional) — platform.openai.com/api-keys
- Mistral (optional, for OCR) — console.mistral.ai
3. Create a Notebook¶
- Click Create Notebook
- Enter a title (e.g., "Product Docs")
- Select an embedding model (
text-embedding-3-smallrecommended) - Click Create
Embedding Model is Permanent
The embedding model is locked after creation. All documents must share the same vector space.
4. Upload and Ingest Documents¶
- Open your notebook → Documents page
- Drag and drop files (PDF, DOCX, TXT, MD, CSV, XLSX)
- Click Ingest to start the processing pipeline
- Watch status: Pending → Processing → Success
5. Ask a Question¶
- Navigate to Chat
- Click + New Chat
- Type a question, e.g.: "What is the cancellation policy?"
- Get a cited answer grounded in your documents
6. Explore Features¶
| Feature | What to Try |
|---|---|
| Search Playground | Compare Fusion vs Semantic search |
| AI Enhancer | Enrich chunks with AI context |
| Intelligence Settings | Switch between OpenRouter, OpenAI, or Ollama |
| System Monitor | Check knowledge base health score |
Using the API¶
You can also interact programmatically:
import httpx
BASE = "http://localhost:8000/api"
# 1. Create a notebook
nb = httpx.post(f"{BASE}/notebooks/", json={
"title": "My API Notebook",
"embedding_model": "openai/text-embedding-3-small"
}).json()["data"]
notebook_id = nb["notebook_id"]
# 2. Upload a file
with open("document.pdf", "rb") as f:
upload = httpx.post(
f"{BASE}/notebooks/{notebook_id}/documents/upload",
files={"files": ("document.pdf", f, "application/pdf")}
).json()["data"]
file_info = upload[0]
# 3. Start ingestion
httpx.post(f"{BASE}/notebooks/{notebook_id}/documents/ingest", json={
"files": [file_info],
"settings": {
"parser": "Docling Parser",
"chunking_strategy": "Recursive Chunking",
"chunk_size": 1000,
"chunk_overlap": 200
},
"notebook_name": "My API Notebook"
})
# 4. Wait for ingestion, then chat
import time
time.sleep(10)
conv = httpx.post(f"{BASE}/notebooks/{notebook_id}/conversations", json={
"title": "First Chat"
}).json()["data"]
response = httpx.post(
f"{BASE}/notebooks/{notebook_id}/conversations/{conv['conversation_id']}/messages",
json={"content": "What are the main topics in this document?"}
).json()["data"]
print(response["assistant_message"]["content"])
See the API Reference for the complete endpoint catalog.
Troubleshooting¶
| Problem | Fix |
|---|---|
.env changes not applied in Docker | Use docker compose up -d (not restart) — recreates containers |
| PostgREST returns null for new columns | docker compose restart supabase-rest |
| Caddy SSL not provisioning | Ensure DNS A record points to your VPS IP, ports 80/443 open |
| Port 80/443 already in use | Stop nginx/Apache: sudo systemctl stop nginx |
| Ollama model not found | Wait for ollama-init container to finish pulling |
Windows --reload misses new files | Kill and restart uvicorn manually |
gcloud: command not found | Install gcloud CLI |
| Cloud Run cold start timeout | Set --min-instances 1 to keep one instance warm |
| Frontend shows "Failed to fetch" | Check backend is running and CORS_ORIGINS includes your URL |
Next Steps¶
- Configuration — All environment variables and deployment options
- API Reference — Full endpoint documentation
- User Guide — Complete walkthrough of every feature
- Deployment Guide — Advanced deployment patterns
- Cloud Run Guide — Deep-dive on serverless deployment