Quick Start¶

Choose your deployment method and get Beyond Retrieval v2 running. Each section is self-contained — pick the one that fits your setup.

Method	Best For	Requirements
Docker on Local Machine	Development, testing, offline/air-gapped	Docker Desktop
Docker on VPS	Production, public-facing, auto-HTTPS	VPS + domain name
Without Docker (Bare-Metal)	Frontend/backend development, debugging	Python 3.12+, Node.js 22+
Google Cloud Run	Serverless production, auto-scaling, pay-per-use	GCP account + Supabase cloud

Docker on Local Machine¶

The fastest way to get everything running — backend, frontend, Caddy reverse proxy, Ollama, Docling, and a local Supabase database. Zero cloud dependencies.

Prerequisites¶

Docker Desktop (Windows/Mac) or Docker Engine + Compose v2 (Linux)
Python 3.12+ (for the start script)
Git

Step 1: Clone and configure¶

git clone https://github.com/your-org/beyond-retrieval.git
cd beyond-retrieval/beyond-retrieval-pythonv
cp .env.example .env

The default .env.example runs everything locally with zero cloud dependencies — local Supabase, local Ollama, no API keys required at startup.

API Keys

LLM provider keys (OpenRouter, OpenAI, Mistral) are configured from the Global Settings page inside the app — not in .env.

Step 2: Generate Supabase JWT keys¶

You need valid JWT keys for the local Supabase instance. The .env.example includes demo keys that work out of the box. For production, generate your own:

Go to supabase.com/docs/guides/self-hosting/docker#generate-api-keys
Enter a strong JWT_SECRET (at least 32 characters)
Copy the generated anon key → ANON_KEY
Copy the generated service_role key → SUPABASE_SERVICE_ROLE_KEY
Set LOCAL_SUPABASE_KEY to the same value as SUPABASE_SERVICE_ROLE_KEY

Step 3: Start all services¶

python start_services.py --profile cpu --build

This starts 16 containers: backend, frontend, Caddy, PostgreSQL, PostgREST, GoTrue, Kong, Storage, Studio, Meta, Ollama, and Docling.

First run takes a few minutes to pull images (~8GB total). Subsequent starts are fast.

Step 4: Open the app¶

Service	URL
App	http://localhost:3000
Supabase Studio	http://localhost:54321
FastAPI Docs	http://localhost:8000/docs

GPU Support (optional)¶

If you have a GPU and want faster Ollama inference:

# NVIDIA GPU (CUDA):
python start_services.py --profile nvidia --build

# AMD GPU (ROCm):
python start_services.py --profile amd --build

Skipping Heavy Services (optional)¶

# Skip Docling sidecar (~3GB image):
python start_services.py --profile cpu --no-docling --build

# Skip Ollama (use cloud LLMs only):
python start_services.py --profile cpu --no-ollama --build

# Skip local Supabase (use cloud Supabase instead):
python start_services.py --profile cpu --no-supabase --build

# Minimal — backend + frontend only:
python start_services.py --no-ollama --no-docling --no-supabase --build

Using Cloud Supabase Instead¶

If you prefer a hosted Supabase project instead of the local Docker one:

cp .env.cloud.example .env

Edit .env and fill in your Supabase URL and service role key, then:

python start_services.py --profile cpu --no-supabase --build

Management Commands¶

python start_services.py --stop              # Stop all services
python start_services.py --logs backend      # Tail logs for a service
python start_services.py --logs              # Tail all logs
python start_services.py --status            # Show service status

Verify it works¶

curl http://localhost:8000/api/health

Expected response:

{"status": "ok", "service": "beyond-retrieval-v2"}

Docker on a VPS¶

Deploy to a VPS (Ubuntu, Debian, etc.) with Docker, Caddy auto-HTTPS, and a public domain. Runs the full stack including local Supabase.

Prerequisites¶

A VPS with at least 4GB RAM and 20GB disk (8GB RAM recommended)
A domain name pointed at your VPS IP (A record in DNS)
Docker Engine + Docker Compose v2 installed on the VPS
Python 3.12+ on the VPS
Ports 80 and 443 open in your firewall

Step 1: SSH into your VPS and clone¶

ssh user@your-server-ip
git clone https://github.com/your-org/beyond-retrieval.git
cd beyond-retrieval/beyond-retrieval-pythonv

Step 2: Create your `.env` file¶

cp .env.example .env
nano .env

Set these critical values:

# ── Your domain (triggers auto-HTTPS via Caddy + Let's Encrypt) ──
APP_HOSTNAME=app.yourdomain.com

# ── CORS — must include your domain ──
CORS_ORIGINS=["https://app.yourdomain.com"]

# ── Auth — disable bypass for production ──
BYPASS_AUTH=false

# ── Performance ──
WEB_CONCURRENCY=4

# ── Local Supabase JWT keys ──
# Generate at: https://supabase.com/docs/guides/self-hosting/docker#generate-api-keys
POSTGRES_PASSWORD=your-very-strong-password-here
JWT_SECRET=your-jwt-secret-at-least-32-characters-long
ANON_KEY=your-generated-anon-jwt
SUPABASE_SERVICE_ROLE_KEY=your-generated-service-role-jwt
LOCAL_SUPABASE_KEY=your-generated-service-role-jwt   # must match above
DASHBOARD_USERNAME=admin
DASHBOARD_PASSWORD=a-strong-studio-password

Generate real JWT keys

Do NOT use the demo keys from .env.example in production. Generate your own at the Supabase link above. The LOCAL_SUPABASE_KEY must equal SUPABASE_SERVICE_ROLE_KEY (same JWT_SECRET).

Step 3: (Optional) Add Supabase Studio subdomain¶

If you want to access Supabase Studio publicly, add a second DNS A record and set:

STUDIO_HOSTNAME=studio.yourdomain.com

Step 4: Start all services¶

python start_services.py --profile cpu --build

For a GPU server:

# NVIDIA GPU:
python start_services.py --profile nvidia --build

Caddy automatically provisions SSL certificates from Let's Encrypt. Your app is live at https://app.yourdomain.com within minutes.

Step 5: Verify¶

curl https://app.yourdomain.com/api/health

Expected:

{"status": "ok", "service": "beyond-retrieval-v2"}

VPS with Cloud Supabase¶

If you prefer using a managed Supabase project (supabase.co) instead of running the database locally:

cp .env.cloud.example .env
nano .env

Set your domain and Supabase credentials:

APP_HOSTNAME=app.yourdomain.com
CORS_ORIGINS=["https://app.yourdomain.com"]
SUPABASE_URL=https://your-project.supabase.co
SUPABASE_SECRET_KEY=your-service-role-key
BYPASS_AUTH=false

Then start without local Supabase:

python start_services.py --profile cpu --no-supabase --build

Database Schema

When using cloud Supabase, you must apply the schema manually:

Open the SQL Editor in your Supabase Dashboard
Copy the contents of db/migrations/001_initial_schema.sql
Paste and click Run

Post-Deployment Checklist¶

Health check passes: curl https://app.yourdomain.com/api/health
Auth config correct: curl https://app.yourdomain.com/api/auth/config
Open the app and create a test notebook
Upload a document and verify ingestion completes
Ask a question in chat — citations should appear
Configure API keys in Global Settings (OpenRouter, etc.)
Set up authentication provider if BYPASS_AUTH=false

Without Docker (Bare-Metal)¶

Run the backend and frontend directly on your machine. Best for active development with hot-reload.

Prerequisites¶

Tool	Version	Notes
Python	3.12+	Required for `type \\| None` syntax
Node.js	22+	Includes npm; used for the React frontend
Git	2.x+	Standard version control

You also need a Supabase database — either:

A free project on supabase.com (easiest), or
A local Supabase via Supabase CLI or Docker (advanced)

Step 1: Clone the repository¶

git clone https://github.com/your-org/beyond-retrieval.git
cd beyond-retrieval/beyond-retrieval-pythonv

Step 2: Set up the database¶

Cloud Supabase (Recommended)Existing Database

Create a free project at supabase.com/dashboard
Open SQL Editor
Copy the contents of db/migrations/001_initial_schema.sql
Paste and click Run
Note your Project URL and Service Role Key from Settings > API

If you already have a Supabase project with the schema applied, grab your credentials from Settings > API.

Step 3: Configure environment¶

cp .env.cloud.example .env

Edit .env with your Supabase credentials:

SUPABASE_URL=https://your-project.supabase.co
SUPABASE_SECRET_KEY=your-service-role-key
BYPASS_AUTH=true
OLLAMA_BASE_URL=http://localhost:11434   # if running Ollama locally

Step 4: Start the backend¶

cd backend
python -m venv venv

# Activate the virtual environment
source venv/bin/activate          # Linux / macOS
# venv\Scripts\activate           # Windows

pip install -r requirements.txt

# Start the FastAPI server
uvicorn main:app --reload --port 8000

The backend serves at http://localhost:8000. API docs at http://localhost:8000/docs.

Windows: --reload limitation

On Windows, --reload does not detect new files created after the watcher started. When adding new router, service, or schema files, you must kill and restart uvicorn manually.

Step 5: Start the frontend¶

Open a new terminal:

cd beyond-retrieval/beyond-retrieval-pythonv/frontend
npm install
npm run dev

The frontend opens at http://localhost:5173. Vite automatically proxies /api requests to http://localhost:8000.

Step 6: Verify¶

curl http://localhost:8000/api/health

Open http://localhost:5173 in your browser.

Optional: Run Ollama locally¶

If you want local LLM inference without Docker:

Install Ollama from ollama.com
Pull a model: ollama pull qwen2.5:1.5b
Ollama runs on http://localhost:11434 by default — the .env already points there

Bare-Metal Limitations¶

Feature	Available?	Why
Auto-HTTPS	No	Caddy only runs in Docker; use nginx/certbot manually
Local Supabase	No	Requires Docker containers; use cloud Supabase instead
Docling sidecar	No	Requires Docker; Docling falls back to local import
Hot-reload	Yes	`--reload` on backend, `npm run dev` on frontend

Google Cloud Run¶

Deploy as two serverless containers that auto-scale from zero. Pay only for actual usage. Requires a cloud Supabase project (no local database on Cloud Run).

Prerequisites¶

Requirement	How to get it
Google Cloud account	cloud.google.com (free tier available)
gcloud CLI	Install guide
Docker	Docker Desktop
Supabase project	supabase.com (free tier works)
OpenRouter API key	openrouter.ai

Architecture¶

                 Internet
                    |
        +-----------+-----------+
        |                       |
Cloud Run (frontend)    Cloud Run (backend)
React + nginx            FastAPI
*.run.app                *.run.app
        |                       |
        |     nginx proxies     |
        |     /api/* ---------->|
        |                       |
        |               +-------+-------+
        |               |  Supabase.co  |
        |               |  (Database,   |
        |               |   Auth,       |
        |               |   Storage)    |
        +               +---------------+

Both services auto-scale from 0 to N instances based on traffic. You pay nothing when idle.

Step 1: Set up Supabase¶

Create a project at supabase.com/dashboard
Open SQL Editor
Copy db/migrations/001_initial_schema.sql and run it
Note your Project URL and Service Role Key from Settings > API

Step 2: Set environment variables¶

# Required
export GCP_PROJECT_ID=my-gcp-project-id
export SUPABASE_URL=https://your-project.supabase.co
export SUPABASE_SECRET_KEY=your-service-role-key
export OPENROUTER_API_KEY=sk-or-v1-your-key

Optional:

export GCP_REGION=us-central1              # default: europe-west3
export BYPASS_AUTH=true                     # default: false
export MISTRAL_API_KEY=your-mistral-key    # for OCR
export VITE_SUPABASE_URL=https://your-project.supabase.co
export VITE_SUPABASE_ANON_KEY=your-anon-key

Step 3: One-time GCP setup¶

cd beyond-retrieval/beyond-retrieval-pythonv
./cloudrun-deploy.sh setup

This enables Cloud Run, Artifact Registry, and Cloud Build APIs, creates a Docker repo, and configures Docker auth. Run once per GCP project.

Step 4: Deploy¶

./cloudrun-deploy.sh deploy

This builds and deploys both services. When done:

[OK]  Deployment complete!
  Backend:  https://beyond-retrieval-backend-xxxxx-ey.a.run.app
  Frontend: https://beyond-retrieval-frontend-xxxxx-ey.a.run.app

Step 5: Verify¶

curl $(cat .backend-url)/api/health

Open the frontend URL in your browser. Create a notebook, upload a document, and test the RAG chat.

Deploy individual services¶

./cloudrun-deploy.sh deploy-backend    # Backend only
./cloudrun-deploy.sh deploy-frontend   # Frontend only

CI/CD with Cloud Build¶

Generate a cloudbuild.yaml for automated deployments on push:

./cloudrun-deploy.sh cloudbuild

Set up a Cloud Build trigger in the GCP Console to run on push to main.

Secret management

Store SUPABASE_SECRET_KEY and OPENROUTER_API_KEY in Secret Manager rather than hardcoding.

Custom Domain¶

Cloud Run services get a *.run.app URL by default. To use your own domain:

Go to Cloud Run > your service > Domain Mappings
Click Add Custom Domain
Add the DNS records shown
Wait for SSL (~15 minutes)

Cloud Run Service Specs¶

Setting	Backend	Frontend
Memory	2 GiB	256 MiB
CPU	2	1
Min instances	0 (scales to zero)	0
Max instances	10	5
Concurrency	80 req/instance	250 req/instance
Timeout	300s	300s

Cloud Run Limitations¶

Feature	Available?	Why
Ollama (local LLM)	No	No persistent GPU on Cloud Run
Docling (document parser)	No	Requires sidecar container
Local Supabase	No	No persistent storage
Cold starts	~2-5s	First request after scale-to-zero
Auto-HTTPS	Yes	Built-in on `*.run.app`

Comparison: All Deployment Methods¶

	Docker Local	Docker VPS	Bare-Metal	Cloud Run
Difficulty	Easy	Medium	Medium	Medium
Cloud dependencies	None	None (or optional)	Supabase cloud	Supabase + GCP
Auto-HTTPS	No (localhost)	Yes (Caddy)	No	Yes (*.run.app)
Ollama (local LLM)	Yes	Yes	Yes (manual)	No
Docling parser	Yes	Yes	Fallback only	No
Local Supabase	Yes	Yes	No	No
Hot-reload	Via `--dev` flag	No	Yes	No
GPU support	NVIDIA / AMD	NVIDIA / AMD	Manual	No
Scaling	Single machine	Single machine	Single machine	Auto (0-N)
Cost	Free	VPS cost (~$5-20/mo)	Free	Pay-per-request
Best for	Dev & testing	Production self-hosted	Active development	SaaS / low traffic

After Deployment: First Steps¶

Once your app is running, regardless of deployment method:

1. Open the Dashboard¶

Navigate to your Beyond Retrieval instance URL.

2. Configure API Keys¶

Go to Global Settings and add your LLM provider keys:

OpenRouter (recommended) — openrouter.ai/keys
OpenAI (optional) — platform.openai.com/api-keys
Mistral (optional, for OCR) — console.mistral.ai

3. Create a Notebook¶

Click Create Notebook
Enter a title (e.g., "Product Docs")
Select an embedding model (text-embedding-3-small recommended)
Click Create

Embedding Model is Permanent

The embedding model is locked after creation. All documents must share the same vector space.

4. Upload and Ingest Documents¶

Open your notebook → Documents page
Drag and drop files (PDF, DOCX, TXT, MD, CSV, XLSX)
Click Ingest to start the processing pipeline
Watch status: Pending → Processing → Success

5. Ask a Question¶

Navigate to Chat
Click + New Chat
Type a question, e.g.: "What is the cancellation policy?"
Get a cited answer grounded in your documents

6. Explore Features¶

Feature	What to Try
Search Playground	Compare Fusion vs Semantic search
AI Enhancer	Enrich chunks with AI context
Intelligence Settings	Switch between OpenRouter, OpenAI, or Ollama
System Monitor	Check knowledge base health score

Using the API¶

You can also interact programmatically:

import httpx

BASE = "http://localhost:8000/api"

# 1. Create a notebook
nb = httpx.post(f"{BASE}/notebooks/", json={
    "title": "My API Notebook",
    "embedding_model": "openai/text-embedding-3-small"
}).json()["data"]

notebook_id = nb["notebook_id"]

# 2. Upload a file
with open("document.pdf", "rb") as f:
    upload = httpx.post(
        f"{BASE}/notebooks/{notebook_id}/documents/upload",
        files={"files": ("document.pdf", f, "application/pdf")}
    ).json()["data"]

file_info = upload[0]

# 3. Start ingestion
httpx.post(f"{BASE}/notebooks/{notebook_id}/documents/ingest", json={
    "files": [file_info],
    "settings": {
        "parser": "Docling Parser",
        "chunking_strategy": "Recursive Chunking",
        "chunk_size": 1000,
        "chunk_overlap": 200
    },
    "notebook_name": "My API Notebook"
})

# 4. Wait for ingestion, then chat
import time
time.sleep(10)

conv = httpx.post(f"{BASE}/notebooks/{notebook_id}/conversations", json={
    "title": "First Chat"
}).json()["data"]

response = httpx.post(
    f"{BASE}/notebooks/{notebook_id}/conversations/{conv['conversation_id']}/messages",
    json={"content": "What are the main topics in this document?"}
).json()["data"]

print(response["assistant_message"]["content"])

See the API Reference for the complete endpoint catalog.

Troubleshooting¶

Problem	Fix
`.env` changes not applied in Docker	Use `docker compose up -d` (not `restart`) — recreates containers
PostgREST returns null for new columns	`docker compose restart supabase-rest`
Caddy SSL not provisioning	Ensure DNS A record points to your VPS IP, ports 80/443 open
Port 80/443 already in use	Stop nginx/Apache: `sudo systemctl stop nginx`
Ollama model not found	Wait for `ollama-init` container to finish pulling
Windows `--reload` misses new files	Kill and restart uvicorn manually
`gcloud: command not found`	Install gcloud CLI
Cloud Run cold start timeout	Set `--min-instances 1` to keep one instance warm
Frontend shows "Failed to fetch"	Check backend is running and CORS_ORIGINS includes your URL

Next Steps¶

Configuration — All environment variables and deployment options
API Reference — Full endpoint documentation
User Guide — Complete walkthrough of every feature
Deployment Guide — Advanced deployment patterns
Cloud Run Guide — Deep-dive on serverless deployment

Quick Start¶

Docker on Local Machine¶

Prerequisites¶

Step 1: Clone and configure¶

Step 2: Generate Supabase JWT keys¶

Step 3: Start all services¶

Step 4: Open the app¶

GPU Support (optional)¶

Skipping Heavy Services (optional)¶

Using Cloud Supabase Instead¶

Management Commands¶

Verify it works¶

Docker on a VPS¶

Prerequisites¶

Step 1: SSH into your VPS and clone¶

Step 2: Create your .env file¶

Step 3: (Optional) Add Supabase Studio subdomain¶

Step 4: Start all services¶

Step 5: Verify¶

VPS with Cloud Supabase¶

Post-Deployment Checklist¶

Without Docker (Bare-Metal)¶

Prerequisites¶

Step 1: Clone the repository¶

Step 2: Set up the database¶

Step 3: Configure environment¶

Step 4: Start the backend¶

Step 5: Start the frontend¶

Step 6: Verify¶

Optional: Run Ollama locally¶

Bare-Metal Limitations¶

Google Cloud Run¶

Prerequisites¶

Architecture¶

Step 1: Set up Supabase¶

Step 2: Set environment variables¶

Step 3: One-time GCP setup¶

Step 4: Deploy¶

Step 5: Verify¶

Deploy individual services¶

CI/CD with Cloud Build¶

Custom Domain¶

Cloud Run Service Specs¶

Cloud Run Limitations¶

Comparison: All Deployment Methods¶

After Deployment: First Steps¶

1. Open the Dashboard¶

2. Configure API Keys¶

3. Create a Notebook¶

4. Upload and Ingest Documents¶

5. Ask a Question¶

6. Explore Features¶

Using the API¶

Troubleshooting¶

Next Steps¶

Step 2: Create your `.env` file¶