Skip to content

Quick Start

Choose your deployment method and get Beyond Retrieval v2 running. Each section is self-contained — pick the one that fits your setup.

Method Best For Requirements
Docker on Local Machine Development, testing, offline/air-gapped Docker Desktop
Docker on VPS Production, public-facing, auto-HTTPS VPS + domain name
Without Docker (Bare-Metal) Frontend/backend development, debugging Python 3.12+, Node.js 22+
Google Cloud Run Serverless production, auto-scaling, pay-per-use GCP account + Supabase cloud

Docker on Local Machine

The fastest way to get everything running — backend, frontend, Caddy reverse proxy, Ollama, Docling, and a local Supabase database. Zero cloud dependencies.

Prerequisites

Step 1: Clone and configure

git clone https://github.com/your-org/beyond-retrieval.git
cd beyond-retrieval/beyond-retrieval-pythonv
cp .env.example .env

The default .env.example runs everything locally with zero cloud dependencies — local Supabase, local Ollama, no API keys required at startup.

API Keys

LLM provider keys (OpenRouter, OpenAI, Mistral) are configured from the Global Settings page inside the app — not in .env.

Step 2: Generate Supabase JWT keys

You need valid JWT keys for the local Supabase instance. The .env.example includes demo keys that work out of the box. For production, generate your own:

  1. Go to supabase.com/docs/guides/self-hosting/docker#generate-api-keys
  2. Enter a strong JWT_SECRET (at least 32 characters)
  3. Copy the generated anon key → ANON_KEY
  4. Copy the generated service_role key → SUPABASE_SERVICE_ROLE_KEY
  5. Set LOCAL_SUPABASE_KEY to the same value as SUPABASE_SERVICE_ROLE_KEY

Step 3: Start all services

python start_services.py --profile cpu --build

This starts 16 containers: backend, frontend, Caddy, PostgreSQL, PostgREST, GoTrue, Kong, Storage, Studio, Meta, Ollama, and Docling.

First run takes a few minutes to pull images (~8GB total). Subsequent starts are fast.

Step 4: Open the app

Service URL
App http://localhost:3000
Supabase Studio http://localhost:54321
FastAPI Docs http://localhost:8000/docs

GPU Support (optional)

If you have a GPU and want faster Ollama inference:

# NVIDIA GPU (CUDA):
python start_services.py --profile nvidia --build

# AMD GPU (ROCm):
python start_services.py --profile amd --build

Skipping Heavy Services (optional)

# Skip Docling sidecar (~3GB image):
python start_services.py --profile cpu --no-docling --build

# Skip Ollama (use cloud LLMs only):
python start_services.py --profile cpu --no-ollama --build

# Skip local Supabase (use cloud Supabase instead):
python start_services.py --profile cpu --no-supabase --build

# Minimal — backend + frontend only:
python start_services.py --no-ollama --no-docling --no-supabase --build

Using Cloud Supabase Instead

If you prefer a hosted Supabase project instead of the local Docker one:

cp .env.cloud.example .env

Edit .env and fill in your Supabase URL and service role key, then:

python start_services.py --profile cpu --no-supabase --build

Management Commands

python start_services.py --stop              # Stop all services
python start_services.py --logs backend      # Tail logs for a service
python start_services.py --logs              # Tail all logs
python start_services.py --status            # Show service status

Verify it works

curl http://localhost:8000/api/health

Expected response:

{"status": "ok", "service": "beyond-retrieval-v2"}

Docker on a VPS

Deploy to a VPS (Ubuntu, Debian, etc.) with Docker, Caddy auto-HTTPS, and a public domain. Runs the full stack including local Supabase.

Prerequisites

  • A VPS with at least 4GB RAM and 20GB disk (8GB RAM recommended)
  • A domain name pointed at your VPS IP (A record in DNS)
  • Docker Engine + Docker Compose v2 installed on the VPS
  • Python 3.12+ on the VPS
  • Ports 80 and 443 open in your firewall

Step 1: SSH into your VPS and clone

ssh user@your-server-ip
git clone https://github.com/your-org/beyond-retrieval.git
cd beyond-retrieval/beyond-retrieval-pythonv

Step 2: Create your .env file

cp .env.example .env
nano .env

Set these critical values:

# ── Your domain (triggers auto-HTTPS via Caddy + Let's Encrypt) ──
APP_HOSTNAME=app.yourdomain.com

# ── CORS — must include your domain ──
CORS_ORIGINS=["https://app.yourdomain.com"]

# ── Auth — disable bypass for production ──
BYPASS_AUTH=false

# ── Performance ──
WEB_CONCURRENCY=4

# ── Local Supabase JWT keys ──
# Generate at: https://supabase.com/docs/guides/self-hosting/docker#generate-api-keys
POSTGRES_PASSWORD=your-very-strong-password-here
JWT_SECRET=your-jwt-secret-at-least-32-characters-long
ANON_KEY=your-generated-anon-jwt
SUPABASE_SERVICE_ROLE_KEY=your-generated-service-role-jwt
LOCAL_SUPABASE_KEY=your-generated-service-role-jwt   # must match above
DASHBOARD_USERNAME=admin
DASHBOARD_PASSWORD=a-strong-studio-password

Generate real JWT keys

Do NOT use the demo keys from .env.example in production. Generate your own at the Supabase link above. The LOCAL_SUPABASE_KEY must equal SUPABASE_SERVICE_ROLE_KEY (same JWT_SECRET).

Step 3: (Optional) Add Supabase Studio subdomain

If you want to access Supabase Studio publicly, add a second DNS A record and set:

STUDIO_HOSTNAME=studio.yourdomain.com

Step 4: Start all services

python start_services.py --profile cpu --build

For a GPU server:

# NVIDIA GPU:
python start_services.py --profile nvidia --build

Caddy automatically provisions SSL certificates from Let's Encrypt. Your app is live at https://app.yourdomain.com within minutes.

Step 5: Verify

curl https://app.yourdomain.com/api/health

Expected:

{"status": "ok", "service": "beyond-retrieval-v2"}

VPS with Cloud Supabase

If you prefer using a managed Supabase project (supabase.co) instead of running the database locally:

cp .env.cloud.example .env
nano .env

Set your domain and Supabase credentials:

APP_HOSTNAME=app.yourdomain.com
CORS_ORIGINS=["https://app.yourdomain.com"]
SUPABASE_URL=https://your-project.supabase.co
SUPABASE_SECRET_KEY=your-service-role-key
BYPASS_AUTH=false

Then start without local Supabase:

python start_services.py --profile cpu --no-supabase --build

Database Schema

When using cloud Supabase, you must apply the schema manually:

  1. Open the SQL Editor in your Supabase Dashboard
  2. Copy the contents of db/migrations/001_initial_schema.sql
  3. Paste and click Run

Post-Deployment Checklist

  • Health check passes: curl https://app.yourdomain.com/api/health
  • Auth config correct: curl https://app.yourdomain.com/api/auth/config
  • Open the app and create a test notebook
  • Upload a document and verify ingestion completes
  • Ask a question in chat — citations should appear
  • Configure API keys in Global Settings (OpenRouter, etc.)
  • Set up authentication provider if BYPASS_AUTH=false

Without Docker (Bare-Metal)

Run the backend and frontend directly on your machine. Best for active development with hot-reload.

Prerequisites

Tool Version Notes
Python 3.12+ Required for type \| None syntax
Node.js 22+ Includes npm; used for the React frontend
Git 2.x+ Standard version control

You also need a Supabase database — either:

Step 1: Clone the repository

git clone https://github.com/your-org/beyond-retrieval.git
cd beyond-retrieval/beyond-retrieval-pythonv

Step 2: Set up the database

  1. Create a free project at supabase.com/dashboard
  2. Open SQL Editor
  3. Copy the contents of db/migrations/001_initial_schema.sql
  4. Paste and click Run
  5. Note your Project URL and Service Role Key from Settings > API

If you already have a Supabase project with the schema applied, grab your credentials from Settings > API.

Step 3: Configure environment

cp .env.cloud.example .env

Edit .env with your Supabase credentials:

SUPABASE_URL=https://your-project.supabase.co
SUPABASE_SECRET_KEY=your-service-role-key
BYPASS_AUTH=true
OLLAMA_BASE_URL=http://localhost:11434   # if running Ollama locally

Step 4: Start the backend

cd backend
python -m venv venv

# Activate the virtual environment
source venv/bin/activate          # Linux / macOS
# venv\Scripts\activate           # Windows

pip install -r requirements.txt

# Start the FastAPI server
uvicorn main:app --reload --port 8000

The backend serves at http://localhost:8000. API docs at http://localhost:8000/docs.

Windows: --reload limitation

On Windows, --reload does not detect new files created after the watcher started. When adding new router, service, or schema files, you must kill and restart uvicorn manually.

Step 5: Start the frontend

Open a new terminal:

cd beyond-retrieval/beyond-retrieval-pythonv/frontend
npm install
npm run dev

The frontend opens at http://localhost:5173. Vite automatically proxies /api requests to http://localhost:8000.

Step 6: Verify

curl http://localhost:8000/api/health

Open http://localhost:5173 in your browser.

Optional: Run Ollama locally

If you want local LLM inference without Docker:

  1. Install Ollama from ollama.com
  2. Pull a model: ollama pull qwen2.5:1.5b
  3. Ollama runs on http://localhost:11434 by default — the .env already points there

Bare-Metal Limitations

Feature Available? Why
Auto-HTTPS No Caddy only runs in Docker; use nginx/certbot manually
Local Supabase No Requires Docker containers; use cloud Supabase instead
Docling sidecar No Requires Docker; Docling falls back to local import
Hot-reload Yes --reload on backend, npm run dev on frontend

Google Cloud Run

Deploy as two serverless containers that auto-scale from zero. Pay only for actual usage. Requires a cloud Supabase project (no local database on Cloud Run).

Prerequisites

Requirement How to get it
Google Cloud account cloud.google.com (free tier available)
gcloud CLI Install guide
Docker Docker Desktop
Supabase project supabase.com (free tier works)
OpenRouter API key openrouter.ai

Architecture

                 Internet
                    |
        +-----------+-----------+
        |                       |
Cloud Run (frontend)    Cloud Run (backend)
React + nginx            FastAPI
*.run.app                *.run.app
        |                       |
        |     nginx proxies     |
        |     /api/* ---------->|
        |                       |
        |               +-------+-------+
        |               |  Supabase.co  |
        |               |  (Database,   |
        |               |   Auth,       |
        |               |   Storage)    |
        +               +---------------+

Both services auto-scale from 0 to N instances based on traffic. You pay nothing when idle.

Step 1: Set up Supabase

  1. Create a project at supabase.com/dashboard
  2. Open SQL Editor
  3. Copy db/migrations/001_initial_schema.sql and run it
  4. Note your Project URL and Service Role Key from Settings > API

Step 2: Set environment variables

# Required
export GCP_PROJECT_ID=my-gcp-project-id
export SUPABASE_URL=https://your-project.supabase.co
export SUPABASE_SECRET_KEY=your-service-role-key
export OPENROUTER_API_KEY=sk-or-v1-your-key

Optional:

export GCP_REGION=us-central1              # default: europe-west3
export BYPASS_AUTH=true                     # default: false
export MISTRAL_API_KEY=your-mistral-key    # for OCR
export VITE_SUPABASE_URL=https://your-project.supabase.co
export VITE_SUPABASE_ANON_KEY=your-anon-key

Step 3: One-time GCP setup

cd beyond-retrieval/beyond-retrieval-pythonv
./cloudrun-deploy.sh setup

This enables Cloud Run, Artifact Registry, and Cloud Build APIs, creates a Docker repo, and configures Docker auth. Run once per GCP project.

Step 4: Deploy

./cloudrun-deploy.sh deploy

This builds and deploys both services. When done:

[OK]  Deployment complete!
  Backend:  https://beyond-retrieval-backend-xxxxx-ey.a.run.app
  Frontend: https://beyond-retrieval-frontend-xxxxx-ey.a.run.app

Step 5: Verify

curl $(cat .backend-url)/api/health

Open the frontend URL in your browser. Create a notebook, upload a document, and test the RAG chat.

Deploy individual services

./cloudrun-deploy.sh deploy-backend    # Backend only
./cloudrun-deploy.sh deploy-frontend   # Frontend only

CI/CD with Cloud Build

Generate a cloudbuild.yaml for automated deployments on push:

./cloudrun-deploy.sh cloudbuild

Set up a Cloud Build trigger in the GCP Console to run on push to main.

Secret management

Store SUPABASE_SECRET_KEY and OPENROUTER_API_KEY in Secret Manager rather than hardcoding.

Custom Domain

Cloud Run services get a *.run.app URL by default. To use your own domain:

  1. Go to Cloud Run > your service > Domain Mappings
  2. Click Add Custom Domain
  3. Add the DNS records shown
  4. Wait for SSL (~15 minutes)

Cloud Run Service Specs

Setting Backend Frontend
Memory 2 GiB 256 MiB
CPU 2 1
Min instances 0 (scales to zero) 0
Max instances 10 5
Concurrency 80 req/instance 250 req/instance
Timeout 300s 300s

Cloud Run Limitations

Feature Available? Why
Ollama (local LLM) No No persistent GPU on Cloud Run
Docling (document parser) No Requires sidecar container
Local Supabase No No persistent storage
Cold starts ~2-5s First request after scale-to-zero
Auto-HTTPS Yes Built-in on *.run.app

Comparison: All Deployment Methods

Docker Local Docker VPS Bare-Metal Cloud Run
Difficulty Easy Medium Medium Medium
Cloud dependencies None None (or optional) Supabase cloud Supabase + GCP
Auto-HTTPS No (localhost) Yes (Caddy) No Yes (*.run.app)
Ollama (local LLM) Yes Yes Yes (manual) No
Docling parser Yes Yes Fallback only No
Local Supabase Yes Yes No No
Hot-reload Via --dev flag No Yes No
GPU support NVIDIA / AMD NVIDIA / AMD Manual No
Scaling Single machine Single machine Single machine Auto (0-N)
Cost Free VPS cost (~$5-20/mo) Free Pay-per-request
Best for Dev & testing Production self-hosted Active development SaaS / low traffic

After Deployment: First Steps

Once your app is running, regardless of deployment method:

1. Open the Dashboard

Navigate to your Beyond Retrieval instance URL.

2. Configure API Keys

Go to Global Settings and add your LLM provider keys:

3. Create a Notebook

  1. Click Create Notebook
  2. Enter a title (e.g., "Product Docs")
  3. Select an embedding model (text-embedding-3-small recommended)
  4. Click Create

Embedding Model is Permanent

The embedding model is locked after creation. All documents must share the same vector space.

4. Upload and Ingest Documents

  1. Open your notebook → Documents page
  2. Drag and drop files (PDF, DOCX, TXT, MD, CSV, XLSX)
  3. Click Ingest to start the processing pipeline
  4. Watch status: PendingProcessingSuccess

5. Ask a Question

  1. Navigate to Chat
  2. Click + New Chat
  3. Type a question, e.g.: "What is the cancellation policy?"
  4. Get a cited answer grounded in your documents

6. Explore Features

Feature What to Try
Search Playground Compare Fusion vs Semantic search
AI Enhancer Enrich chunks with AI context
Intelligence Settings Switch between OpenRouter, OpenAI, or Ollama
System Monitor Check knowledge base health score

Using the API

You can also interact programmatically:

import httpx

BASE = "http://localhost:8000/api"

# 1. Create a notebook
nb = httpx.post(f"{BASE}/notebooks/", json={
    "title": "My API Notebook",
    "embedding_model": "openai/text-embedding-3-small"
}).json()["data"]

notebook_id = nb["notebook_id"]

# 2. Upload a file
with open("document.pdf", "rb") as f:
    upload = httpx.post(
        f"{BASE}/notebooks/{notebook_id}/documents/upload",
        files={"files": ("document.pdf", f, "application/pdf")}
    ).json()["data"]

file_info = upload[0]

# 3. Start ingestion
httpx.post(f"{BASE}/notebooks/{notebook_id}/documents/ingest", json={
    "files": [file_info],
    "settings": {
        "parser": "Docling Parser",
        "chunking_strategy": "Recursive Chunking",
        "chunk_size": 1000,
        "chunk_overlap": 200
    },
    "notebook_name": "My API Notebook"
})

# 4. Wait for ingestion, then chat
import time
time.sleep(10)

conv = httpx.post(f"{BASE}/notebooks/{notebook_id}/conversations", json={
    "title": "First Chat"
}).json()["data"]

response = httpx.post(
    f"{BASE}/notebooks/{notebook_id}/conversations/{conv['conversation_id']}/messages",
    json={"content": "What are the main topics in this document?"}
).json()["data"]

print(response["assistant_message"]["content"])

See the API Reference for the complete endpoint catalog.


Troubleshooting

Problem Fix
.env changes not applied in Docker Use docker compose up -d (not restart) — recreates containers
PostgREST returns null for new columns docker compose restart supabase-rest
Caddy SSL not provisioning Ensure DNS A record points to your VPS IP, ports 80/443 open
Port 80/443 already in use Stop nginx/Apache: sudo systemctl stop nginx
Ollama model not found Wait for ollama-init container to finish pulling
Windows --reload misses new files Kill and restart uvicorn manually
gcloud: command not found Install gcloud CLI
Cloud Run cold start timeout Set --min-instances 1 to keep one instance warm
Frontend shows "Failed to fetch" Check backend is running and CORS_ORIGINS includes your URL

Next Steps