Installation¶
This page walks through a full Peregrine installation from scratch.
Prerequisites¶
- Git — to clone the repository
- Internet connection —
setup.shdownloads Docker and other dependencies - Operating system: Ubuntu/Debian, Fedora/RHEL, Arch Linux, or macOS (with Docker Desktop)
Windows
Windows is not supported. Use WSL2 with Ubuntu instead.
Step 1 — Clone the repository¶
Step 2 — Run setup.sh¶
setup.sh performs the following automatically:
- Detects your platform (Ubuntu/Debian, Fedora/RHEL, Arch, macOS)
- Installs Git if not already present
- Installs Docker Engine and the Docker Compose v2 plugin via the official Docker repositories
- Adds your user to the
dockergroup so you do not needsudofor docker commands (Linux only — log out and back in after this) - Detects NVIDIA GPUs — if
nvidia-smiis present and working, installs the NVIDIA Container Toolkit and configures Docker to use it - On macOS: offers to install Ollama natively for Metal GPU-accelerated inference (see Apple Silicon GPU below)
- Creates
.envfrom.env.example— edit.envto customise ports and model storage paths before starting
macOS
setup.sh installs Docker Desktop via Homebrew (brew install --cask docker) then exits. Open Docker Desktop, start it, then re-run the script. On a second pass it will proceed to the Ollama native install prompt.
GPU requirement (NVIDIA / Linux)
For GPU support, nvidia-smi must return output before you run setup.sh. Install your NVIDIA driver first. The Container Toolkit installation will fail silently if the driver is not present.
Step 3 — (Optional) Edit .env¶
The .env file controls ports and volume mount paths. The defaults work for most single-user installs:
# Default ports
STREAMLIT_PORT=8501
OLLAMA_PORT=11434
VLLM_PORT=8000
SEARXNG_PORT=8888
VISION_PORT=8002
Change STREAMLIT_PORT if 8501 is taken on your machine.
Step 4 — Start Peregrine¶
Choose a profile based on your hardware:
make start # remote — no GPU, use API-only LLMs
make start PROFILE=cpu # cpu — local models on CPU (or Metal GPU on Apple Silicon, see below)
make start PROFILE=single-gpu # single-gpu — one NVIDIA GPU
make start PROFILE=dual-gpu # dual-gpu — GPU 0 = Ollama, GPU 1 = vLLM (NVIDIA only)
make start runs preflight.py first, which checks for port conflicts and writes GPU/RAM recommendations back to .env. Then it calls docker compose --profile <PROFILE> up -d.
Step 5 — Open the UI¶
Navigate to http://localhost:8501 (or whatever STREAMLIT_PORT you set).
The first-run wizard launches automatically. See First-Run Wizard for a step-by-step guide through all seven steps.
Supported Platforms¶
| Platform | Tested | Notes |
|---|---|---|
| Ubuntu 22.04 / 24.04 | Yes | Primary target |
| Debian 12 | Yes | |
| Fedora 39/40 | Yes | |
| RHEL / Rocky / AlmaLinux | Yes | |
| Arch Linux / Manjaro | Yes | |
| macOS (Apple Silicon) | Yes | Docker Desktop required; Metal GPU via native Ollama (see below) |
| macOS (Intel) | Yes | Docker Desktop required; no GPU profiles (no NVIDIA) |
| Windows | No | Use WSL2 with Ubuntu |
GPU Support¶
NVIDIA (Linux)¶
Requirements:
- NVIDIA driver installed and nvidia-smi working before running setup.sh
- CUDA 12.x recommended (CUDA 11.x may work but is untested)
- Minimum 8 GB VRAM for single-gpu profile with default models
- For dual-gpu: GPU 0 is assigned to Ollama, GPU 1 to vLLM
If your GPU has less than 10 GB VRAM, preflight.py will calculate a CPU_OFFLOAD_GB value and write it to .env. The vLLM container picks this up via --cpu-offload-gb to overflow KV cache to system RAM.
AMD ROCm is not currently supported.
Apple Silicon GPU (macOS)¶
Docker Desktop on macOS runs in a Linux VM and cannot access the Apple GPU directly. Metal-accelerated local inference is available through a different path: native Ollama on the host.
setup.sh prompts you to install Ollama natively via Homebrew. When Ollama is running on port 11434, preflight.py detects it and automatically stubs out the Docker Ollama container, routing all inference through the native process (which uses Metal automatically).
To set it up manually if you skipped the prompt:
Then start Peregrine with --profile cpu. Despite the name, inference will run on the GPU. single-gpu and dual-gpu profiles require NVIDIA hardware and are not applicable on macOS.
Stopping Peregrine¶
Reinstalling / Clean State¶
You will be prompted to type yes to confirm.