LLM Backend Setup (Optional)¶

An LLM backend unlocks receipt OCR, recipe suggestions (L3–L4), and style auto-classification. Everything else works without one.

You can use any OpenAI-compatible inference server: Ollama, vLLM, LM Studio, a local llama.cpp server, or a commercial API.

BYOK — Bring Your Own Key¶

BYOK means you provide your own LLM backend. Paid AI features are unlocked at any tier when a valid backend is configured. You pay for your own inference; Kiwi just uses it.

Choosing a backend¶

Backend	Best for	Notes
Ollama	Local, easy setup	Recommended for getting started
vLLM	Local, high throughput	Better for faster hardware
OpenAI API	No local GPU	Requires paid API key
Anthropic API	No local GPU	Requires paid API key

Ollama setup (recommended)¶

# Install Ollama
curl -fsSL https://ollama.ai/install.sh | sh

# Pull a model — llama3.1 8B works well for recipe tasks
ollama pull llama3.1

# Verify it's running
ollama list

In your Kiwi .env:

LLM_BACKEND=ollama
LLM_BASE_URL=http://host.docker.internal:11434
LLM_MODEL=llama3.1

Docker networking

Use host.docker.internal instead of localhost when Ollama is running on your host and Kiwi is in Docker.

OpenAI-compatible API¶

LLM_BACKEND=openai
LLM_BASE_URL=https://api.openai.com/v1
LLM_API_KEY=sk-your-key-here
LLM_MODEL=gpt-4o-mini

Verify the connection¶

In the Kiwi Settings page, the LLM status indicator shows whether the backend is reachable. A green checkmark means OCR and L3–L4 recipe suggestions are active.

What LLM is used for¶

Feature	LLM required
Receipt OCR (line-item extraction)	Yes
Recipe suggestions L1 (pantry match)	No
Recipe suggestions L2 (substitution)	No
Recipe suggestions L3 (style templates)	Yes
Recipe suggestions L4 (full generation)	Yes
Style auto-classifier	Yes

L1 and L2 suggestions use deterministic matching — they work without any LLM configured. See Recipe Engine for the full algorithm breakdown.

Model recommendations¶

Receipt OCR: any model with vision capability (LLaVA, GPT-4o, etc.)
Recipe suggestions: 7B–13B instruction-tuned models work well; larger models produce more creative L4 output
Style classification: small models handle this fine (3B+)