Skip to content

LLM Backend Setup (Optional)

An LLM backend unlocks receipt OCR, recipe suggestions (L3–L4), and style auto-classification. Everything else works without one.

You can use any OpenAI-compatible inference server: Ollama, vLLM, LM Studio, a local llama.cpp server, or a commercial API.

BYOK — Bring Your Own Key

BYOK means you provide your own LLM backend. Paid AI features are unlocked at any tier when a valid backend is configured. You pay for your own inference; Kiwi just uses it.

Choosing a backend

Backend Best for Notes
Ollama Local, easy setup Recommended for getting started
vLLM Local, high throughput Better for faster hardware
OpenAI API No local GPU Requires paid API key
Anthropic API No local GPU Requires paid API key
# Install Ollama
curl -fsSL https://ollama.ai/install.sh | sh

# Pull a model — llama3.1 8B works well for recipe tasks
ollama pull llama3.1

# Verify it's running
ollama list

In your Kiwi .env:

LLM_BACKEND=ollama
LLM_BASE_URL=http://host.docker.internal:11434
LLM_MODEL=llama3.1

Docker networking

Use host.docker.internal instead of localhost when Ollama is running on your host and Kiwi is in Docker.

OpenAI-compatible API

LLM_BACKEND=openai
LLM_BASE_URL=https://api.openai.com/v1
LLM_API_KEY=sk-your-key-here
LLM_MODEL=gpt-4o-mini

Verify the connection

In the Kiwi Settings page, the LLM status indicator shows whether the backend is reachable. A green checkmark means OCR and L3–L4 recipe suggestions are active.

What LLM is used for

Feature LLM required
Receipt OCR (line-item extraction) Yes
Recipe suggestions L1 (pantry match) No
Recipe suggestions L2 (substitution) No
Recipe suggestions L3 (style templates) Yes
Recipe suggestions L4 (full generation) Yes
Style auto-classifier Yes

L1 and L2 suggestions use deterministic matching — they work without any LLM configured. See Recipe Engine for the full algorithm breakdown.

Model recommendations

  • Receipt OCR: any model with vision capability (LLaVA, GPT-4o, etc.)
  • Recipe suggestions: 7B–13B instruction-tuned models work well; larger models produce more creative L4 output
  • Style classification: small models handle this fine (3B+)