LiteLLM Setup

LiteLLM is a unified proxy that provides OpenAI-compatible API access to 100+ LLM providers, including local models and self-hosted deployments.

Why LiteLLM?

  • Local Models: Run AI on your own hardware
  • Privacy: Keep data completely private
  • No API Costs: Free for local models
  • Multi-Provider: Connect to any LLM API
  • Custom Models: Use your fine-tuned models

Installation

1
pip install litellm

Or with all dependencies:

1
pip install 'litellm[proxy]'

Quick Start

Start LiteLLM Proxy

With OpenAI:

1
litellm --model gpt-5

With Local Ollama:

1
2
3
ollama serve
ollama pull llama3.1
litellm --model ollama/llama3.1

With Config File:

1
2
3
4
5
6
# litellm_config.yaml
model_list:
  - model_name: llama3
    litellm_params:
      model: ollama/llama3.1
      api_base: http://localhost:11434
1
litellm --config litellm_config.yaml

LiteLLM starts on http://localhost:4000 by default.

Configure Browser Operator

  1. Start your LiteLLM proxy (see above)
  2. Open Browser Operator
  3. Click AI ChatSettings (⚙️)
  4. Select “LiteLLM Provider”
  5. Enter proxy URL: http://localhost:4000
  6. (Optional) Enter master key if configured
  7. Click “Fetch Models”
  8. Select your model
  9. Click “Save”

Popular Setups

Ollama (Local Models)

Install Ollama:

1
2
3
4
5
6
7
# macOS
brew install ollama

# Linux
curl -fsSL https://ollama.ai/install.sh | sh

# Windows: Download from ollama.ai

Pull and run models:

1
2
3
4
ollama serve
ollama pull llama3.1
ollama pull mistral
ollama pull qwen2.5

Start LiteLLM:

1
litellm --model ollama/llama3.1

Popular Models:

  • llama3.1 - Meta’s Llama (8B, 70B, 405B)
  • mistral - Mistral 7B
  • qwen2.5 - Alibaba Qwen
  • deepseek-coder - Code generation
  • gemma2 - Google Gemma
  • phi3 - Microsoft Phi-3

Browse all: ollama.ai/library

Multiple Providers

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
# litellm_config.yaml
model_list:
  # OpenAI
  - model_name: gpt-5
    litellm_params:
      model: openai/gpt-5
      api_key: sk-...

  # Anthropic
  - model_name: claude-sonnet
    litellm_params:
      model: anthropic/claude-3.5-sonnet
      api_key: sk-ant-...

  # Local Ollama
  - model_name: llama3-local
    litellm_params:
      model: ollama/llama3.1
      api_base: http://localhost:11434

Troubleshooting

LiteLLM Won’t Start

  • Check Python version: python --version (need 3.8+)
  • Reinstall: pip uninstall litellm && pip install --upgrade litellm

Cannot Connect to Ollama

  • Verify Ollama running: curl http://localhost:11434/api/tags
  • Check logs: ollama serve (run in foreground)

Models Not Loading

  1. Check LiteLLM running: curl http://localhost:4000/health
  2. Verify models available: curl http://localhost:4000/models
  3. Review LiteLLM logs for errors
  4. Confirm correct proxy URL in Browser Operator

Out of Memory

  • Use smaller models (7B instead of 70B)
  • Enable quantization (4-bit, 8-bit)
  • Reduce context length
  • Close other applications

Slow Inference

  • Use GPU acceleration if available
  • Enable model quantization
  • Use smaller models
  • Try vLLM for production workloads

Security

Secure Your Proxy

1
2
general_settings:
  master_key: your-secure-random-key
1
2
3
4
5
6
7
# Run on localhost only
litellm --host 127.0.0.1 --port 4000

# Use HTTPS
litellm --config config.yaml \
  --ssl_keyfile key.pem \
  --ssl_certfile cert.pem

Next Steps

Support