LiteLLM Setup
LiteLLM is a unified proxy that provides OpenAI-compatible API access to 100+ LLM providers, including local models and self-hosted deployments.
Why LiteLLM?
- Local Models: Run AI on your own hardware
- Privacy: Keep data completely private
- No API Costs: Free for local models
- Multi-Provider: Connect to any LLM API
- Custom Models: Use your fine-tuned models
Installation
Or with all dependencies:
Quick Start
Start LiteLLM Proxy
With OpenAI:
With Local Ollama:
With Config File:
LiteLLM starts on http://localhost:4000 by default.
Configure Browser Operator
- Start your LiteLLM proxy (see above)
- Open Browser Operator
- Click AI Chat → Settings (⚙️)
- Select “LiteLLM Provider”
- Enter proxy URL:
http://localhost:4000 - (Optional) Enter master key if configured
- Click “Fetch Models”
- Select your model
- Click “Save”
Popular Setups
Ollama (Local Models)
Install Ollama:
Pull and run models:
Start LiteLLM:
Popular Models:
llama3.1 - Meta’s Llama (8B, 70B, 405B)mistral - Mistral 7Bqwen2.5 - Alibaba Qwendeepseek-coder - Code generationgemma2 - Google Gemmaphi3 - Microsoft Phi-3
Browse all: ollama.ai/library
Multiple Providers
Troubleshooting
LiteLLM Won’t Start
- Check Python version:
python --version (need 3.8+) - Reinstall:
pip uninstall litellm && pip install --upgrade litellm
Cannot Connect to Ollama
- Verify Ollama running:
curl http://localhost:11434/api/tags - Check logs:
ollama serve (run in foreground)
Models Not Loading
- Check LiteLLM running:
curl http://localhost:4000/health - Verify models available:
curl http://localhost:4000/models - Review LiteLLM logs for errors
- Confirm correct proxy URL in Browser Operator
Out of Memory
- Use smaller models (7B instead of 70B)
- Enable quantization (4-bit, 8-bit)
- Reduce context length
- Close other applications
Slow Inference
- Use GPU acceleration if available
- Enable model quantization
- Use smaller models
- Try vLLM for production workloads
Security
Secure Your Proxy
Next Steps
Support