TL;DR
- Install with
pip install llmorbrew install simonw/llm/llm(macOS) - Configure API keys via
llm keys set <provider> <key> - Run prompts:
llm "Explain quantum computing"or pipe data:cat notes.txt | llm -s "Summarize" - Plugins extend support to Ollama, Claude, Gemini, and local models
- Conversations stored in SQLite (
~/.llm/log.db)
## 1. Installation
Python (pip/pipx)
# Recommended: pipx (isolated environment)
pipx install llm
# Alternative: pip (global)
pip install --user llm
Expected output:
✔ Successfully installed llm-0.15.0
Gotchas:
- If
llmcommand isn't found afterpip install, add~/.local/binto yourPATH:echo 'export PATH="$HOME/.local/bin:$PATH"' >> ~/.bashrc source ~/.bashrc - Python 3.8+ required. Verify with
python --version.
macOS (Homebrew)
brew install simonw/llm/llm
Windows (PowerShell)
pip install llm
- Ensure Microsoft Visual C++ Build Tools are installed if you encounter compilation errors.
Docker
docker run -it --rm -v ~/.llm:/root/.llm simonw/llm:latest
- Persists data in
~/.llmon the host.
## 2. Configuring API Keys
LLM supports multiple providers. Set keys via the CLI:
# OpenAI
llm keys set openai
# Paste your key when prompted
# Anthropic (Claude)
llm keys set anthropic
# Google Gemini
llm keys set gemini
Verify keys:
llm keys list
Expected output:
Keys:
openai: sk-...abc123
anthropic: sk-ant-...xyz456
Gotchas:
- Keys are stored in
~/.llm/keys.json. Secure this file withchmod 600 ~/.llm/keys.json. - Rate limits apply. Check your provider's dashboard for usage.
## 3. Running Prompts
Basic Prompt
llm "Explain quantum computing in 3 bullet points"
Expected output:
- Quantum bits (qubits) can exist in multiple states simultaneously (superposition).
- Qubits can be entangled, meaning the state of one instantly influences another.
- Quantum computers solve certain problems exponentially faster than classical computers.
Specify a Model
llm -m gpt-4o "Write a Python script to fetch stock prices"
Supported models:
gpt-4o,gpt-4-turbo(OpenAI)claude-3-opus(Anthropic)gemini-1.5-pro(Google)llama3(via Ollama plugin)
Local Models (GGUF)
# Install the Ollama plugin
llm install llm-ollama
# Pull a model (e.g., Llama 3)
ollama pull llama3
# Run locally
llm -m ollama/llama3 "Explain the Physical AI Stack"
Gotchas:
- Local models require significant RAM (e.g., 7B+ models need 8GB+).
- Quantized models (e.g.,
llama3:8b-instruct-q4_K_M) reduce memory usage but may impact quality.
## 4. Plugin Ecosystem
Install Plugins
# Ollama (local models)
llm install llm-ollama
# Mistral
llm install llm-mistral
# Hugging Face
llm install llm-huggingface
List Available Models
llm models list
Expected output:
OpenAI:
gpt-4o
gpt-4-turbo
Anthropic:
claude-3-opus
Ollama:
ollama/llama3
ollama/mistral
Use a Plugin Model
llm -m ollama/llama3 "Generate a Dockerfile for a FastAPI app"
Gotchas:
- Plugins may require additional dependencies (e.g.,
llm-ollamaneedsollamainstalled). - Check plugin docs for setup:
llm plugins info <plugin-name>.
## 5. Conversation History and Templates
View Conversations
llm logs
Expected output:
ID Model Prompt Timestamp
1 gpt-4o Explain quantum computing... 2026-05-12 10:00:00
2 claude-3 Write a Python script... 2026-05-12 10:05:00
Resume a Conversation
llm logs --id 1 --continue
Prompt:
Follow up: How does superposition enable quantum parallelism?
Save a Template
Create ~/.llm/templates/summarize.yaml:
system: You are a concise summarizer. Extract key points in bullet form.
prompt: "Summarize the following text:\n\n{{ text }}"
Use the template:
cat notes.txt | llm -t summarize
Gotchas:
- Templates use Jinja2 syntax. Escape
{{and}}with{% raw %}if needed. - Conversations are stored in
~/.llm/log.db(SQLite). Back up this file for long-term storage.
## 6. Piping Data and Shell Integration
Pipe Text to LLM
cat README.md | llm -s "Summarize this project in 1 paragraph"
Pipe Command Output
git diff | llm -s "Explain these changes in simple terms"
Save Output to File
llm "Generate a Kubernetes deployment YAML for a FastAPI app" > deployment.yaml
Use with jq (JSON Processing)
curl https://api.github.com/repos/simonw/llm | jq '.description' | llm -s "Explain this project"
Gotchas:
- Piped input is treated as a single prompt. For multi-turn conversations, use
llm logs --continue. - Large inputs may hit token limits. Truncate with
head -n 100or similar.
## 7. Building Custom Workflows
Python API
import llm
model = llm.get_model("gpt-4o")
response = model.prompt("Explain the Hyperion Lifecycle in 3 steps")
print(response.text())
Embeddings
# Generate embeddings (requires llm-embeddings plugin)
llm embed -m text-embedding-3-small "The Physical AI Stack" > embedding.json
# Compare embeddings
llm embed --compare "SENSE layer" "CONNECT layer"
Custom Model Aliases
Add to ~/.llm/config:
aliases:
fast: gpt-4o-mini
local: ollama/llama3
Usage:
llm -m fast "Write a Python one-liner to parse CSV"
Scheduled Prompts (Cron)
# Edit crontab
crontab -e
Add:
0 9 * * * /usr/local/bin/llm "Summarize top Hacker News stories" >> ~/hn-summary.txt
Gotchas:
- For Python API, ensure
llmis installed in your virtual environment. - Embeddings require the
llm-embeddingsplugin and may need additional dependencies (e.g.,sentence-transformers).
## Alternatives at a Glance
| Tool | Best For | Limitations |
|---|---|---|
| LLM | Lightweight CLI, local/API models | No GUI, limited analytics |
| LangChain | Enterprise workflows, RAG | Steeper learning curve |
| Ollama | Local models (GGUF) | Fewer API integrations |
## What's Next?
- Explore plugins: Install
llm install llm-gpt4allfor offline models orllm install llm-mistralfor Mistral support. - Automate workflows: Pipe command output to LLM (e.g.,
git log | llm -s "Summarize recent changes"). - Build a custom template: Create a YAML template for repetitive tasks (e.g., code reviews or meeting notes).
For advanced AI tooling and workflows, Hyperion Consulting offers enterprise-grade solutions to accelerate your AI adoption.
