Tools I use in production — not tools I have a partnership with
Every technology listed here has been deployed in a production system. I am vendor-agnostic by design — the right tool depends on your use case, your data, and your budget. No partnership deals influence my recommendations.
Foundation models for reasoning, generation, and multimodal tasks
Industry-recognized credentials demonstrating expertise
Scrum.org
Product ownership and value maximization in Scrum
Issued 2019
Scrum Alliance
Agile facilitation and Scrum framework mastery
Issued 2018
Scaled Agile
Scaled Agile Framework for enterprise transformation
Issued 2021
Product School
Building and managing AI-powered products
Issued 2023
DeepLearning.AI
Neural networks, CNNs, RNNs, and transformers
Issued 2022
No vendor partnerships influence my recommendations. I choose the model that fits your latency, cost, and accuracy requirements.
Every tool here has been deployed in a system that handles real traffic. Lab-tested tools do not make this list.
Most AI projects overspend on infrastructure by 3-5x. I right-size from the start — smaller models, smarter caching, efficient inference.
AI models change every quarter. My architectures abstract the model layer so you can swap providers without rewriting your application.
Book a 30-minute call. I will assess your requirements and recommend the right combination of models, infrastructure, and frameworks — with cost estimates.
Every tool we evaluate, deploy, or recommend — with honest assessments.
Anthropic
Most capable Claude model — complex reasoning, long-context analysis, agentic tasks.
Official docs →Anthropic
Best balance of intelligence and speed for production workloads.
Official docs →Anthropic
Fastest and lowest-cost Claude model for high-volume tasks.
Official docs →Anthropic
AI-native CLI for agentic software engineering — reads, writes, and runs code autonomously.
Official docs →Anthropic
Open protocol connecting AI assistants to external tools, data sources, and services.
Official docs →Anthropic
Build, orchestrate, and deploy multi-agent systems powered by Claude.
Official docs →Mistral AI
Top-tier reasoning model with 128K context — Mistral's flagship for enterprise tasks.
Official docs →Mistral AI
Cost-efficient multimodal model — text and image understanding.
Official docs →Mistral AI
Apache 2.0 multilingual model — EU-sovereign deployments, 128K context.
Official docs →Mistral AI
Code generation specialist — 80+ languages, fill-in-the-middle, 32K context.
Official docs →Mistral AI
Frontier vision-language model — document analysis, chart reading, 128K context.
Official docs →Mistral AI
High-quality text embeddings for RAG and semantic search.
Official docs →Mistral AI
Train and own frontier AI model weights outright — no API rental, full data sovereignty.
Official docs →Mistral AI
Enterprise AI assistant — SSO, audit logs, EU data residency, web search, document upload.
Official docs →Meta
Meta's flagship open-weight model — Apache 2.0, matches GPT-4 on many benchmarks at fraction of cost.
Official docs →Meta
Lightweight Llama models for mobile, edge, and on-device inference.
Official docs →Meta
Vision-language Llama models — image understanding, document analysis.
Official docs →Google's open-weight family — Apache 2.0, strong reasoning, multilingual, edge-to-server range.
Official docs →Microsoft
MIT-licensed reasoning specialist — outperforms models 3× larger on math and coding.
Official docs →Microsoft
Edge-optimised reasoning model — 3.8B parameters, strong instruction following on constrained hardware.
Official docs →Alibaba
Alibaba's Apache 2.0 multilingual family — exceptional Chinese/English, strong math, full size range.
Official docs →Alibaba
State-of-the-art open-source code generation — rivals GPT-4o on coding benchmarks.
Official docs →DeepSeek
MIT-licensed reasoning specialist with chain-of-thought — matches o1 on math and science tasks.
Official docs →DeepSeek
671B MoE open-weight general model — top open-source benchmark scores across all categories.
Official docs →TII UAE
TII's Apache 2.0 family — strong multilingual performance, designed for EU/MENA sovereign deployments.
Official docs →Hugging Face
Ultra-compact models for on-device and browser inference — Apache 2.0, efficiency benchmark.
Official docs →Ollama
One-command local model serving — runs Llama, Mistral, Gemma and 100+ models on any hardware.
Official docs →vLLM Project
High-throughput production LLM serving — PagedAttention, continuous batching, OpenAI-compatible.
Official docs →Hugging Face
Hugging Face's production inference server — tensor parallelism, quantization, streaming.
Official docs →ggerganov
CPU/GPU inference in C++ — GGUF format, runs on Apple Silicon, NVIDIA, AMD, CPU-only.
Official docs →LM Studio
Desktop GUI for discovering, downloading, and running local LLMs — OpenAI-compatible server.
Official docs →Hugging Face
Run Transformers in the browser and Node.js — ONNX-based, no server required.
Official docs →Microsoft
Cross-platform optimised inference — CPU, GPU, mobile, browser, WASM support.
Official docs →BerriAI
Universal LLM API proxy — call 100+ models with OpenAI format, load balancing, fallbacks.
Official docs →Unsloth AI
2× faster fine-tuning, 70% less VRAM — LoRA and QLoRA for Llama, Mistral, Qwen, Gemma.
Official docs →OpenAccess AI Collective
Production fine-tuning framework — YAML config, LoRA/QLoRA/full, multi-GPU, Flash Attention.
Official docs →hiyouga
Fine-tune 100+ LLMs with a web UI or CLI — SFT, DPO, GRPO, LoRA, QLoRA.
Official docs →PyTorch
PyTorch-native fine-tuning library — recipe-based, minimal dependencies, full control.
Official docs →Hugging Face
Parameter-Efficient Fine-Tuning — LoRA, QLoRA, IA³, AdaLoRA, Prefix Tuning.
Official docs →Hugging Face
Transformer Reinforcement Learning — SFT, DPO, GRPO, PPO, ORPO for alignment training.
Official docs →Microsoft
ZeRO optimizer for large model training — 10× throughput, trillion-parameter scale.
Official docs →Hugging Face
One-line multi-GPU and TPU training — no code changes, FSDP and DeepSpeed integration.
Official docs →NVIDIA
NVIDIA's large-scale pre-training framework — tensor/pipeline/sequence parallelism.
Official docs →Hugging Face
900K+ models, 100K+ datasets, and Spaces — the de facto standard for AI artifact sharing.
Official docs →Hugging Face
Core model library — load, run, and fine-tune any model in PyTorch, TensorFlow, or JAX.
Official docs →Hugging Face
100K+ datasets with streaming, arrow-based loading, and one-line preprocessing.
Official docs →Hugging Face
Managed dedicated or serverless model deployment — auto-scaling, private endpoints.
Official docs →Hugging Face
No-code fine-tuning for LLMs and other models — SFT, DPO, classification, NER.
Official docs →Hugging Face
Host Gradio and Streamlit ML demos — free tier available, GPU-enabled options.
Official docs →Hugging Face
Standardised metrics library — BLEU, ROUGE, accuracy, F1, and 100+ custom metrics.
Official docs →Hugging Face
Programmatic Hub access — upload models, create repos, manage tokens, search.
Official docs →LangChain
LLM application framework — chains, agents, RAG, tool use, memory.
Official docs →LlamaIndex
Data framework for LLM apps — ingestion, indexing, querying over any data source.
Official docs →deepset
Production NLP pipeline framework — RAG, document search, question answering.
Official docs →Stanford NLP
Declarative LLM programming — optimise prompts and weights automatically.
Official docs →Jason Liu
Structured output extraction — Pydantic schemas from any LLM, with validation and retries.
Official docs →Microsoft
Enterprise LLM orchestration for .NET, Python, Java — plugins, planners, memory.
Official docs →CrewAI
Role-based multi-agent orchestration — agents collaborate with defined roles and goals.
Official docs →Microsoft
Microsoft's multi-agent conversation framework — async agents, human-in-the-loop.
Official docs →Hugging Face
Minimal agentic framework — code-first agents that write and execute Python, 1000-line core.
Official docs →Qdrant
Rust-based vector search — on-prem friendly, filterable, sparse+dense hybrid search.
Official docs →Weaviate
GraphQL API vector database — multi-tenancy, hybrid search, generative search.
Official docs →Chroma
Local-first open-source vector database — Python-native, zero infrastructure required.
Official docs →Zilliz
Distributed vector search for billion-scale data — HNSW, IVF, GPU acceleration.
Official docs →PostgreSQL
Vector similarity search extension for PostgreSQL — no separate infrastructure needed.
Official docs →Pinecone
Managed cloud vector database — serverless tier, namespaces, metadata filtering.
Official docs →Pollen Robotics
Open-source humanoid robot for research and industry — Apache 2.0, ROS2, Python SDK.
Official docs →Open Robotics
Robot Operating System 2 — real-time communication, sensor fusion, navigation stack.
Official docs →Hugging Face
Open-source robot learning — imitation learning, reinforcement learning, pre-trained policies.
Official docs →NVIDIA
Robot simulation and deployment platform — synthetic data generation, physics simulation.
Official docs →OpenCV
Computer vision library — 2500+ algorithms, real-time image processing, widely deployed.
Official docs →Meta
Segment Anything Model 2 — real-time video and image segmentation, zero-shot.
Official docs →Ultralytics
Real-time object detection — fastest production-grade detector, ONNX/CoreML export.
Official docs →Amazon
Managed foundation model APIs on AWS — Claude, Llama, Mistral, Titan, Stable Diffusion.
Official docs →Microsoft
Microsoft's enterprise AI platform — model catalog, fine-tuning, responsible AI tools.
Official docs →GCP's unified AI/ML platform — Gemini, model garden, AutoML, feature store.
Official docs →Cloudflare
Run AI models at the edge globally — Workers AI, 100+ models, serverless inference.
Official docs →LangChain
LLM observability and tracing — log runs, compare prompts, regression testing.
Official docs →Weights & Biases
ML experiment tracking, visualisation, and hyperparameter sweeps — industry standard.
Official docs →Databricks
ML lifecycle management — experiment tracking, model registry, deployment.
Official docs →CNCF / Grafana Labs
Inference metrics collection and dashboards — latency, throughput, error rates.
Official docs →Arize AI
LLM evaluation and monitoring — hallucination detection, embeddings visualisation, drift.
Official docs →Exploding Gradients
RAG evaluation framework — faithfulness, answer relevancy, context precision metrics.
Official docs →