AI Engineering Resources
Resources for building AI products in production.
From prompting to agents, LLMOps to evaluation. Curated from the AI engineering community.
Prompting
Learn Prompting
docs page
The Prompt Report: A Systematic Survey of Prompting Techniqu…
paper on arxiv , summary in tweets
How to prompt o1
(o1 isn’t a chat model – and that’s the point), blog post , by Ben Hylak
Effective Context Engineering for AI Agents
blog post , by Anthropic
Agents
Building Effective Agents
blog post , by Anthropic
Hugging Face Agents Course
course , by Hugging Face
How We Built Ellipsis
(or: Lessons from 27 months building LLM coding agents), blog post , by Nick Bradford
Don’t Build Multi-Agents
blog post , by Walden Yan
We Built a Multi-Agent Research System
blog post , by Anthropic
LLMOps
What We Learned from a Year of Building with LLMs
by Eugene Yan, Bryan Bischof, Charles Frye, Hamel Husain, Jason Liu and Shreya Shankar , Part 1 , Part 2 , Part 3
Traceability and Observability in Multi-Step LLM Systems
webinar by Marc Klingen
Data Flywheels for LLM Applications
blog post , by Shreya Shankar
Latency optimization
cookbook , by OpenAI
The OSS LLMOps Stack
page by LiteLLM and Langfuse
Evaluation
Evaluating the Effectiveness of LLM-Evaluators (aka LLM-as-J…
blog post , by Eugene Yan
AI Agent Observability & Evaluation
course , by Hugging Face
Frequently Asked Questions (And Answers) About AI Evals
blog post , by Hamel Husain
Your AI Product Needs Evals
blog post , by Hamel Husain
Creating an LLM-as-a-Judge That Drives Business Results
blog post , by Hamel Husain
Voice AI
Voice AI & Voice Agents - An Illustrated Primer
book , by Kwindla Hultman Kramer
Evaluating Voice AI Agents
blog post and video , by Marc Klingen and Brooke Hopkins
Voice AI Evals
repo and tweet , by Kwindla Hultman Kramer
LLM 101
Intro to Large Language Models
talk by Andrej Karpathy
How I use LLMs
talk by Andrej Karpathy
News
AI News
newsletter , daily roundup of top AI discussions from Discord, Reddit, and X/Twitter
Last Week in AI
podcast , weekly summary AI news and research
Latent Space
podcast , deep dives and interview episodes
Stratechery
newsletter/podcast , tech/business strategy deep dives and news, many episodes related to AI/Labs, e.g. DeepSeek FAQ , D…
Libraries & Tools
DSPy
Programming - not prompting - Foundation Models. The framework for optimizing LM prompts and weights.
TensorZero
The energetic gateway for LLMs. Rust-based, high-performance router and governance layer.
vLLM
High-throughput and memory-efficient LLM serving engine. The gold standard for production inference.
Unsloth
2-5x faster fine-tuning of LLMs with 50-70% less memory usage. Backprop magic.
SGLang
Structured Generation Language for LLMs. Fast serving with RadixAttention.
Axolotl
Go-to framework for streamlining LLM fine-tuning via configuration files.
LangGraph
Build stateful, multi-actor applications with LLMs. Graph-based control flow.
PydanticAI
Agent framework powered by Pydantic. Type-safe, production-ready, no nonsense.
Letta
State management for LLMs. Enabling persistent memory and long-running threads.
Langfuse
Open source LLM engineering platform. Tracing, evaluations, and prompt management.
Arize Phoenix
AI observability & evaluation. Trace, debug, and evaluate LLM applications.
LiteLLM
Consistent API for 100+ LLMs. The proxy server for model routing and cost tracking.
TextGrad
Automatic 'differentiation' via text. Optimizing prompts and systems using gradients.
Outlines
Structured text generation. Guarantees output matches a regex or JSON schema.
Llama.cpp
Inference of LLMs globally in pure C/C++. The engine running everywhere.
Infinity
High-throughput embedding and reranking server. Built for scale.
OpenLit
OpenTelemetry-native observability for GenAI. Vendor-neutral tracing.
Mem0
The memory layer for personalized AI. Long-term user state management.
UV
An extremely fast Python package installer and resolver. The future of Python dev tooling.
FastHTML
Modern web applications in pure Python. Hypermedia-driven systems for AI UIs.