AI Engineering Resources

Tools for AI builders

Resources for building AI products in production.

From prompting to agents, LLMOps to evaluation. Curated from the AI engineering community.

Prompting

Learn Prompting

learnprompting.org

docs page

The Prompt Report: A Systematic Survey of Prompting Techniqu…

arxiv.org

paper on arxiv , summary in tweets

How to prompt o1

latent.space

(o1 isn’t a chat model – and that’s the point), blog post , by Ben Hylak

Effective Context Engineering for AI Agents

anthropic.com

blog post , by Anthropic

Agents

Building Effective Agents

anthropic.com

blog post , by Anthropic

Hugging Face Agents Course

huggingface.co

course , by Hugging Face

How We Built Ellipsis

ellipsis.dev

(or: Lessons from 27 months building LLM coding agents), blog post , by Nick Bradford

Don’t Build Multi-Agents

cognition.ai

blog post , by Walden Yan

We Built a Multi-Agent Research System

anthropic.com

blog post , by Anthropic

LLMOps

What We Learned from a Year of Building with LLMs

oreilly.com

by Eugene Yan, Bryan Bischof, Charles Frye, Hamel Husain, Jason Liu and Shreya Shankar , Part 1 , Part 2 , Part 3

Traceability and Observability in Multi-Step LLM Systems

langfuse.com

webinar by Marc Klingen

Data Flywheels for LLM Applications

sh-reya.com

blog post , by Shreya Shankar

Latency optimization

platform.openai.com

cookbook , by OpenAI

The OSS LLMOps Stack

oss-llmops-stack.com

page by LiteLLM and Langfuse

Evaluation

Evaluating the Effectiveness of LLM-Evaluators (aka LLM-as-J…

eugeneyan.com

blog post , by Eugene Yan

AI Agent Observability & Evaluation

huggingface.co

course , by Hugging Face

Frequently Asked Questions (And Answers) About AI Evals

hamel.dev

blog post , by Hamel Husain

Your AI Product Needs Evals

hamel.dev

blog post , by Hamel Husain

Creating an LLM-as-a-Judge That Drives Business Results

hamel.dev

blog post , by Hamel Husain

Voice AI

Voice AI & Voice Agents - An Illustrated Primer

voiceaiandvoiceagents.com

book , by Kwindla Hultman Kramer

Evaluating Voice AI Agents

langfuse.com

blog post and video , by Marc Klingen and Brooke Hopkins

Voice AI Evals

github.com

repo and tweet , by Kwindla Hultman Kramer

LLM 101

Intro to Large Language Models

youtube.com

talk by Andrej Karpathy

How I use LLMs

youtube.com

talk by Andrej Karpathy

News

AI News

buttondown.com

newsletter , daily roundup of top AI discussions from Discord, Reddit, and X/Twitter

Last Week in AI

lastweekinai.com

podcast , weekly summary AI news and research

Latent Space

latent.space

podcast , deep dives and interview episodes

Stratechery

stratechery.com

newsletter/podcast , tech/business strategy deep dives and news, many episodes related to AI/Labs, e.g. DeepSeek FAQ , D…

Libraries & Tools

DSPy

dspy.ai

Programming - not prompting - Foundation Models. The framework for optimizing LM prompts and weights.

TensorZero

tensorzero.com

The energetic gateway for LLMs. Rust-based, high-performance router and governance layer.

vLLM

vllm.ai

High-throughput and memory-efficient LLM serving engine. The gold standard for production inference.

Unsloth

unsloth.ai

2-5x faster fine-tuning of LLMs with 50-70% less memory usage. Backprop magic.

SGLang

github.com

Structured Generation Language for LLMs. Fast serving with RadixAttention.

Axolotl

github.com

Go-to framework for streamlining LLM fine-tuning via configuration files.

LangGraph

langchain-ai.github.io

Build stateful, multi-actor applications with LLMs. Graph-based control flow.

PydanticAI

ai.pydantic.dev

Agent framework powered by Pydantic. Type-safe, production-ready, no nonsense.

Letta

letta.com

State management for LLMs. Enabling persistent memory and long-running threads.

Langfuse

langfuse.com

Open source LLM engineering platform. Tracing, evaluations, and prompt management.

Arize Phoenix

phoenix.arize.com

AI observability & evaluation. Trace, debug, and evaluate LLM applications.

LiteLLM

litellm.ai

Consistent API for 100+ LLMs. The proxy server for model routing and cost tracking.

TextGrad

github.com

Automatic 'differentiation' via text. Optimizing prompts and systems using gradients.

Outlines

github.com

Structured text generation. Guarantees output matches a regex or JSON schema.

Llama.cpp

github.com

Inference of LLMs globally in pure C/C++. The engine running everywhere.

Infinity

github.com

High-throughput embedding and reranking server. Built for scale.

OpenLit

github.com

OpenTelemetry-native observability for GenAI. Vendor-neutral tracing.

Mem0

mem0.dev

The memory layer for personalized AI. Long-term user state management.

UV

astral.sh

An extremely fast Python package installer and resolver. The future of Python dev tooling.

FastHTML

fastht.ml

Modern web applications in pure Python. Hypermedia-driven systems for AI UIs.