Evaluation Guides
A curated collection of 2 evaluation guides for building with AI coding agents.
Guide
Best LLM & RAG Evaluation Tools in 2026: DeepEval vs RAGAS vs LangSmith vs Phoenix vs promptfoo
A decision guide to the LLM eval landscape — code-first frameworks vs. eval-and-observability platforms, open-source vs. hosted, and which fits your stack.
3m read· AgentsCamp
Guide
Write Evals for an LLM App: From Zero to a CI Gate
How to evaluate an LLM feature — build a dataset, choose metrics, set a baseline, score offline, add an LLM judge, and gate CI so quality changes are measured.
3m read· AgentsCamp