Evaluation Guides

A curated collection of 3 evaluation guides for building with AI coding agents.

Guide

Best LLM & RAG Evaluation Tools in 2026: DeepEval vs RAGAS vs LangSmith vs Phoenix vs promptfoo

A decision guide to the LLM eval landscape — code-first frameworks vs. eval-and-observability platforms, open-source vs. hosted, and which fits your stack.

3m read· AgentsCamp

Guide

LLM Evaluation Metrics Explained: Which One to Use and When

A practical map of LLM and RAG evaluation metrics — why BLEU/ROUGE fail open-ended text, how LLM-as-judge and RAG metrics work, and which to pick per task.

6m read· AgentsCamp

Guide

Write Evals for an LLM App: From Zero to a CI Gate

How to evaluate an LLM feature — build a dataset, choose metrics, set a baseline, score offline, add an LLM judge, and gate CI so quality changes are measured.

3m read· AgentsCamp