Evaluation Tools
AI coding tools in the evaluation category — 4 curated for building with AI coding agents.
Tool
Braintrust
An end-to-end platform for evaluating, iterating on, and observing LLM apps, with a prompt playground.
freemiumevaluation
Tool
DeepEval
An open-source evaluation framework for LLM apps — 'Pytest for LLMs' with ready-made metrics and CI integration.
open sourceevaluation
Tool
promptfoo
An open-source CLI for testing, comparing, and red-teaming LLM prompts, models, and apps.
open sourceevaluation
Tool
RAGAS
An open-source framework for evaluating retrieval-augmented generation with reference-free RAG metrics.
open sourceevaluation