Evaluation Tools

AI coding tools in the evaluation category — 4 curated for building with AI coding agents.

Tool

An end-to-end platform for evaluating, iterating on, and observing LLM apps, with a prompt playground.

freemiumevaluation

Tool

An open-source evaluation framework for LLM apps — 'Pytest for LLMs' with ready-made metrics and CI integration.

open sourceevaluation

Tool

An open-source CLI for testing, comparing, and red-teaming LLM prompts, models, and apps.

open sourceevaluation

Tool

An open-source framework for evaluating retrieval-augmented generation with reference-free RAG metrics.

open sourceevaluation