# Constitutional AI

> Constitutional AI trains models against written principles — the model critiques and revises its own outputs by them, reducing reliance on human labels.

**Constitutional AI (CAI) is Anthropic's alignment technique: instead of relying purely on human raters, the model is trained against an explicit written constitution — critiquing and revising its own outputs by those principles, then optimized with AI feedback on which responses follow them best.**

It answered two problems in classic [RLHF](/glossary/rlhf) at once. **Scale**: human preference labels are expensive and inconsistent; CAI substitutes AI-generated feedback (RLAIF) guided by principles, multiplying alignment data cheaply. **Transparency**: RLHF encodes values implicitly in rater behavior; a constitution states them as text anyone can read — principles drawing on sources from the UN Declaration of Human Rights to practical harmlessness criteria — making "what is this model aligned to?" an answerable question. The technique shaped Claude's character and influenced industry-wide adoption of AI-feedback methods.

For builders, CAI matters as background and as pattern: background, because it explains behavioral texture in the models you use; pattern, because *principles-as-explicit-text* recurs at the application layer — rules engines like NeMo Guardrails and policy-based [guardrails](/glossary/guardrails) are the same move at runtime, and writing your app's "constitution" (what it must never do, stated plainly) is the first step of every serious safety review.

---

_Source: https://agentscamp.com/glossary/constitutional-ai — Term on AgentsCamp._
