NeMo Guardrails

NeMo Guardrails is NVIDIA's open-source toolkit (Apache-2.0) for adding programmable guardrails to LLM apps. You define rails — input, dialog, retrieval, and output — in the Colang modeling language to detect jailbreaks and injection, keep conversations on allowed topics, filter retrieved context, and moderate or fact-check responses.

NeMo Guardrails is an open-source toolkit from NVIDIA for adding programmable guardrails to LLM-based applications. Instead of trusting a system prompt to keep a model on-topic and safe, you define explicit rails — rules that run at specific points in the request/response flow — to constrain what the model sees, says, and does. It's a structured way to build the input/output validation layer that prompt-injection and safety defenses depend on.

It is aimed at developers who want enforceable boundaries around an LLM app: keeping a conversation on allowed topics, detecting jailbreak/injection attempts, moderating output, and checking responses against policy or for hallucination. Rails are authored in Colang, NeMo's modeling language for conversational guardrails, and configured alongside your app.

Highlights

Multiple rail types — input rails (e.g. jailbreak/injection detection), dialog rails (keep the conversation in bounds), retrieval rails (filter retrieved context), and output rails (moderation, fact-checking, policy).
Colang — a purpose-built language for expressing conversational flows and guardrail logic declaratively.
Composable checks — combine built-in and custom checks, and integrate third-party safety models/scanners as rails.
Framework-friendly — works alongside common LLM app stacks and providers as a wrapping safety layer.

In an AI-assisted workflow

Wrap your app with a guardrails config that defines the rails, then route calls through it:

from nemoguardrails import RailsConfig, LLMRails
 
config = RailsConfig.from_path("./config")   # rails + Colang flows live here
rails = LLMRails(config)
response = rails.generate(messages=[{"role": "user", "content": user_input}])
# input/dialog/output rails run around the model call

TIP

Guardrails are defense in depth, not prevention — pair NeMo Guardrails with least privilege and human approval for high-impact actions (see Defending Against Prompt Injection). Design which rails you actually need with the llm-guardrails-designer skill.

Good to know

NeMo Guardrails is free and open source under Apache-2.0 and runs as a Python layer around your LLM app. It's strongest on programmable conversational rails; for a ready-made library of input/output scanners (PII, secrets, prompt injection, toxicity), LLM Guard is complementary — many teams use both.

Frequently asked questions

What is NeMo Guardrails?

NeMo Guardrails is an open-source toolkit from NVIDIA for adding programmable guardrails to LLM-based applications. Instead of trusting a system prompt, you define explicit rails that run at specific points in the request/response flow — input rails for jailbreak/injection detection, dialog rails for staying on topic, retrieval rails for filtering context, and output rails for moderation and policy checks.

Is NeMo Guardrails free?

Yes — free and open source under Apache-2.0. It runs as a Python layer that wraps your LLM app, configured alongside it.

NeMo Guardrails vs LLM Guard?

Complementary rather than competing. NeMo Guardrails is strongest on programmable conversational rails authored in Colang — dialog control and flow logic — while LLM Guard is a ready-made library of input/output scanners for PII, secrets, prompt injection, and toxicity. Many teams use both.

Highlights

In an AI-assisted workflow

Good to know

Frequently asked questions

Related