AI Safety & Security — AI Agents, Skills & Tools

Agents, skills, guides, tools, and commands for ai safety & security — 17 curated resources for building with AI coding agents.

Agent

Prompt Injection Auditor

Use this agent to audit an LLM app or agent for prompt-injection exposure — mapping where untrusted content enters the model's context (user, RAG, tools, web), assessing the blast radius if an injection succeeds, probing with adversarial inputs, and recommending architectural mitigations. Examples — "audit our RAG agent for indirect prompt injection", "what's the blast radius if our agent gets injected — which tools and credentials are exposed?", "review our LLM app's trust boundaries and tell us what to fix".

sonnet4

Skill

LLM Guardrails Designer

Design input and output guardrails for an LLM app — decide what to check (injection patterns, PII, secrets, policy, schema, leakage, toxicity), place them as input vs. output rails, implement with a library like NeMo Guardrails or LLM Guard, and fail closed. Use when adding a safety/validation layer around an LLM, not relying on the prompt alone.

invocablev1.0.0

Skill

Prompt Pii Redactor

Detect and redact PII and secrets from prompts (and logs/traces) before they reach an LLM provider — mask or tokenize emails, phone numbers, names, IDs, and API keys, reversibly where the response needs the real values back. Use when sending user or document data to a third-party model, or when LLM request logs may capture sensitive data.

invocablev1.0.0

Guide

Sandboxing AI-Generated Code: E2B vs Modal vs Daytona vs Vercel Sandbox

Where should agent-written code run? E2B vs Modal vs Daytona vs Vercel Sandbox compared on isolation, persistence, and cost, plus rules for safe execution.

2m read· AgentsCamp

Guide

Are Claude Skills Safe? A Security Review Checklist

Skills are an instruction supply chain: what can go wrong with third-party SKILL.md files, and the review checklist before installing or distributing one.

4m read· AgentsCamp

Guide

Data Privacy for LLM Apps: Stop Leaking Sensitive Data

Where LLM apps leak PII and secrets — prompts, logs, traces, vector stores, providers — and the controls (redaction, ZDR, tenant isolation) that stop it.

6m read· AgentsCamp

Guide

Defending Against Prompt Injection: A Practical Guide for LLM Apps

Prompt injection can't be solved at the model layer — so you defend in depth: trust boundaries, least privilege, human approval, guardrails, and red-teaming.

5m read· AgentsCamp

Guide

Securing AI Agents: The OWASP Agentic Top 10 in Practice

Agents add risks LLM-app security misses — autonomy, tools, memory, multi-agent trust. The key OWASP agentic threats and how to mitigate each in practice.

4m read· AgentsCamp

Tool

LLM Guard

An open-source security toolkit of input and output scanners for LLM apps — prompt injection, PII/anonymize, secrets, toxicity, and more, from Protect AI.

open sourcesdk

Tool

NeMo Guardrails

NVIDIA's open-source toolkit for adding programmable guardrails to LLM apps — input, dialog, retrieval, and output rails defined in the Colang language.

open sourcesdk

Tool

promptfoo

An open-source CLI for testing, comparing, and red-teaming LLM prompts, models, and apps.

open sourceevaluation

Command

Red Team LLM

Red-team an LLM app or agent for prompt injection, jailbreaks, and data leakage — probe the real attack surface (input, RAG, tools, system prompt) with adversarial inputs and report what got through and how to fix it.

/red-team-llm<the app/endpoint/agent to test, or a description of its inputs, tools, and data>

Term

AI Safety & Security — AI Agents, Skills & Tools

Prompt Injection Auditor

LLM Guardrails Designer

Prompt Pii Redactor

Sandboxing AI-Generated Code: E2B vs Modal vs Daytona vs Vercel Sandbox

Are Claude Skills Safe? A Security Review Checklist

Data Privacy for LLM Apps: Stop Leaking Sensitive Data

Defending Against Prompt Injection: A Practical Guide for LLM Apps

Securing AI Agents: The OWASP Agentic Top 10 in Practice

LLM Guard

NeMo Guardrails

promptfoo

Red Team LLM

Constitutional AI

Guardrails

Jailbreak

Prompt Injection

Red-Teaming (AI)