DevOps & Infra — AI Agents, Skills & Tools
Agents, skills, guides, tools, and commands for devops & infra — 15 curated resources for building with AI coding agents.
LLM Cost Optimizer
Use this agent to cut the cost and latency of an application's LLM API usage without losing quality — audit where the tokens and dollars go, then apply caching, model right-sizing, prompt trimming, batching, and budgets, proven against an eval bar. Examples — "our OpenAI bill tripled, find where the spend is and cut it", "this endpoint's p95 is 8s, bring it down", "right-size models per task and add prompt caching to our chat feature".
Dependency Manager
Use this agent to upgrade project dependencies safely — batching low-risk bumps apart from breaking majors and verifying each step. Examples — clearing months of stale packages, taking a single major version with migration notes, resolving a peer-dependency conflict.
Cloud Architect
Use this agent to design a cloud architecture on AWS, GCP, or Azure — compute, networking, data stores, IAM, and cost trade-offs. Examples — choosing serverless vs containers for a new service, designing a multi-account network boundary, picking a database and estimating its monthly cost.
DevOps Engineer
Use this agent for CI/CD, infrastructure, and automation. Examples — writing a CI pipeline, containerizing an app, infrastructure-as-code changes.
Kubernetes Specialist
Use this agent for Kubernetes — manifests, Helm, troubleshooting, scaling, and resource tuning. Examples — debugging a CrashLoopBackOff, writing a Deployment, tuning requests/limits.
SRE Engineer
Use this agent to make reliability measurable: SLIs/SLOs and error budgets, observability, symptom-based alerting, incident response, and capacity. Examples — defining an SLO for a checkout API, fixing a noisy pager, writing a blameless postmortem.
Terraform Specialist
Use this agent for Terraform and infrastructure-as-code — module design, remote state, plan/apply safety, drift, and provider pinning. Examples — reviewing a plan for destroys before apply, designing a reusable module, resolving state drift after a console change.
Prompt Cache Optimizer
Restructure an LLM call to maximize prompt-cache hit rate and add response/semantic caching — move the stable prefix (system prompt, instructions, few-shot, context) to the front and variable input to the end, set cache breakpoints, and measure the hit rate and savings. Use when repeated calls share large common context and token cost or latency is too high.
Dependency Audit
Audit project dependencies for known vulnerabilities and turn the raw scanner output into a triaged, prioritized upgrade plan. Use when an audit is noisy, a CVE was reported, or you need to know which advisories actually matter.
LLM Cost and Latency Engineering: Caching, Right-Sizing, and p95 Budgets
A practical playbook for cutting LLM cost and tail latency — caching, model right-sizing, prompt trimming, and enforced p95 budgets — without losing quality.
LLM Gateways Compared: Portkey vs Helicone vs LiteLLM for Caching & Cost Control
How Portkey, Helicone, and LiteLLM compare for caching, cost control, and observability — each one's 2026 status and which fits self-hosted vs. hosted.
Helicone
Open-source LLM observability and AI gateway with one-line integration — logging, tracing, caching, and cost/latency tracking across providers.
LiteLLM
Call 100+ LLM APIs with one OpenAI-format interface — as a Python library or a self-hosted gateway/proxy.
Portkey
An AI gateway and LLMOps platform: route to many LLMs through one API with caching, retries, fallbacks, load balancing, guardrails, and full observability.
Set Perf Budget
Define and enforce a cost and latency budget for an LLM feature or endpoint — set p95/p99 latency and cost-per-request ceilings, instrument to measure them against real traffic, and wire a check that fails when the budget is breached.