Provider Fallback Wrapper
Wrap LLM calls so a provider outage, rate limit, or timeout degrades gracefully — with multi-provider fallback, bounded retries with backoff, and timeouts. Use when an app depends on a single model/provider and needs production resilience.
Install to ~/.claude/skills/provider-fallback-wrapper/SKILL.md
A single-provider LLM call is a single point of failure. This skill wraps calls with timeouts, bounded retries with backoff (retrying only retryable errors), and fallback to an alternate model/provider — so a rate limit or outage degrades gracefully instead of taking the feature down.
LLM providers have outages, rate limits, and latency spikes. If your feature calls one model directly, every one of those is an incident. This skill wraps LLM calls with the resilience patterns that keep the feature up: timeouts, sensible retries, and fallback to an alternate model or provider.
When to use this skill
- A production feature depends on a single model/provider and needs to survive outages and rate limits.
- You're seeing user-facing failures from transient
429/5xx/timeout errors. - You want a cheaper/faster primary model with a stronger fallback (or vice versa).
Instructions
- Set a timeout. Every call gets a deadline. A hung provider should fail fast into retry/fallback, not block the request indefinitely.
- Retry only what's retryable. Retry transient failures — timeouts, rate limits (
429), and5xx— with exponential backoff and jitter and a hard attempt cap. Do not retry non-retryable errors (400bad request,401auth, content-policy refusals); retrying those just wastes time and money. - Fall back across providers/models. On exhausting retries (or on specific errors), route to an alternate model or provider. Decide the order by cost/quality and keep the request/response shape stable so callers don't care which served it. A gateway like LiteLLM or OpenRouter can do fallback for you; otherwise implement it explicitly.
- Mind semantic differences. Fallback models may differ in format adherence and quality — re-apply structured-output validation after fallback, and don't silently downgrade a critical response without noting it.
- Make it observable. Log which provider served each request, retry counts, and fallback events, and emit metrics so you can see when you're leaning on the fallback (a signal the primary is degraded).
- Guard cost. Fallbacks and retries cost tokens; cap attempts and consider a circuit breaker that stops hammering a provider that's clearly down.
WARNING
Don't retry non-idempotent, side-effecting calls blindly — for tool-executing agents, a naive retry can repeat an action. Retry the model call, but make any side effects idempotent (see the agent tool-calling guidance).
NOTE
Fallback adds resilience, not correctness. A degraded fallback model can still produce worse output — validate it, and surface when you're running on the backup.
Output
A wrapper around the app's LLM calls implementing timeouts, retryable-only backoff retries, multi-provider/model fallback, validation after fallback, and logging/metrics — with attempt and cost caps.
Related
- Calling Any Model: Unified LLM Gateways & SDKs in 2026Why teams put a unified layer in front of LLM providers — and how LiteLLM, OpenRouter, and the Vercel AI SDK compare for fallback and cost control.
- LiteLLMCall 100+ LLM APIs with one OpenAI-format interface — as a Python library or a self-hosted gateway/proxy.
- OpenRouterA hosted unified API to hundreds of models from many providers, with one key, one bill, and automatic fallbacks.
- LLM Integration EngineerUse this agent to add an LLM feature to an application and make it production-grade — typed/structured output, streaming, provider fallback and retries, caching, and cost/latency controls. Examples — "add an AI summary endpoint to our app", "our LLM calls return unparseable JSON and break, make them reliable", "add streaming and a fallback provider to our chat feature".