--- description: "Produce a grounded effort and complexity estimate for a task by exploring the codebase read-only." argument-hint: "" allowed-tools: "Read, Grep, Glob" --- Estimate the effort to do the task in `$ARGUMENTS` by grounding it in the real codebase, not in vibes. This is an analysis pass only — read to understand, then deliver an estimate. Do not edit, run, or create anything. ## Scope Treat `$ARGUMENTS` as the task to size — a feature, a refactor, a migration, a bug ("add SSO via SAML", "split the monolith's billing module into a service", "fix the flaky checkout test"). If it names files, symbols, or routes, those are where your investigation starts. If `$ARGUMENTS` is empty, do not invent a task. Ask one question — *"What task should I estimate?"* — and stop until they answer. > [!WARNING] > Read-only mode. Your only output is the written estimate. An estimate is a range under stated assumptions, never a commitment or a deadline — say so explicitly in the report so nobody quotes the midpoint as a promise. ## Step 1 — Pin down scope and non-scope Restate the task in one sentence. Then list what is explicitly **out of scope** — the adjacent work a reader might assume is included but isn't (migrations of old data, the admin UI, rollback tooling, docs). Unbounded scope is the single biggest source of estimate error; naming the boundary is half the job. If the task hides a fork that changes the size by an order of magnitude (rewrite vs. patch? backward-compatible vs. breaking? one provider vs. a pluggable abstraction?), surface it as an open question — do not silently pick the cheap reading. ## Step 2 — Explore the code to ground the estimate Read enough of the repo to size against reality. Do not guess the structure — find it. ```bash # Find the symbols, routes, or modules the task touches grep -rn "checkoutHandler" src/ # Map the surrounding structure and blast radius find src -path '*billing*' -name '*.ts' # Gauge how much code already exists vs. needs writing grep -rln "PaymentProvider" src/ ``` Open the entry points, trace one level of callers and callees, and note the seams the change crosses: shared types, config, public APIs, tests, and anything with many inbound references. A change behind a clean interface is small; one that ripples through dozens of call sites is not. > [!NOTE] > Existing tests are an effort multiplier in both directions. Good coverage on the touched area shrinks the estimate (you can refactor safely); zero coverage on a critical path inflates it (you must write characterization tests before you dare touch it). Check before you size. ## Step 3 — Decompose into independently-shippable subtasks Cut the work into subtasks that each land on their own and deliver a checkable result — schema, then store, then the wiring, then tests, then docs. Estimating the whole as one lump is where guesses hide; estimating parts forces you to confront each one. Anything you can't decompose is a sign you don't understand it yet — flag it as a spike. ## Step 4 — Size each subtask and sum Give each subtask a T-shirt size with a rough hands-on range (these are illustrative — calibrate to the team and stack): | Size | Rough range | Looks like | | ---- | ----------- | ---------- | | S | < ~0.5 day | localized edit, pattern already exists in the repo | | M | ~0.5–2 days | new code across a couple of files, clear approach | | L | ~2–5 days | new seam or interface, touches several modules | | XL | > ~1 week | unknown approach, cross-cutting, or needs a spike first — decompose further | An XL is a smell: split it until the pieces are L or smaller, or call it a spike whose only deliverable is a smaller estimate. Sum the subtask ranges into a **total range** (low to high), not a single number. > [!WARNING] > Do not just add the optimistic ends. Integration, review, and the inevitable "while I was in there" overhead are real — fold them in as their own line or a percentage, and let the high end of the range carry the unknowns rather than burying them. ## Step 5 — Surface risks, dependencies, and assumptions The estimate is only as good as what could blow it up. List, specifically: - **Risks / unknowns** that would inflate the number — undocumented behavior, a flaky area, a third-party API you haven't read the docs for, a refactor that might cascade. For each, note roughly how much it could add. - **Dependencies / sequencing** — what must land first, what's blocked on another team, what can run in parallel. - **Assumptions** — every "I'm assuming X" you relied on to size it (env exists, no data migration, the happy path only). If an assumption is wrong, the estimate changes — that's the whole point of writing them down. ## Report Deliver the estimate as your message — it is the whole deliverable. ```markdown ## Effort estimate — **Scope:** **Out of scope:** | # | Subtask | Size | Range | | - | ------- | ---- | ----- | | 1 | Define provider config + types | S | < 0.5d | | 2 | Add SAML assertion parser | L | 2–4d | | 3 | Wire into the auth middleware | M | 1–2d | | 4 | Characterization tests for the login path | M | 1–2d | **Total range:** ~4.5–8.5 days (incl. review + integration) **Top risks:** **Dependencies / sequencing:** **Assumptions:** ``` End with the **recommended first slice**: the smallest subtask that retires the biggest unknown (usually a spike or the riskiest interface). Shipping it first tightens the whole range — call out which assumptions it will confirm or kill. Remind the reader the total is a range under these assumptions, not a date. --- _Source: https://agentscamp.com/commands/plan/estimate-effort — Command on AgentsCamp._