The Anatomy of a Perfect AI Agent Task
A well-crafted task for an AI coding agent is essentially context engineering — you’re deliberately curating the minimum set of information the agent needs to produce the right output on the first try. Rather than pre-loading everything up front, the best approach combines focused instructions with enough pointers that the agent can pull in additional context just-in-time as it works (Anthropic — Effective Context Engineering ). Below is a breakdown of every element that matters, why it matters, and a full example at the end that ties it all together.
When to Use This
The seven elements below describe the upper-bound shape of a non-trivial task spec, not a baseline checklist. For trivial work — fixing a typo, renaming a variable, anything where the agent has no real risk of getting it wrong — skip the elaborate spec. (The companion sizing post uses “describable in one sentence” as a sizing test, not a triviality test — well-sized tasks often fit in one sentence yet still warrant a full spec when there are constraints, edge cases, or pitfalls to communicate. The worked example below is one such task.) Even for non-trivial tasks, treat these elements as a maximum rather than a minimum: frontier LLMs reliably follow only ~150–200 instructions before performance degrades, and every irrelevant detail dilutes the signal of the rest (HumanLayer: Writing a Good CLAUDE.md ).
1. State the Goal, Not the Steps
Lead with the outcome you want, not a micro-managed sequence of instructions. Agents perform better when they understand the “why” and can plan their own approach.
Bad: “Open user.go, find the CreateUser function, add a field called PhoneNumber…”
Good: “Add phone number support to user registration, including validation, storage, and API response.”
“The best task descriptions share three properties: they state the goal, provide constraints, and define done.” — Claude Directory: Context Engineering for Claude Code
2. Provide Architectural Context the Agent Can’t Infer
The agent can read your code. What it can’t read is the reasoning behind your architectural decisions, team conventions, or the “why” behind structural choices. Include only what’s not derivable from the codebase itself.
Include things like:
- Why the architecture is shaped a certain way (e.g., “We use the repository pattern to keep DB logic out of handlers”)
- Relevant files and entry points (saves the agent from searching blindly and burning context window)
- Technology choices and versions (e.g., “Go 1.22, sqlc for query generation, chi router”)
- Domain-specific terminology the agent might misinterpret
“Claude already knows what your project is after reading a few files. What it needs is information it can’t derive from reading code.” — Claude Directory: Context Engineering
That said, there’s a discipline to this — more context is not always better. Research suggests frontier LLMs can reliably follow roughly 150–200 instructions before performance degrades, and broader context-rot studies show models attend to context less reliably as input grows (Chroma: Context Rot — Hong et al., 2025 ). Every irrelevant detail you add dilutes the signal of the details that actually matter.
“Your CLAUDE.md file should contain as few instructions as possible — ideally only ones which are universally applicable. An LLM will perform better when its context window is full of focused, relevant context compared to when it has a lot of irrelevant context.” — HumanLayer: Writing a Good CLAUDE.md
3. Define Explicit Constraints and Non-Goals
This is where most tasks fall apart. Without boundaries, agents will happily refactor your auth layer while you asked them to add a field to a struct.
- Constraints: What rules must be followed (e.g., “Do not change the public API contract,” “Use the existing
validatepackage, do not introduce a new dependency”) - Non-goals: What is explicitly out of scope (e.g., “Do not modify the frontend,” “Do not refactor existing tests”)
“Without constraints, AI might miss pagination for list APIs, use field injection instead of constructor injection, or not adhere to your project’s package structure.” — JetBrains: Coding Guidelines for AI Agents
4. Provide Concrete Examples and Reference Implementations
One of the highest-leverage things you can do. Point the agent at an existing implementation in your codebase that follows the pattern you want replicated.
- “Follow the same pattern as
internal/order/handler.gofor the new endpoint.” - “See
migrations/003_add_email.sqlfor the migration format we use.”
“Include helpful examples for reference. ❌ ‘Implement tests for class ImageProcessor’ → ✅ ‘Implement tests for class ImageProcessor. Check text_processor.py for test organization examples.’” — Augment Code: Best Practices for AI Coding Agents
5. Define “Done” with Acceptance Criteria
If you don’t define what “done” looks like, the agent will decide for you — and you probably won’t agree.
Acceptance criteria should be:
- Observable (can be verified by running something)
- Specific (not “should work correctly”)
- Testable (ideally map to test cases)
“Create a set of tests that will determine if the generated code works based on your requirements.” — Google Cloud: Five Best Practices for AI Coding Assistants
6. Include Verification Commands
Tell the agent exactly how to confirm its own work. This is the difference between “I think it works” and “it passes the build.”
go test ./internal/user/...go vet ./...golangci-lint runcurl -X POST localhost:8080/api/v1/users -d '{"phone": "+1234567890"}' | jq .
“Claude Code’s best practices emphasize including Bash commands for verification. This gives Claude persistent context it can’t infer from code alone.” — Claude Code Docs: Best Practices
7. Call Out Edge Cases and Known Pitfalls
You know things about your system the agent doesn’t. If there’s a footgun, flag it. If there’s a non-obvious coupling between modules, say so.
- “The
user_idcolumn has a unique constraint — the migration must handle existing duplicates.” - “The
Validate()method is called both at the handler level and inside the repository. Don’t double-validate.”
The Full Example
A non-trivial feature decomposes into a handful of well-sized tasks. Take adding an optional phone number to user registration — accepted on signup, persisted on the user record, and returned by the user API. That feature splits into four tasks, one per architectural layer:
- Migration — Add a nullable
phone_numbercolumn with reversible up/down SQL. - Model + sqlc — Update the
Userstruct and regenerate sqlc queries. - Service + validation — Add
ValidatePhonetoUserServiceusingvalidate.PhoneE164, with unit tests. - Handler + integration — Wire the field through
POSTandGET /api/v1/usersand add integration tests.
The third is spec’d out in full below as the worked example. It’s the strongest illustration of the seven elements at the right scope: the diff fits in one sentence, it stays inside a single layer, the agent reads ~5 files, the change lands well under the 200 LOC ceiling, and it can be verified independently — passing every gate of the companion sizing post’s decision flowchart .
## Task Spec: Add E.164 phone validation to UserService
### Goal
Phone numbers submitted to user registration must be rejected at the service layer when they aren't valid E.164. This task delivers that check; handler wiring and DB persistence are separate tasks.
### Architectural Context
- Semantic validation belongs in the service, not the handler. Handler does null/shape; service owns format and bounds.
- `UserService.ValidateEmail` is the canonical example of this split — match its shape.
### Relevant Files
- `internal/user/service.go` — add `ValidatePhone` here.
- `internal/user/service_test.go` — add tests here.
- `internal/pkg/validate/phone.go` — read-only reference for `PhoneE164` and `validate.Error`.
### Reference Implementation
Mirror `UserService.ValidateEmail` in `service.go`:
- Signature: `func (s *UserService) ValidatePhone(phone *string) error`.
- Nil pointer → return nil. Empty string → return error.
- Return the `*validate.Error` from `PhoneE164` unwrapped — no `fmt.Errorf`.
- Copy the table-driven layout from `TestUserService_ValidateEmail`.
### Constraints
- Use `validate.PhoneE164`. No regex, no new dependencies.
- Don't touch `UserRepository` or its mock — validation is pure.
- Don't wrap the error; the handler relies on `errors.As(&validate.Error{})` to map it to HTTP 422.
### Non-Goals
No handler, migration, sqlc, or integration-test changes. No edits to `ValidateEmail` or other unrelated methods.
### Edge Cases
- `phone == nil` → return nil (field not provided).
- `*phone == ""` → return `validate.Error` (malformed input).
- Strict E.164: `1234567890` (no leading `+`) must fail.
- The handler already checks the JSON field is present and is a string — don't re-check those concerns here.
### Acceptance Criteria
1. `ValidatePhone(phone *string) error` on `UserService`.
2. `nil` phone → returns nil.
3. Empty or non-E.164 → returns `*validate.Error` (verifiable via `errors.As`).
4. Valid E.164 (e.g., `+14155552671`) → returns nil.
5. At least four test cases: valid, invalid, nil, empty.
6. Only `service.go` and `service_test.go` change.
### Verification
go test ./internal/user/... -v -run TestValidatePhone
go vet ./...
golangci-lint run ./internal/user/...
Why This Works
| Element | Purpose |
|---|---|
| Goal | Anchors the agent on what and why, not how |
| Architectural context | Provides knowledge the agent can’t infer from code |
| Relevant files | Eliminates unnecessary exploration and context burn |
| Reference implementation | “Do it like this” is worth 1,000 words of description |
| Constraints + non-goals | Prevents scope creep and unsolicited refactors |
| Edge cases | Surfaces domain knowledge only you have |
| Acceptance criteria | Defines “done” in observable, testable terms |
| Verification commands | Lets the agent self-check before declaring victory |
References
- Anthropic — Effective Context Engineering for AI Agents — Why just-in-time context retrieval and focused instructions outperform pre-loading everything into the prompt.
- Claude Code Docs — Best Practices — Including verification commands and CLAUDE.md conventions so the agent can self-check its work.
- Claude Directory — Context Engineering for Claude Code — The task trifecta: state the goal, provide constraints, define done.
- Augment Code — Best Practices for Using AI Coding Agents — Pointing agents at reference implementations and reviewing changes after each sub-task.
- JetBrains — Coding Guidelines for Your AI Agents — How missing constraints lead agents to skip pagination, misuse injection patterns, and ignore project conventions.
- Google Cloud — Five Best Practices for AI Coding Assistants — Planning-first workflow and using tests as acceptance criteria for generated code.
- HumanLayer — Writing a Good CLAUDE.md — Why fewer, focused instructions outperform instruction overload, and the ~150–200 instruction ceiling for frontier models.
- Chroma — Context Rot (Hong et al., 2025) — Empirical study across 18 LLMs showing that attention to context degrades non-uniformly as input length grows.