Hello.
A sandbox for agent skills that produce reviewable artifacts.
When an agent finishes a coding task, the human review usually falls back to either reading the diff or rubber-stamping it because the diff is too large to read. Neither preserves real human judgment in the loop.
The skills iterated on in this repo replace that surface with per-change interactive artifacts — narrative walkthroughs of what changed and why, with live before/after data, written to be legible to PMs and QA, not just engineers reading code.
View types an artifact can compose
A single feature diff can be broken into pieces shaped for different readers and different questions. Below: the view types these skills currently produce, each with a tiny example.
A one-sentence summary up top, before any code. The reader who only opens the page for ten seconds still leaves with the gist.
Group the inbox by sender
Conversations from the same sender now collapse into one row. Click a row to expand. Default state matches old behavior.
Mark the words a reviewer should not skim past. Used sparingly — highlighting everything is highlighting nothing.
The change adds a single field to the request payload. Existing clients keep working because the new field is optional and defaulted to null. The only behavior shift happens when the field is present and non-empty.
Progressive disclosure. The reviewer opens only the steps that apply to them — PM might read step 1, an engineer skips to step 3.
A small diff with the prose annotation right next to it — not above, not in a separate file.
function handle(req, res) {- const id = req.params.id;+ const id = parseId(req.params.id); return load(id); }
parseId throws on malformed input. The old code would have 500’d on the database call instead, with a less useful stack.
Side-by-side data shape. A schema change becomes visible at a glance without reading the migration.
{
"id": "abc123",
"title": "Demo task"
}{
"id": "abc123",
"title": "Demo task",
"status": "open"
}Same change, two layers of explanation. PM and engineer read the same artifact and find their own depth.
The list now loads about three times faster on big accounts. No visual change — same rows, same order.
Replaced N+1 fetch with a join + group-by; added covering index on (account_id, created_at). p95 600ms → 180ms.
Chunks reveal one at a time, paced by the reviewer. A side rail shows where you are without spoiling what’s next.
2. Where the new check lives
The validator now runs once at the request boundary instead of inside each handler. Handlers stay shorter and a forgotten call site can’t bypass it.
A short, specific punch list. Reviewer-facing, not engineer-facing — phrased as things to do, not things to know.
- Send a request that omits the new field — confirm it still works.
- Look for callers in older modules that might still hit the old path.
- If load looks the same after deploy, check the cache hit rate didn’t drop.
The engineer-tier ‘what was actually edited’ table. Optional reading, last in the artifact, for the curious.
| File | What happened |
|---|---|
| src/api/router.ts | New route registered |
| src/lib/auth.ts | Validator factored out of two callers |
| tests/auth.test.ts | Three new cases for the boundary check |
The reviewer’s exit point. Same affordance as a code-review tool, attached to a narrative artifact instead of a diff.
The other skill is forward-looking — a planning walk before the code is written. Different palette: paper, pen, mentor voice.
Before you reach for state, ask: what’s the smallest piece of this that has to be remembered between requests?
Sketch this on paper before continuing:
Draw the boxes that hold state. Draw an arrow into each box showing what writes to it. If a box has more than one arrow in, circle it.
request ──▶ [ validator ] ──▶ [ handler ] ──▶ store
│
└── on fail ──▶ 4xx (no store write)