How We Build Rye With Claude Code: Engineering with AI Agents
Rye
9 minutes read
Three loops we've landed on — building, investigating, and operating — and the skills we wrote to make them repeatable.
TL;DR / Key Takeaways
We build agentic commerce infrastructure. We also build it with agents. Claude Code is embedded in our daily engineering workflow across three patterns: building (shipping fixes and features), investigating (debugging messy, cross-system failures), and operating (the housekeeping that quietly eats engineering hours).
The highest-leverage move we've found is noticing a workflow we keep running by hand and encoding it as a reusable Claude Code skill. If the loop has a clear "did this actually work?" step, it's a skill candidate.
Our hero example is
/fix-product-data-extraction— one command that pulls a failed scrape from storage, diagnoses the bug, spins up a worktree, implements the fix, verifies end-to-end, writes a regression test, and opens the PR.Claude is cheap to point at messy, unstructured data a human would find tedious — walls of HTML, logs scattered across GCP, AWS, and Grafana. It doesn't get bored.
We treat Claude like a junior engineer who needs goals and firm supervision, not a consultant we defer to. That posture matters more than any individual prompt.
Rye operates at Layer 5 of the agentic commerce stack — checkout execution — and Layer 6 — product data. Both layers live downstream of the open web, which means our production problems look like: a retailer pushed a site update at 2 AM and our extraction broke, or one proxy region is suddenly getting throttled, or a checkout that passed tests yesterday now fails silently.
The work is messy, real, and production-critical. It's exactly the kind of work where "AI can automate that" sounds nice and usually doesn't survive contact with the codebase. So we started small, built trust incrementally, and paid attention to what actually worked.
1. Building — Encoding repeatable loops as skills
The highest-return pattern we've found is simple: notice a workflow you keep running by hand, then write a Claude Code skill that runs it for you.
Our hero example is /fix-product-data-extraction.
Rye's product data pipeline extracts structured product information from arbitrary merchant storefronts. When an extraction fails on a fixable error — not a hard bot-detection block — we persist the offending page to our internal capture store. We built that capture pipeline long before we had agents in the loop. It turned out to be exactly the substrate a skill needs.
Here's the workflow we were running by hand:
Check our capture store for recent fixable failures.
Pull the captured HTML for one of them.
Run the product extraction logic against it locally and capture the error.
Either declare the page unfixable (no product data in the markup) or diagnose the bug in our code.
Spin up a git worktree and implement the fix.
Verify the extracted product actually matches the requested URL.
If the bug is novel, write a regression test.
Open a PR with a summary of the bug, the fix, and evidence of the end-to-end run working.
We ran this loop often enough that the friction got obvious. So we turned it into a skill. Now one command — /fix-product-data-extraction — does the whole thing. Claude pulls the HTML, runs the extraction, diagnoses the bug, spins up the worktree, implements the fix, verifies, writes the regression test, and opens the PR. We review. We ship.
The step that makes this skill trustworthy is step 6: verification against the requested URL. It's easy to miss why this matters. Merchant product pages routinely contain related items, recommendations, and "customers also bought" blocks — multiple product structures on the same page. A "fixed" extractor can silently start returning the wrong product while your tests still pass. Without the verification step, we'd never know. With it, the skill has a built-in "did this actually work?" gate.
That's the pattern. Every skill we trust in production has a verification step baked in.
Another example in the same bucket: we recently shipped a batch of improvements to our proxy selection logic. Exploring the option space by hand would have taken five times longer. Claude ran the sweeps, surfaced the winning configurations, and we shipped the fix in an afternoon.
The rule: if a debugging loop has clear verification criteria, it's a skill candidate. Skills compound.
2. Investigating — Claude as a fast analyst over messy data
Not every problem has a clean reproduction. Some just have exhaust — logs, HTML, metrics — and a question.
Claude is surprisingly fast at parsing a wall of HTML markup and identifying the bug in extraction code once it has the offending page in context. The same strength shows up during incidents: point Claude at a messy pile of logs and ask it to reconstruct what happened.
Our incident investigations usually span three or four systems. Application logs live in GCP. Some infrastructure logs sit in AWS. Grafana owns the metric dashboards. The failure often surfaces in a MongoDB collection. Stitching all of that into a coherent story used to be the slow, tedious work at the front of every postmortem.
Claude does it fast. We point it at the signal sources, describe the symptom, and let it correlate — running analytics on the fly, counting across time windows, cross-joining against request IDs, building up a hypothesis iteratively. The output isn't always right on the first pass. But it gets us from "something went wrong" to a working hypothesis in minutes instead of an afternoon. We verify everything it produces before acting on it. That part is non-negotiable.
Claude's edge here is stamina. It's cheap to point at disparate, unstructured sources a human would find tedious to stitch together. It doesn't get bored. It doesn't give up. It will happily reread a 20MB log file for the fourth time because we asked it to check one more thing.
3. Operating — Running the shop, not just writing code
Not all agent work is heroic. A lot of it is the housekeeping that quietly eats engineering hours.
Our internal skill here is called /i-hate-pushups. The name is a joke. The pain is real. It tidies our Linear board by creating and associating tickets for our open GitHub PRs — the project-management chore that everyone agrees should happen and nobody actually does consistently.
This is the bucket we're most excited about, and the one we think most teams underrate. The leverage is time. Claude reclaims the hours we were losing to cross-tool bookkeeping. PR triage, stale-branch cleanup, state sync between Linear and GitHub, keeping docs in step with config changes. None of it is glamorous. All of it compounds.
If the rule for the Building bucket is "if it has verification criteria, make it a skill," the rule for the Operating bucket is simpler: if you keep forgetting to do it, make it a skill.
What we've learned
A few things have been true across every bucket.
Goal-based prompting beats spec-based prompting. We give Claude an outcome — "fix this extraction, verify it against the original URL, open a PR" — and trust it to figure out the approach. The moment we start prescribing steps, we slow it down and introduce mistakes.
Course-correct aggressively when it drifts. Claude is a junior engineer. When it reaches for the wrong approach, we push back hard and early. The posture we've settled into is firm supervision.
Encode the loop, not the task. A one-off fix is a prompt. A workflow we keep running is a skill. The moment we notice ourselves running the same three prompts in a row, we stop and write the skill.
Verification is non-negotiable. Every skill we trust in production has a built-in "did this actually work?" step — regression tests, end-to-end re-runs, WOMM evidence attached to the PR. Without that gate, "fixed" stops meaning anything.
Frequently Asked Questions
What is a Claude Code skill?
A skill is a reusable workflow you can invoke with a slash command inside Claude Code. It packages up instructions, tools, and guardrails so Claude runs the same loop the same way every time. Think of it as the difference between explaining a task from scratch every day and writing a script you can run on demand.
How do you decide when to turn a workflow into a skill?
Two rules. First, if the loop has a clear "did this actually work?" step — a verification gate that tells you the output is correct — it's a strong skill candidate. Second, if you keep forgetting to do it (PR hygiene, ticket sync, stale-branch cleanup), it's a skill candidate. If neither applies, it's probably still a prompt.
How do you prevent Claude from shipping bad code?
Three layers. First, every skill we trust has a built-in verification step — the fix has to re-run the failing input and produce the right answer before the PR is opened. Second, we review every PR a skill opens. Third, we course-correct aggressively when Claude drifts during a session. Treating Claude like a junior engineer is the posture that makes the other two layers work.
What tools do you use alongside Claude Code?
For the examples in this post, Claude is wired into the substrate we already had — our data stores, version control, ticket tracker, and observability stack. That's the point: Claude is most useful where the substrate already exists. Our /fix-product-data-extraction skill only works because we'd already built the capture pipeline long before we had agents in the loop.
Can I try Rye with my own agent?
Yes. Rye is built to be agent-friendly: three-endpoint checkout, tokenized payments, universal merchant coverage, and an SDK that handles the retry and idempotency details you don't want to write yourself. Point your agent at the docs and you can start building today.
Start Building
We build Rye with agents because we built Rye for agents.
Every design decision — the three-endpoint checkout flow, the tokenized payment contract, the SDK's retry and idempotency handling, universal coverage across merchants — is aimed at the same target: making Rye the place an agent with a product URL and a payment token can actually complete a purchase.
If you're building with an agent, that's what we want you to try. Point your agent at Rye and see how the whole thing fits together.
Start building at rye.com/docs →