Two weeks. Model-tier matrix, prompt-rig audit, prompt-caching breakeven analysis, six-month roadmap, kill list, recommended next sprint.
Claude consulting for teams who picked the wrong Claude.
Most teams running Claude are using Opus where Sonnet would win and Sonnet where Haiku would win. The fix is not a different model provider. It is the right tier per call, an XML prompt rig, and prompt caching wired in by default.
Most teams are overspending on Claude by 40–70%. We fix the bill and the latency.
Each is a contract for a specific problem.
Two weeks, no extensions
For teams with twelve possible Claude bets and a budget for three. Model-tier matrix, prompt-rig audit, caching breakeven, and a six-month roadmap defensible in a board meeting.
Fourteen days to live
For teams who have decided the bet and need a Claude feature serving real traffic. The production proof of concept (PoC) runs in your environment behind a feature flag by day fourteen, refundable if it doesn't ship.
Six to twelve weeks
For teams scaling from one prompt to a portfolio. Tool use, thinking blocks, 200K context strategy, hallucination CI, monitoring with refusal alerts, cost caps per tenant.
Monthly retainer
For teams with ML engineers who want senior backup. Prompt review, eval design, model-tier calls, on-call escalation. We're in your Slack, not on the org chart.
Strategy sprint. Seven deliverables. One document.
The strategy sprint is the entry point. You get a model-tier matrix, a prompt-rig audit, a caching breakeven analysis, eval harness baseline, a six-month roadmap, a kill list, and a recommended next sprint costed and scoped.
Book a fit call →Four shapes. Ranges from $25k to $150k+.
We publish them because we have been on the other side of the table - the call where you ask the price and three weeks of email-tag begin.
Fourteen days to one Claude feature live in your stack. XML rig, eval harness, prompt caching wired, dashboard, runbook. Refundable if it doesn't ship.
Six to twelve weeks. Tool use, thinking blocks, 200K context strategy, hallucination CI, monitoring, cost caps per tenant, full integration into your stack.
Senior engineers embedded with your team. Prompt review, eval design, model-tier calls, on-call escalation. Quoted by scope.
We run Sentinel on Claude. Same rig you'll inherit.
Sentinel is JAAX's live Shopify analytics product. The Claude layer that answers operator questions over merchant data runs on the same XML-structured, prompt-cached, eval-gated rig you'll inherit. Every habit on this page was earned shipping it, not theorized for a deck.
See SentinelFor teams who have picked Claude and want the engineering right.
The buyers we do our best work for share three traits:
- A number they want moved - deflection rate, recovery rate, time-to-quote, cost-per-ticket
- At least one AI initiative already attempted - they know the difference between a working agent and a working demo
- A window, usually a quarter, to show something running
We work with Series A startups whose CTO is the buyer. We work with mid-market product teams handed Claude with no roadmap. We work with Fortune 1000 divisions that picked Claude through procurement and now need someone who can actually ship on it.
If you need vendor-evaluation help across Anthropic, OpenAI, and Google, see our anthropic-consulting page. This page is for product-level Claude buyers.
Questions we get on every fit call.
Claude consulting is the engineering practice of getting Claude-backed features into production with the right model tier, the right prompt rig, and the eval harness to know whether either is working. The honest version of the category builds and runs the systems it sells. The dressed-up version sells prompt-tweaking by the hour. We are the first kind. The proof is Sentinel.
- Opus 4.7 for high-stakes reasoning, multi-step agent loops, and the calls where being right matters more than being fast.
- Sonnet 4.6 balances cost and quality - it wins most evals. Most production calls live here as the daily driver.
- Haiku 4.5 for high-volume classification, routing, and the calls where p95 latency (the slowest 5% of requests) under 500ms is the constraint.
The rule is: write the eval first, run all three at the cheapest setting, and let the eval pick. Most teams arriving at us are using Opus where Sonnet would win and Sonnet where Haiku would win.
XML tags, not Markdown headers. Claude was trained to follow XML structure (
Prompt caching breaks even at roughly two cache hits within the five-minute TTL, since the write is 1.25x and the hit is 0.1x of base cost. That makes caching a near-universal win for any prompt with a stable system message, a long retrieved context, or a multi-turn conversation. We wire caching at the call-site by default and set the cache_control (the API flag that activates prompt caching) breakpoint at the largest stable prefix - usually the system prompt plus retrieved documents. Teams who haven't enabled it are typically overspending by 40-70% on Sonnet and Opus calls.
Treat the 200K window as a budget, not a license. Claude's recall degrades on long contexts the same way every model's does - middle-of-context information gets lost more than start or end. We retrieve narrowly, place the most important information at the top and the task at the bottom, and use prompt caching to make long stable contexts cheap. Published benchmarks for long-context recall look better than real-world performance. The production reality is to design for the first 32K tokens and let caching pay for the rest.
Yes. Tool use is the right primitive for any feature that needs to fetch, calculate, or write. We define tools with strict JSON schemas, validate every tool result, and run a retry-with-correction loop when a tool returns an error. Thinking blocks (extended thinking on Opus and Sonnet) are reserved for genuinely hard reasoning calls where the answer benefits from chain-of-thought - most production prompts do not need them and pay for them anyway when teams enable thinking globally. Eval first, then decide.
Anthropic consulting is the enterprise procurement lens - vendor selection, contract terms, security review, deployment options across Anthropic API, AWS Bedrock, and Google Vertex. Claude consulting is the product-engineering lens for teams who already picked Claude and need the model tier, prompt rig, and eval harness right. OpenAI consulting is the same engineering work but on the GPT side. We do all three. This page is for teams shopping the product side.
Strategy sprints run $25–45k. Production proofs-of-concept run $50–150k. Full implementations start at $150k and scale with integration depth. Embedded team augmentation is a monthly retainer. Pricing is the same regardless of industry - we charge by engagement shape, not by model provider.
Yes, mutual NDA before any technical conversation. We do not work for clients with conflicting active engagements in the same competitive set during a quarter - a rule we enforce on ourselves more strictly than most clients ask us to.
Send a paragraph. We'll come back the same day.
Tell us which Claude tier you're on, what feature you want shipped, and the metric you want moved. We'll come back with a yes, a no, or a sharper question. No discovery deck, no pitch meeting marathon.
Book a 30-min fit call