Two posts ago we compared Kimi K2 and Claude Sonnet for raw coding tasks and landed on "K2 covers about 80% of the work at ~15% of the cost." That post left a question hanging: how do you actually wire that up so Kimi K2 picks up the cheap work automatically and your expensive models only fire when they need to?
Answer: routing. The OpenCode Controller skill inside OpenClaw is the routing layer. This post is the playbook we use across our client projects, with token-cost numbers from real billing data.
Here's a 30-day window from one of our agency tooling stacks. Pre-routing, single-model:
| Period | Active model | Tokens (M) | Cost | Notes |
|---|---|---|---|---|
| Month 1 | Claude Opus 4.6 | 38.4 | $1,920 | Everything went through Opus |
| Month 2 | Claude Sonnet 4.6 | 41.1 | $410 | Switched everything to Sonnet |
| Month 3 (routed) | Mixed (K2 + Opus) | 43.7 | $158 | OpenCode Controller routing live |
Token volume went up between months 2 and 3 because the team trusted the tooling more and used it for more tasks. Cost still dropped 61%. The marginal cost of an additional task in month 3 was effectively zero for ~80% of work — that's the routing dividend.
Important caveat: same workload, same team, same OpenClaw setup. We didn't change agents or tasks, just routing rules.
Kimi K2 is fast, cheap, and good enough at agentic code tasks. The places it falls down are predictable:
For everything else — file edits, refactors, test scaffolding, glue code, search-and-replace, doc generation — K2 is competitive with Sonnet at a fraction of the cost.
The point of routing isn't to use K2 everywhere. It's to use K2 by default and escalate to Opus on the work where K2 actually breaks.
This is the [routing] section of our opencode-controller/config.toml, dropped into client projects unchanged:
[routing] # 1. Default: Kimi K2 for everything not otherwise tagged cheap_default = "openrouter:moonshotai/kimi-k2" # 2. Heavy tags route to Opus heavy_tags = ["schema", "migration", "auth", "security", "billing"] heavy_model = "anthropic:claude-opus-4-7" # 3. UI/design tasks go to Gemini 3 (better at design intent) ui_tags = ["ui-mock", "design", "layout"] ui_model = "google:gemini-3-pro" # 4. Reviews always use a different model than the writer review_strategy = "alternate" # 5. When K2 returns a 429 or context-window overflow, fall through fallback_chain = [ "openrouter:moonshotai/kimi-k2", "anthropic:claude-sonnet-4-6", "anthropic:claude-opus-4-7" ]
The five-line breakdown:
cheap_default is the workhorse. ~80% of token volume hits this.heavy_tags is the safety net. We learned the list by tracking which K2 outputs needed manual rework — schemas and migrations dominated, auth and billing were close behind.ui_tags is qualitative. Gemini 3 produces design intent K2 doesn't match yet.review_strategy = "alternate" is the cheap-insurance trick. If K2 wrote it, Opus reviews. If Opus wrote it, Sonnet reviews. Different model = different blind spots.fallback_chain matters more than it sounds. Kimi K2 via OpenRouter rate-limits aggressively. When you saturate the lane, the chain prevents builds from blocking.The model picks the route, but you pick the tag. Three patterns we use:
Manual tagging. Before kicking off a sub-task: /opencode tag schema. Reset after.
Convention-based tagging. A wrapper script reads the file path and auto-tags. migrations/*.sql → migration. lib/auth/* → auth. The script lives in our agency starter kit — we ship it with new client projects.
LLM-driven tagging. A first-pass routing agent reads the user request and picks the tag. We tried this; it's clever but adds 200-400ms latency per turn and doesn't beat the convention-based approach. Killed it after two weeks.
Pick one and stick with it. Mixing modes confuses the team and makes the cost dashboard hard to read.
The biggest cost lever after routing is prompt caching. When the same long system prompt or codebase context goes to Opus repeatedly, cache it.
OpenClaw doesn't do this for you — but the controller skill does, with one config flag:
[providers.anthropic] auth = "env:ANTHROPIC_API_KEY" default_model = "claude-opus-4-7" prompt_cache = "auto" # cache anything > 1024 tokens for 5 minutes
In our usage, this knocks another 30-50% off Opus costs on long-running sessions. Sonnet supports it identically. K2 via OpenRouter doesn't, but K2 is cheap enough that you don't care.
Tagging too aggressively. We started with seven heavy tags and routed almost everything to Opus. Cost barely budged from baseline. The data showed schema and migration were doing 90% of the routing work; the rest were noise. Cut to five tags, then to four. Costs dropped further.
Not measuring per-task quality. We assumed K2 was "fine" because builds passed. Builds passing isn't quality. We added a one-line manual rating after every K2-completed task for two weeks. The data surfaced that K2 outputs in auth/ were getting reworked 3x more often than other paths — that's how auth ended up in heavy_tags.
Trusting /opencode cost blindly. v1.4.2 of the skill undercounts cached Anthropic input by 90%. If your real bill is way higher than the skill's report, that's the cause. We pull actual billing data weekly and compare; the gap has stayed predictable.
Three patterns where multi-model routing isn't worth the complexity:
For everyone else — agency teams, AI-tooling-heavy startups, anyone running OpenClaw in production daily — this routing setup is the single biggest cost lever we've found short of dropping AI entirely.
If you haven't installed the controller skill yet, start with the setup guide. If you're skeptical of K2's quality claims, the Kimi K2 vs Claude benchmark post has the numbers. If you're trying to figure out the broader OpenClaw + OpenCode picture, the pillar essay is the entry point.
Founder & Lead Developer
With 8+ years building software from the Philippines, Jomar has served 50+ US, Australian, and UK clients. He specializes in construction SaaS, enterprise automation, and helping Western companies build high-performing Philippine development teams.
Tell us what you're building. We'll show you the fastest path to a production-ready launch.
Get My Free Proposal