agent workflow · routing · systems note

Coding Agent Model Routing

Plan/act separation, local fallbacks, and choosing the smallest capable model for each loop.

  • Plan/Act
  • Model routing
  • Context budgets
  • Local-first
  • Workflow

$ head -n 1

A compact routing doctrine for coding agents: planning versus acting, local versus hosted, fast versus deep, and when to split a task instead of feeding one giant context window.

$ grep -i "routing beats ranking"

Provider leaderboards are a weak routing plan for day-to-day agent work. Planning, patching, reviewing, summarizing, and broad codebase exploration all stress different parts of an agent stack.

A useful routing policy asks what the next loop needs: broad reasoning, fast patch iteration, structured JSON, careful code review, or a small classification. Then it picks the smallest capable model and context budget.

$ grep -i "plan and act"

Plan mode wants room to inspect, compare, and choose a path. Act mode wants determinism, low temperature, and a bounded file surface. Keeping those jobs separate reduces both latency and drift.

The same idea applies to local and hosted models. Local-first keeps common loops private and fast enough; hosted reasoning is reserved for situations where the extra headroom changes the outcome.

$ grep -i "failure mode"

Quality usually drops when the task is too broad and the context keeps growing until the agent becomes expensive, slow, and less disciplined. Good routing includes a stop rule: summarize state, split the work, then continue in a narrower phase.