Summary
A compact routing note for coding agents: planning versus acting, local versus hosted, fast versus deep, and when to split a task instead of feeding one giant context window.
Routing Beats Ranking
Provider leaderboards are a weak routing plan for day-to-day agent work. Planning, patching, reviewing, summarizing, and broad codebase exploration all stress different parts of an agent stack.
A useful routing policy asks what the next loop needs: broad reasoning, fast patch iteration, structured JSON, careful code review, or a small classification. Then it picks the smallest capable model and context budget.
Plan And Act
Plan mode wants room to inspect, compare, and choose a path. Act mode wants determinism, low temperature, and a bounded file set. Keeping those jobs separate reduces both latency and drift.
The same idea applies to local and hosted models. Local-first keeps common loops private and fast enough; hosted reasoning is reserved for situations where the extra headroom changes the outcome.
Failure Mode
Quality usually drops when the task is too broad and the context keeps growing until the agent becomes expensive, slow, and less disciplined. Good routing includes a stop rule: summarize state, split the work, then continue in a narrower phase.