Model Routing: The Art of Using the Right Intelligence | AI Academy

Every military commander knows you don't deploy your elite forces against low-value targets. You match asset capability to mission requirement. AI is no different — and most builders have no routing strategy at all.

They pick a model, usually the most powerful one they've heard of, and run everything through it. That's not a strategy. That's a budget leak with good vibes.

Model Routing Decision Tree

The Model Hierarchy

There are four tiers of intelligence in the current model landscape. Each has a cost profile, a latency profile, and a capability ceiling. Understanding where each tier operates is the routing problem.

Flash — Free/Fast. Sub-second responses, near-zero cost. Handles extraction, classification, formatting, simple summarization, and pattern matching on structured data. This is your high-volume workhorse. Use it for anything where "good enough" is the actual standard because the downstream consumer doesn't need more.

Pro — Moderate. Solid analytical depth, reasonable cost. Handles synthesis, writing tasks with moderate complexity, advisory summarization, and analysis where the output is a human-readable document but not a brand artifact. The 80th percentile of tasks should live here.

Sonnet — Complex. When you need judgment, voice, novel synthesis, or reasoning chains longer than a few steps. This is where quality-critical AI work happens. Brand-voice content. Architecture decisions. Complex multi-step analysis. Not cheap, but not Opus. The sweet spot for most serious production work.

Opus — Maximum Reasoning. Deep research, strategic synthesis, tasks where the model needs to hold contradictory information in context and reason through it. Reserve this for work where being wrong has real downstream cost.

He will win who knows when to fight and when not to fight.
— Sun Tzu · The Art of War

The Routing Decision Framework

The routing question is not "what's the best model?" It's "what's the minimum capability needed to produce an acceptable output?"

Work backward from the output requirement:

Output is a classified label or extracted field → Flash
Output is a coherent paragraph or moderate analysis → Pro
Output needs brand voice, analytical depth, or complex reasoning → Sonnet
Output is a research brief or strategic decision → Opus

The failure modes are symmetric. Route too high and you overspend. Route too low and you produce mediocre output that erodes brand trust.

⚠WARNING

The quality trap is real. Using Flash where you need Sonnet doesn't save money — it costs you in eroded brand signal, republished mediocre content, and the downstream work of fixing what the model got wrong. Underpowered routing has a hidden tax.

Real Routing Decisions From Our Stack

These routing calls are already made in production. Examining the reasoning matters more than the decisions themselves.

blog-autopilot article generation → Sonnet. This is a 1,200-word article in Knox's analytical voice published on a personal brand site with 31 posts. The reader's expectation is set by the body of work. Flash would flatten the voice and miss the cross-domain analogies that define the style. Sonnet is the minimum acceptable.

Topic selection in gather.py → No AI. Python reads a list of topics, tracks which ones have been used, and rotates through them in sequence. This requires zero intelligence. Routing it to any model wastes tokens and adds latency to a stage that should complete in milliseconds.

Invictus Sentinel post-mortem → Gemini Flash. After an incident, the system collects structured log data, timestamps, and error codes. Flash pattern-matches against known failure signatures and generates the incident report. The data is structured, the report is templated, and Flash handles this at near-zero cost.

Advisory council summaries → Pro. Moderate synthesis of multi-source research into a coherent briefing. This is writing work, not brand work. Pro produces acceptable quality at one-third the cost of Sonnet.

Model Tiers

Flash / Pro / Sonnet / Opus

Cost Delta

~10x

Flash to Sonnet price difference

Routing ROI

Quality × Efficiency

The only metric that matters

The Compounding Cost Problem

One misrouted call is noise. One hundred misrouted calls per day is a structural problem.

At 100 daily AI calls, routing everything through Sonnet when 60% of those tasks should run on Flash is a real cost differential — not a rounding error. At 1,000 daily calls, it becomes the difference between a sustainable operation and a money pit.

More critically: the cases where you route too low compound differently. Mediocre output at scale isn't just wasted compute — it trains your audience to expect less. The brand tax accrues invisibly until engagement drops and you can't trace it back to the routing decision you made six months ago.

⚔DOCTRINE

Match capability to requirement. The right model isn't the most powerful one — it's the cheapest one that produces an acceptable output for the specific use case. Know the difference between those two standards before every routing decision.

The Hidden Routing Decision: No AI

The most underrated routing option is removing the model entirely.

Any task that follows a deterministic logic path — rotate topics, check if a file exists, format a timestamp, send a webhook — belongs in Python, not in a model. This isn't a routing decision between Flash and Sonnet. It's a routing decision between code and AI. Always choose code when code is sufficient.

The no-AI route is always cheaper, faster, and more auditable than the cheapest model.

Lesson 9 Drill

List your last five AI tasks — things you actually ran through a model in the past week. For each one, assign it to the correct tier: Flash, Pro, Sonnet, Opus, or No AI.

Then check whether you actually used that tier or routed higher out of habit.

The gap between where you routed and where you should have routed is your current waste ratio. That number tells you how much headroom you have before you hit cost ceilings.

Bottom Line

Routing is the leverage point most builders ignore because it feels like optimization, not strategy. It's both.

When you route correctly, you get better output at lower cost because focused capability on the right task outperforms overpowered capability on an unfocused brief. Match the asset to the mission, every time.

The goal isn't to use the best model. The goal is to produce the best output per dollar across the full portfolio of work your system runs.