Stop Asking Which AI Model Is Best. Ask Which Problem You're Solving.

Every week someone publishes a new benchmark and half the internet starts debating whether to switch their entire stack. That's the wrong conversation.

The competitive moat in AI tooling right now is not which model you have access to. Every serious operator has access to the same frontier models. The moat is routing discipline — knowing precisely which model to use for which class of problem, and building that logic into your systems rather than leaving it to individual habit.

He who knows when he can fight and when he cannot will be victorious.
— Sun Tzu · The Art of War

What I Actually Run in Production

I operate 54+ apps on a single Mac Mini — trading bots, agentic pipelines, monitoring daemons, content systems, signal infrastructure. Across all of it, I use five model families and the routing logic is explicit, not tribal knowledge.

Here's the actual table:

Claude Opus — Architecture decisions, multi-document reasoning, strategy synthesis. I use this when I need to reason across the entire system and the cost of a wrong answer compounds. Slow and expensive on purpose. You don't use a precision instrument for volume work.

Claude Sonnet via Claude Code — Implementation. This is my default execution layer for all non-trivial coding: multi-file refactors, new features, complex bug investigation. Claude Code runs in terminal, writes files, runs tests, iterates on failures. It's an agent, not an assistant. The blog-autopilot skill, the invictus-sentinel monitoring daemon, the polymarket-bot v4.1 — all built through this loop.

Gemini Flash — High-volume gather tasks. This is the workhorse for anything at scale. The blog-autopilot pipeline's gather stage uses Gemini Flash to pull and synthesize research: RSS feeds, source scraping, topic discovery. At $0 effective cost on the free tier, it runs on a cron every morning and outputs intermediate JSON that the synthesis stage consumes. You don't pay Sonnet-tier prices to summarize an RSS feed.

Grok — Real-time social signal. What's actually moving on X right now. Sentiment on a specific ticker or narrative. Which posts are gaining velocity. The smart-engage skill uses Grok to identify high-engagement creator content before it peaks. Static models trained months ago can't tell you what's hot at 9 AM today.

GPT-4o — When clients or integrations require it. Not my first choice, but sometimes the tools around a model matter as much as the model itself. OpenAI's ecosystem integrations are mature. I don't fight that.

Local/open-source — When data cannot leave the machine. Some workflows touch financial data, private signals, or API keys that can't be sent to a cloud endpoint. For those, a local model running on-device is the only correct answer regardless of quality tradeoffs.

⚔DOCTRINE

Model monoculture is a tax. Routing a $0.03/task Gemini Flash job through a $1.80/task Opus call because you're comfortable with one interface is pure friction cost — and it compounds across 40+ daily pipelines.

The Blog-Autopilot Cost Math

This is concrete enough to show in numbers.

The blog-autopilot pipeline runs every other day at 9 AM ET. It has four stages: gather, write, image, deliver. The gather stage — pulling YouTube transcripts, scraping sources, synthesizing research context — runs on Gemini Flash. The write stage — actual article generation — runs on Claude Sonnet.

If I ran the entire pipeline on Sonnet, gather would cost roughly 60× more per run. On a pipeline that runs 180 times a year, that's not a rounding error — it's a system design decision. Flash handles what Flash is good at: volume, synthesis, structured data extraction. Sonnet handles what demands quality output.

Cost Ratio

60×

Claude Sonnet vs Gemini Flash per token — model routing turns this gap into leverage

Daily Pipeline Coverage

40+

Automated pipelines across the Invictus stack — each routed by problem type, not default preference

Flash-Tier Task Share

~70%

Estimated share of total AI calls that qualify for Flash-tier routing — most gather/synthesis work does

Why Most Teams Don't Route

The honest answer: routing requires you to classify the problem before you solve it. That's cognitive overhead most teams skip because they're already comfortable with one model and the switching cost feels higher than the savings.

It isn't. Once routing logic is built into your pipeline — not just a personal habit but an explicit system parameter — the savings accrue automatically. The invictus-sentinel monitoring system routes its health check synthesis through Flash and only escalates to Sonnet when a post-mortem requires deep root cause reasoning. That decision is in the code, not in whoever happens to be running the check.

Teams that leave routing to individual engineer preference will have inconsistent costs, inconsistent quality, and no leverage as model pricing evolves. The model landscape will keep changing. Your routing logic adapts. Your habits don't.

◉SIGNAL

This week: audit one pipeline you own. Identify every place an LLM call happens and ask: does this task actually require frontier-model reasoning, or is this a synthesis/gather job that Flash can handle? Start there. Build the logic in, not just the intention.

The Gemini 3.1 Pro Signal, Actually

Yes — Gemini 3.1 Pro's reasoning gains are real and significant. I'm watching it closely for the class of problems that currently sits at the Opus tier: deep multi-document synthesis, novel inference chains, high-stakes architecture reasoning. If Gemini 3.1 Pro matches or beats Opus on those tasks at lower cost, the routing table updates.

That's what a mature operator does when a new model drops. Not "should I switch everything?" — that's consumer thinking. The question is: "does this new capability change which tier a specific class of problems belongs in?" Maybe it does for complex reasoning. It definitely doesn't change where RSS digest synthesis belongs.

The model landscape will keep rotating. Something better than 3.1 Pro ships in Q3. The routing framework is durable. The monoculture habit is not.

Bottom Line

Your routing table is a strategic asset. Build it explicitly — which model family, which problem class, which cost tier — and wire it into your systems. The operator who runs 40 pipelines with deliberate routing will outperform the operator who runs the same 40 pipelines on one model every time, not because they have better access, but because they've turned a cost structure into leverage. Gemini 3.1 Pro is a genuine improvement. Use it where it earns its place. Route everything else accordingly.

Stop Asking Which AI Model Is Best. Ask Which Problem You're Solving.

What I Actually Run in Production

The Blog-Autopilot Cost Math

Why Most Teams Don't Route

The Gemini 3.1 Pro Signal, Actually

Bottom Line

Follow the Signal

Claude Code Remote Control: The End of Being Tethered to Your Desk

Perplexity Computer and the Fork in the AI Agent Road

Meta's Ghost Patent Isn't About Death. It's About Who Owns Your Digital Persona.