AI Workflow VIP 2026-06-22

Separate Design from Execution — The Dual-Model Strategy

Running every task on the most expensive AI is wasteful. Design with the big model, execute with the fast one. The general-and-soldier structure cuts cost and raises speed.

Many people try to cut AI costs this way and that. Switch to a cheaper model — the output gets sloppy. Switch back to the expensive one — next month the bill is massive. A lot of people ping-pong between the two. The answer isn't ping-pong. It's dividing the work. Today we unpack the principle. Slowly.


Here's the spine. Design and execution are different tasks. Different tasks get different tools. Design with the high-intelligence model. Execute with the high-speed one.

AI models usually come in three tiers. Claude has Opus (most expensive, smartest), Sonnet (middle), Haiku (cheapest, fastest). ChatGPT is similar (GPT-5.4, GPT-5, Mini). The gap: the most expensive model runs about 18× more than the cheapest. Eighteen. That number is today's key variable.

The Shape of Confusion — Everything on the Biggest Model

Most people leave defaults on and run everything on the top model. A 3-line email — Opus. Cleaning up meeting notes — Opus. Renaming 50 files — Opus. Each is a few cents, but 50 times a day stacks past $200 a month. That's the source of bill shock.

The other extreme exists too. People who, to save money, run everything on the cheapest model. Now the problem flips. Complex debugging handed to Haiku gets stuck for 3 hours and still doesn't solve. Opus would have finished in 30 minutes. You saved dollars and lost hours. Hours cost more than dollars.

Both extremes fail for the same reason. You tried to do all work with one model. Work isn't one kind. So the tool shouldn't be one kind.

Analogy — General and Soldier

Think of a military operation. Planning a battle is the general's job. On the map, they consider the enemy position, friendly movement, weather, supply lines — all at once — and make the large calls. The general sees the whole.

But the general doesn't carry a rifle to the front. That's the soldier's job. The soldier executes the order. Accurately and fast. Thousands of soldiers each execute the plan in their positions, and the operation moves.

If the general did every soldier's work, the front collapses — one person can execute only so much. If soldiers acted without a general, strategy scatters and the operation falls apart.

AI models work the same way.

  • Opus (general) — big judgments, whole structure, complex reasoning. Slow and expensive.
  • Haiku (soldier) — repetitive execution, fixed patterns, high volume. Fast and cheap.

Put the general on soldier's work and money explodes. Put soldiers in the general's chair and the operation fails. Dividing roles correctly is the core skill.

Aha

The real skill is knowing when to switch models. Use the execution model at the design stage and the structure breaks. Use the design model at the execution stage and cost blows up.

This sentence is the key. Let's unpack.

Every task actually has two stages.

Design stage — deciding what to do, in what order, in what format. Many judgments, few repetitions. Once done well, it's done.

Execution stage — following the design to actually produce. Few judgments, many repetitions. Must be done many times fast.

Every task contains both. Most people blur them together and run on one model. Don't blur. Separate them and both sides improve.

Diagnosis — Which Stage Is Your Task

Ask this question.

'Does this task actually need judgment?'

Three answers.

  • Yes, a lot → strategy, complex debugging, architecture. Opus tier.
  • In between → implementing a defined spec, refactoring, writing a report. Sonnet tier.
  • Barely any → renaming files, standardizing formats, simple classification. Haiku tier.

Three words to memorize: Judgment → Opus. Making → Sonnet. Repetition → Haiku. Keep these three words in mind and ask once at the start of every task.

Real Example — 20-Page Meeting Record

Let's walk through. Task: 'extract action items from a 20-page meeting record.'

One-model way — all Opus Hand all 20 pages to Opus. 'Extract action items.' Opus reads, understands, pulls them out. Good result. Cost $3.

Dual-track way — Opus for design, Haiku for execution

Step 1: ask Opus for the plan only.

'I want to extract action items from a 20-page meeting record. Don't execute — design the method. What keywords to scan first, which sections usually hold them, what output format.'

Opus returns a short answer. 'Scan for phrases like must-do, decided-to. Decisions cluster at the end of sections, so read those first. Format as 3 columns: owner / action / deadline.' Cost $0.20.

Step 2: hand that plan to Haiku.

'Follow the plan above and extract action items from this 20-page record, owner / action / deadline.'

Haiku executes. Cost $0.30.

Total $0.50. One-sixth of the baseline. And what's interesting — the result is often better. When Opus focuses only on 'how to do it,' the thinking sharpens. Same as a general focusing only on strategy while soldiers execute — the operation runs better.

Commands — Switching Models

In Claude Code, switch mid-session with /model opus, /model sonnet, /model haiku. On Claude.ai web, use the top dropdown. With API, change the model parameter in the request.

Try it for a week. At the start of each task, whisper the one question — 'is this judgment, making, or repetition?' — and pick accordingly. The first 2-3 days feel awkward. After a week it's automatic. The bill drops visibly.

Summary

Design and execution are different tasks. Use different tools. One model for everything means either expensive or sloppy. Divide and both sides win.

This principle isn't only for AI. It applies at work, at home, in solo projects. Asking one person to both plan and execute leaves both half-baked. Divide and give each role to the right fit and both thrive. AI is the tool that makes you feel this principle 10 times a day.

Judgment → making → repetition. Run your own work through these three words.

Edit Section