Newer is not always better. Version numbers only tell you the order. Today we unpack this principle through Opus 4.7 vs 4.6.
A new version drops and your heart moves first. "Opus 4.7 is out — does that mean 4.6 is done?" Bigger number, obviously better. But run them side by side on real work and something strange happens. Some tasks, 4.7 wins clearly. Other tasks, 4.6 is better. Why?
This essay walks through what the version number doesn't tell you. Opus 4.6 and 4.7 are today's example, but the principle applies to any numbered product. Three years from now, when "Opus" is a different name, the spine of this essay still holds. We'll go slowly.
The notation came from the software era. The original meaning was defined like this.
This worked because software is built by adding features. Add a feature, bump 0.1. Add another, bump again. Predictable math.
AI models aren't software. More precisely, they're not made by the addition logic software runs on. Let's see why.
In March 2026, Opus 4.7 shipped. A 0.1 bump from 4.6. Nuance: "a bit smarter."
I threw the same tasks at both side by side. The table:
| Task | 4.6 | 4.7 | Winner |
|---|---|---|---|
| Complex code refactor | OK | Excellent | 4.7 |
| 30-page doc summary | Deep | Flat | 4.6 |
| Korean sentence rhythm | Natural | Slightly stiff | 4.6 |
| Math reasoning | Fine | Excellent | 4.7 |
| Emotionally subtle writing | Delicate | Logical but dry | 4.6 |
Five tasks — 4.7 won two, 4.6 won three.
The Korean rhythm row surprised me most. 4.7 is the smarter model overall, yet 4.6's Korean flows more smoothly. Same prompt asking for a sentence that opens with "자," — 4.6 produces it naturally on the first try, while 4.7 often sneaks in a stiffer clause. For my blog drafts, that single difference is decisive. A 0.1 bump, but which side wins depends on the domain. Why?
Because AI doesn't upgrade by addition. Each new version, the company retrains the model. Training data changes, training method changes, tuning direction changes. The result isn't addition — it's reconstruction. Some capabilities rise; others pay the cost and fall. That's the nature of AI models.
The version number encodes this reconstruction as pure ordering. "Came out later" — that's it. Not "better in every respect." Whatever AI company ships what next, this property repeats.
Picture a car brand that ships a new model every year. 2024, 2025, 2026.
Looking at numbers, 2026 is obviously best. But walk into an owner forum and you hear:
Car makers don't add everything to each new model. Budget, weight, constraints. Put something in, something comes out. So the right car for your road isn't necessarily the latest. If you commute through the city, 2024 might suit you. If you drive country roads, 2026 might.
AI models are identical. Your model isn't guaranteed to be the latest model.
Here's the first aha moment.
A version number tells you order. It doesn't tell you fit.
If numbers aren't the criterion, what is? Context. More precisely, your task. Keep one question handy.
"Have I thrown this task at both models myself?"
If not, numbers are your only evidence. If yes, results are. Which is more accurate? Results, obviously.
Do this. A/B test.
Ten minutes. That ten minutes automates your choices for the next month. "Code goes to 4.7, summaries go to 4.6," and so on.
Concrete commands. If you're on Claude Code:
# Start with the new version
claude --model claude-opus-4-7
# Start with the previous version
claude --model claude-opus-4-6
# Switch mid-session
/model opus-4-6
/model opus-4-7
On Claude.ai web, settings has a "Previous version" option. Pick 4.6 there.
My current split:
This split will shift in six months. When 4.8 arrives, I'll rerun the 10-minute A/B and update. The criterion isn't the number — it's the task.
Version numbers are the language of the software era. AI doesn't fit that language cleanly. 0.1 from 4.6 to 4.7 is reconstruction, not addition. Some domains go up; others come down. Today we unpacked this with Opus 4.6/4.7, but the same applies to any AI that comes next. When the names become Opus 7 or GPT-9, the reconstruction property repeats.
Keep one question handy — "Have I thrown this task at both models myself?" Trust the results, not the number. A 10-minute A/B decides the next month.
Three words to remember: Number. Context. Check. Don't look at the number — look at the context, and always check. The technology changes. The principle does not.