AI Strategy VIP 2026-04-18

What the 0.1 Doesn't Say

Newer is not always better. Version numbers only tell you the order. Today we unpack this principle through Opus 4.7 vs 4.6.

A new version drops and your heart moves first. "Opus 4.7 is out — does that mean 4.6 is done?" Bigger number, obviously better. But run them side by side on real work and something strange happens. Some tasks, 4.7 wins clearly. Other tasks, 4.6 is better. Why?

This essay walks through what the version number doesn't tell you. Opus 4.6 and 4.7 are today's example, but the principle applies to any numbered product. Three years from now, when "Opus" is a different name, the spine of this essay still holds. We'll go slowly.

Where did version numbers come from?

The notation came from the software era. The original meaning was defined like this.

Integer part changes: big change. 4 → 5.
First decimal: small feature added. 4.6 → 4.7.
Second decimal: bug fix. 4.7.1.

This worked because software is built by adding features. Add a feature, bump 0.1. Add another, bump again. Predictable math.

AI models aren't software. More precisely, they're not made by the addition logic software runs on. Let's see why.

Example — the odd difference between 4.7 and 4.6

In March 2026, Opus 4.7 shipped. A 0.1 bump from 4.6. Nuance: "a bit smarter."

I threw the same tasks at both side by side. The table:

Task	4.6	4.7	Winner
Complex code refactor	OK	Excellent	4.7
30-page doc summary	Deep	Flat	4.6
Korean sentence rhythm	Natural	Slightly stiff	4.6
Math reasoning	Fine	Excellent	4.7
Emotionally subtle writing	Delicate	Logical but dry	4.6

Five tasks — 4.7 won two, 4.6 won three.

The Korean rhythm row surprised me most. 4.7 is the smarter model overall, yet 4.6's Korean flows more smoothly. Same prompt asking for a sentence that opens with "자," — 4.6 produces it naturally on the first try, while 4.7 often sneaks in a stiffer clause. For my blog drafts, that single difference is decisive. A 0.1 bump, but which side wins depends on the domain. Why?

Because AI doesn't upgrade by addition. Each new version, the company retrains the model. Training data changes, training method changes, tuning direction changes. The result isn't addition — it's reconstruction. Some capabilities rise; others pay the cost and fall. That's the nature of AI models.

The version number encodes this reconstruction as pure ordering. "Came out later" — that's it. Not "better in every respect." Whatever AI company ships what next, this property repeats.

Analogy — new car models

Picture a car brand that ships a new model every year. 2024, 2025, 2026.

Looking at numbers, 2026 is obviously best. But walk into an owner forum and you hear:

"2024's ride is smoother."
"2025's fuel economy is great but the engine is louder."
"2026 looks beautiful but the trunk shrank."

Car makers don't add everything to each new model. Budget, weight, constraints. Put something in, something comes out. So the right car for your road isn't necessarily the latest. If you commute through the city, 2024 might suit you. If you drive country roads, 2026 might.

AI models are identical. Your model isn't guaranteed to be the latest model.

Here's the first aha moment.

A version number tells you order. It doesn't tell you fit.

So what's the real criterion?

If numbers aren't the criterion, what is? Context. More precisely, your task. Keep one question handy.

"Have I thrown this task at both models myself?"

If not, numbers are your only evidence. If yes, results are. Which is more accurate? Results, obviously.

Do this. A/B test.

Pick 3 tasks you do often (code review, doc summary, email reply, etc.).
Throw one of each at both models (new + previous).
Lay results side by side. Which is better?
Write it into a table.

Ten minutes. That ten minutes automates your choices for the next month. "Code goes to 4.7, summaries go to 4.6," and so on.

Apply — running both models

Concrete commands. If you're on Claude Code:

# Start with the new version
claude --model claude-opus-4-7

# Start with the previous version
claude --model claude-opus-4-6

# Switch mid-session
/model opus-4-6
/model opus-4-7

On Claude.ai web, settings has a "Previous version" option. Pick 4.6 there.

My current split:

Code / math / logical analysis → Opus 4.7
Long Korean writing / subtle summaries / tone-sensitive work → Opus 4.6

This split will shift in six months. When 4.8 arrives, I'll rerun the 10-minute A/B and update. The criterion isn't the number — it's the task.

Summary

Version numbers are the language of the software era. AI doesn't fit that language cleanly. 0.1 from 4.6 to 4.7 is reconstruction, not addition. Some domains go up; others come down. Today we unpacked this with Opus 4.6/4.7, but the same applies to any AI that comes next. When the names become Opus 7 or GPT-9, the reconstruction property repeats.

Keep one question handy — "Have I thrown this task at both models myself?" Trust the results, not the number. A 10-minute A/B decides the next month.

Three words to remember: Number. Context. Check. Don't look at the number — look at the context, and always check. The technology changes. The principle does not.

자, 새 버전이 나오면 마음이 먼저 움직이시죠. "Opus 4.7 나왔다는데, 4.6은 이제 쓰면 안 되나?" 숫자가 큰 쪽이 당연히 낫다고 생각하게 됩니다. 그런데 실제로 둘을 나란히 돌려보시면 이상한 일이 생깁니다. 어떤 일은 4.7이 훨씬 낫고, 어떤 일은 4.6이 오히려 낫습니다. 왜 이럴까요.

이 글에서는 버전 숫자가 말하지 않는 부분을 처음부터 설명드립니다. Opus 4.6과 4.7을 예로 들겠지만, 원리는 숫자가 붙은 모든 제품에 적용됩니다. 3년 후 Opus라는 이름이 사라져도 이 글의 뼈대는 유효합니다. 천천히 가겠습니다.

버전 숫자는 어디서 왔나요

먼저 이 표기법이 어디서 왔는지 이해하고 넘어가시면 좋습니다. 버전 숫자는 소프트웨어 시대에 만들어진 언어예요. 원래 뜻은 이렇게 정해져 있었습니다.

정수 자리가 바뀌면: 큰 변화. 4 → 5.
소수점 자리가 바뀌면: 작은 기능 추가. 4.6 → 4.7.
두 번째 소수 자리: 버그 수정. 4.7.1.

이 규칙이 통했던 이유는 단순해요. 소프트웨어가 기능의 덧셈으로 만들어지기 때문입니다. 기능 하나 붙이면 0.1 올리고, 또 붙이면 0.1 올리고. 예측 가능한 수학이었어요.

그런데 AI 모델은 소프트웨어가 아닙니다. 더 정확히는, 소프트웨어의 덧셈 논리로 설명되지 않는 물건이에요. 왜 그런지 보겠습니다.

예시 — Opus 4.7과 4.6의 이상한 차이

원리를 구체화해보겠습니다. 2026년 3월에 Opus 4.7이 나왔어요. 4.6 대비 0.1 증가. "좀 더 똑똑해졌다" 정도의 뉘앙스죠.

제가 같은 과제를 두 모델에 나란히 던져봤습니다. 결과표입니다.

과제	4.6	4.7	승자
복잡한 코드 리팩토링	괜찮음	매우 좋음	4.7
긴 문서 요약 (30p)	깊음	평면적	4.6
한국어 문장 리듬	자연스러움	조금 딱딱함	4.6
수학 추론	보통	매우 좋음	4.7
감정 섬세한 글쓰기	섬세함	논리적이지만 건조	4.6

5개 중 4.7이 2개, 4.6이 3개 이겼습니다.

제가 특히 놀란 건 "한국어 문장 리듬" 항목이었어요. 4.7이 전반적으로 똑똑한데, 한국어 호흡은 오히려 4.6이 매끄럽습니다. 같은 지시문에 "자,"로 시작하는 문장을 부탁하면 4.6은 바로 자연스럽게 내놓고, 4.7은 한 번쯤 딱딱한 문장을 섞어요. 저한테는 블로그 원고에 쓸 때 이 차이가 결정적입니다. 0.1 올라갔는데 더 잘한 게 영역마다 다른 거예요. 왜 이럴까요.

답은 AI가 덧셈으로 업그레이드되지 않기 때문입니다. 새 버전을 만들 때 회사는 모델을 다시 학습시킵니다. 학습 데이터가 바뀌고, 학습 방법이 바뀌고, 튜닝 방향이 바뀝니다. 결과는 덧셈이 아니라 재구성이에요. 어떤 능력은 올라가고, 어떤 능력은 그 대가로 내려갑니다. 이게 AI 모델의 본질이에요.

버전 숫자는 이 재구성을 순서로만 표기합니다. "나중에 나왔다"는 뜻일 뿐, "모든 면에서 낫다"는 뜻이 아니에요. 앞으로 어떤 회사가 어떤 AI를 내놓아도 이 성질은 반복됩니다.

비유 — 자동차 신모델

쉽게 이해하시려면 자동차를 떠올려보세요. 같은 회사에서 같은 이름으로 매년 신모델이 나옵니다. 2024년형, 2025년형, 2026년형.

숫자만 보면 2026년형이 당연히 낫죠. 그런데 동호회 가보시면 이상한 이야기가 많아요.

"2024년형이 승차감은 더 부드러워."
"2025년형은 연비가 좋은데 엔진 소리가 커."
"2026년형은 디자인은 좋은데 트렁크 공간이 줄었어."

자동차 회사가 신모델 만들 때 전부를 더하지 않습니다. 예산과 무게와 제약이 있으니까요. 뭔가를 넣으면 뭔가는 빠집니다. 그래서 "내 길에 맞는 차"는 최신 모델이 아닐 수 있어요. 매일 도시를 다니시면 2024년형이 더 좋으실 수 있고, 시골길을 자주 가시면 2026년형이 좋으실 수 있습니다.

AI 모델도 정확히 같습니다. 당신의 길에 맞는 모델이 최신 모델이라는 보장이 없어요.

여기서 첫 번째 아하 모멘트가 옵니다.

버전 숫자는 순서만 말합니다. 맞음을 말하지 않습니다.

그럼 무엇을 기준으로 고를까요

숫자가 기준이 아니라면 뭐가 기준일까요. 답은 맥락입니다. 더 정확히는, 당신의 과제예요. 질문 하나를 몸에 붙이세요.

"이 과제를 두 모델에 직접 던져봤는가?"

안 던져봤으면, 기준이 숫자밖에 없습니다. 던져봤으면, 기준이 결과물이 됩니다. 숫자와 결과물 중 어느 쪽이 정확할까요. 당연히 결과물이에요.

이렇게 해보시면 됩니다. A/B 테스트입니다.

당신이 자주 하는 작업 3종을 고르세요 (예: 코드 리뷰, 문서 요약, 이메일 답변).
각 작업 하나씩 두 모델(현재 버전 + 이전 버전)에 똑같이 던지세요.
결과를 나란히 놓고 비교하세요. 어느 쪽이 더 나은지.
결과를 표로 정리하세요.

10분이면 끝납니다. 이 10분이 앞으로 한 달의 선택을 자동화시켜요. "코드는 4.7, 요약은 4.6" 이런 식으로요.

실제 적용 — 두 모델 돌리는 법

구체적인 명령어를 드리겠습니다. Claude Code를 쓰시는 경우예요.

# 새 버전으로 시작
claude --model claude-opus-4-7

# 이전 버전으로 시작
claude --model claude-opus-4-6

# 세션 중간에 바꾸기
/model opus-4-6
/model opus-4-7

Claude.ai 웹에서도 설정 메뉴에 "Previous version" 옵션이 있습니다. 거기서 4.6을 고르시면 됩니다.

제 개인적인 현재 분배는 이렇습니다.

코드 / 수학 / 논리적 분석 → Opus 4.7
한국어 긴 글쓰기 / 섬세한 요약 / 톤 중요한 작업 → Opus 4.6

이 분배는 6개월 후엔 또 바뀔 거예요. 4.8이 나오면 다시 A/B 테스트 10분 돌리고, 분배를 업데이트합니다. 기준이 숫자가 아니라 과제니까요.

정리

오늘 하신 일을 정리해볼까요.

버전 숫자는 소프트웨어 시대의 언어입니다. AI는 그 언어로 정확히 설명되지 않습니다. 4.6에서 4.7로 가는 0.1은 덧셈이 아니라 재구성이에요. 어떤 영역은 올라가고, 어떤 영역은 내려갑니다. 이 원리를 Opus 4.6/4.7로 설명드렸지만, 앞으로 나올 어떤 AI에도 똑같이 적용됩니다. 숫자 이름이 Opus 7, GPT-9로 바뀌어도 이 재구성 성질은 반복됩니다.

질문 하나만 몸에 붙이세요 — "이 과제를 두 모델에 직접 던져봤는가?" 숫자를 믿지 마시고 결과물을 믿으세요. A/B 10분이 다음 한 달을 결정합니다.

기억하실 세 단어 — 숫자. 맥락. 확인. 숫자를 보지 마시고 맥락을 보시고, 보고 나서 반드시 확인하세요. 기술은 바뀝니다. 원리는 안 바뀝니다.

What the 0.1 Doesn't Say

Where did version numbers come from?

Example — the odd difference between 4.7 and 4.6

Analogy — new car models

So what's the real criterion?

Apply — running both models

Summary

버전 숫자는 어디서 왔나요

예시 — Opus 4.7과 4.6의 이상한 차이

비유 — 자동차 신모델

그럼 무엇을 기준으로 고를까요

실제 적용 — 두 모델 돌리는 법

정리

Read the full story

Edit Section

What the 0.1 Doesn't Say

Where did version numbers come from?

Example — the odd difference between 4.7 and 4.6

Analogy — new car models

So what's the real criterion?

Apply — running both models

Summary

버전 숫자는 어디서 왔나요

예시 — Opus 4.7과 4.6의 이상한 차이

비유 — 자동차 신모델

그럼 무엇을 기준으로 고를까요

실제 적용 — 두 모델 돌리는 법

정리

Related YouTube Videos

Read the full story

Edit Section