Context Engineering VIP 2026-06-04

Don't Run Every Role in One Window

When your AI suddenly drifts off topic, the model didn't get stupid. You probably gave it too many roles in one window. Let's unpack this principle through a simple meeting-room analogy.

After using an AI agent for a month, most people hit the same moment. You asked for a script, and it started writing code. You asked for a plan revision, and the tone from an earlier conversation bled in. The result gets weird. The model looks like it suddenly got stupid. It didn't. You had too many people in one meeting room.

This principle predates AI. Claude, GPT, any agent three years from now — same story. The product names change. The principle doesn't. Let's go slowly.

When Roles Mix, Memory Mixes

Picture a real office. A director, a writer, a designer, and a programmer in one room, all four agendas on the table at the same time. What happens? Code talk drifts toward the director. Thumbnail notes fly at the programmer. The designer is somehow giving script feedback. It's chaos.

Now look at how you actually run a company. When the marketing meeting ends, people walk out of that room. The dev meeting happens in a different room. The finance review gets its own time slot. Why? Because when roles differ, separating the space works better than keeping everyone together. Same humans, different rooms, cleaner heads.

Somehow that common sense disappears the moment we open an AI chat. One window — and in there we write the blog, fix the code, review the design, draft the email, plan the video. Everything piles into a single conversation. That pile — what engineers call the "context window" — is simply every piece of text the AI is holding in its head for this chat. Once it gets polluted, your variable names sneak into the blog intro and the email tone bleeds into the script.

I call this context bleed. Like ink soaking through paper, the earlier task seeps into the next one.

The Day My 100-File Session Collapsed

A real story. I had 100 portfolio images with broken external links. I told Claude: "Read them one by one and fix them. Keep going until you're done."

The first few worked. By item ten, things got odd. By thirty, old file content was showing up in new files. By fifty, the session died outright. "Prompt is too long." You've seen that error.

Here's the mechanism. When you use an MCP tool — think of it as a direct bridge that lets the AI drive an external app — every page you pull in lands, in full, inside the chat. Twenty calls means twenty full pages piled up. One chat window becomes a twenty-volume book. No human can stay sharp after reading twenty books at once. Neither can the model.

There's a second, nastier layer. When roles mix, judgment standards mix too. Mid-fix, I told Claude, "Oh, while you're at it, clean up the file names." The carefulness from the repair task bled into the rename task. Claude started asking me, "Is the original filename meaningful for this image?" for what should have been a plain lowercase conversion. The weight of the earlier role got layered onto the later one. Both jobs slowed down and got muddier.

Analogy — The Habit of Closing the Room

The fix is simple. When the role changes, open a new window. It's the same move as switching meeting rooms at the office.

A writing window. A coding window. A planning window. An email window. To a human it feels fussy. To an AI it's the natural work environment. A fresh window is like walking into an empty desk — no paperwork left behind by the last visitor. The thinking stays crisp from the first sentence.

But what about the context from the old window? That's the key move. You write a one-page handoff.

A Number to Pin It Down

I measured the same task two ways.

Method	Sessions	Avg tokens used	Quality
One window for all	1	~180,000 tokens	Fades in the back half
Split by role	3	~60,000 tokens (total)	Even throughout

Tokens are money for an AI. Roughly a word and a half of text per token, and a million tokens on Opus runs up to $75. Three times fewer tokens, and the output was actually better.

Splitting sessions isn't about saving money. It's about keeping your thinking sharp.

Apply It — One Question

Before you open your next AI window, ask yourself one thing.

"What role am I assigning this time?"

One role? Use the window. Two or more? Stop. You need to split.

Writing work → writing window
Code work → coding window
Decision work → judgment window
Research and cleanup → research window

Stuck? Look at the final output. What shape does the deliverable take? Prose means writing. An executable means code. A brief means research. Different shapes, different rooms.

A Real Example — The One-Page Handoff

The second a role is about to change, stop the session and ask this.

"Summarize everything we've done as a handoff document. The next agent should be able to read this and pick up immediately. Four sections: background, current state, next steps, watch-outs."

The AI produces one page. Copy it. Now open a fresh window. Empty. Start it like this.

"Read the handoff document below, then continue as [new role]. Assume this document contains the only context you have from the previous window."

Paste. Done. The new AI gets exactly what it needs, with none of the debris from the previous chat. A fresh employee walking into a clean desk.

Commands You Can Use Today

In Claude Code, open a fresh session from the terminal.

claude                  # start a brand-new session
/clear                  # wipe the memory of the current session only
/resume <session-id>    # deliberately continue a prior session

In Claude.ai web or ChatGPT, hit "New chat" in the top-left. The shortcut is usually Ctrl/Cmd + Shift + O. Three seconds of work.

Keep your handoffs in their own folder. I use handoffs/ with filenames like 2026-04-23_writer_to_coder.md — date plus who-to-who. Easy to trace back later.

One Objection — "What About Huge Context Windows?"

A question I get often: "Gemini handles a million tokens. Can't I just make the window bigger and mix freely?"

Sadly no. A larger room doesn't help when four people shout four agendas at once. The problem isn't context size. It's role count. The moment multiple roles enter, the model has to check, sentence by sentence, which role that instruction belongs to. That bookkeeping cost eats the benefit of a big window.

I still split windows even on a million-token model. Four small rooms finish the work faster than one big room.

Wrap-Up

So what did we cover.

The biggest enemy when working with AI isn't the model's intelligence. It's the habit of cramming every role into one window. When memory mixes, output mixes. You spend three times the tokens and get worse results.

The examples used Claude Code's language, but this applies to any AI. Gemini, GPT, any agent that ships three years from now — the same structure returns. Because it isn't really an AI problem. It's a work-division problem. Same reason offices separate their meeting rooms.

Change one habit starting today. Before you open a window, ask once: "What role am I assigning?" Two or more? Split the window. Every split gets a one-page handoff. Three years from now, when "Claude" is no longer Claude, this habit still works. The tech changes. The principle doesn't.

One role, one window, one handoff.

자, AI 에이전트를 한 달쯤 쓰시다 보면 비슷한 순간을 만나십니다. 분명 스크립트를 쓰라고 했는데, 갑자기 코드를 짜기 시작합니다. 분명 기획안을 다듬으라고 했는데, 이전 대화의 톤이 섞여서 이상한 결과가 나옵니다. 모델이 갑자기 바보가 된 것 같죠. 제가 오늘 말씀드릴 건 — 바보가 된 게 아니라, 한 방에 너무 많은 사람이 들어와 있었던 겁니다.

이 원리는 AI 이전부터 있던 상식입니다. Claude든, GPT든, 3년 후에 나올 어떤 모델이든 똑같이 적용됩니다. 제품 이름은 바뀌어도 원리는 안 바뀝니다. 천천히 가겠습니다.

역할이 섞이면 기억도 섞입니다

회사를 떠올려보시면 바로 이해가 되십니다. 감독, 작가, 디자이너, 프로그래머. 네 명이 한 방에 있고, 네 가지 안건을 동시에 얘기한다고 해보세요. 어떻게 되겠습니까. 감독한테 코드 얘기가 흘러가고, 프로그래머한테 썸네일 얘기가 튀어나오고, 디자이너는 대본 피드백을 듣고 있습니다. 개판이 됩니다.

그런데 이 회사에서 하시는 일을 한번 돌아보세요. 마케팅 회의 끝나면 회의실을 나오십니다. 개발 회의는 다른 방에서 하시죠. 재무 미팅은 또 다른 시간에 잡으십니다. 왜요. 역할이 다르면 공간을 분리하는 게 더 낫기 때문입니다. 같은 사람이 와서 얘기하더라도, 방이 바뀌면 머리가 리셋됩니다.

AI 앞에서만 이상하게 이 상식이 사라집니다. 한 채팅창을 열어놓고 — 거기서 블로그 쓰고, 코드 고치고, 디자인 피드백 받고, 이메일 답장 쓰고, 영상 기획까지 합니다. 이러면 AI의 기억 — 이걸 "컨텍스트 윈도우"라고 부릅니다. 쉽게 말하면 AI가 이 대화창에서 읽고 있는 모든 글의 뭉치예요 — 이 뭉치가 오염됩니다.

이전에 짠 코드의 변수명이 기획안에 섞여 나오고, 블로그 톤이 이메일로 튀어나옵니다. 이걸 저는 "컨텍스트 블리드"라고 부릅니다. 피가 번지듯 앞 대화가 다음 작업으로 스며드는 거죠.

예시 — 100개 파일 처리하다 터진 세션

제가 얼마 전에 진짜로 겪은 일입니다. 포트폴리오 자료를 정리하는데, 이미지 100개가 외부 링크 만료로 전부 깨져 있었습니다. Claude한테 이렇게 시켰어요. "이 파일들을 하나씩 읽고 고쳐줘. 끝날 때까지 계속."

처음 몇 개는 잘했습니다. 10개쯤 지나니까 이상해지기 시작했어요. 30개쯤에서는 예전 파일의 내용을 새 파일에 섞어 쓰기 시작했고요. 50개쯤 갔을 때 세션이 아예 터졌습니다. "Prompt is too long." 이 에러 본 적 있으시죠.

이유가 뭐냐면, MCP 도구 — 쉽게 말하면 AI가 외부 앱을 직접 조작하는 연결선입니다 — 로 페이지 하나를 가져오면, 그 페이지 전체 내용이 대화창에 쌓입니다. 20번 쓰면 20개 페이지가 다 쌓여요. 채팅창 하나가 20권짜리 책이 되는 셈입니다. 사람도 한 번에 20권 읽으면 앞쪽이 흐려지는 게 당연합니다.

여기서 더 중요한 점이 있습니다. 역할이 섞이면 판단 기준도 섞입니다. 이미지 복원 작업 중간에 "아, 김에 이 파일 이름도 좀 정리하자"고 시켰거든요. 그러니까 복원 작업의 신중함이 파일명 정리 쪽으로 새어 나왔습니다. 단순한 소문자 변환을 해야 하는 일에, AI가 과도하게 "이 파일은 원본 이름이 중요할까요?"라고 계속 묻기 시작했어요. 앞쪽 역할의 판단 무게가 뒤쪽 역할에 얹힌 거죠. 결국 두 일 모두 느려지고 탁해졌습니다.

비유 — 회의실 문을 닫는 습관

해결책은 단순합니다. 역할이 바뀌면, 반드시 새 창을 여세요. 회사 회의실을 바꾸는 것과 정확히 같은 행동입니다.

글쓰기 창, 코드 창, 기획 창, 이메일 창 — 용도별로 따로 엽니다. 사람한테는 번거롭게 느껴질 수 있지만, AI한테는 이게 자연스러운 근무 환경이에요. 새 창은 책상이 빈 상태에서 출근하는 것과 같습니다. 앞사람이 뒤에 두고 간 서류 더미가 없으니, 처음부터 선명하게 생각합니다.

그럼 이전 창에서 했던 내용은 어떻게 넘기냐고요. 이게 핵심입니다. 인수인계 문서 한 장을 쓰시면 됩니다.

구체 숫자 + 아하

제가 며칠 재봤습니다. 같은 작업, 두 방식.

방식	세션 수	평균 토큰 소모	결과 품질
한 창에서 전부	1개	약 18만 토큰	후반부 흐려짐
역할별 분리	3개	약 6만 토큰 (합계)	전 구간 일정

토큰은 AI한테 돈입니다. 대충 단어 한 개 반 정도의 글자 묶음이고, 100만 토큰에 Opus 기준 $75까지 갑니다. 3배 적게 쓰고, 결과는 더 좋았어요.

세션 분리는 돈을 아끼려는 게 아닙니다. 생각을 선명하게 유지하려는 겁니다.

적용 — 질문 하나

지금 AI 창을 여시기 전에, 한 가지만 스스로 물어보세요.

"이번에 맡길 일은 어떤 역할인가?"

답이 하나면 그 창에 들어가시면 됩니다. 답이 두 개 이상이면 — 멈추세요. 창을 나누셔야 합니다.

글을 쓰는 일인가 → 글쓰기 창
코드를 건드리는 일인가 → 코딩 창
결정만 내리는 일인가 → 판단 창
자료를 정리하는 일인가 → 리서치 창

헷갈리시면 작업의 출력물을 보세요. 최종 결과가 어떤 형태로 나와야 하는지. 글이면 글쓰기. 실행 파일이면 코드. 요약본이면 리서치. 출력물이 다르면 창도 다릅니다.

실제 예제 — 인수인계 문서 한 장

기존 창에서 작업하다가 역할이 바뀌는 순간, 멈추시고 이렇게 시키세요.

"지금까지 한 내용을 인수인계 문서 형식으로 정리해줘. 다음 사람이 읽고 바로 이어서 작업할 수 있도록. 배경, 현재 상태, 다음 단계, 주의사항 네 항목으로."

AI가 한 페이지짜리 정리를 뽑습니다. 이걸 복사해두세요. 그다음 새 창을 여십니다. 완전히 빈 창이에요. 거기서 이렇게 시작하시면 됩니다.

"아래 인수인계 문서를 읽고, [새 역할]로 작업을 이어가줘. 이전 창의 모든 맥락은 이 문서에만 있다고 가정해."

붙여넣기. 끝입니다. 새 AI는 딱 필요한 정보만 받고, 앞 대화의 찌꺼기 없이 일합니다. 책상이 비어 있는 상태에서 출근한 직원이에요.

구체 명령어 — 지금 쓸 수 있는 것

Claude Code를 쓰시는 분은 터미널에서 새 세션을 이렇게 여시면 됩니다.

claude                  # 완전히 새 세션 시작
/clear                  # 현재 세션의 기억만 리셋
/resume <session-id>    # 끊긴 세션을 의도적으로 재개

Claude.ai 웹이나 ChatGPT를 쓰시면 왼쪽 상단의 "새 대화" 버튼을 누르시면 됩니다. 단축키는 보통 Ctrl/Cmd + Shift + O입니다. 3초 걸립니다.

인수인계 문서는 따로 폴더를 하나 만드시면 편합니다. handoffs/ 이런 이름으로요. 저는 파일명을 2026-04-23_writer_to_coder.md 식으로 날짜 + 누가에서 누구로를 넣어둡니다. 나중에 되짚어보기 쉽습니다.

한 가지 오해 — "긴 컨텍스트 창이면 해결되지 않나요"

여기서 많이들 이렇게 물어보십니다. "Gemini는 1M 토큰까지 된다던데, 창을 크게 키우면 섞어도 괜찮지 않아요?"

아쉽지만 아닙니다. 방이 넓어도 네 사람이 동시에 네 주제로 떠들면 똑같이 혼란스럽습니다. 컨텍스트 크기가 아니라 역할의 개수가 문제거든요. 모델 입장에서는 여러 역할이 들어온 순간, 이 문장이 어느 역할의 지시였는지를 매번 확인해야 합니다. 그 확인 비용이 쌓이면, 긴 컨텍스트도 이득을 다 까먹습니다.

그래서 저는 1M 토큰 모델을 써도 여전히 창을 나눕니다. 큰 방 하나보다, 작은 방 네 개가 일이 더 빨리 끝납니다.

정리

오늘 하신 이야기를 정리해볼까요.

AI한테 일을 시킬 때 가장 큰 적은 모델의 성능이 아닙니다. 한 창에 너무 많은 역할을 넣는 습관입니다. 기억이 섞이면 결과도 섞입니다. 토큰도 3배 더 쓰고, 품질은 오히려 떨어집니다.

오늘 드린 원리는 Claude Code의 용어로 설명드렸지만, 사실 어떤 AI에도 똑같이 적용됩니다. Gemini, GPT, 앞으로 나올 어떤 에이전트든 이 구조는 반복됩니다. 왜냐하면 이건 AI의 문제가 아니라 일 나누기의 문제니까요. 회사에서 회의실을 분리하는 이유와 완전히 같습니다.

오늘부터 습관 하나만 바꿔보세요. 창을 여시기 전에 한 번만 물어보시는 겁니다. "이번에 맡길 일은 어떤 역할인가?" 답이 두 개 이상이면 창을 나누시고, 나누실 때마다 인수인계 한 장을 쓰시면 됩니다. 3년 후에 Claude가 Claude가 아니어도, 이 습관은 그대로 작동합니다. 기술은 바뀝니다. 원리는 안 바뀝니다.

역할 하나, 창 하나, 인수인계 하나.