Context Engineering VIP 2026-06-04

Don't Run Every Role in One Window

When your AI suddenly drifts off topic, the model didn't get stupid. You probably gave it too many roles in one window. Let's unpack this principle through a simple meeting-room analogy.

After using an AI agent for a month, most people hit the same moment. You asked for a script, and it started writing code. You asked for a plan revision, and the tone from an earlier conversation bled in. The result gets weird. The model looks like it suddenly got stupid. It didn't. You had too many people in one meeting room.

This principle predates AI. Claude, GPT, any agent three years from now — same story. The product names change. The principle doesn't. Let's go slowly.

When Roles Mix, Memory Mixes

Picture a real office. A director, a writer, a designer, and a programmer in one room, all four agendas on the table at the same time. What happens? Code talk drifts toward the director. Thumbnail notes fly at the programmer. The designer is somehow giving script feedback. It's chaos.

Now look at how you actually run a company. When the marketing meeting ends, people walk out of that room. The dev meeting happens in a different room. The finance review gets its own time slot. Why? Because when roles differ, separating the space works better than keeping everyone together. Same humans, different rooms, cleaner heads.

Somehow that common sense disappears the moment we open an AI chat. One window — and in there we write the blog, fix the code, review the design, draft the email, plan the video. Everything piles into a single conversation. That pile — what engineers call the "context window" — is simply every piece of text the AI is holding in its head for this chat. Once it gets polluted, your variable names sneak into the blog intro and the email tone bleeds into the script.

I call this context bleed. Like ink soaking through paper, the earlier task seeps into the next one.

The Day My 100-File Session Collapsed

A real story. I had 100 portfolio images with broken external links. I told Claude: "Read them one by one and fix them. Keep going until you're done."

The first few worked. By item ten, things got odd. By thirty, old file content was showing up in new files. By fifty, the session died outright. "Prompt is too long." You've seen that error.

Here's the mechanism. When you use an MCP tool — think of it as a direct bridge that lets the AI drive an external app — every page you pull in lands, in full, inside the chat. Twenty calls means twenty full pages piled up. One chat window becomes a twenty-volume book. No human can stay sharp after reading twenty books at once. Neither can the model.

There's a second, nastier layer. When roles mix, judgment standards mix too. Mid-fix, I told Claude, "Oh, while you're at it, clean up the file names." The carefulness from the repair task bled into the rename task. Claude started asking me, "Is the original filename meaningful for this image?" for what should have been a plain lowercase conversion. The weight of the earlier role got layered onto the later one. Both jobs slowed down and got muddier.

Analogy — The Habit of Closing the Room

The fix is simple. When the role changes, open a new window. It's the same move as switching meeting rooms at the office.

A writing window. A coding window. A planning window. An email window. To a human it feels fussy. To an AI it's the natural work environment. A fresh window is like walking into an empty desk — no paperwork left behind by the last visitor. The thinking stays crisp from the first sentence.

But what about the context from the old window? That's the key move. You write a one-page handoff.

A Number to Pin It Down

I measured the same task two ways.

Method Sessions Avg tokens used Quality
One window for all 1 ~180,000 tokens Fades in the back half
Split by role 3 ~60,000 tokens (total) Even throughout

Tokens are money for an AI. Roughly a word and a half of text per token, and a million tokens on Opus runs up to $75. Three times fewer tokens, and the output was actually better.

Splitting sessions isn't about saving money. It's about keeping your thinking sharp.

Apply It — One Question

Before you open your next AI window, ask yourself one thing.

"What role am I assigning this time?"

One role? Use the window. Two or more? Stop. You need to split.

  • Writing work → writing window
  • Code work → coding window
  • Decision work → judgment window
  • Research and cleanup → research window

Stuck? Look at the final output. What shape does the deliverable take? Prose means writing. An executable means code. A brief means research. Different shapes, different rooms.

A Real Example — The One-Page Handoff

The second a role is about to change, stop the session and ask this.

"Summarize everything we've done as a handoff document. The next agent should be able to read this and pick up immediately. Four sections: background, current state, next steps, watch-outs."

The AI produces one page. Copy it. Now open a fresh window. Empty. Start it like this.

"Read the handoff document below, then continue as [new role]. Assume this document contains the only context you have from the previous window."

Paste. Done. The new AI gets exactly what it needs, with none of the debris from the previous chat. A fresh employee walking into a clean desk.

Commands You Can Use Today

In Claude Code, open a fresh session from the terminal.

claude                  # start a brand-new session
/clear                  # wipe the memory of the current session only
/resume <session-id>    # deliberately continue a prior session

In Claude.ai web or ChatGPT, hit "New chat" in the top-left. The shortcut is usually Ctrl/Cmd + Shift + O. Three seconds of work.

Keep your handoffs in their own folder. I use handoffs/ with filenames like 2026-04-23_writer_to_coder.md — date plus who-to-who. Easy to trace back later.

One Objection — "What About Huge Context Windows?"

A question I get often: "Gemini handles a million tokens. Can't I just make the window bigger and mix freely?"

Sadly no. A larger room doesn't help when four people shout four agendas at once. The problem isn't context size. It's role count. The moment multiple roles enter, the model has to check, sentence by sentence, which role that instruction belongs to. That bookkeeping cost eats the benefit of a big window.

I still split windows even on a million-token model. Four small rooms finish the work faster than one big room.

Wrap-Up

So what did we cover.

The biggest enemy when working with AI isn't the model's intelligence. It's the habit of cramming every role into one window. When memory mixes, output mixes. You spend three times the tokens and get worse results.

The examples used Claude Code's language, but this applies to any AI. Gemini, GPT, any agent that ships three years from now — the same structure returns. Because it isn't really an AI problem. It's a work-division problem. Same reason offices separate their meeting rooms.

Change one habit starting today. Before you open a window, ask once: "What role am I assigning?" Two or more? Split the window. Every split gets a one-page handoff. Three years from now, when "Claude" is no longer Claude, this habit still works. The tech changes. The principle doesn't.

One role, one window, one handoff.

Edit Section