Tapping app icons was a temporary era. We are moving into a world where you say what you want, and the OS assembles the apps to get it done. Today we unpack that shift through Rabbit R1.
How many apps do you have on your phone right now? If you count them, it's probably 50, 100, maybe over 200 for some of you. Out of those, how many have you opened in the last week? Probably fewer than 10. The rest sit somewhere on your screen, and when you finally need one, you type its name into the search bar. Why did it end up like this? The answer is simple. The operating system can't understand human language.
This essay explains why that wall is collapsing. Even if you've never touched a Rabbit R1, you can follow along — we'll go slowly. Today's example is this small $199 device, but the principle applies equally to every personal computer, pair of glasses, and car dashboard yet to come. Five years from now, even if the company Rabbit disappears, the spine of this essay still holds.
Let's start with the principle. Break down what you do to get something done on a phone. "I want pizza" → find the delivery app → tap the icon → type the restaurant name → pick a menu → confirm address → pay.
Six steps. But what you actually wanted was one thing — "order pizza." Five of those steps are overhead created because the OS couldn't understand you.
This overhead isn't new. In the early computer days, you typed commands. The mouse came, and you clicked icons. Smartphones came, and you tapped with fingers. Interfaces got friendlier, yes. But the structure — user hunts for an app and drives it — stayed exactly the same. Humans adjusted to the computer's language. Why? Because the machine couldn't understand ours.
To make the principle concrete, let's look at Rabbit R1. An AI computer unveiled in January 2024 that fits in a pocket. Price $199, size roughly half a phone. Camera on top, push-to-talk button on the side, one analog scroll wheel. That's it. There's a screen, but almost no app icons.
The way you use it is simple. Press the button and speak. "Order me a pizza." Then the LAM (Large Action Model) inside goes to work. Here's what that means — existing LLMs are specialized for understanding words and answering in sentences. LAM takes one more step. It's a model that understands you, then operates existing apps on your behalf to finish the job. It opens the delivery app, picks a menu, and hits pay — all for you.
How is this possible? Rabbit's explanation is compelling. It's like a self-driving car. A self-driving car watches roads with a camera and learns driving movements. Rabbit's LAM watches people tapping and scrolling on screens and learns those movements. Demonstrate booking a room on Airbnb once, and from then on, "book me 3 nights in Jeju next month" is enough.
A natural question follows — "Isn't this just Siri? How is it different from voice recognition?" The answer lies in structural depth. Siri can only talk to a few predefined apps. LAM's goal is to operate every app on your behalf. If Siri is a "voice shortcut," LAM is "language becoming the OS itself." Whether it's Rabbit or another company, this 3-layer structure — intention, interface, interaction — will repeat. The principle demands it.
To picture this easily, think of a company. Once you reach a certain level, you get an assistant. You say "book my business trip" and the assistant opens several systems, reserves, pays, updates the calendar, and organizes receipts. You don't have to know the middle steps. You only say the desired outcome.
That's exactly what Rabbit R1 does. The OS becomes the assistant. The apps still exist, but you never tap them. You say what you want, and the OS runs the required apps in order.
Let's check concretely. I compared three everyday tasks.
| Task | Old way | Natural-language OS | Steps saved |
|---|---|---|---|
| Carrier discount | Find app → open → pull up barcode → scan | "Show my discount barcode" | 4 → 1 |
| Restaurant booking | App → search → pick time → headcount → pay | "Book that place Friday 7pm for 2" | 5 → 1 |
| Hail a taxi | App → destination → vehicle → pay | "Take me home now" | 4 → 1 |
On average 4-5 steps collapse into 1. As of 2025, one Rabbit device may not be perfect. But this structure is close to the correct answer. Apple is moving the same direction with Vision Pro and spatial computing — once a device sits on your head, mouse and keyboard aren't options. Nothing but natural language works.
Here comes the first aha moment.
Apps don't disappear. The labor of opening them disappears.
That's the core. The existing app ecosystem isn't collapsing. Instead, a new layer is forming above it. That layer is called the "natural-language OS." It takes what you want and operates the necessary apps on your behalf.
To get a feel for where this matters, ask one question — "How many apps do I have to open to finish this task?" If the answer is 3 or more, that zone belongs to the natural-language OS. Planning a trip, sorting receipts, coordinating family schedules, writing a weekly report. All of them require several apps. These tasks get automated first.
Let's try it concretely. Task: prepare for a Friday night date.
Search restaurant on maps → book on a reservation app → order a bouquet on a flower app → reserve a taxi home. At least 20 minutes, 4 apps, 3 payments.
"Book that restaurant Friday 7pm for 2, deliver a rose bouquet to my office at 6pm, and call a taxi home at 11pm."
The OS handles it. 1 minute. The apps all run in the background, out of sight. Back to the assistant analogy, you speak once and walk off for coffee. The actual work happens at the assistant's desk; you only confirm the outcome.
Let's sum up what you learned today.
The era of tapping app icons was temporary. It was a short-lived structure built because machines couldn't understand us. Now the OS is starting to understand natural language. Today I used Rabbit R1, but the same principle will apply to every AR glass, spatial computer, car dashboard, and smart speaker yet to come. Specific names will change; the natural-language OS structure will repeat.
Plant one question in your body — "How many apps do I need to open to finish this?" That single question shows which parts of your daily life will be eaten by the natural-language OS. Three or more apps, and that zone will soon vanish.
Don't spend time arranging app icons beautifully. Five years from now the home screen will look different altogether. What's more important is practicing the habit of saying what you want in one sentence. Tools change. Principles don't.
Speak it. OS runs it. Done.