Future VIP 2026-05-13

The OS Now Speaks Your Language

Tapping app icons was a temporary era. We are moving into a world where you say what you want, and the OS assembles the apps to get it done. Today we unpack that shift through Rabbit R1.

How many apps do you have on your phone right now? If you count them, it's probably 50, 100, maybe over 200 for some of you. Out of those, how many have you opened in the last week? Probably fewer than 10. The rest sit somewhere on your screen, and when you finally need one, you type its name into the search bar. Why did it end up like this? The answer is simple. The operating system can't understand human language.

This essay explains why that wall is collapsing. Even if you've never touched a Rabbit R1, you can follow along — we'll go slowly. Today's example is this small $199 device, but the principle applies equally to every personal computer, pair of glasses, and car dashboard yet to come. Five years from now, even if the company Rabbit disappears, the spine of this essay still holds.

App icons were temporary

Let's start with the principle. Break down what you do to get something done on a phone. "I want pizza" → find the delivery app → tap the icon → type the restaurant name → pick a menu → confirm address → pay.

Six steps. But what you actually wanted was one thing — "order pizza." Five of those steps are overhead created because the OS couldn't understand you.

This overhead isn't new. In the early computer days, you typed commands. The mouse came, and you clicked icons. Smartphones came, and you tapped with fingers. Interfaces got friendlier, yes. But the structure — user hunts for an app and drives it — stayed exactly the same. Humans adjusted to the computer's language. Why? Because the machine couldn't understand ours.

Example — Rabbit R1's LAM

To make the principle concrete, let's look at Rabbit R1. An AI computer unveiled in January 2024 that fits in a pocket. Price $199, size roughly half a phone. Camera on top, push-to-talk button on the side, one analog scroll wheel. That's it. There's a screen, but almost no app icons.

The way you use it is simple. Press the button and speak. "Order me a pizza." Then the LAM (Large Action Model) inside goes to work. Here's what that means — existing LLMs are specialized for understanding words and answering in sentences. LAM takes one more step. It's a model that understands you, then operates existing apps on your behalf to finish the job. It opens the delivery app, picks a menu, and hits pay — all for you.

How is this possible? Rabbit's explanation is compelling. It's like a self-driving car. A self-driving car watches roads with a camera and learns driving movements. Rabbit's LAM watches people tapping and scrolling on screens and learns those movements. Demonstrate booking a room on Airbnb once, and from then on, "book me 3 nights in Jeju next month" is enough.

A natural question follows — "Isn't this just Siri? How is it different from voice recognition?" The answer lies in structural depth. Siri can only talk to a few predefined apps. LAM's goal is to operate every app on your behalf. If Siri is a "voice shortcut," LAM is "language becoming the OS itself." Whether it's Rabbit or another company, this 3-layer structure — intention, interface, interaction — will repeat. The principle demands it.

Analogy — the company assistant

To picture this easily, think of a company. Once you reach a certain level, you get an assistant. You say "book my business trip" and the assistant opens several systems, reserves, pays, updates the calendar, and organizes receipts. You don't have to know the middle steps. You only say the desired outcome.

That's exactly what Rabbit R1 does. The OS becomes the assistant. The apps still exist, but you never tap them. You say what you want, and the OS runs the required apps in order.

How much does it shrink — in numbers

Let's check concretely. I compared three everyday tasks.

Task	Old way	Natural-language OS	Steps saved
Carrier discount	Find app → open → pull up barcode → scan	"Show my discount barcode"	4 → 1
Restaurant booking	App → search → pick time → headcount → pay	"Book that place Friday 7pm for 2"	5 → 1
Hail a taxi	App → destination → vehicle → pay	"Take me home now"	4 → 1

On average 4-5 steps collapse into 1. As of 2025, one Rabbit device may not be perfect. But this structure is close to the correct answer. Apple is moving the same direction with Vision Pro and spatial computing — once a device sits on your head, mouse and keyboard aren't options. Nothing but natural language works.

Here comes the first aha moment.

Apps don't disappear. The labor of opening them disappears.

That's the core. The existing app ecosystem isn't collapsing. Instead, a new layer is forming above it. That layer is called the "natural-language OS." It takes what you want and operates the necessary apps on your behalf.

How do we split the work — one question

To get a feel for where this matters, ask one question — "How many apps do I have to open to finish this task?" If the answer is 3 or more, that zone belongs to the natural-language OS. Planning a trip, sorting receipts, coordinating family schedules, writing a weekly report. All of them require several apps. These tasks get automated first.

Real example — Friday night date

Let's try it concretely. Task: prepare for a Friday night date.

Old way — open 4 apps

Search restaurant on maps → book on a reservation app → order a bouquet on a flower app → reserve a taxi home. At least 20 minutes, 4 apps, 3 payments.

New way — one sentence

"Book that restaurant Friday 7pm for 2, deliver a rose bouquet to my office at 6pm, and call a taxi home at 11pm."

The OS handles it. 1 minute. The apps all run in the background, out of sight. Back to the assistant analogy, you speak once and walk off for coffee. The actual work happens at the assistant's desk; you only confirm the outcome.

Summary

Let's sum up what you learned today.

The era of tapping app icons was temporary. It was a short-lived structure built because machines couldn't understand us. Now the OS is starting to understand natural language. Today I used Rabbit R1, but the same principle will apply to every AR glass, spatial computer, car dashboard, and smart speaker yet to come. Specific names will change; the natural-language OS structure will repeat.

Plant one question in your body — "How many apps do I need to open to finish this?" That single question shows which parts of your daily life will be eaten by the natural-language OS. Three or more apps, and that zone will soon vanish.

Don't spend time arranging app icons beautifully. Five years from now the home screen will look different altogether. What's more important is practicing the habit of saying what you want in one sentence. Tools change. Principles don't.

Speak it. OS runs it. Done.

자, 스마트폰에 앱이 몇 개 깔려 있으신가요. 한 번 세어보시면 50개, 100개, 많으면 200개 넘는 분들도 계실 겁니다. 그중에 일주일 안에 한 번이라도 여신 앱이 몇 개나 되시나요. 대부분 10개 안쪽일 겁니다. 나머지는 화면 어딘가에 박혀서, 쓸 때가 되면 검색창에 이름을 쳐서 찾아야 합니다. 왜 이렇게 됐을까요. 답은 간단합니다. 운영체제가 사람의 말을 못 알아들어서입니다.

이 글에서는 그 벽이 왜 무너지고 있는지 설명드립니다. Rabbit R1을 한 번도 안 써보신 분도 끝까지 따라오실 수 있게 천천히 가겠습니다. 오늘의 예시는 199달러짜리 이 작은 장치지만, 원리 자체는 앞으로 나올 모든 개인 컴퓨터, 안경, 자동차 대시보드에도 똑같이 적용됩니다. 5년 뒤 Rabbit이라는 회사가 사라져도 이 글의 뼈대는 유효할 겁니다.

앱 아이콘은 임시였습니다

우선 원리부터 이해하고 넘어가시면 좋습니다. 우리가 스마트폰에서 무언가를 하는 과정을 풀어보면 이렇습니다. 피자를 주문하고 싶다 → 배달앱을 찾는다 → 아이콘을 탭한다 → 검색창에 피자집을 친다 → 메뉴를 고른다 → 주소를 확인한다 → 결제한다.

여섯 단계입니다. 그런데 우리가 원래 하고 싶었던 건 "피자 주문" 한 가지뿐이에요. 다섯 단계는 OS가 내 말을 못 알아들어서 생긴 중간 비용입니다.

이 중간 비용이 어제 오늘 이야기가 아닙니다. 컴퓨터 초창기엔 명령어를 타이핑했습니다. 그러다 마우스가 생기면서 아이콘을 클릭하게 됐고, 스마트폰에 와서는 손가락으로 탭하게 됐죠. 인터페이스가 편해진 건 맞지만, "사용자가 앱을 찾아서 작동시킨다"는 구조 자체는 똑같았습니다. 사람이 컴퓨터의 언어에 맞춰준 거예요. 왜일까요. 기계가 사람 말을 못 알아들었기 때문입니다.

예시 — Rabbit R1의 LAM

원리를 이해하기 위해 Rabbit R1을 예로 들어보겠습니다. 2024년 1월에 공개된 주머니에 들어가는 AI 컴퓨터입니다. 가격은 199달러, 사이즈는 스마트폰의 절반 정도. 위쪽에 카메라, 옆쪽에 푸시-투-토크 버튼, 아날로그 스크롤 휠 하나. 그게 전부입니다. 화면은 있지만 앱 아이콘은 거의 없습니다.

사용법이 아주 단순합니다. 버튼을 누르고 말합니다. "피자 한 판 주문해줘." 그러면 안에 있는 LAM(Large Action Model)이 움직입니다. 이게 뭐냐면 — 기존의 LLM(언어 모델)은 말을 이해하고 문장으로 대답하는 데 특화돼 있습니다. LAM은 거기서 한 걸음 더 갑니다. 말을 이해한 다음, 기존 앱들을 대신 작동시켜서 일을 끝내는 모델이에요. 배달앱을 열고, 메뉴를 고르고, 결제 버튼을 누르는 일까지 전부 대신 해줍니다.

어떻게 이게 가능할까요. Rabbit의 설명이 인상적입니다. 자율주행 자동차와 비슷합니다. 자율주행차는 카메라로 도로를 보고 운전 동작을 학습합니다. Rabbit의 LAM은 사람이 화면을 탭하고 스크롤하는 동작을 보고 학습합니다. 에어비앤비에서 방을 예약하는 과정을 한 번 보여주면, 그 뒤부터는 "다음달 제주도 3박 예약해" 한 마디로 끝낼 수 있다는 발상이죠.

자연스럽게 따라오는 질문이 있습니다. "이거 그냥 Siri 아닌가? 음성 인식이랑 뭐가 다르지?" 답은 구조의 깊이에 있습니다. Siri는 몇 개 정해진 앱에만 말을 걸 수 있습니다. LAM은 모든 앱을 대신 조작하는 걸 목표로 합니다. Siri가 "음성 단축키"라면 LAM은 **"말이 OS 자체가 되는 것"**입니다. 이름이 Rabbit이든 다른 회사든, 이 3-단계(인텐션-인터페이스-인터랙션) 구조는 앞으로 반복됩니다. 원리가 그렇게 요구하니까요.

비유 — 회사의 비서

쉽게 이해하시려면 회사를 떠올려보세요. 팀장급 이상이 되시면 비서나 조수가 있습니다. 비서한테 "출장 다녀올 티켓 끊어줘"라고 한 마디 하시면, 비서가 대신 여러 시스템을 열고, 예약하고, 결제하고, 캘린더에 넣고, 영수증까지 정리합니다. 팀장님은 그 중간 과정을 몰라도 됩니다. "원하는 결과"만 말하면 되는 것입니다.

Rabbit R1이 하는 일이 정확히 이거예요. OS가 비서가 되는 것입니다. 앱들은 그대로 있지만, 사용자가 직접 앱을 탭할 일이 사라집니다. 내가 원하는 결과를 말하면, OS가 알아서 필요한 앱들을 순서대로 돌려줍니다.

얼마나 줄어드나요 — 숫자로

구체적으로 확인해보겠습니다. 일상에서 자주 하는 세 가지 일을 비교해봤습니다.

작업	기존 방식	자연어 OS 방식	줄어든 단계
통신사 할인 받기	앱 찾기 → 열기 → 바코드 보이기 → 찍기	"할인 바코드 띄워줘"	4 → 1단계
맛집 예약	예약앱 → 검색 → 시간 선택 → 인원 → 결제	"금요일 저녁 7시 그 식당 2명 예약"	5 → 1단계
택시 호출	앱 → 목적지 → 차량 선택 → 결제	"지금 집 가자"	4 → 1단계

평균 4-5단계가 1단계로 줄어듭니다. 2025년 시점에서 Rabbit 하나의 성능이 완벽하지 않을 수 있습니다. 하지만 이 구조가 정답에 가깝다는 건 분명합니다. 애플도 Vision Pro 같은 공간 컴퓨팅으로 가면서 같은 방향을 봅니다. 머리에 쓴 상태에서 마우스와 키보드를 못 쓰니까요. 자연어 외에는 답이 없어요.

여기서 첫 번째 아하 모멘트가 옵니다.

앱은 없어지지 않습니다. 앱을 여는 수고만 없어집니다.

이게 핵심입니다. 기존 앱 생태계가 무너지는 게 아닙니다. 오히려 그 위에 한 층이 더 생기는 거예요. 그 층의 이름이 "자연어 OS"입니다. 그 층이 내가 하고 싶은 일을 받아서, 필요한 앱들을 대신 조작합니다.

일을 어떻게 나눌까요 — 질문 하나

그럼 실제로 이게 어디에 쓰일지 감을 잡으시려면, 질문 하나만 해보시면 됩니다 — "이 일을 하려면 지금 몇 개의 앱을 열어야 하는가?" 답이 3개 이상이면, 그건 자연어 OS가 먹고 들어가는 영역입니다. 여행 일정 짜기, 영수증 정리, 가족 스케줄 조율, 주간 보고서 쓰기. 모두 앱을 여러 개 건드려야 하는 일들이죠. 이런 일들이 먼저 자동화됩니다.

실제 예시 — 금요일 저녁 데이트

구체적으로 한 번 해보시겠습니다. 과제: 금요일 저녁 데이트를 준비해야 합니다.

옛날 방식 — 앱 4개 열기

지도 앱에서 식당 검색 → 예약 앱으로 예약 → 꽃 배달 앱에서 꽃다발 주문 → 택시 앱으로 귀가 예약. 최소 20분, 앱 4개, 결제 3번.

새로운 방식 — 한 문장

"금요일 저녁 7시에 그 식당 2명 예약하고, 장미 한 다발 6시에 내 사무실로 배달, 밤 11시에 택시 집으로 불러줘."

OS가 알아서 처리합니다. 1분. 앱은 모두 뒤에서 돌아가지만, 내 눈앞엔 안 보입니다. 비서 비유로 돌아가시면, 팀장님이 비서한테 한 마디 하고 커피 드시러 가는 것과 같습니다. 실제 작업은 비서의 책상에서 진행되고, 결과만 확인하면 됩니다.

정리

오늘 하신 일을 정리해볼까요.

앱 아이콘을 탭하던 시대는 임시였습니다. 기계가 사람 말을 못 알아들어서 잠깐 만들어진 구조였죠. 이제 OS가 자연어를 이해하기 시작합니다. 오늘은 Rabbit R1으로 이 전환을 설명드렸지만, 사실 앞으로 나올 모든 AR 안경, 공간 컴퓨터, 자동차 대시보드, 스마트 스피커에도 똑같이 적용됩니다. 구체적 이름은 바뀌어도 자연어 OS 구조는 반복됩니다.

질문 하나만 몸에 붙이세요 — "이 일을 하려면 지금 몇 개의 앱을 열어야 하는가?" 이 질문 하나가 당신의 일상에서 어느 부분이 자연어 OS에 먹힐지 알려줍니다. 3개 이상이면, 거기가 곧 사라집니다.

앱 아이콘을 예쁘게 정리하시느라 시간 쓰지 마세요. 5년 뒤엔 홈 화면 자체가 다르게 생겼을 겁니다. 오히려 내가 하고 싶은 일을 한 문장으로 말하는 연습이 더 중요합니다. 말이 짧으면 실행도 짧아지고, 말이 명확하면 결과도 명확해지니까요. 도구는 바뀝니다. 원리는 안 바뀝니다.

말하면. OS가. 실행한다.

The OS Now Speaks Your Language

App icons were temporary

Example — Rabbit R1's LAM

Analogy — the company assistant

How much does it shrink — in numbers

How do we split the work — one question

Real example — Friday night date

Old way — open 4 apps

New way — one sentence

Summary

앱 아이콘은 임시였습니다

예시 — Rabbit R1의 LAM

비유 — 회사의 비서

얼마나 줄어드나요 — 숫자로

일을 어떻게 나눌까요 — 질문 하나

실제 예시 — 금요일 저녁 데이트

옛날 방식 — 앱 4개 열기

새로운 방식 — 한 문장

정리

Read the full story

Edit Section

The OS Now Speaks Your Language

App icons were temporary

Example — Rabbit R1's LAM

Analogy — the company assistant

How much does it shrink — in numbers

How do we split the work — one question

Real example — Friday night date

Old way — open 4 apps

New way — one sentence

Summary

앱 아이콘은 임시였습니다

예시 — Rabbit R1의 LAM

비유 — 회사의 비서

얼마나 줄어드나요 — 숫자로

일을 어떻게 나눌까요 — 질문 하나

실제 예시 — 금요일 저녁 데이트

옛날 방식 — 앱 4개 열기

새로운 방식 — 한 문장

정리

Related YouTube Videos

Read the full story

Edit Section