Claude Code was leaked... I Read All 500,000 Lines of Claude Code. It Dreams About Your Codebase While You Sleep.

What half a million lines of bundled TypeScript reveal about where AI coding agents are actually going

Apr 01, 2026

Claude Code for Web Design. Claude Code is one of the hottest… | by Nick Babich | Mar, 2026 | UX Planet

I go into quite a bit of depth so buckle up, if you’re into Claude Code and AI you won’t want to miss any detail in this one!

Speculative execution, multi-agent swarms, a YOLO classifier, and the autonomous AI system Anthropic hasn’t shipped yet.

Anthropic ships Claude Code as an npm package, @anthropic-ai/claude-code. You install it, you run it, it’s on your machine. And like any npm package, it’s just JavaScript sitting in your node_modules folder, waiting to be read. Someone unpacked the bundled TypeScript source, and what fell out isn’t just a CLI tool. It’s the skeleton of an autonomous, self-improving AI system, most of which is sitting behind feature flags on your machine right now.

This is the same kind of analysis people have done with VS Code, Slack, Discord, basically any Electron or Node-based tool that ships as a bundle. You download publicly distributed software, you look at what’s inside. There’s nothing adversarial about it, it’s just reading code you already have on your disk.

So let’s pick this apart.

The codename is Tengu

Every analytics event in Claude Code is prefixed with tengu_. Every feature flag starts with tengu_. The internal codename for this project is Tengu, a creature from Japanese folklore known for being skilled and mischievous. Kind of fitting for an AI coding agent.

This isn’t just a cute naming choice, it tells you something about how Anthropic thinks about the product internally. Tengu is the identity of this thing across their infrastructure. When their engineers look at dashboards, they’re looking at tengu metrics. When they gate a feature, it’s behind a tengu flag. It’s the real name.

Claude Code literally dreams

This is the wildest one. Every 24 hours, if 5 or more sessions have accumulated since the last consolidation, Claude Code spawns a background forked agent that reviews your past sessions and consolidates what it learned. The code calls this “dreaming.”

It has a triple gate system (time gate, session gate, filesystem lock) and runs a 4-stage consolidation prompt: orient, gather, consolidate, prune. If the dream fails, the lock file rewinds so the next session retries. The dream agent’s edits are visible in a background tasks dialog (Shift+Down).

This is combined with an EXTRACT_MEMORIES system that runs at the end of each query loop as a fire-and-forget forked agent, pulling durable insights out of your conversation and writing them to ~/.claude/projects/<path>/memory/. It pre-injects a manifest of existing memories so the agent doesn’t rediscover what it already knows. The memory extraction agent is restricted to read-only bash, file reading, grep, glob, and can only write within the memory directory.

The /remember skill sits on top of this as a manual override. It reviews auto-extracted memories and proposes promotions across four layers: CLAUDE.md (project conventions for all contributors), CLAUDE.local.md (personal instructions for this user), team memory (org-wide knowledge), or keeping them in auto-memory as working notes. It identifies duplicates, outdated entries, and conflicts between layers, noting which is more recent. It presents all proposals before making changes and won’t modify files without explicit approval.

This is a genuine learning loop. Sessions produce memories, dreams consolidate memories, consolidated memories inform future sessions. The system improves from its own experience across sessions without any human feedback loop. That’s machine learning in the most classical sense, just happening at the application layer instead of the weights.

Speculative execution, or branch prediction for AI coding

Claude Code can pre-execute your next likely action while you’re still typing. After each response, a prompt suggestion runs speculatively in the background using a copy-on-write overlay filesystem (temp directory under ~/.claude/temp/speculation/<pid>/<id>). It stops at safety boundaries like file writes (unless in acceptEdits mode), non-read-only bash, or network tools.

When speculation completes, it automatically pipelines the next suggestion. If you accept, the clean messages are injected into your conversation. It tracks timeSavedMs per speculation and caps at 20 turns or 100 messages. The speculation uses identical cache parameters to the main conversation to hit prompt cache.

This is branch prediction for AI coding. The system is guessing what you’ll ask next and pre-computing the answer so it’s ready the instant you confirm. CPU architects have been doing this for decades, Anthropic is doing it at the agent layer.

Skillify lets Claude Code program itself

The /skillify command captures what you just did in a session and turns it into a reusable skill. It runs a 4-round interview: first it analyzes the session to identify what was performed, what tools were used, and where you corrected the AI. Second round names the skill, determines if it should run inline or as a forked subagent, and asks where to save it (repo-specific .claude/skills/ or personal ~/.claude/skills/). Third round breaks each step into success criteria, execution type, artifacts, human checkpoints, and rules. Fourth round confirms trigger phrases and gotchas.

The generated SKILL.md includes a when_to_use field that enables auto-invocation, so the skill fires automatically when trigger phrases appear in future prompts. Skills can create other skills. This is an AI that watches what it did, extracts the pattern, packages it, and uses it next time without being told.

Magic Docs maintain themselves

Files with a # MAGIC DOC: [title] header are automatically kept in sync with the codebase. When Claude Code reads a magic doc, it registers the file for background updates. After each assistant response, if no tool calls are active, a forked agent spawns to update all tracked magic docs using instructions from italicized text after the header (e.g., *Keep this focused on API changes*).

The agent can only use the file edit tool on that specific file. If you delete the file or remove the header, it auto-unregisters. Documentation that maintains itself. Every engineering team I’ve ever worked on has had docs that rot within weeks of being written. This is a direct answer to that problem.

Auto-mode is literally called “YOLO”

The permission system that lets Claude Code auto-approve tool calls without asking you every time? It’s internally called the “YOLO Classifier.” I’m not making this up, the file is yoloClassifier.ts and the dangerous pattern detection system is explicitly labeled “YOLO mode prevention.”

It works in three layers. First, a dangerous pattern detector strips overly broad permission rules like Bash(*) or python:* before the classifier even sees them. If you try to give Claude Code blanket permission to run any bash command, the system catches it and removes it. The LLM classifier then makes instant allow/deny decisions on whatever’s left.

The Anthropic-internal builds have extra dangerous patterns beyond what external users see. Things like kubectl, aws, gh api, curl, git, and coo, which appears to be their internal cluster launcher. So even Anthropic’s own engineers have guardrails preventing Claude Code from running wild on their infrastructure. The tool that builds AI has its own AI safety layer, which is, indeed, exactly what you’d want.

The verification agent is adversarial by design

There’s a built-in verification agent whose job description literally says “your job is NOT to confirm it works, it’s to try to break it.” It refuses to accept reading code and narrating tests as verification. Every single PASS step must have command output, and if a re-run doesn’t match, the report is rejected.

It has type-specific attack strategies: for frontend it launches dev servers and uses browser automation, for APIs it curls endpoints and tests error handling, for database migrations it runs up then down to test reversibility, for refactors it demands existing tests pass unchanged. It runs adversarial probes: concurrent create-if-not-exists to find duplicates, boundary values (0, -1, empty, MAX_INT, unicode), idempotency checks (same mutating request twice), orphan operations (reference non-existent IDs).

Final output must be VERDICT: PASS, VERDICT: FAIL, or VERDICT: PARTIAL. This is AI that’s been explicitly instructed to distrust other AI’s work.

Fork children get yelled at in XML

When Claude Code spawns a forked worker process, the child receives a directive wrapped in an XML tag that literally reads STOP. READ THIS FIRST. The directive tells the fork: “You are a forked worker process. You are NOT the main agent.” It explicitly forbids the child from spawning sub-agents, asking questions, or suggesting next steps. Output has to follow a structured format: Scope, Result, Key files, Files changed, Issues.

The engineering behind this is smart. All fork children are built to produce byte-identical API request prefixes up to the final directive, which maximizes prompt cache hits across parallel workers. The placeholder result text is the same for every child: “Fork started, processing in background.” Only the actual task directive differs. Anthropic pays for the prompt cache once and all parallel workers benefit from it.

The multi-agent swarm runs in your terminal

The swarm system has backends for tmux and iTerm2 as pane-based executors. Each agent runs in its own terminal pane, coordinated by the orchestrator. The coordinator gets a 370+ line system prompt teaching it how to decompose work, spawn workers, synthesize results, and do verification rather than just delegating and trusting.

The coordinator prompt contains a genuinely interesting constraint about multi-agent orchestration. Workers can’t see the coordinator’s conversation with the user, so every prompt must be self-contained. The prompt explicitly states: “you must understand findings before directing follow-up; never write ‘based on your findings.’” The coordinator must synthesize worker outputs into its own understanding before issuing new instructions. It can’t be a dumb router, it has to actually comprehend what workers discovered. The prompt teaches “choose continue vs. spawn”: continue a worker if its context overlaps with the next task, spawn fresh if the task is narrow, and always use fresh eyes for verification to avoid anchoring on failed approaches.

Workers communicate back via XML-wrapped <task-notification> messages. The coordinator can spawn them, stop them, or continue conversations with them via a SendMessage tool. The SendMessage tool supports multiple address schemes: teammate name for local communication, * for broadcast to all teammates, uds:<socket-path> for Unix Domain Socket to local peer processes, and bridge:<session-id> for cross-machine communication via Anthropic’s servers. Messages persist to ~/.claude/projects/<path>/.team/<teammate>/mailbox/. Cross-machine sends require explicit user consent.

Claude Code controls how hard the model thinks

There’s an effort system with four levels: low, medium, high, max. Only Opus 4.6 supports max effort. The mapping from numeric values: 0-50 is low, 51-85 is medium, 86-100 is high, 101+ is max. Default for Pro subscribers on Opus 4.6 is medium. Max effort doesn’t persist to settings.json for non-Anthropic users, it’s session-only.

There’s also a separate “ultrathink” keyword system where typing the word ultrathink in your prompt triggers enhanced reasoning, and the UI highlights it with rainbow color cycling via getRainbowColor(charIndex % RAINBOW_COLORS.length). The thinking system itself supports three modes: adaptive (no budget, uses native thinking), enabled with explicit token budget, and disabled. Budget is hard-capped at max_tokens - 1 per API constraint.

Quick reads get low effort, architectural decisions get max effort with ultrathink. This granularity means AI assistants can be always-on without being always-expensive, running in low-effort background mode until something important happens, then ramping up.

Prompt cache preservation is the architectural foundation

Everything in Claude Code is built around prompt cache optimization. There’s a CacheSafeParams type that carries the exact parameters needed to guarantee cache hits in forked agents: system prompt, user context, system context, tool use context, and fork context messages. If any of these change, the cache misses. A code comment warns: “Thinking config is part of the cache key.”

The entire forked agent system, dreams, memory extraction, magic docs, speculation, all of it is designed so that parallel workers share the parent’s prompt cache prefix. This is why fork children produce byte-identical API request prefixes. The prompt cache break detection system (promptCacheBreakDetection.ts) monitors cache effectiveness in real-time, flags drops greater than 5%, performs root cause analysis (model change? tool schema change? system prompt change?), and writes unified diffs to ~/.claude/temp/cache-break-*.diff for debugging.

They’re treating prompt cache hit rate as a primary engineering metric. When you’re running parallel agents that each consume millions of tokens, the difference between 90% and 50% cache hit rates is the difference between viable and bankrupt. This is the economic constraint shaping how AI agents are actually built.

The unreleased features behind flags

There are dozens of feature flags pointing to capabilities that aren’t publicly available yet.

ULTRAPLAN is a multi-agent planning system that runs via Claude Code Remote (CCR) on Anthropic’s web infrastructure. It has a 30-minute timeout, uses Opus 4.6 by default, and supports a “teleport” flow where the plan comes back to your local CLI for execution. Plan mode itself is being A/B tested with four experimental arms: Control (full context, recommended approach, critical file paths, verification steps), Trim (reduced prose), Cut (further reduced), and Cap (40-line hard limit). They’re scientifically testing how much planning detail actually helps versus hurts.

PROACTIVE is an autonomous agent mode where Claude can take initiative without waiting for prompts. KAIROS is a whole family of flags (KAIROS_BRIEF, KAIROS_CHANNELS, KAIROS_GITHUB_WEBHOOKS) pointing to a persistent assistant mode with channel support and GitHub webhook integration. Claude Code evolving from a CLI tool into something more like a persistent team member that watches your repos.

Teleport handles session transfer between environments. Grove mode is policy-based interaction. There’s a Backseat observer mode with a coaching classifier. And features called Undercover mode and Scratch mode that are referenced but whose purposes aren’t clear from the code alone.

The obfuscated flag names

Anthropic uses a naming convention for their feature flags that’s clearly designed to be opaque to outsiders. Instead of descriptive names like enable_multi_agent or new_voice_model, they use nature and object themed codenames. Things like amber_flint, slate_heron, copper_bridge, coral_fern, timber_lark, surreal_dali, birch_trellis, bramble_lintel.

Smart practice. If your feature flags are in client-side code, which they are via GrowthBook SDK, giving them descriptive names would basically publish your product roadmap. You can see tengu_cobalt_frost gates something, and from context clues it appears to be a Deepgram Nova 3 voice model, but that mapping requires reading the surrounding code carefully.

Token budgets reveal how the system thinks

The source code exposes the exact resource limits Claude Code operates within, and these numbers tell you a lot about the engineering tradeoffs.

Each tool result gets capped at 50,000 characters. Max tool result in tokens is 100,000, roughly 400KB. When you run multiple tools in parallel, the aggregate caps at 200,000 characters per message. The default max output tokens is 8,000, and the code has a comment explaining why, it’s a “slot-reservation optimization” because their p99 output is actually 4,911 tokens. They’re reserving just enough headroom above the 99th percentile to avoid wasting inference capacity. If a response overflows that budget, the system escalates once to 64,000 tokens and retries.

These aren’t arbitrary numbers. They’re the result of production data analysis, the engineering team measuring actual usage patterns and tuning accordingly.

Claude Code also uses cache_edit blocks in the Anthropic API to delete old tool results from the prompt cache without invalidating the cache hash. They can trim stale content (bash outputs, file reads, search results) while keeping the cache hit. The edits are “pinned” and re-sent in subsequent requests to maintain cache validity. This isn’t publicly documented in Anthropic’s API docs, it’s an internal mechanism for efficient context window management. This is how Claude Code stays fast in long sessions, surgically removing stale content without paying the re-caching cost.

The secret scanner that hides from itself

Claude Code has a built-in secret scanner that prevents credentials from leaking into shared team memory files. It uses curated gitleaks rules to detect AWS tokens (looking for the AKIA/ASIA prefix), GCP keys, OpenAI keys, GitHub PATs, Slack tokens, Stripe tokens, private key blocks, the works.

The clever part is how it handles its own key format. The Anthropic API key prefix, sk-ant-api, is assembled at runtime from separate string fragments so the literal byte sequence never appears in the published bundle. They’re hiding the detection pattern from the scanner’s own distribution. If the prefix appeared as a contiguous string in the source code, it would trigger the very scanner that’s supposed to detect it in user code.

The system prompt is ~15,000 tokens of carefully tuned instructions

The main system prompt lives in constants/prompts.ts and weighs in at approximately 14,902 tokens. It’s built dynamically from modular sections: introduction, system behavior, task execution, tool usage, tone guidelines, output efficiency, plus dynamic sections for memory, language, MCP instructions, and more.

Some specific instructions: Claude Code is told to “only use emojis if the user explicitly requests it,” to never use git push --force without explicit user request, to prefer editing existing files over creating new ones, and to “go straight to the point.” The git safety section mandates new commits over amends, specific file staging over git add -A, and never skipping hooks.

The prompt cache system marks most sections as cacheable, with an explicitly scary function name for the exception: DANGEROUS_uncachedSystemPromptSection. The override hierarchy goes coordinator > agent > custom > default, with an appendSystemPrompt that always gets added at the end regardless of mode, which is how CLAUDE.md files and memory stay in context.

There are 15+ bundled skills loaded at startup and a tool result repair system (ensureToolResultPairing()) that fixes five categories of broken message state: orphaned tool results, missing tool results, orphaned result blocks, duplicate tool_use IDs, and duplicate tool_result IDs. It synthesizes error blocks with placeholder content and runs after every message normalization. The conversation protocol self-heals from corruption.

25+ lifecycle hooks for full automation

The hooks system covers essentially every lifecycle event: pre/post tool execution, permission denied, notification, user prompt submission (can block or erase the prompt), session start/end, subagent start/stop, pre/post compact, permission requests, repo setup, teammate idle, task created/completed, MCP elicitation, config changes, CLAUDE.md loading, worktree create/remove, working directory change, and file watching.

Three execution modes: shell scripts (exit code 0 passes, 2 blocks), prompt hooks (Claude responds inline), and agent hooks (full subagent with tools). Plugins can register hooks via settings. This is the extension API, and it’s comprehensive.

The Agent SDK and daemon infrastructure

The public SDK (entrypoints/agentSdkTypes.ts) exposes forkSession() which creates a new conversation branch with a fresh UUID from any point in a session. There’s watchScheduledTasks() that yields fire/missed events from .claude/scheduled_tasks.json. And connectRemoteControl() holds a persistent WebSocket bridge connection to claude.ai from daemon processes. The SDK also exposes unstable_v2_createSession() and unstable_v2_resumeSession() for persistent multi-turn conversations.

The daemon mode supports up to 50 concurrent scheduled tasks with jitter to prevent thundering herd, filesystem locks to prevent double-firing, and automatic takeover when owning sessions crash. Background housekeeping fires automatically at startup: magic docs initialization, skill improvement, memory extraction, auto-dream setup, plugin auto-updates, and deep link protocol registration. On a 10-minute delay: old message and version cleanup. On a 24-hour recurring schedule for Anthropic-internal builds: npm cache cleanup.

Claude Code isn’t just a tool you run. It’s a platform you build on.

The telemetry is extensive

Claude Code tracks over 150 distinct event types through a dual-pipeline analytics system. Events go to both Datadog and a first-party logging service simultaneously. There’s event sampling configuration, batch processing, and PII-tagged metadata columns (prefixed with _PROTO_) that route to privileged BigQuery columns.

What gets tracked is comprehensive. Every tool invocation, every permission decision, voice recording events, session compaction, memory loading, auto-mode switches, MCP server connections, OAuth token refreshes. A killswitch flag called tengu_frond_boric can shut the analytics sink down entirely. The Datadog pipeline has an explicit allowlist of about 22 event types that get forwarded there, everything else only goes to the first-party logger.

Telemetry in developer tools isn’t unusual. But the granularity here is notable.

Connector text is an anti-distillation mechanism

The connector_text block type tracks attribution and provenance data from MCP connectors. The beta header is summarize-connector-text-2026-03-13 and it’s described as an “anti-distillation POC.” Connector text blocks are stripped from signature-bearing blocks when sharing transcripts and can be summarized server-side.

This appears to be an early mechanism for preventing AI-generated content from being used to train competing models, by embedding provenance metadata that can be detected and stripped. The AI equivalent of a watermark, sitting at the protocol layer.

Model pricing, straight from the source

The code contains a complete pricing table for all Claude models, defined as cost-per-million-tokens:

Sonnet models (3.5, 3.7, 4, 4.5, 4.6): $3 input / $15 output
Opus 4 and 4.1: $15 input / $75 output
Opus 4.5 and 4.6 (standard): $5 input / $25 output
Opus 4.6 fast mode: $30 input / $150 output
Haiku 3.5: $0.80 input / $4 output
Haiku 4.5: $1 input / $5 output

When you toggle /fast in Claude Code, you’re still using the same Opus 4.6 model, but the pricing jumps to 6x the standard rate. That $30/$150 per million tokens confirms that “fast mode” is a priority queue with faster output generation, not a different model. The code explicitly checks isFastModeEnabled() to determine which cost tier to apply.

Prompt caching multipliers: cache writes cost 1.25x the input token price, cache reads are roughly 10% of input cost. Web search requests are $0.01 each across all models.

Rate limits and the Max tier

Five rate limit window types: five_hour, seven_day, seven_day_opus, seven_day_sonnet, and overage. The Max subscription tier gets a 20x rate limit boost, keyed to rateLimitTier === "default_claude_max_20x". Enterprise and C4E subscriptions get custom policy limits fetched via API with 1-hour background polling and ETag caching. The overage system lets subscribers spend beyond their tier limit, OAuth only.

The integration surface

Voice input connects to Anthropic’s private voice_stream WebSocket endpoint, requiring OAuth and a Claude.ai subscription. The audio capture chain tries native C++ bindings first (CoreAudio on macOS, cpal on Linux/Windows), falls back to arecord or SoX, and lazy-loads on first voice keypress to avoid a 1-8 second initialization freeze. The tengu_cobalt_frost flag gates Deepgram Nova 3 as an alternative STT provider.

There’s a Chrome extension creating bi-directional communication between the CLI and Claude.ai in your browser, exposing itself as an MCP server. IDEs connect via .claudeide.lock files, with support for VSCode, IntelliJ, PyCharm, WebStorm, GoLand, CLion, RustRover, and Fleet. Settings sync across machines via background upload/download. Mid-conversation model switching is supported with automatic fallback when Opus is under high load.

The git worktree system (600+ lines) creates isolated repo copies for safe parallel work with strict path traversal prevention and sparse checkout for large repos.

The cyber risk instruction has named authors

A safety instruction in constants/cyberRiskInstruction.ts governs how Claude Code handles pentesting, CTF challenges, and defensive security work. The file lists specific Safeguards team members who wrote it (David Forsythe, Kyla Guru) and includes a note that it shouldn’t be modified without their approval. One of the few files in the codebase with that kind of ownership annotation.

FedStart means government deployments

References to claude.fedstart.com and claude-staging.fedstart.com appear as approved OAuth base URLs. FedStart is a compliance framework for deploying SaaS products in US government environments. Pointing the OAuth flow at an unapproved URL throws: “CLAUDE_CODE_CUSTOM_OAUTH_URL is not an approved endpoint.”

And yes, there’s a /buddy

The /buddy companion sprite only appears April 1-7, 2026 for external users (always on for Anthropic employees). It shows a rainbow-colored notification on startup with a 15-second timeout and includes hatching/spawning animations. It’s behind the BUDDY feature flag. This is shipping in the current build right now.

What’s hardcoded, and what isn’t

There are credentials embedded in the published code, but they’re all designed to be public. A Datadog client token (the pub prefix is Datadog’s convention for write-only client tokens). Three GrowthBook SDK keys for feature flag fetching. OAuth client IDs, public by definition in the OAuth spec. Staging infrastructure URLs under the ant.dev domain.

In remote CCR sessions, authentication uses session-ingress tokens (prefix sk-ant-si-). The token is read from /run/ccr/session_token at startup, then the file is immediately unlinked. The token stays heap-only. On Linux, prctl(PR_SET_DUMPABLE, 0) blocks ptrace heap scraping, preventing other processes from reading the token from memory. The upstream proxy runs on 127.0.0.1 with a NO_PROXY list that excludes anthropic.com and github.com to prevent MITM on auth traffic.

No private API keys, no AWS credentials, no database passwords, no user data. The Anthropic API key prefix is fragmented at build time to avoid appearing in the bundle. Their security practices on the things that actually matter are solid.

What this actually means

Half a million lines of TypeScript, and the picture that emerges isn’t a CLI tool with some cool features. It’s the foundation of something that hasn’t fully arrived yet but is clearly being built with intention.

The learning loop is the big one. Claude Code extracts memories from sessions, consolidates them during dreams, and feeds consolidated knowledge back into future sessions. Every developer who uses it long enough will have a version that understands their codebase, their patterns, their preferences. The implications for productivity are real, but so are the implications for dependence. When your AI knows your codebase better than you do, the relationship between developer and tool starts to look different.

Speculative execution means the AI is thinking ahead of you. Right now it pre-computes your likely next action while you’re still deciding what to do. The timeSavedMs metric tells you they’re optimizing for this explicitly. Eventually the speculation gets good enough that you’re just approving a stream of correct guesses. At that point you’re supervising, not coding.

Skillify is self-programming. An AI that watches what it just did, extracts the pattern, and packages it as a reusable skill is an AI that accumulates capabilities over time. Today it’s bounded, skills are markdown templates with tool restrictions. But the architecture is there for compound growth. Each skill makes the next session more capable. Each dream consolidation makes the memory layer richer. The system is designed to get better without anyone shipping a new model.

The coordinator insight is worth paying attention to. Workers can’t see the conversation, so the coordinator has to genuinely understand what they found before directing follow-up. It can’t be a dumb router. This is the template for how AI teams will work: hierarchical systems where intelligence concentrates at the coordination layer, not flat swarms where everyone sees everything. That’s also how human organizations work, which is either reassuring or unsettling depending on where you sit.

The verification agent is AI auditing AI. An adversarial agent whose entire purpose is to distrust the work of other agents and try to break it. When you have AI writing code and AI testing code, the human’s role shifts from implementer to judge. The VERDICT system is designed for human review of AI-vs-AI outcomes. That’s a workflow that barely exists yet in most organizations.

The cache architecture is the economic story underneath all of it. Everything, dreams, speculation, forked agents, multi-agent swarms, is built so that parallel workers share the parent’s prompt cache prefix. The CacheSafeParams type, the byte-identical prefixes, the cache break detection with diff output, these are the economic constraints shaping how AI agents are actually built. The companies that solve cache economics will be the ones that can afford to run persistent multi-agent AI at scale. The ones that don’t will burn through inference budgets so fast that autonomous agents remain a demo, not a product.

The effort system changes what’s economically possible. Being able to control how hard the model thinks (low through max, adaptive, ultrathink) means AI assistants can be always-on without being always-expensive. Low-effort background mode until something important happens, then ramp up. That’s the pattern for persistent AI that doesn’t bankrupt its operator.

What you’re running as a CLI tool today is the visible surface of something much larger. The infrastructure for autonomous, persistent, self-improving AI agents is already on your machine, waiting behind flags with names like amber_flint and surreal_dali. Claude Code dreams about your codebase while you sleep, pre-computes answers before you ask, programs its own capabilities, runs adversarial verification against its own work, and orchestrates multi-agent swarms across your terminal panes.

The question isn’t whether this kind of AI is coming. It’s sitting in your node_modules folder right now.

BuildingBetter

Discussion about this post

Ready for more?