Internals on Claude Code Wiki

The System Prompt: What Claude Reads Before You Say Anything

Mon, 01 Jan 0001 00:00:00 +0000

The System Prompt: What Claude Reads Before You Say Anything#

Executive Summary#

The system prompt is the hidden instruction text sent to the model on every API call, before any conversation messages. It defines Claude’s behavior, available tools, safety rules, and injected knowledge. In Claude Code, it’s assembled from multiple sources and re-sent with every message – making it the single largest factor in per-message token overhead.

Component	Source	Typical Size
Core instructions	Claude Code built-in	~3,000-5,000 tokens
Tool definitions	Built-in + MCP servers	~3,000-5,000 tokens
CLAUDE.md files	User, project, enterprise scopes	~2,000-4,000 tokens
Skill catalog	Enabled skills + plugins	~2,000-5,000 tokens
Subagent catalog	Plugin subagent descriptions	~1,000-2,000 tokens
Environment + git context	Auto-detected	~200-500 tokens
Typical total		~12,000-20,000 tokens

Table of Contents#

The System Prompt: What Claude Reads Before You Say Anything

What the System Prompt Is#

Every API call to Claude has three parts: system prompt, conversation history, and the current message. The system prompt is the first part – instructions the model reads before seeing any conversation.

Context Management: Working Within the Token Budget

Mon, 01 Jan 0001 00:00:00 +0000

Context Management: Working Within the Token Budget#

Executive Summary#

The context window is Claude’s working memory – everything the model can reference when generating a response. In Claude Code, it fills with the system prompt, conversation history, tool results, and file contents. Managing this space is the single most important factor in maintaining effective sessions as they grow longer.

Model	Standard Window	Extended (Beta)	Long Context Pricing
Claude Opus 4.6	200K tokens	1M tokens	2x input, 1.5x output above 200K
Claude Sonnet 4.6	200K tokens	1M tokens	2x input, 1.5x output above 200K
Claude Sonnet 4.5	200K tokens	–	–
Claude Sonnet 4	200K tokens	–	–
Claude Haiku 4.5	200K tokens	–	–

The 1M token context window is currently available in beta on the API only. Standard claude.ai and Claude Code users access the 200K window unless the beta header is explicitly enabled.

Prompt Caching: Why Your System Prompt Doesn't Cost What You Think

Mon, 01 Jan 0001 00:00:00 +0000

Prompt Caching: Why Your System Prompt Doesn’t Cost What You Think#

Executive Summary#

Prompt caching allows the API to reuse previously processed prompt prefixes, reducing both cost and latency. Since Claude Code re-sends the system prompt on every API call, caching is what makes large system prompts economically viable. Without caching, a 200-message session with a 15,000-token system prompt would cost ~$15 on Opus 4.6. With caching, it costs ~$1.60 – an 89% reduction.

Optimizing Token Usage: Skills, Plugins, and Context Budget

Mon, 01 Jan 0001 00:00:00 +0000

Optimizing Token Usage: Skills, Plugins, and Context Budget#

Executive Summary#

Every Claude Code session carries a baseline token cost from skills, plugins, and system configuration loaded into the context window. This cost is per-message – paid on every API round-trip regardless of whether the features are used. Understanding and managing this overhead directly impacts session cost, available context for actual work, and response latency.

Component	When Loaded	Token Cost Pattern
Skill catalog (names + descriptions)	Every message	~25-100 tokens per skill
Skill content (full SKILL.md)	On invocation only	Varies (can be significant)
Plugin subagents (descriptions in Task tool)	Every message	~50-150 tokens per subagent
CLAUDE.md files	Every message	Entire file contents
MCP tool definitions	Every message	~30-80 tokens per tool

A typical setup with 20+ plugins can consume 4,000-5,000+ tokens per message just for the skill/subagent catalog – before any actual work happens.

Extended Thinking: How Claude Reasons Through Complex Problems

Mon, 01 Jan 0001 00:00:00 +0000

Extended Thinking: How Claude Reasons Through Complex Problems#

Executive Summary#

Extended thinking gives Claude additional tokens to reason before responding. On Opus 4.6 and Sonnet 4.6, thinking is adaptive: Claude decides how much to think based on task complexity. Thinking tokens are billed as output tokens ($25/MTok on Opus 4.6), making thinking depth the second-biggest cost lever after model selection. On Opus 4.6, effort levels (low/medium/high/max) control how much Claude thinks; Sonnet 4.6 supports low/medium/high effort.

Tool Execution Context

Mon, 01 Jan 0001 00:00:00 +0000

Tool Execution Context#

When Claude runs a bash command or any other tool, the system prompt and skill instructions are not re-injected alongside the tool result. The result passes back to the model as bare output. This is sometimes called “pass-through” behavior.

The distinction matters when your workflows depend on behavioral rules defined in CLAUDE.md or a skill file – those rules are active when Claude is reasoning, but they are not re-delivered when Claude processes tool output.