<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Internals on Claude Code Wiki</title><link>http://www.markalston.net/claude-code-wiki/internals/</link><description>Recent content in Internals on Claude Code Wiki</description><generator>Hugo</generator><language>en-us</language><atom:link href="http://www.markalston.net/claude-code-wiki/internals/index.xml" rel="self" type="application/rss+xml"/><item><title>The System Prompt: What Claude Reads Before You Say Anything</title><link>http://www.markalston.net/claude-code-wiki/internals/system-prompt/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>http://www.markalston.net/claude-code-wiki/internals/system-prompt/</guid><description>&lt;h1 id="the-system-prompt-what-claude-reads-before-you-say-anything"&gt;The System Prompt: What Claude Reads Before You Say Anything&lt;a class="anchor" href="#the-system-prompt-what-claude-reads-before-you-say-anything"&gt;#&lt;/a&gt;&lt;/h1&gt;
&lt;h2 id="executive-summary"&gt;Executive Summary&lt;a class="anchor" href="#executive-summary"&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;The system prompt is the hidden instruction text sent to the model on every API call, before any conversation messages. It defines Claude&amp;rsquo;s behavior, available tools, safety rules, and injected knowledge. In Claude Code, it&amp;rsquo;s assembled from multiple sources and re-sent with every message &amp;ndash; making it the single largest factor in per-message token overhead.&lt;/p&gt;
&lt;table&gt;
 &lt;thead&gt;
 &lt;tr&gt;
 &lt;th&gt;Component&lt;/th&gt;
 &lt;th&gt;Source&lt;/th&gt;
 &lt;th&gt;Typical Size&lt;/th&gt;
 &lt;/tr&gt;
 &lt;/thead&gt;
 &lt;tbody&gt;
 &lt;tr&gt;
 &lt;td&gt;Core instructions&lt;/td&gt;
 &lt;td&gt;Claude Code built-in&lt;/td&gt;
 &lt;td&gt;~3,000-5,000 tokens&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;Tool definitions&lt;/td&gt;
 &lt;td&gt;Built-in + MCP servers&lt;/td&gt;
 &lt;td&gt;~3,000-5,000 tokens&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;CLAUDE.md files&lt;/td&gt;
 &lt;td&gt;User, project, enterprise scopes&lt;/td&gt;
 &lt;td&gt;~2,000-4,000 tokens&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;Skill catalog&lt;/td&gt;
 &lt;td&gt;Enabled skills + plugins&lt;/td&gt;
 &lt;td&gt;~2,000-5,000 tokens&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;Subagent catalog&lt;/td&gt;
 &lt;td&gt;Plugin subagent descriptions&lt;/td&gt;
 &lt;td&gt;~1,000-2,000 tokens&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;Environment + git context&lt;/td&gt;
 &lt;td&gt;Auto-detected&lt;/td&gt;
 &lt;td&gt;~200-500 tokens&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;strong&gt;Typical total&lt;/strong&gt;&lt;/td&gt;
 &lt;td&gt;&lt;/td&gt;
 &lt;td&gt;&lt;strong&gt;~12,000-20,000 tokens&lt;/strong&gt;&lt;/td&gt;
 &lt;/tr&gt;
 &lt;/tbody&gt;
&lt;/table&gt;
&lt;h2 id="table-of-contents"&gt;Table of Contents&lt;a class="anchor" href="#table-of-contents"&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#the-system-prompt-what-claude-reads-before-you-say-anything"&gt;The System Prompt: What Claude Reads Before You Say Anything&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#executive-summary"&gt;Executive Summary&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#table-of-contents"&gt;Table of Contents&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#what-the-system-prompt-is"&gt;What the System Prompt Is&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#anatomy-of-a-claude-code-api-call"&gt;Anatomy of a Claude Code API Call&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#system-prompt-components"&gt;System Prompt Components&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#core-instructions"&gt;Core Instructions&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#tool-definitions"&gt;Tool Definitions&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#claudemd-files"&gt;CLAUDE.md Files&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#skill-catalog"&gt;Skill Catalog&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#subagent-catalog"&gt;Subagent Catalog&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#mcp-server-instructions"&gt;MCP Server Instructions&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#environment-and-git-context"&gt;Environment and Git Context&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#plugin-injected-content"&gt;Plugin-Injected Content&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#how-it-gets-assembled"&gt;How It Gets Assembled&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#why-this-matters"&gt;Why This Matters&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#token-cost"&gt;Token Cost&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#behavior-shaping"&gt;Behavior Shaping&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#prompt-caching"&gt;Prompt Caching&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#what-you-can-control"&gt;What You Can Control&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#references"&gt;References&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="what-the-system-prompt-is"&gt;What the System Prompt Is&lt;a class="anchor" href="#what-the-system-prompt-is"&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Every API call to Claude has three parts: system prompt, conversation history, and the current message. The system prompt is the first part &amp;ndash; instructions the model reads before seeing any conversation.&lt;/p&gt;</description></item><item><title>Context Management: Working Within the Token Budget</title><link>http://www.markalston.net/claude-code-wiki/internals/context-management/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>http://www.markalston.net/claude-code-wiki/internals/context-management/</guid><description>&lt;h1 id="context-management-working-within-the-token-budget"&gt;Context Management: Working Within the Token Budget&lt;a class="anchor" href="#context-management-working-within-the-token-budget"&gt;#&lt;/a&gt;&lt;/h1&gt;
&lt;h2 id="executive-summary"&gt;Executive Summary&lt;a class="anchor" href="#executive-summary"&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;The context window is Claude&amp;rsquo;s working memory &amp;ndash; everything the model can reference when generating a response. In Claude Code, it fills with the system prompt, conversation history, tool results, and file contents. Managing this space is the single most important factor in maintaining effective sessions as they grow longer.&lt;/p&gt;
&lt;table&gt;
 &lt;thead&gt;
 &lt;tr&gt;
 &lt;th&gt;Model&lt;/th&gt;
 &lt;th&gt;Standard Window&lt;/th&gt;
 &lt;th&gt;Extended (Beta)&lt;/th&gt;
 &lt;th&gt;Long Context Pricing&lt;/th&gt;
 &lt;/tr&gt;
 &lt;/thead&gt;
 &lt;tbody&gt;
 &lt;tr&gt;
 &lt;td&gt;Claude Opus 4.6&lt;/td&gt;
 &lt;td&gt;200K tokens&lt;/td&gt;
 &lt;td&gt;1M tokens&lt;/td&gt;
 &lt;td&gt;2x input, 1.5x output above 200K&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;Claude Sonnet 4.6&lt;/td&gt;
 &lt;td&gt;200K tokens&lt;/td&gt;
 &lt;td&gt;1M tokens&lt;/td&gt;
 &lt;td&gt;2x input, 1.5x output above 200K&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;Claude Sonnet 4.5&lt;/td&gt;
 &lt;td&gt;200K tokens&lt;/td&gt;
 &lt;td&gt;&amp;ndash;&lt;/td&gt;
 &lt;td&gt;&amp;ndash;&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;Claude Sonnet 4&lt;/td&gt;
 &lt;td&gt;200K tokens&lt;/td&gt;
 &lt;td&gt;&amp;ndash;&lt;/td&gt;
 &lt;td&gt;&amp;ndash;&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;Claude Haiku 4.5&lt;/td&gt;
 &lt;td&gt;200K tokens&lt;/td&gt;
 &lt;td&gt;&amp;ndash;&lt;/td&gt;
 &lt;td&gt;&amp;ndash;&lt;/td&gt;
 &lt;/tr&gt;
 &lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;The 1M token context window is currently available in beta on the API only. Standard claude.ai and Claude Code users access the 200K window unless the beta header is explicitly enabled.&lt;/p&gt;</description></item><item><title>Prompt Caching: Why Your System Prompt Doesn't Cost What You Think</title><link>http://www.markalston.net/claude-code-wiki/internals/prompt-caching/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>http://www.markalston.net/claude-code-wiki/internals/prompt-caching/</guid><description>&lt;h1 id="prompt-caching-why-your-system-prompt-doesnt-cost-what-you-think"&gt;Prompt Caching: Why Your System Prompt Doesn&amp;rsquo;t Cost What You Think&lt;a class="anchor" href="#prompt-caching-why-your-system-prompt-doesnt-cost-what-you-think"&gt;#&lt;/a&gt;&lt;/h1&gt;
&lt;h2 id="executive-summary"&gt;Executive Summary&lt;a class="anchor" href="#executive-summary"&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Prompt caching allows the API to reuse previously processed prompt prefixes, reducing both cost and latency. Since Claude Code re-sends the system prompt on every API call, caching is what makes large system prompts economically viable. Without caching, a 200-message session with a 15,000-token system prompt would cost ~$15 on Opus 4.6. With caching, it costs ~$1.60 &amp;ndash; an 89% reduction.&lt;/p&gt;</description></item><item><title>Optimizing Token Usage: Skills, Plugins, and Context Budget</title><link>http://www.markalston.net/claude-code-wiki/internals/token-optimization/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>http://www.markalston.net/claude-code-wiki/internals/token-optimization/</guid><description>&lt;h1 id="optimizing-token-usage-skills-plugins-and-context-budget"&gt;Optimizing Token Usage: Skills, Plugins, and Context Budget&lt;a class="anchor" href="#optimizing-token-usage-skills-plugins-and-context-budget"&gt;#&lt;/a&gt;&lt;/h1&gt;
&lt;h2 id="executive-summary"&gt;Executive Summary&lt;a class="anchor" href="#executive-summary"&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Every Claude Code session carries a baseline token cost from skills, plugins, and system configuration loaded into the context window. This cost is per-message &amp;ndash; paid on every API round-trip regardless of whether the features are used. Understanding and managing this overhead directly impacts session cost, available context for actual work, and response latency.&lt;/p&gt;
&lt;table&gt;
 &lt;thead&gt;
 &lt;tr&gt;
 &lt;th&gt;Component&lt;/th&gt;
 &lt;th&gt;When Loaded&lt;/th&gt;
 &lt;th&gt;Token Cost Pattern&lt;/th&gt;
 &lt;/tr&gt;
 &lt;/thead&gt;
 &lt;tbody&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;strong&gt;Skill catalog&lt;/strong&gt; (names + descriptions)&lt;/td&gt;
 &lt;td&gt;Every message&lt;/td&gt;
 &lt;td&gt;~25-100 tokens per skill&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;strong&gt;Skill content&lt;/strong&gt; (full SKILL.md)&lt;/td&gt;
 &lt;td&gt;On invocation only&lt;/td&gt;
 &lt;td&gt;Varies (can be significant)&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;strong&gt;Plugin subagents&lt;/strong&gt; (descriptions in Task tool)&lt;/td&gt;
 &lt;td&gt;Every message&lt;/td&gt;
 &lt;td&gt;~50-150 tokens per subagent&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;strong&gt;CLAUDE.md files&lt;/strong&gt;&lt;/td&gt;
 &lt;td&gt;Every message&lt;/td&gt;
 &lt;td&gt;Entire file contents&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;strong&gt;MCP tool definitions&lt;/strong&gt;&lt;/td&gt;
 &lt;td&gt;Every message&lt;/td&gt;
 &lt;td&gt;~30-80 tokens per tool&lt;/td&gt;
 &lt;/tr&gt;
 &lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;A typical setup with 20+ plugins can consume &lt;strong&gt;4,000-5,000+ tokens per message&lt;/strong&gt; just for the skill/subagent catalog &amp;ndash; before any actual work happens.&lt;/p&gt;</description></item><item><title>Extended Thinking: How Claude Reasons Through Complex Problems</title><link>http://www.markalston.net/claude-code-wiki/internals/extended-thinking/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>http://www.markalston.net/claude-code-wiki/internals/extended-thinking/</guid><description>&lt;h1 id="extended-thinking-how-claude-reasons-through-complex-problems"&gt;Extended Thinking: How Claude Reasons Through Complex Problems&lt;a class="anchor" href="#extended-thinking-how-claude-reasons-through-complex-problems"&gt;#&lt;/a&gt;&lt;/h1&gt;
&lt;h2 id="executive-summary"&gt;Executive Summary&lt;a class="anchor" href="#executive-summary"&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Extended thinking gives Claude additional tokens to reason before responding. On Opus 4.6 and Sonnet 4.6, thinking is adaptive: Claude decides how much to think based on task complexity. Thinking tokens are billed as output tokens ($25/MTok on Opus 4.6), making thinking depth the second-biggest cost lever after model selection. On Opus 4.6, effort levels (low/medium/high/max) control how much Claude thinks; Sonnet 4.6 supports low/medium/high effort.&lt;/p&gt;</description></item><item><title>Tool Execution Context</title><link>http://www.markalston.net/claude-code-wiki/internals/tool-execution-context/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>http://www.markalston.net/claude-code-wiki/internals/tool-execution-context/</guid><description>&lt;h1 id="tool-execution-context"&gt;Tool Execution Context&lt;a class="anchor" href="#tool-execution-context"&gt;#&lt;/a&gt;&lt;/h1&gt;
&lt;p&gt;When Claude runs a bash command or any other tool, the system prompt and skill instructions are &lt;strong&gt;not re-injected&lt;/strong&gt; alongside the tool result. The result passes back to the model as bare output. This is sometimes called &amp;ldquo;pass-through&amp;rdquo; behavior.&lt;/p&gt;
&lt;p&gt;The distinction matters when your workflows depend on behavioral rules defined in CLAUDE.md or a skill file &amp;ndash; those rules are active when Claude is reasoning, but they are not re-delivered when Claude processes tool output.&lt;/p&gt;</description></item></channel></rss>