Extended Thinking: How Claude Reasons Through Complex Problems#

Executive Summary#

Extended thinking gives Claude additional tokens to reason before responding. On Opus 4.6, 4.7, and 4.8 and Sonnet 4.6, thinking is adaptive: Claude decides how much to think based on task complexity. Thinking tokens are billed as output tokens ($25/MTok on the Opus tier), making thinking depth the second-biggest cost lever after model selection. Effort levels control how deeply Claude thinks: Opus 4.6 supports low/medium/high/max, Opus 4.7 and 4.8 add xhigh, and Sonnet 4.6 supports low/medium/high.

Aspect	Details
Default state	Enabled by default in Claude Code
Adaptive models	Opus 4.6, 4.7, 4.8 and Sonnet 4.6 (dynamic depth based on complexity)
Other models	Manual (fixed budget via `budget_tokens`)
Manual budget default	63,999 tokens (Sonnet 4.5, Haiku 4.5); adaptive models set depth dynamically
Billing	Thinking tokens billed as output tokens
Visibility	Summarized reasoning; `Ctrl+O` shows it in the transcript

Table of Contents#

Extended Thinking: How Claude Reasons Through Complex Problems

How Extended Thinking Works#

Why Intermediate Tokens Help#

Token generation is autoregressive: each token is predicted from all prior tokens in the context window. When Claude generates intermediate reasoning tokens, those tokens become context for the tokens that follow. The model is using its own output as working memory.

A direct prompt-to-answer jump gives the final answer a short context to condition on. A chain of intermediate steps – exploring an approach, identifying a constraint, reconsidering – gives the final answer more to work from. This is why thinking improves quality on problems that require multiple dependent steps, and why it has no effect on problems where the answer is immediate.

There is no separate reasoning system running in parallel. The thinking phase and the response phase use the same next-token prediction mechanism. The difference is that thinking tokens are generated first, accumulate in context, and are then available when the response tokens are generated.

The Thinking Process#

When thinking is enabled, Claude generates internal reasoning before crafting its response:

User prompt arrives
    │
    ▼
┌─────────────────────┐
│  Thinking Phase     │  Claude generates intermediate reasoning:
│  (thinking tokens)  │  - Explores approaches
│                     │  - Identifies constraints and edge cases
│                     │  - Backtracks from dead ends
│                     │  - Commits to an approach
└──────────┬──────────┘
           │
           ▼
┌─────────────────────┐
│  Summary Phase      │  Full thinking summarized
│  (no extra charge)  │  for user visibility
└──────────┬──────────┘
           │
           ▼
┌─────────────────────┐
│  Response Phase     │  Final answer conditioned
│  (output tokens)    │  on all prior thinking tokens
└─────────────────────┘

The thinking phase is where quality differences appear. On problems that require multiple dependent steps, thinking tokens give the response tokens more context to condition on. On problems that don’t – simple lookups, mechanical edits – thinking tokens add cost without improving output.

Adaptive Thinking#

Opus 4.6, 4.7, and 4.8 and Sonnet 4.6 use adaptive thinking by default. Instead of a fixed budget, Claude decides how much to think based on the complexity of each request:

Simple requests (rename a variable, fix a typo): minimal or no thinking
Moderate requests (implement a function, fix a bug): moderate thinking
Complex requests (architect a system, debug a race condition): deep thinking

This replaces the manual budget_tokens approach used on earlier models. You don’t need to estimate how many thinking tokens a task needs – Claude adjusts automatically.

Combined with effort levels, adaptive thinking gives you a spectrum from fast/cheap to thorough/expensive without manual tuning.

Summarized Thinking#

Claude 4 models return a summarized version of the thinking, not the raw thinking output:

You see a summary of the key reasoning steps
You are billed for the full thinking tokens, not the summary
The summary is generated by a separate model at no extra charge
The thinking model does not see the summarized output

The billed output token count will be higher than the visible token count. This is expected.

In Claude Code, toggle verbose mode (Ctrl+O) to see the thinking text as gray italic text in the transcript.

On Opus 4.7 and 4.8, the API returns no thinking text by default and must be asked for summaries explicitly. In Claude Code, Ctrl+O still surfaces the summarized reasoning.

Interleaved Thinking#

With tool use, Claude can think between tool calls, including between each tool result:

Think → Call tool → Read result → Think again → Call another tool → Think → Respond

On Opus 4.6, 4.7, and 4.8 and Sonnet 4.6 with adaptive thinking, interleaved thinking is automatic. On earlier models, it requires a beta header. In Claude Code, this is handled transparently – you don’t need to configure anything.

Interleaved thinking is useful for multi-step tasks where each tool result changes what to do next. Claude can reconsider its approach after seeing actual file contents or command output rather than committing to a plan upfront.

Effort Levels#

Available Levels#

Level	Thinking Behavior	Speed	Cost	Availability
max	No depth limit, absolute maximum	Slowest	Highest	Opus 4.6, 4.7, 4.8
xhigh	Between high and max	Slower	Higher	Opus 4.7, 4.8
high	Almost always thinks deeply	Slow	High	Effort-capable models
medium	May skip thinking for simple queries	Medium	Medium	Effort-capable models
low	Minimizes or skips thinking	Fast	Low	Effort-capable models

Effort applies to Opus 4.5, 4.6, 4.7, and 4.8 and to Sonnet 4.6. Sonnet 4.5 and Haiku 4.5 do not support the effort parameter – requesting it errors, so control their thinking with MAX_THINKING_TOKENS instead. max is Opus-tier only. xhigh exists on Opus 4.7 and 4.8; other models fall back to high.

Default effort by subscription tier:

Subscription	Default Model	Default Effort
Max, Team Premium	Opus 4.8	high
Pro	Sonnet 4.6	high
Free / 3P	Sonnet 4.5	(not set)

The default is high for Pro, Max, and Team subscribers on the current Opus and Sonnet models, matching the API, which treats an unset effort as high. Opus 4.8 defaults to high.

Effort is a behavioral signal, not a strict token budget. At lower effort, Claude still thinks on genuinely difficult problems – it just thinks less than it would at higher effort for the same problem.

What Effort Controls#

Effort affects all tokens in the response, including non-thinking output:

Aspect	Low Effort	High Effort
Thinking depth	Minimal or skipped for simple tasks	Deep reasoning on most tasks
Tool calls	Fewer, combined operations	More, thorough exploration
Explanations	Terse confirmations	Detailed plans and summaries
Code comments	Minimal	Comprehensive
Action style	Proceeds directly	Explains approach before acting

Setting Effort in Claude Code#

Four methods:

/effort command: opens an interactive slider (arrow keys), with the ends labeled Faster and Smarter. Pass a level to set it directly, e.g. /effort xhigh.
/model menu: adjust the effort slider with arrow keys while selecting a model.
Environment variable: CLAUDE_CODE_EFFORT_LEVEL=low|medium|high|xhigh|max
Settings file: set effortLevel in your settings JSON.

Configuration#

Toggle Thinking On/Off#

Method	Scope	Details
`Option+T` / `Alt+T`	Current session	Toggles thinking for this session only
`/config`	Permanent	Saved as `alwaysThinkingEnabled` in settings
`Ctrl+O`	Display only	Shows thinking text in verbose mode

MAX_THINKING_TOKENS#

Controls the thinking token budget for manual-mode models:

Setting	Value
Default	63,999 tokens (Sonnet 4.5, Haiku 4.5)
Maximum	63,999 tokens
Minimum	1,024 tokens
Disable thinking	Set to `0` (applies on any model)

# Temporary (session only)
MAX_THINKING_TOKENS=63999 claude

# Permanent
export MAX_THINKING_TOKENS=63999  # in ~/.zshrc or ~/.bashrc

On the adaptive models (Opus 4.6, 4.7, and 4.8 and Sonnet 4.6), MAX_THINKING_TOKENS is ignored because adaptive thinking controls depth dynamically. Exception: setting it to 0 still disables thinking entirely. On Opus 4.6 manual budget_tokens is deprecated; on Opus 4.7 and 4.8 it is removed at the API level.

Thinking Budget vs Output Budget#

budget_tokens must be less than max_tokens (standard mode)
With interleaved thinking (tool use), budget_tokens can exceed max_tokens
Opus 4.6, 4.7, 4.8: up to 128K output tokens
Sonnet 4.6 and earlier models: up to 64K output tokens

The budget is a target, not a strict limit – actual usage varies by task.

Context Window Interaction#

Thinking tokens interact with the context window differently depending on the model:

Opus 4.5 and later (including 4.6): Thinking blocks from previous turns are preserved in context. Thinking tokens consume context window space across turns.

Earlier models: Thinking blocks are stripped from context between turns. Only the final response carries forward.

With tool use (all models):

context = input tokens + previous thinking tokens + tool tokens + new thinking + response

Without tool use:

context = input tokens - previous thinking tokens + new thinking + response

Thinking in Subagents#

Each subagent has its own context window and thinking budget. Considerations:

Set model: haiku or model: sonnet on subagents that don’t need deep reasoning
CLAUDE_CODE_SUBAGENT_MODEL overrides all subagent model settings
Low effort is recommended for subagents doing simple tasks (research, file searching)
Thinking adds latency – for parallel subagent work, lower effort means faster results

When to Use Extended Thinking#

The deciding factor is whether the task requires multiple dependent reasoning steps. If the answer requires working through constraints, dependencies, or tradeoffs that build on each other, thinking tokens improve output. If the answer is deterministic or immediate, they don’t.

Tasks That Benefit#

Task Type	Why Thinking Helps
Architecture decisions	Dependencies between components require multi-step analysis
Complex debugging	Hypothesis → test → revise cycles benefit from working memory
Implementation planning	Sequencing work correctly requires tracking many constraints
Algorithm design	Edge cases interact; reasoning through them compounds
Security review	Data flows require tracing across many steps
Multi-file refactoring	Cross-file dependencies need to be tracked simultaneously

Tasks Where It’s Overkill#

Task Type	Why Thinking Adds No Value
Simple file edits	No ambiguity to reason about
Formatting changes	Mechanical transformation, no judgment needed
Find-and-replace	Deterministic operation
File reads and searches	No decision-making involved
Quick lookups	Answer is immediate, no reasoning chain needed

On the adaptive Opus models (4.6, 4.7, 4.8), Claude already allocates less thinking to simple tasks. For explicit control, use low effort for routine work and high, xhigh, or max for complex tasks.

Cost Management#

How Thinking Tokens Are Billed#

Thinking tokens are billed as output tokens at the model’s output rate:

Model	Output Rate (including thinking)	Cache Read
Opus 4.8	$25/MTok	$0.50/MTok
Sonnet 4.6	$15/MTok	$0.30/MTok
Haiku 4.5	$5/MTok	$0.10/MTok

A request that generates 10,000 thinking tokens + 2,000 response tokens costs the same as 12,000 output tokens.

Cost Control Levers#

Lever	Effect	How to Set
Effort level	Controls thinking depth (biggest lever)	`/model` slider, env var, settings
Model selection	Sonnet at $15/MTok vs Opus at $25/MTok	`/model` command
Disable thinking	Zero thinking tokens	`Option+T` / `Alt+T`, `/config`
MAX_THINKING_TOKENS	Cap thinking budget (non-adaptive models)	Environment variable
Subagent model choice	Use cheaper models for simple delegated tasks	Per-agent `model:` field

Cost Estimates#

Rough estimates for a 100-message Opus 4.8 session:

Effort Level	Avg Thinking Tokens/Msg	Thinking Cost (100 msgs)	Total Session Cost
max	~20,000	~$50	~$60-75
high	~10,000	~$25	~$35-50
medium	~5,000	~$12.50	~$20-30
low	~1,000	~$2.50	~$10-15

Actual costs vary based on task complexity, response length, and tool call volume.

Model Support#

Model	Thinking Mode	Effort Levels	Interleaved	Max Output
Opus 4.8	Adaptive	low, medium, high, xhigh, max	Automatic	128K
Opus 4.7	Adaptive	low, medium, high, xhigh, max	Automatic	128K
Opus 4.6	Adaptive	low, medium, high, max	Automatic	128K
Sonnet 4.6	Adaptive	low, medium, high	Automatic	64K
Opus 4.5	Manual (budget)	low, medium, high	Beta header	128K
Sonnet 4.5	Manual (budget)	Not supported	Beta header	64K
Haiku 4.5	Manual (budget)	Not supported	Beta header	64K

Adaptive thinking is available on Opus 4.6, 4.7, and 4.8 and Sonnet 4.6. On Opus 4.6 and Sonnet 4.6, manual budget_tokens is deprecated but still functional; on Opus 4.7 and 4.8 it is removed entirely. max effort is Opus-tier (4.6, 4.7, 4.8); xhigh is Opus 4.7 and 4.8. Sonnet 4.5 and Haiku 4.5 do not support the effort parameter.

Feature Compatibility#

On Opus 4.6, 4.7, and 4.8 and Sonnet 4.6, sampling parameters (temperature, top_p, top_k) and response pre-filling are removed – requesting them returns a 400 error. Use output_config.format (structured outputs) or system prompt instructions to control response format instead.

On older models that still use manual budget_tokens (Sonnet 4.5, Haiku 4.5), extended thinking is not compatible with:

temperature modifications (must be default)
top_k modifications
Forced tool use (tool_choice: "any" or tool_choice: "tool")
Response pre-filling

On those models, top_p can be set between 0.95 and 1.0 when thinking is enabled.

Changing thinking parameters invalidates prompt cache for messages, though system prompts and tool definitions remain cached.

Best Practices#

Use adaptive thinking on the Opus models. On Opus 4.6, 4.7, and 4.8, don’t set manual budgets – let Claude decide how much to think. This is the default in Claude Code and works well for most workflows.
Use effort levels as your primary cost control. Instead of toggling thinking on/off, adjust effort. medium provides a good balance for daily work. high or max for complex architecture and debugging.
Use lower effort for subagents. Subagents doing research, file searching, or simple analysis don’t need deep thinking. Set model: sonnet or effort to low on delegated tasks.
Don’t optimize thinking for simple tasks. On the adaptive models, Claude already minimizes thinking for simple requests.
Enable verbose mode for debugging. Ctrl+O shows thinking text, which helps understand why Claude made specific decisions. Useful when Claude’s response doesn’t match expectations.
Budget thinking tokens for cost-sensitive workflows. In CI/CD or headless mode, set MAX_THINKING_TOKENS to a reasonable cap to prevent runaway costs on unexpectedly complex inputs.
Use the opusplan alias. Opus for planning (where thinking helps most) and Sonnet for execution (where thinking is less critical) is a reasonable cost/quality split.

Anti-Patterns#

Setting MAX_THINKING_TOKENS on an adaptive model. The variable is ignored on Opus 4.6, 4.7, and 4.8 and Sonnet 4.6 (except 0). Use effort levels instead.
Disabling thinking globally. On complex tasks, thinking tokens are where quality improvements come from. Disable it per-task if needed, not globally.
Assuming “ultrathink” is deprecated. The ultrathink keyword is an active feature – typing it in your prompt triggers high effort via keyword detection, with rainbow highlighting in the input box. It works as a convenient shortcut for requesting deeper reasoning without changing settings.
Max effort on routine tasks. max effort on simple file edits wastes tokens and adds latency. Reserve max for tasks that require many dependent reasoning steps.
Expecting visible token counts to match billing. Claude 4 models show summarized thinking. The billed count (full thinking) is higher than what you see. This is expected, not a bug.
Ignoring thinking costs in headless mode. Automated pipelines can run many requests. Without MAX_THINKING_TOKENS or effort limits, thinking costs accumulate quickly.

References#

Extended Thinking (API) – API-level thinking configuration
Adaptive Thinking – Opus 4.6, 4.7, 4.8 and Sonnet 4.6 adaptive mode
Effort Parameter – effort levels and behavioral effects
Extended Thinking Tips – prompt engineering for thinking
Model Configuration (Claude Code) – effort levels, model aliases, thinking settings
Cost Management (Claude Code) – pricing, typical costs
Pricing – token pricing per model
Model Selection Article – model comparison and cost strategies
Context Management Article – thinking tokens and context window interaction