Extended Thinking: How Claude Reasons Through Complex Problems#

Executive Summary#

Extended thinking gives Claude additional tokens to reason before responding. On Opus 4.6, 4.7, and 4.8 and Sonnet 4.6, thinking is adaptive: Claude decides how much to think based on task complexity. Thinking tokens are billed as output tokens ($25/MTok on the Opus tier), making thinking depth the second-biggest cost lever after model selection. Effort levels control how deeply Claude thinks: Opus 4.6 supports low/medium/high/max, Opus 4.7 and 4.8 add xhigh, and Sonnet 4.6 supports low/medium/high.

AspectDetails
Default stateEnabled by default in Claude Code
Adaptive modelsOpus 4.6, 4.7, 4.8 and Sonnet 4.6 (dynamic depth based on complexity)
Other modelsManual (fixed budget via budget_tokens)
Manual budget default63,999 tokens (Sonnet 4.5, Haiku 4.5); adaptive models set depth dynamically
BillingThinking tokens billed as output tokens
VisibilitySummarized reasoning; Ctrl+O shows it in the transcript

Table of Contents#

How Extended Thinking Works#

Why Intermediate Tokens Help#

Token generation is autoregressive: each token is predicted from all prior tokens in the context window. When Claude generates intermediate reasoning tokens, those tokens become context for the tokens that follow. The model is using its own output as working memory.

A direct prompt-to-answer jump gives the final answer a short context to condition on. A chain of intermediate steps – exploring an approach, identifying a constraint, reconsidering – gives the final answer more to work from. This is why thinking improves quality on problems that require multiple dependent steps, and why it has no effect on problems where the answer is immediate.

There is no separate reasoning system running in parallel. The thinking phase and the response phase use the same next-token prediction mechanism. The difference is that thinking tokens are generated first, accumulate in context, and are then available when the response tokens are generated.

The Thinking Process#

When thinking is enabled, Claude generates internal reasoning before crafting its response:

User prompt arrives
┌─────────────────────┐
│  Thinking Phase     │  Claude generates intermediate reasoning:
│  (thinking tokens)  │  - Explores approaches
│                     │  - Identifies constraints and edge cases
│                     │  - Backtracks from dead ends
│                     │  - Commits to an approach
└──────────┬──────────┘
┌─────────────────────┐
│  Summary Phase      │  Full thinking summarized
│  (no extra charge)  │  for user visibility
└──────────┬──────────┘
┌─────────────────────┐
│  Response Phase     │  Final answer conditioned
│  (output tokens)    │  on all prior thinking tokens
└─────────────────────┘

The thinking phase is where quality differences appear. On problems that require multiple dependent steps, thinking tokens give the response tokens more context to condition on. On problems that don’t – simple lookups, mechanical edits – thinking tokens add cost without improving output.

Adaptive Thinking#

Opus 4.6, 4.7, and 4.8 and Sonnet 4.6 use adaptive thinking by default. Instead of a fixed budget, Claude decides how much to think based on the complexity of each request:

  • Simple requests (rename a variable, fix a typo): minimal or no thinking
  • Moderate requests (implement a function, fix a bug): moderate thinking
  • Complex requests (architect a system, debug a race condition): deep thinking

This replaces the manual budget_tokens approach used on earlier models. You don’t need to estimate how many thinking tokens a task needs – Claude adjusts automatically.

Combined with effort levels, adaptive thinking gives you a spectrum from fast/cheap to thorough/expensive without manual tuning.

Summarized Thinking#

Claude 4 models return a summarized version of the thinking, not the raw thinking output:

  • You see a summary of the key reasoning steps
  • You are billed for the full thinking tokens, not the summary
  • The summary is generated by a separate model at no extra charge
  • The thinking model does not see the summarized output

The billed output token count will be higher than the visible token count. This is expected.

In Claude Code, toggle verbose mode (Ctrl+O) to see the thinking text as gray italic text in the transcript.

On Opus 4.7 and 4.8, the API returns no thinking text by default and must be asked for summaries explicitly. In Claude Code, Ctrl+O still surfaces the summarized reasoning.

Interleaved Thinking#

With tool use, Claude can think between tool calls, including between each tool result:

Think → Call tool → Read result → Think again → Call another tool → Think → Respond

On Opus 4.6, 4.7, and 4.8 and Sonnet 4.6 with adaptive thinking, interleaved thinking is automatic. On earlier models, it requires a beta header. In Claude Code, this is handled transparently – you don’t need to configure anything.

Interleaved thinking is useful for multi-step tasks where each tool result changes what to do next. Claude can reconsider its approach after seeing actual file contents or command output rather than committing to a plan upfront.

Effort Levels#

Available Levels#

LevelThinking BehaviorSpeedCostAvailability
maxNo depth limit, absolute maximumSlowestHighestOpus 4.6, 4.7, 4.8
xhighBetween high and maxSlowerHigherOpus 4.7, 4.8
highAlmost always thinks deeplySlowHighEffort-capable models
mediumMay skip thinking for simple queriesMediumMediumEffort-capable models
lowMinimizes or skips thinkingFastLowEffort-capable models

Effort applies to Opus 4.5, 4.6, 4.7, and 4.8 and to Sonnet 4.6. Sonnet 4.5 and Haiku 4.5 do not support the effort parameter – requesting it errors, so control their thinking with MAX_THINKING_TOKENS instead. max is Opus-tier only. xhigh exists on Opus 4.7 and 4.8; other models fall back to high.

Default effort by subscription tier:

SubscriptionDefault ModelDefault Effort
Max, Team PremiumOpus 4.8high
ProSonnet 4.6high
Free / 3PSonnet 4.5(not set)

The default is high for Pro, Max, and Team subscribers on the current Opus and Sonnet models, matching the API, which treats an unset effort as high. Opus 4.8 defaults to high.

Effort is a behavioral signal, not a strict token budget. At lower effort, Claude still thinks on genuinely difficult problems – it just thinks less than it would at higher effort for the same problem.

What Effort Controls#

Effort affects all tokens in the response, including non-thinking output:

AspectLow EffortHigh Effort
Thinking depthMinimal or skipped for simple tasksDeep reasoning on most tasks
Tool callsFewer, combined operationsMore, thorough exploration
ExplanationsTerse confirmationsDetailed plans and summaries
Code commentsMinimalComprehensive
Action styleProceeds directlyExplains approach before acting

Setting Effort in Claude Code#

Four methods:

  • /effort command: opens an interactive slider (arrow keys), with the ends labeled Faster and Smarter. Pass a level to set it directly, e.g. /effort xhigh.
  • /model menu: adjust the effort slider with arrow keys while selecting a model.
  • Environment variable: CLAUDE_CODE_EFFORT_LEVEL=low|medium|high|xhigh|max
  • Settings file: set effortLevel in your settings JSON.

Configuration#

Toggle Thinking On/Off#

MethodScopeDetails
Option+T / Alt+TCurrent sessionToggles thinking for this session only
/configPermanentSaved as alwaysThinkingEnabled in settings
Ctrl+ODisplay onlyShows thinking text in verbose mode

MAX_THINKING_TOKENS#

Controls the thinking token budget for manual-mode models:

SettingValue
Default63,999 tokens (Sonnet 4.5, Haiku 4.5)
Maximum63,999 tokens
Minimum1,024 tokens
Disable thinkingSet to 0 (applies on any model)
# Temporary (session only)
MAX_THINKING_TOKENS=63999 claude

# Permanent
export MAX_THINKING_TOKENS=63999  # in ~/.zshrc or ~/.bashrc

On the adaptive models (Opus 4.6, 4.7, and 4.8 and Sonnet 4.6), MAX_THINKING_TOKENS is ignored because adaptive thinking controls depth dynamically. Exception: setting it to 0 still disables thinking entirely. On Opus 4.6 manual budget_tokens is deprecated; on Opus 4.7 and 4.8 it is removed at the API level.

Thinking Budget vs Output Budget#

  • budget_tokens must be less than max_tokens (standard mode)
  • With interleaved thinking (tool use), budget_tokens can exceed max_tokens
  • Opus 4.6, 4.7, 4.8: up to 128K output tokens
  • Sonnet 4.6 and earlier models: up to 64K output tokens

The budget is a target, not a strict limit – actual usage varies by task.

Context Window Interaction#

Thinking tokens interact with the context window differently depending on the model:

Opus 4.5 and later (including 4.6): Thinking blocks from previous turns are preserved in context. Thinking tokens consume context window space across turns.

Earlier models: Thinking blocks are stripped from context between turns. Only the final response carries forward.

With tool use (all models):

context = input tokens + previous thinking tokens + tool tokens + new thinking + response

Without tool use:

context = input tokens - previous thinking tokens + new thinking + response

Thinking in Subagents#

Each subagent has its own context window and thinking budget. Considerations:

  • Set model: haiku or model: sonnet on subagents that don’t need deep reasoning
  • CLAUDE_CODE_SUBAGENT_MODEL overrides all subagent model settings
  • Low effort is recommended for subagents doing simple tasks (research, file searching)
  • Thinking adds latency – for parallel subagent work, lower effort means faster results

When to Use Extended Thinking#

The deciding factor is whether the task requires multiple dependent reasoning steps. If the answer requires working through constraints, dependencies, or tradeoffs that build on each other, thinking tokens improve output. If the answer is deterministic or immediate, they don’t.

Tasks That Benefit#

Task TypeWhy Thinking Helps
Architecture decisionsDependencies between components require multi-step analysis
Complex debuggingHypothesis → test → revise cycles benefit from working memory
Implementation planningSequencing work correctly requires tracking many constraints
Algorithm designEdge cases interact; reasoning through them compounds
Security reviewData flows require tracing across many steps
Multi-file refactoringCross-file dependencies need to be tracked simultaneously

Tasks Where It’s Overkill#

Task TypeWhy Thinking Adds No Value
Simple file editsNo ambiguity to reason about
Formatting changesMechanical transformation, no judgment needed
Find-and-replaceDeterministic operation
File reads and searchesNo decision-making involved
Quick lookupsAnswer is immediate, no reasoning chain needed

On the adaptive Opus models (4.6, 4.7, 4.8), Claude already allocates less thinking to simple tasks. For explicit control, use low effort for routine work and high, xhigh, or max for complex tasks.

Cost Management#

How Thinking Tokens Are Billed#

Thinking tokens are billed as output tokens at the model’s output rate:

ModelOutput Rate (including thinking)Cache Read
Opus 4.8$25/MTok$0.50/MTok
Sonnet 4.6$15/MTok$0.30/MTok
Haiku 4.5$5/MTok$0.10/MTok

A request that generates 10,000 thinking tokens + 2,000 response tokens costs the same as 12,000 output tokens.

Cost Control Levers#

LeverEffectHow to Set
Effort levelControls thinking depth (biggest lever)/model slider, env var, settings
Model selectionSonnet at $15/MTok vs Opus at $25/MTok/model command
Disable thinkingZero thinking tokensOption+T / Alt+T, /config
MAX_THINKING_TOKENSCap thinking budget (non-adaptive models)Environment variable
Subagent model choiceUse cheaper models for simple delegated tasksPer-agent model: field

Cost Estimates#

Rough estimates for a 100-message Opus 4.8 session:

Effort LevelAvg Thinking Tokens/MsgThinking Cost (100 msgs)Total Session Cost
max~20,000~$50~$60-75
high~10,000~$25~$35-50
medium~5,000~$12.50~$20-30
low~1,000~$2.50~$10-15

Actual costs vary based on task complexity, response length, and tool call volume.

Model Support#

ModelThinking ModeEffort LevelsInterleavedMax Output
Opus 4.8Adaptivelow, medium, high, xhigh, maxAutomatic128K
Opus 4.7Adaptivelow, medium, high, xhigh, maxAutomatic128K
Opus 4.6Adaptivelow, medium, high, maxAutomatic128K
Sonnet 4.6Adaptivelow, medium, highAutomatic64K
Opus 4.5Manual (budget)low, medium, highBeta header128K
Sonnet 4.5Manual (budget)Not supportedBeta header64K
Haiku 4.5Manual (budget)Not supportedBeta header64K

Adaptive thinking is available on Opus 4.6, 4.7, and 4.8 and Sonnet 4.6. On Opus 4.6 and Sonnet 4.6, manual budget_tokens is deprecated but still functional; on Opus 4.7 and 4.8 it is removed entirely. max effort is Opus-tier (4.6, 4.7, 4.8); xhigh is Opus 4.7 and 4.8. Sonnet 4.5 and Haiku 4.5 do not support the effort parameter.

Feature Compatibility#

On Opus 4.6, 4.7, and 4.8 and Sonnet 4.6, sampling parameters (temperature, top_p, top_k) and response pre-filling are removed – requesting them returns a 400 error. Use output_config.format (structured outputs) or system prompt instructions to control response format instead.

On older models that still use manual budget_tokens (Sonnet 4.5, Haiku 4.5), extended thinking is not compatible with:

  • temperature modifications (must be default)
  • top_k modifications
  • Forced tool use (tool_choice: "any" or tool_choice: "tool")
  • Response pre-filling

On those models, top_p can be set between 0.95 and 1.0 when thinking is enabled.

Changing thinking parameters invalidates prompt cache for messages, though system prompts and tool definitions remain cached.

Best Practices#

  1. Use adaptive thinking on the Opus models. On Opus 4.6, 4.7, and 4.8, don’t set manual budgets – let Claude decide how much to think. This is the default in Claude Code and works well for most workflows.

  2. Use effort levels as your primary cost control. Instead of toggling thinking on/off, adjust effort. medium provides a good balance for daily work. high or max for complex architecture and debugging.

  3. Use lower effort for subagents. Subagents doing research, file searching, or simple analysis don’t need deep thinking. Set model: sonnet or effort to low on delegated tasks.

  4. Don’t optimize thinking for simple tasks. On the adaptive models, Claude already minimizes thinking for simple requests.

  5. Enable verbose mode for debugging. Ctrl+O shows thinking text, which helps understand why Claude made specific decisions. Useful when Claude’s response doesn’t match expectations.

  6. Budget thinking tokens for cost-sensitive workflows. In CI/CD or headless mode, set MAX_THINKING_TOKENS to a reasonable cap to prevent runaway costs on unexpectedly complex inputs.

  7. Use the opusplan alias. Opus for planning (where thinking helps most) and Sonnet for execution (where thinking is less critical) is a reasonable cost/quality split.

Anti-Patterns#

  1. Setting MAX_THINKING_TOKENS on an adaptive model. The variable is ignored on Opus 4.6, 4.7, and 4.8 and Sonnet 4.6 (except 0). Use effort levels instead.

  2. Disabling thinking globally. On complex tasks, thinking tokens are where quality improvements come from. Disable it per-task if needed, not globally.

  3. Assuming “ultrathink” is deprecated. The ultrathink keyword is an active feature – typing it in your prompt triggers high effort via keyword detection, with rainbow highlighting in the input box. It works as a convenient shortcut for requesting deeper reasoning without changing settings.

  4. Max effort on routine tasks. max effort on simple file edits wastes tokens and adds latency. Reserve max for tasks that require many dependent reasoning steps.

  5. Expecting visible token counts to match billing. Claude 4 models show summarized thinking. The billed count (full thinking) is higher than what you see. This is expected, not a bug.

  6. Ignoring thinking costs in headless mode. Automated pipelines can run many requests. Without MAX_THINKING_TOKENS or effort limits, thinking costs accumulate quickly.

References#