Extended thinking — the inference-time computation mode where Claude spends additional tokens on chain-of-thought reasoning before producing a response — was introduced as a binary capability: on or off. When enabled, the model always thought before responding. This produced better results on complex tasks at the cost of additional latency and tokens on simple tasks that did not benefit from the extra reasoning.
Claude 4.6 introduces adaptive thinking: thinking: {type: "adaptive"}. Rather than always thinking or never thinking, the model dynamically decides whether to think and how much to think based on the complexity of the current turn. A simple factual question gets an immediate response. A complex planning task gets extended reasoning. The model makes the decision, not the developer.
Alongside adaptive thinking, the effort parameter has graduated from beta to general availability. The four effort levels — low, medium, high, and max — provide developer-side control over the model's reasoning budget. The max level, exclusive to Opus 4.6, unlocks the deepest reasoning with no constraint on token spending. And the ultrathink keyword in prompts temporarily overrides to maximum effort for individual turns.
Together, adaptive thinking and the effort parameter give agent developers the most fine-grained control over the cost-latency-quality tradeoff that any model API has provided.
How Adaptive Thinking Works
In non-adaptive mode, extended thinking has a fixed cost: every model call includes a thinking phase that consumes tokens and adds latency. For a ten-turn agent session with three complex planning calls and seven routine tool-dispatch calls, the fixed thinking overhead on those seven routine calls is wasted budget.
Adaptive thinking eliminates this waste. The model evaluates each turn's complexity and allocates thinking budget accordingly:
| Turn Type | Non-Adaptive (Extended Thinking On) | Adaptive |
|---|---|---|
| Simple tool dispatch | Full thinking (wasteful) | Minimal or no thinking |
| Routine code generation | Full thinking (some benefit) | Moderate thinking |
| Complex planning | Full thinking (high value) | Full thinking |
| Ambiguous problem decomposition | Full thinking (high value) | Extended thinking |
The model's assessment of turn complexity is not perfect — it occasionally under-thinks on turns that would benefit from more reasoning, or over-thinks on turns that are genuinely simple. But empirically, adaptive thinking produces better cost-efficiency than either "always on" or "always off" across mixed-complexity agent sessions.
The Effort Parameter
The effort parameter provides developer-side control that complements the model's adaptive thinking. Where adaptive thinking lets the model decide, the effort parameter lets the developer constrain or expand the decision space.
Low effort biases the model toward quick responses with minimal thinking. This is appropriate for high-volume, low-complexity tasks — batch classification, simple data extraction, routine tool dispatch. The latency reduction is significant: low-effort responses are typically 2-3x faster than high-effort responses on the same input.
Medium effort is the default and provides balanced behavior. The model thinks when it judges thinking would help, but does not over-invest in reasoning for routine tasks.
High effort biases the model toward thorough reasoning. The model is more likely to think, thinks for longer, and explores more candidate approaches before committing to a response. This is appropriate for critical decisions, complex debugging, and tasks where the cost of an incorrect response is high.
Max effort is exclusive to Opus 4.6 and provides the absolute highest capability with no constraints on token spending. Responses are slower and more expensive than high effort, but the model applies its deepest reasoning chains — useful for the most complex planning and analysis tasks.
The ultrathink keyword, triggered by including "ultrathink" in the prompt text, temporarily overrides to maximum reasoning effort for a single turn. This is useful for the specific turns in an agent session where you want the model to invest heavily in reasoning — the initial planning phase, a critical decision point, a complex debugging step — without changing the session-wide effort setting.
Agent Architecture Implications
The effort parameter maps cleanly to agent phase architecture:
Agent Session: ├─ Planning Phase (high effort / ultrathink) │ └─ Task decomposition, approach selection, risk assessment ├─ Execution Phase (medium or low effort) │ └─ Tool calls, code generation, file operations ├─ Verification Phase (medium effort) │ └─ Test interpretation, result validation └─ Error Recovery (high effort) └─ Failure analysis, approach revision
Setting effort per-phase — rather than per-session — optimizes the cost-quality tradeoff at the granularity where it matters. Planning benefits from high effort because early decisions constrain everything that follows. Execution can run at medium or low effort because the plan provides sufficient structure. Verification runs at medium effort because interpreting test results requires judgment but not the deep reasoning that planning demands. Error recovery escalates to high effort because understanding why something failed and choosing an alternative approach is genuinely complex reasoning.
For Neumar's Claude Agent SDK integration, the effort parameter can be configured in the AgentOptions:
const agentConfig: AgentOptions = {
model: 'opus',
thinking: { type: 'adaptive' },
// Effort can be set per-agent or adjusted per-turn
// via the system prompt or tool_runner configuration
};
For LangGraph workflow agents, effort can be configured per-node — high effort on planning nodes, low effort on data transformation nodes, medium effort on synthesis nodes. The state annotation system tracks which phase the workflow is in, and node-level configuration adjusts effort accordingly.
Cost Impact
The cost impact of adaptive thinking versus fixed extended thinking is meaningful for production agent workloads.
Consider a typical agent session with 20 model calls:
- 3 planning calls that genuinely benefit from extended thinking
- 12 execution calls that need basic reasoning
- 5 verification/error-handling calls that need moderate reasoning
With fixed extended thinking (always on), all 20 calls incur thinking token costs. With adaptive thinking, approximately 3-5 calls incur significant thinking costs, 5-8 incur moderate thinking costs, and the remainder incur minimal or no thinking costs.
The token savings depend on task composition, but 30-50% reduction in thinking token usage is typical for mixed-complexity agent sessions — with no meaningful quality degradation on the execution and verification phases.
Combined with the Sonnet 4.6 tiered approach (using Sonnet for execution and Opus for planning), adaptive thinking provides a second axis of cost optimization: not just choosing the right model for each call, but choosing the right reasoning depth for each call.
The Control Spectrum
The effort parameter and adaptive thinking together create a control spectrum for agent reasoning:
More developer control ←──────────────────→ More model control effort: "low" effort: "medium" thinking: "adaptive" ultrathink (fast, cheap) (balanced) (model decides) (maximum reasoning)
Agent developers can position their applications anywhere on this spectrum, and can move along it dynamically within a single session. The initial planning turn uses ultrathink. The subsequent execution turns use medium effort with adaptive thinking. A failed verification escalates to high effort. The post-recovery execution returns to medium.
This is meaningfully more control than "pick a model and hope for the best." It represents the maturation of the model API from a simple input-output interface to a configurable reasoning system where the developer and the model share control over how much computation is applied to each decision.
Neumar's agent orchestration supports per-agent and per-phase effort configuration through the Claude Agent SDK wrapper. Adaptive thinking with the effort parameter enables cost-optimized agent sessions without sacrificing quality on the turns that matter most.
