Every conversation with a language model has an expiration date: the context window limit. For Claude Opus 4.6, that limit is one million tokens — a substantial budget that handles most single-session tasks comfortably. But for long-running agent workflows, multi-hour debugging sessions, or conversations that accumulate significant tool output over dozens of turns, even a million tokens eventually runs out.
The Compaction API, launched in beta alongside Opus 4.6 in early 2026, addresses this constraint by providing server-side context summarization. When a conversation approaches a configured token threshold, the API automatically summarizes older messages — preserving critical information while freeing token budget for new content. The result is effectively unbounded conversations that maintain coherence across arbitrarily long interactions.
This is a significant infrastructure primitive for agent developers. Understanding how it works and where it fits reveals decisions about context management that affect every long-running agent system.
How Compaction Works
The API surface is straightforward. Include the beta header compact-2026-01-12 in your API requests and configure a compaction threshold. When the conversation's token count approaches that threshold, the API returns a compaction block — a structured summary of the older conversation context — alongside the model's response.
On subsequent requests, you append the response to your messages as normal. The API automatically drops all message blocks prior to the compaction block, continuing the conversation from the summary. The developer does not need to implement summarization logic, manage conversation truncation, or decide what to preserve.
Request 1-N: Normal conversation → Messages accumulate toward threshold Request N+1: Threshold reached → Response includes compaction block (summary of messages 1 through N) → Older messages dropped Request N+2: Conversation continues from summary → Context = compaction summary + recent messages → Full token budget available for new content
The compaction block itself is a structured summary optimized for conversation continuation. It preserves:
- Decisions made. Choices the model or user committed to during the conversation
- File paths and identifiers. Specific references that would be needed to continue work
- Error history. Problems encountered and how they were resolved
- Current task state. Where the conversation left off and what remains to be done
What it does not preserve in full is the verbatim text of every tool result, the complete reasoning chains for decisions already made, and conversational pleasantries that carry no information relevant to continuation.
Client-Side vs. Server-Side Compaction
The Compaction API provides server-side compaction — Anthropic's infrastructure handles the summarization. But the Python and TypeScript SDKs also include client-side compaction, which adds automatic context management when using the tool_runner interface.
The distinction matters for agent developers:
| Property | Server-Side (API) | Client-Side (SDK) |
|---|---|---|
| Implementation | API beta header | SDK tool_runner integration |
| Summarization | Runs on Anthropic's servers | Runs as additional model calls |
| Cost | No additional charge | Additional API calls for summarization |
| Control | API-managed threshold | Developer-configurable |
| Availability | Opus 4.6 and Sonnet 4.6 | Works with supported models |
Server-side compaction has a notable cost advantage: the summarization itself is free. You pay only for the tokens in your actual conversation — the compaction is handled as part of the API's context management without additional billing. For long-running agent tasks that would otherwise require multiple expensive context-window-sized conversations, this produces meaningful cost savings.
Client-side compaction provides more control. The developer can configure when compaction triggers, what information to prioritize in summaries, and how to handle the transition between pre- and post-compaction context. For applications with domain-specific summarization requirements — preserving certain types of context over others — client-side compaction provides the flexibility that server-side compaction's one-size-fits-all approach cannot.
What This Means for Agent Applications
Compaction transforms long-running agent tasks from a context management problem into a straightforward engineering problem.
Without compaction, an agent hitting the context window limit during a complex task faces an unpleasant choice: terminate the task and start a new session (losing accumulated context), attempt manual summarization of the conversation (adding complexity and latency), or truncate older messages and hope nothing important is lost (unreliable).
With compaction, the agent simply continues. The conversation can span hundreds of turns, accumulate megabytes of tool output, and run for hours — the compaction system manages the context automatically. The agent developer does not need to implement context window management at all.
This is particularly impactful for three agent task categories:
Multi-step debugging. A debugging session that spans reading logs, examining code, running tests, making changes, and verifying fixes can easily generate more context than even a million-token window accommodates. Compaction lets the agent maintain awareness of the entire debugging arc without losing track of earlier findings.
Research and analysis. An agent that reads multiple documents, cross-references information, and synthesizes findings accumulates substantial context from the source material. Compaction preserves the synthesized understanding while freeing the token budget consumed by raw source text.
Iterative development. A development task that involves multiple rounds of implementation, testing, and revision generates significant context from code changes and test output. Compaction maintains awareness of what was tried, what worked, and what failed while freeing tokens for the current iteration.
Integration with Neumar
Neumar's agent orchestration layer wraps the Claude Agent SDK's streaming interface through the runAgentWithTracing function, which handles both agent execution and Langfuse observability tracing. The Compaction API integrates at this layer — enabling compaction for agent sessions managed by Neumar requires configuration at the SDK wrapper level rather than changes to individual agent implementations.
For Neumar users, the practical benefit is that agent sessions can run longer and handle more complex tasks without hitting context limits. A Linear ticket-to-PR pipeline that reads issue history, analyzes the codebase, implements changes, runs tests, and opens a pull request — a workflow that might span 50+ tool calls and generate substantial context — can execute as a single continuous session rather than being broken into multiple sessions with manual context transfer.
The combination of the 1M token context window and the Compaction API means that the practical limit on agent task complexity is no longer the context window. The limit is now the task itself — specifically, whether the accumulated decisions and context from earlier in the task are accurately preserved through compaction summaries. For well-structured tasks with clear state progression, compaction preserves this context reliably. For tasks with subtle interdependencies across distant parts of the conversation, the summarization may lose nuance.
The Broader Pattern
The Compaction API is part of a consistent pattern in Anthropic's platform evolution: moving infrastructure concerns from the developer to the platform. Authentication, rate limiting, tool dispatch, streaming, and now context management — each release moves another piece of the agent development stack from "something the developer implements" to "something the platform handles."
For agent developers, this trajectory is worth tracking because it progressively lowers the surface area of manual implementation required to build reliable agent systems. The compaction you would have spent a week building — and months debugging edge cases around summarization quality, context preservation, and continuation coherence — is now a beta header in your API request.
Neumar's agent architecture is built on the Claude Agent SDK's streaming interface, with Langfuse tracing for observability. The Compaction API integrates at the SDK layer, enabling effectively unbounded agent sessions without changes to individual agent implementations or the AG-UI streaming protocol.
