Claude Code v2.1.89 + 300K max_tokens: The Updates That Change How You Build with Claude
PreToolUse defer support, PermissionDenied hooks, CJK history bug fix, and Batches API with 300K max_tokens for Opus/Sonnet 4.6. Here's what actually matters in the April 2026 Claude releases.
Claude Code v2.1.89 shipped alongside a batch of platform API updates that, taken together, represent a significant upgrade to how precisely you can control agentic behavior — and how much context you can work with in a single call.
Here’s the breakdown of what actually matters.
Hook System Upgrades: defer and PermissionDenied
The two most significant Claude Code changes are in the hook system.
PreToolUse defer Decision
Previously, PreToolUse hooks could return approve or block. Now they can return defer — which passes the decision up to the user rather than making an automated call.
Why this matters: before defer, hooks had to make binary decisions. For edge cases or ambiguous situations, hooks were forced to either block (too restrictive) or approve (too permissive). defer adds a third path: “I’m not sure, ask the human.”
Practical use case: a code execution hook that defers when it detects commands with side effects it can’t fully evaluate. Rather than blocking all shell commands (too broad) or allowing all (too risky), it defers exactly the ambiguous ones.
{
"decision": "defer",
"reason": "Shell command with potential side effects: detected network call"
}
PermissionDenied Hook and Automatic Denial Notifications
When an agent attempts an action that’s denied by permissions, Claude Code now fires a PermissionDenied hook. This is significant for two reasons:
Observability: You can now log, monitor, and alert on permission denials. Before this, denials were silent from an automation perspective. Now you can build dashboards that track what your agent is attempting and being blocked from.
Agent feedback loops: The agent itself receives structured notification about denials, enabling it to adjust strategy rather than silently failing or infinite-retrying. Pairs directly with the defer pattern.
CJK and Emoji History Bug Fix
This one is underrated for non-English developers. A bug caused 4KB truncation of conversation history when messages contained Chinese, Japanese, Korean characters, or emoji. The issue: multi-byte character handling was calculating byte length instead of character length for the history buffer.
For CJK-language development workflows, this bug was silently corrupting context in long sessions. The v2.1.89 fix restores full history integrity for multi-byte character sets.
Autocompact Infinite Loop Fix
Autocompact — Claude Code’s automatic context compression for long sessions — had an edge case where the compaction loop could trigger itself recursively under specific token boundary conditions. The fix is operational; if you were seeing runaway context compression, update.
API: 300K max_tokens for Batches (Opus/Sonnet 4.6 GA)
The platform-level announcement is bigger than the Claude Code changelog entry suggests.
Anthropic’s Message Batches API now supports 300,000 max_tokens per call for Claude Opus 4.6 and Claude Sonnet 4.6. This is a 50% increase from the previous 200K limit.
What 300K actually enables
The practical jump isn’t just “more tokens.” It’s the qualitative shift into new task categories:
Whole-codebase analysis in one call: A medium-sized React application (40–60 files, 20,000–30,000 lines of TypeScript) now fits comfortably in a single call with room for detailed output. Previously, this required chunking and multi-call aggregation.
Full documentation generation: Send the entire codebase + a documentation template and receive comprehensive API documentation in one pass. No stitching multiple partial outputs together.
Cross-file architectural reports: “Analyze all 80 files in this service and identify coupling violations, dead code, and performance antipatterns” — now a single call that delivers a coherent report, not fragmented partial analyses.
Full transcript summarization: For AI applications that accumulate long conversation history, 300K means you can analyze and summarize entire interaction transcripts without truncation.
Batches API specifically
The 300K limit is confirmed for the Batches API (async, up to 24h processing, 50% cost discount). This makes it particularly useful for:
- Nightly codebase audit jobs
- Scheduled documentation refreshes
- CI/CD integration for comprehensive code review on large PRs
For real-time applications, the standard API limit remains lower. But for asynchronous workflows, the combination of 300K context + 50% batch discount makes comprehensive single-call analysis economically viable at scale.
Fine-Grained Tool Streaming + Structured Outputs GA
Both features are now generally available (out of beta):
Fine-grained tool streaming: Tool inputs stream token-by-token as they’re generated, rather than delivering the complete tool call at the end. For UX: users see the agent “thinking” in real time. For latency: time-to-first-token on tool results drops.
Structured outputs: Native JSON schema enforcement at the API level. The model’s output is guaranteed to match your specified schema — no parsing, no validation fallbacks, no “sometimes the model wraps the JSON in markdown” edge cases.
Upgrade Checklist
If you’re running Claude Code or building on the Anthropic API:
- Update to v2.1.89:
npm install -g @anthropic-ai/claude-code@latest - Review your PreToolUse hooks — add
defercases for ambiguous edge cases - Add a
PermissionDeniedhook for observability in production deployments - Test history integrity if you have CJK/emoji in your workflow
- Evaluate Batches API for any nightly analysis workflows you’re running manually
The hook improvements in particular are worth implementing immediately. The defer decision type and PermissionDenied notifications are the building blocks of production-grade agent reliability — not optional features but operational necessities at scale.
Sources: Claude Code Changelog · Claude Platform Release Notes