Claude Code v2.1.89 + 300K max_tokens: The Updates That Change How You Build with Claude

Claude Code v2.1.89 shipped alongside a batch of platform API updates that, taken together, represent a significant upgrade to how precisely you can control agentic behavior — and how much context you can work with in a single call.

Here’s the breakdown of what actually matters.

Hook System Upgrades: `defer` and `PermissionDenied`

The two most significant Claude Code changes are in the hook system.

PreToolUse `defer` Decision

Previously, PreToolUse hooks could return approve or block. Now they can return defer — which passes the decision up to the user rather than making an automated call.

Why this matters: before defer, hooks had to make binary decisions. For edge cases or ambiguous situations, hooks were forced to either block (too restrictive) or approve (too permissive). defer adds a third path: “I’m not sure, ask the human.”

Practical use case: a code execution hook that defers when it detects commands with side effects it can’t fully evaluate. Rather than blocking all shell commands (too broad) or allowing all (too risky), it defers exactly the ambiguous ones.

{
  "decision": "defer",
  "reason": "Shell command with potential side effects: detected network call"
}

`PermissionDenied` Hook and Automatic Denial Notifications

When an agent attempts an action that’s denied by permissions, Claude Code now fires a PermissionDenied hook. This is significant for two reasons:

Observability: You can now log, monitor, and alert on permission denials. Before this, denials were silent from an automation perspective. Now you can build dashboards that track what your agent is attempting and being blocked from.

Agent feedback loops: The agent itself receives structured notification about denials, enabling it to adjust strategy rather than silently failing or infinite-retrying. Pairs directly with the defer pattern.

CJK and Emoji History Bug Fix

This one is underrated for non-English developers. A bug caused 4KB truncation of conversation history when messages contained Chinese, Japanese, Korean characters, or emoji. The issue: multi-byte character handling was calculating byte length instead of character length for the history buffer.

For CJK-language development workflows, this bug was silently corrupting context in long sessions. The v2.1.89 fix restores full history integrity for multi-byte character sets.

Autocompact Infinite Loop Fix

Autocompact — Claude Code’s automatic context compression for long sessions — had an edge case where the compaction loop could trigger itself recursively under specific token boundary conditions. The fix is operational; if you were seeing runaway context compression, update.

API: 300K max_tokens for Batches (Opus/Sonnet 4.6 GA)

The platform-level announcement is bigger than the Claude Code changelog entry suggests.

Anthropic’s Message Batches API now supports 300,000 max_tokens per call for Claude Opus 4.6 and Claude Sonnet 4.6. This is a 50% increase from the previous 200K limit.

What 300K actually enables

The practical jump isn’t just “more tokens.” It’s the qualitative shift into new task categories:

Whole-codebase analysis in one call: A medium-sized React application (40–60 files, 20,000–30,000 lines of TypeScript) now fits comfortably in a single call with room for detailed output. Previously, this required chunking and multi-call aggregation.

Full documentation generation: Send the entire codebase + a documentation template and receive comprehensive API documentation in one pass. No stitching multiple partial outputs together.

Cross-file architectural reports: “Analyze all 80 files in this service and identify coupling violations, dead code, and performance antipatterns” — now a single call that delivers a coherent report, not fragmented partial analyses.

Full transcript summarization: For AI applications that accumulate long conversation history, 300K means you can analyze and summarize entire interaction transcripts without truncation.

Batches API specifically

The 300K limit is confirmed for the Batches API (async, up to 24h processing, 50% cost discount). This makes it particularly useful for:

Nightly codebase audit jobs
Scheduled documentation refreshes
CI/CD integration for comprehensive code review on large PRs

For real-time applications, the standard API limit remains lower. But for asynchronous workflows, the combination of 300K context + 50% batch discount makes comprehensive single-call analysis economically viable at scale.

Fine-Grained Tool Streaming + Structured Outputs GA

Both features are now generally available (out of beta):

Fine-grained tool streaming: Tool inputs stream token-by-token as they’re generated, rather than delivering the complete tool call at the end. For UX: users see the agent “thinking” in real time. For latency: time-to-first-token on tool results drops.

Structured outputs: Native JSON schema enforcement at the API level. The model’s output is guaranteed to match your specified schema — no parsing, no validation fallbacks, no “sometimes the model wraps the JSON in markdown” edge cases.

Upgrade Checklist

If you’re running Claude Code or building on the Anthropic API:

Update to v2.1.89: npm install -g @anthropic-ai/claude-code@latest
Review your PreToolUse hooks — add defer cases for ambiguous edge cases
Add a PermissionDenied hook for observability in production deployments
Test history integrity if you have CJK/emoji in your workflow
Evaluate Batches API for any nightly analysis workflows you’re running manually

The hook improvements in particular are worth implementing immediately. The defer decision type and PermissionDenied notifications are the building blocks of production-grade agent reliability — not optional features but operational necessities at scale.

Sources: Claude Code Changelog · Claude Platform Release Notes