context-raii: Task-Scoped Context Management for Claude Code

The problem

When Claude Code's context window fills up, it compacts: the agent summarizes the conversation history and discards the raw content. This is lossy by design. Summaries can't be decompacted back to tool call results.

The compaction algorithm has no information about which pieces of context are still needed. A 4,000-token file read done for a task that completed ten minutes ago looks identical to a file the current task depends on. Both become compressed prose. After compaction, Claude re-reads files it already ingested, wasting tokens recovering context that should have been evictable in the first place.

The problem is that context has no structure. Each tool result is a chunk of bytes in a flat sequence, with no record of why it exists, which task produced it, or whether it's still needed. The compaction system can't make a principled decision because the information isn't there.

The RAII insight

In systems programming, RAII ties resource lifetime to object scope. Memory, file handles, and locks are acquired when an object is created and released when it goes out of scope. The resource lifecycle matches the code that needed it.

The same principle applies to context. Every file read, bash output, and search result was acquired for a reason: to complete some task. When that task finishes, the context it produced should be eligible for release. Not immediately deleted, but marked lower-priority: something the compaction pass can safely condense or drop.

context-raii operationalizes this through Claude Code hooks. Each tool result gets tagged to the active task at ingestion. When a task completes, its chunks are marked evictable. When compaction fires, the system injects structured guidance about what to drop and what to preserve.

How it works

When a tool result comes in, the system tags it: which task was active when this call fired, stored in SQLite alongside the content. That tag is the core primitive. Everything else follows from it.

Write-invalidation is the first thing that consumes it. If the result was a file edit, any existing reads of that path are freed immediately. A file that has been written is stale, and there is no reason to hold onto the old read.

The second consumer is task completion. When a task closes, the eviction engine scans every chunk it owns. A chunk is evictable if its task is done. Two things can keep it alive: an active task that has explicitly referenced it, or an active task that declared a dependency on the owning task. One automatic override: a task that accumulates 50+ chunks without closing gets auto-abandoned. It is a proxy for "the user moved on" and prevents stale exploratory context from staying pinned indefinitely.

At compaction, the system translates all of this into a prioritized hint to the summarizer: here is what is safe to drop, here is what to preserve. When Claude wakes in a new context window after compaction, it gets a state summary of active tasks and tagging status so the session can continue without starting over.

Benchmark results

Scenario	Eviction	Completion	Refetch	Result
sequential_clean	85.7%	100.0%	0.0%	PASS
cross_cutting_refactor	95.8%	100.0%	0.0%	PASS
exploratory_abandon	100.0%	0.0%	0.0%	PASS
parallel_tasks	95.0%	100.0%	0.0%	PASS
long_chain	87.5%	100.0%	0.0%	PASS

Most pass without comment. Two show behaviors that are not obvious from the design alone.

cross_cutting_refactor is the write-invalidation case: Task A reads 10 files while active; Task B then edits 6 of them. Each edit immediately frees A's stale reads, even though A is still in-progress. Without write-invalidation, those reads would stay pinned until A completed. Final eviction: 95.8%.

exploratory_abandon is the blunt-instrument case: 55 files read into a task that's never closed. The 50-chunk threshold triggers auto-abandonment, freeing all 57 chunks. Eviction rate: 100%, completion rate: 0%. The task is abandoned, not completed. The distinction is preserved so you can tell timeouts from intentional closures.

Zero refetches across all five scenarios. No chunk was marked evictable while an active task still needed it.

Known limitations

The abandoned-task threshold is blunt. A legitimate long-running task that reads 60 files will get auto-abandoned. The threshold is configurable, but the engine counts chunks, not elapsed time or explicit user intent, so it can misfire on genuinely large tasks.

Compliance verification is a proxy, not a guarantee. PreCompact injects eviction hints and PostToolUse watches subsequent reads as a signal for whether compaction honored them. But there's no API to inspect what the summarizer actually included. The compliance numbers are informative, not authoritative.

The system trusts task declarations. A task incorrectly marked completed while work is ongoing will free its chunks. PreToolUse reduces this risk by requiring an active task for work tools, but doesn't eliminate it.

Getting started

Setup, hook configuration, and inspection commands are in the README at github.com/sanjitr11/context-raii. The benchmark harness runs against isolated databases and produces pass/fail results for all five scenarios, with pass thresholds encoded alongside each scenario definition.