Session Intelligence
Maturity note: Session intelligence is heuristic. It improves sessions but doesn’t carry the same guarantee as deterministic safety rules. Signals are bounded, advisory in nature, and degrade gracefully when ambiguous.
Warden tracks session health continuously and intervenes only when signals indicate degradation. Most of this runs out-of-band — stored outside the model’s context window and surfaced selectively. In a healthy session, you won’t see any of it.
| Signal | What it detects |
|---|---|
| Focus | How concentrated the agent’s work is across the file system. |
| Loop detection | Repeated command patterns that aren’t producing progress. |
| Verification debt | Edits accumulating without test or build runs. |
| Drift | How far the session has moved from its original goal. |
| Phase | Whether the session is warming up, productive, exploring, struggling, or late. |
Session Phases
Warden classifies each session into phases. The current phase influences how sensitive Warden is to signals and how aggressively it intervenes.
| Phase | Behavior |
|---|---|
| Warmup | The agent is reading files and understanding the task. Warden is lenient. |
| Productive | Core working phase. Warden watches for loops and drift but keeps a light touch. |
| Exploring | The agent has shifted from its original task. Not necessarily bad, but tracked. |
| Struggling | Error rates are climbing, commands are repeating. Warden increases advisory frequency. |
| Late | Deep into the session, context pressure is mounting. Compression tightens, guidance increases. |
Phases aren’t strictly sequential. A session can move from Struggling back to Productive if the agent resolves its issues.
How Signals Work
Focus tracks how concentrated the agent’s work is. Editing three files in src/auth/ is high focus. Jumping between src/database/, src/ui/, and package.json drops it. When focus drops significantly, Warden suggests returning to the original subsystem.
Loop detection identifies repeated patterns without progress. The simplest loop is running the same command and getting the same error — but Warden also detects subtler patterns like edit-build-fail cycles where edits aren’t addressing the actual error.
Verification debt accumulates when the agent edits files without running tests or builds. Editing 2-3 files before building is normal. Editing many files across several directories without verification is risky.
Drift measures divergence from the session’s initial goal. It combines subsystem switching, file distance from the original working set, and whether the agent is still touching relevant files.
How Interventions Work in Practice
Early turns (Warmup): The agent reads files, runs searches, examines the directory. Warden is silent.
Mid-session (Productive): The agent edits two files. No advisory — that’s normal.
Debt rising: Six files edited without a build. Warden injects: “Verification debt: 6 files edited since last build. Consider running tests.”
Looping: The agent builds, gets an error, edits, builds again, same error. After several cycles: “Loop detected on the same error. The approach may need rethinking.”
Recovery: The agent takes a different approach, the build passes, tests are green. Warden goes silent again.
Advisory frequency scales with session health — healthy sessions hear almost nothing, struggling sessions get more guidance. Safety signals always fire regardless.
Cross-Session Learning
When a session ends, Warden processes what happened in the background:
- Resume packets — compact summaries injected when the next session starts or after context compaction, so the agent doesn’t lose hard-won context.
- Repair patterns — maps error types to fixes that worked.
- Working set ranking — which files and directories matter most for this project.
- Effectiveness tracking — which advisories preceded progress vs. were ignored, used to improve future guidance.
This learning is local, bounded, and project-scoped. It helps the next session start with context rather than from scratch.
Session Scorecard
On demand via warden scorecard, Warden generates a quality summary:
| Dimension | What it measures |
|---|---|
| Safety | How many safety rules fired |
| Efficiency | Compression ratio, unnecessary command count |
| Focus | Subsystem switches, drift from original goal |
| Session health | Loops detected, verification debt at end |