Warden
beta · v2.14.0
beta · v2.14.0
Documentation

Session Intelligence

Maturity note: Session intelligence is heuristic. It improves sessions but doesn’t carry the same guarantee as deterministic safety rules. Signals are bounded, advisory in nature, and degrade gracefully when ambiguous.

Warden tracks session health continuously and intervenes only when signals indicate degradation. Most of this runs out-of-band — stored outside the model’s context window and surfaced selectively. In a healthy session, you won’t see any of it.

SignalWhat it detects
FocusHow concentrated the agent’s work is across the file system.
Loop detectionRepeated command patterns that aren’t producing progress.
Verification debtEdits accumulating without test or build runs.
DriftHow far the session has moved from its original goal.
PhaseWhether the session is warming up, productive, exploring, struggling, or late.

Session Phases

Warden classifies each session into phases. The current phase influences how sensitive Warden is to signals and how aggressively it intervenes.

PhaseBehavior
WarmupThe agent is reading files and understanding the task. Warden is lenient.
ProductiveCore working phase. Warden watches for loops and drift but keeps a light touch.
ExploringThe agent has shifted from its original task. Not necessarily bad, but tracked.
StrugglingError rates are climbing, commands are repeating. Warden increases advisory frequency.
LateDeep into the session, context pressure is mounting. Compression tightens, guidance increases.

Phases aren’t strictly sequential. A session can move from Struggling back to Productive if the agent resolves its issues.

How Signals Work

Focus tracks how concentrated the agent’s work is. Editing three files in src/auth/ is high focus. Jumping between src/database/, src/ui/, and package.json drops it. When focus drops significantly, Warden suggests returning to the original subsystem.

Loop detection identifies repeated patterns without progress. The simplest loop is running the same command and getting the same error — but Warden also detects subtler patterns like edit-build-fail cycles where edits aren’t addressing the actual error.

Verification debt accumulates when the agent edits files without running tests or builds. Editing 2-3 files before building is normal. Editing many files across several directories without verification is risky.

Drift measures divergence from the session’s initial goal. It combines subsystem switching, file distance from the original working set, and whether the agent is still touching relevant files.

How Interventions Work in Practice

Early turns (Warmup): The agent reads files, runs searches, examines the directory. Warden is silent.

Mid-session (Productive): The agent edits two files. No advisory — that’s normal.

Debt rising: Six files edited without a build. Warden injects: “Verification debt: 6 files edited since last build. Consider running tests.”

Looping: The agent builds, gets an error, edits, builds again, same error. After several cycles: “Loop detected on the same error. The approach may need rethinking.”

Recovery: The agent takes a different approach, the build passes, tests are green. Warden goes silent again.

Advisory frequency scales with session health — healthy sessions hear almost nothing, struggling sessions get more guidance. Safety signals always fire regardless.

Cross-Session Learning

When a session ends, Warden processes what happened in the background:

  • Resume packets — compact summaries injected when the next session starts or after context compaction, so the agent doesn’t lose hard-won context.
  • Repair patterns — maps error types to fixes that worked.
  • Working set ranking — which files and directories matter most for this project.
  • Effectiveness tracking — which advisories preceded progress vs. were ignored, used to improve future guidance.

This learning is local, bounded, and project-scoped. It helps the next session start with context rather than from scratch.

Session Scorecard

On demand via warden scorecard, Warden generates a quality summary:

DimensionWhat it measures
SafetyHow many safety rules fired
EfficiencyCompression ratio, unnecessary command count
FocusSubsystem switches, drift from original goal
Session healthLoops detected, verification debt at end