Design Philosophy
Warden exists because of a single observation: intelligence is only valuable if it improves the next decision. A system that detects drift but cannot course-correct is just expensive logging. A system that learns patterns but never applies them is academic. Every feature in Warden traces back to this premise.
Principle 1: Invisible First
The highest compliment a developer can pay Warden is forgetting it exists.
Zero-friction operation means:
- No manual setup per project. Warden detects the assistant, reads its configuration, and starts working. No config to create, no init wizard to run per-project, no onboarding flow to complete.
- No visible output during normal operation. Hooks that approve a tool call produce no terminal output. Only denials, warnings, and injections are surfaced — and even those are delivered through the assistant’s own UI.
- No performance tax. Warden has an internal latency budget per hook. If it cannot decide in time, it approves and logs the miss. The developer never waits.
- No cognitive load. There are no commands to memorize for daily use.
warden doctorexists for debugging;warden server startexists for CI. Neither is required for normal development.
If a feature requires the developer to change their workflow, it must justify itself against the alternative of not existing.
Principle 2: Bounded Intelligence
Every capability in Warden has an explicit cost ceiling.
- Time bounds. Pattern matching evaluates all rules in a single compiled regex pass. No backtracking, no rule-by-rule iteration.
- Space bounds. Session state and cross-session learning are capped. Warden will never fill your disk.
- Complexity bounds. Session health uses a fixed formula with a small set of input variables. No machine learning model, no neural network, no gradient descent. The formula is legible, debuggable, and deterministic.
- Token bounds. Context injections are capped per trust tier. A high-trust session gets at most 1 injection; a low-trust session gets up to 15. Low-trust sessions need more guardrails, not fewer.
When a capability approaches its ceiling, it degrades gracefully. If session phase can’t be confidently determined, Warden falls back to conservative defaults.
Principle 3: Fail-Open
Warden must never be the reason a developer cannot complete their work.
- Errors return approval. If Warden crashes, if the server is unreachable, if the process is missing — the hook returns exit code 0 (approve). The assistant continues unimpeded.
- Parse failures skip analysis. If a tool call’s payload is malformed, Warden logs the error and skips rather than denying.
- Version mismatches degrade gracefully. Warden detects the mismatch, logs a warning, and falls back to stateless mode rather than refusing to operate.
- Missing state initializes defaults. If session data is corrupted or absent, Warden creates a fresh session with conservative defaults.
The only exception is the safety layer. A pattern that matches a known-dangerous command (like rm -rf /) will deny even if surrounding context is ambiguous. Safety overrides liveness.
Principle 4: Hooks Over Prompts
Hooks are more reliable than prompts. This is the core architectural bet.
An advisory system might inject “Please avoid running destructive commands.” The LLM might ignore it, rephrase it, or hallucinate a justification for why this particular command is fine.
A hook-based system intercepts the tool call before execution and returns a verdict: Allow, Deny, or Modify. The LLM never sees the dangerous command succeed. There is no ambiguity, no negotiation, no prompt injection that can bypass the check.
- Hooks are synchronous gates. The tool call cannot proceed until the hook returns.
- Hooks see the actual payload. Not a summary — the exact command or file path.
- Hooks compose. Multiple checks analyze the same call independently. A deny from any check is final.
- Hooks are testable. Given input X, produce output Y. No temperature, no sampling, no stochastic variation.
Context injections (advisories) are Warden’s secondary mechanism, used for guidance that can’t be expressed as a binary gate — phase awareness, focus tracking, convention reminders. But they’re always subordinate to hook verdicts.
Principle 5: Runtime Value Only
Warden does not generate code. It does not write tests. It does not refactor. It does not create files.
Warden’s entire value proposition is making the AI assistant better at doing those things. The distinction matters because it limits the blast radius. If Warden has a bug in phase detection, the worst outcome is a slightly mistuned advisory. The assistant still writes code, still runs tests — just with slightly less contextual guidance.
This also means Warden has no opinion about what you’re building. It doesn’t know or care whether you’re writing a React app or a Kubernetes operator. Its analysis operates at the tool-call level: is this command safe? Is this write consistent with the session’s focus? Is the assistant looping?
Principle 6: Small Outputs
Every byte Warden injects into the context window has a cost — tokens displaced that could have been source code, documentation, or user instructions.
- Deny messages are one sentence.
- Advisories are 2-5 lines. A phase indicator, a focus reminder, a convention note. Never a paragraph.
- Session summaries are capped at 500 tokens.
If output exceeds its budget, it’s truncated, not wrapped. Truncation forces authors to front-load the most important information.
Local-Only, Always
Warden executes entirely within your environment. No rules are downloaded. No session data is sent anywhere. The server runs as a local process; the rules are built in.
This means Warden works offline, behind firewalls, in air-gapped environments, and in enterprise settings where data sovereignty matters. Your code, your rules, your data — all on your machine.
Harness Engineering
Warden represents a bet on harness engineering — building structured systems around LLMs — over prompt engineering — crafting natural-language instructions for LLMs.
Prompt engineering is fragile. It depends on the LLM following instructions reliably, which varies across models, versions, and context lengths.
Harness engineering is structural. Hooks fire regardless of which LLM is behind the assistant. Session signals compute the same way whether the session has 10 turns or 10,000. Pattern matching doesn’t depend on the model understanding what “dangerous” means — it depends on a pattern matching a string.
Anthropic’s harness design work explores multi-agent harnesses — planner/generator/evaluator loops coordinated from the outside. Warden takes the complementary approach: runtime governance from the inside. A multi-agent harness coordinates the work; Warden governs each individual agent session. They compose naturally.
One insight from that work applies directly: “every component in a harness encodes an assumption about what the model can’t do.” As models improve, some assumptions become unnecessary. Warden addresses this by clearly separating deterministic protections (permanent, model-agnostic) from heuristic guidance (may evolve as models get better at self-correction).
The goal is a system where the LLM’s reliability determines the quality of the output, but the harness’s reliability determines the safety of the process. Warden is the harness.