Spendwall: an agent spending firewall - Catalin-Stefan Niculescu

Week 1 · 1 May 2026

Two days ago I opened a fresh folder on my Desktop and started writing Spendwall. The workspace is up, the daemon runs, the CLI moves it around, but most of what makes it useful is still ahead of me. I'm writing this post anyway, while it's early, so the second post about it can talk about what actually changed.

What it's going to be

Meter tracks one developer's cost on one CLI. Spendwall is the firewall version of the same idea: a local-first daemon that captures, attributes, and gates AI-agent spending across multiple agents and multiple vendors at once. Same intuition (cost awareness changes how you build with these tools), different scale (a team, an org, several agents, OpenAI and Anthropic together, eventually Stripe Link agent wallets too).

Three jobs, in the order I'll get them done:

Observe. Capture every API call from the agents you run, Claude Code, Codex, your own Python or TypeScript scripts hitting OpenAI or Anthropic, and show what each one spent, tagged by whatever axis you care about (agent, customer, project, environment).

Cap. Set a budget per agent, per tag, per rolling window. When it's hit, choose: alert, block, or require approval. There's a 250ms emergency kill switch on the dashboard for when something has visibly gone off the rails.

Hand the budget back to the agent. Through an MCP server, the agent itself can call get_remaining_budget or estimate_cost before it commits to a million-token plan. Cost-aware agents make better decisions than cost-blind ones. That's the architectural bet I most want to test, almost every cost-control tool I've seen treats the agent as the adversary, but the agent is in the best position to do something about a tight budget if you just let it ask.

Local-first by default. SQLite for the audit ledger, TOML for human-edited config. Fail-open by default, configurable to fail-closed for compliance customers. No prompt rewriting, no model substitution, no context compression, the proxy is byte-for-byte transparent. Cloud sync is M3+, a strict superset of the local mode rather than a prerequisite.

What stands up today

Two days of commits. The honest inventory:

A Cargo workspace with five foundational crates (core, config, audit, policy, panel) and two binaries (spendwall, spendwall-daemon). The CLI has the subcommands you'd want for a daemon you don't want to think about: init writes default configs without clobbering existing ones, start and stop manage the daemon via PID file, status reports port and PID liveness, doctor emits a Markdown health report, panel opens the dashboard in a browser.

The daemon itself brings up config, audit, policy, and panel in order, listens on a chosen port, handles SIGINT and SIGTERM cleanly, and supervises its child tasks with a restart-rate cap so a flapping component can't burn CPU. There's an in-process integration harness for tests that spin up the whole daemon end-to-end without leaking ports or PIDs between runs, plus smoke tests against a clean macOS host and a fresh Ubuntu container.

That's it. The skeleton stands. The muscles haven't been wired up.

What's still WIP

Roughly the next eight weeks, by the spec:

Path B: the loopback HTTP proxy that lets you instrument Claude Code or Codex by setting two env vars. This is the wedge that turns Spendwall from a workspace into a useful thing.

Path A: the Python and TypeScript SDK wrappers for custom agents that import the OpenAI or Anthropic SDK directly.

Per-agent rolling-window caps, the kill switch, and attribution by user-defined tag. The data model and the actor are sketched in the policy crate but nothing acts on them yet.

The MCP server. The interesting bit. Read-only budget tools first; request_preapproval in M2.

Stripe webhook integration for gating Link agent-wallet purchases against policy, with HMAC verification on the receive side.

Slack alerts for cap events, signed-binary distribution via Homebrew tap and curl | sh, and a per-user supervisor (launchd on macOS, systemd --user on Linux) so the daemon survives terminal closes.

Why I'm posting now

The post-it-when-it-ships pattern means I never write down the early decisions, which are usually the most interesting ones to other engineers. The fact that the architecture is already shaped before any of the actual capture paths exist, supervisor, audit pipeline, panel, hot-swap config, kill-switch dashmap, means there are real engineering choices behind the skeleton, even if you can't see them yet from the outside.

Also: I want a snapshot of what the day-two version of the project felt like, so the day-sixty post has something to compare itself to.

Repo lives at github.com/catancs/spendwall, or will the moment I push it public. I'll come back here with a follow-up the day Path B is forwarding real telemetry.

· · ·

Week 2 · 7 May 2026

Six days and 200-odd commits later. The skeleton has muscles now: Path B started forwarding real telemetry on Monday, the SDKs ship, the MCP server is live, and the kill switch cuts an in-flight upstream call in under 250ms in tests. This is the follow-up I promised at the end of week one, the one I said I'd write the day Path B was real.

What's actually working

One policy engine, four interfaces, the MCP spoke is the one most cost-control tools don't ship.

Path B: the loopback proxy. Set OPENAI_BASE_URL and ANTHROPIC_BASE_URL at the local proxy and Spendwall is in front of every API call. /v1/messages for Anthropic, /v1/chat/completions, /v1/responses, and /v1/completions for OpenAI. Forwarded byte-for-byte to upstream, audited on the way back, broadcast as SpendObserved events to anything subscribed to the panel websocket. Blocked calls return a real 402 PAYMENT_REQUIRED with a structured error body, which feels like the most HTTP-honest thing the proxy can do when policy says no.

Path A: Python and TypeScript SDKs. Both ship as importable wrappers. In Python, spendwall.openai.wrap(client) patches chat.completions and responses in place; spendwall.anthropic.wrap(client) patches messages.create. In TypeScript, createSpendwallOpenAI() and createSpendwallAnthropic() return drop-in clients. Both SDKs talk to the daemon over a Unix domain socket with 0600 perms, faster than loopback HTTP and lets the daemon trust the caller via filesystem ownership rather than an additional auth layer.

Both SDKs default to fail-open. If the daemon is down, the API call still goes through; you just don't get policy enforcement. That's the right default for "I added Spendwall to my agent" not to mean "my agent stopped working." There's a per-call mode="strict" override and an env-var override for compliance customers who'd rather fail-closed.

The kill switch. Per-agent and global. The integration test that asserts a kill-switch flip cancels in-flight upstream calls within 250ms passes consistently. Mid-stream kills are handled too, if you hit the kill while an SSE response is streaming back, the proxy aborts the upstream connection and writes a partial-cost audit row using whatever usage tokens it had already parsed off the side channel. Same path handles client disconnects.

Approval flow. When a policy decision is "approval required", the proxy doesn't 4xx and walk away, it holds the request open, inserts a row into the approvals table, broadcasts a websocket event so the panel pings, and polls the row until you grant, deny, or it times out. Holding the request rather than streaming the approval back means it works with any HTTP client unchanged. There's a background expirer that clears stale pending approvals, and panel endpoints (GET /api/approvals, POST /api/approvals/:id/resolve) for the dashboard.

The MCP server. The architectural bet from week one. It's running. spendwall-mcp is an rmcp stdio server that exposes get_remaining_budget and estimate_cost as MCP tools, backed by a UDS read-only projection from the daemon (/v1/policy/snapshot). The agent (Claude Code, Codex, whatever) can now ask get_remaining_budget before it commits to a million-token plan. Cost-aware agents really do make different decisions than cost-blind ones; I've watched it happen in my own sessions for the past three days.

Slack and email alerts. Two transports behind a common Transport trait, dispatched through AlertDispatcher with capped exponential backoff and a SQLite dead-letter table for messages that exhaust their retry budget. The proxy emits CapBlocked, ApprovalRequested, and CapAlert events; the dispatcher fans them out. Email goes through lettre with STARTTLS; Slack is HTTP webhooks with a careful retryable-vs-permanent error mapping (the kind of thing where 408 belongs in the retryable bucket and 401 doesn't, and you only learn that by getting it wrong once).

Stripe webhook ingestion and reconciliation. HMAC-SHA256 signature verification, a replay window with saturating arithmetic (a negative window almost slipped through, tests caught it), and a StripeReconciler that matches invoice line items back to spend_events rows. The day Anthropic or OpenAI invoices arrive, the daemon reconciles the local audit ledger against what was actually billed and flags drift.

Pre-flight cost estimation. Tokenizer-based, with a per-provider story. OpenAI uses tiktoken-rs with lazy-cached encoders, entirely local, microseconds per call. Anthropic doesn't have a local tokenizer, so the proxy makes a /v1/messages/count_tokens call against the upstream API with a 50ms timeout. If it times out, it falls back to a byte-based estimate. That fallback path is the kind of thing I'd normally over-engineer; here I just want the proxy to not become the bottleneck.

Hot-reload everything. caps.toml, pricing.toml, alerts.toml, all watched on disk via a debounced file watcher, all hot-swapped through ArcSwap handles with last-good-wins semantics. Edit a cap, save, the daemon picks it up without a restart. If you save garbage, the daemon keeps running on the last good config and logs the parse error.

Distribution. cargo-dist for the Rust binaries, npm for the TypeScript SDK, pypi for the Python SDK, Homebrew tap for the daemon. Tagged releases trigger the whole pipeline. There's a fail-loud guard on the release workflow so an unmapped tag suffix can't silently produce empty packages.

The panel. A Vite + React UI embedded into the daemon binary via rust-embed. Auth-check against a tri-state endpoint, live /api/events websocket for spend events and approval requests, kill-switch button that hits /api/kill_switch. It's a skeleton compared to where it'll need to be at GA, but for a build-in-public alpha the bones are honest.

Things that hardened

The architectural bet from week one (let the agent ask about the budget) held up. The MCP server is the smallest, simplest crate in the workspace, and the most useful part of the system from where I'm sitting. Two read-only tools and a UDS-backed snapshot client. That's it. Cost-aware agents really do scope their plans differently when they can ask; I no longer think this is a hopeful claim.

The fail-open default, on the other hand, took longer to feel right than I expected. Every instinct says compliance-shaped tools should fail closed. But for the audience that adopts this first (solo devs and small teams putting it in front of their agents), fail-closed means "your agent stops working when my daemon crashes," which is a worse outcome than missed enforcement. Strict mode is one env var away. That asymmetry felt wrong for a day and now feels obviously correct.

The proxy being byte-for-byte transparent is the constraint that keeps paying off. No prompt rewriting. No model substitution. No context compression. The only thing the proxy does to the response is parse usage tokens off a side channel for streaming responses, and even that is non-mutating. Every time I've been tempted to "just add" a small mutation I've talked myself out of it within a day. The class of bugs that opens up is enormous and the upside is small.

What's still WIP

The release pipeline ships binaries but the signed-binary distribution and the Gatekeeper notarization on macOS aren't done; first-run on a fresh Mac currently requires the right-click-Open dance. launchd on macOS and systemd --user on Linux supervisor units are written but not yet installed by spendwall init. Cloud sync is M3 work and hasn't started, by design, local-first means local-first works first. And the panel needs real charts; the current "show me the last hour by tag" view is a list, not a graph.

On the agent-side ergonomics, the next thing I want is request_preapproval as a third MCP tool. Right now an agent that's about to do something expensive can ask how much budget it has, but it can't proactively reserve it. That belongs in M2.

Closing

The whole reason I wrote a Week 1 post about a project that did almost nothing yet was so this Week 2 update would have something to compare itself to. That worked. Reading the Week 1 post a week later, the honest things hit hardest: the inventory of WIP at the end, the bit about the muscles not being wired up. None of that aged badly. The skeleton I described was the skeleton, and the muscles got wired in roughly the order I expected.

The next post will be when something genuinely surprises me, either a compliance customer asks for something I haven't designed for, or someone uses the MCP tools in a way I didn't anticipate, or the proxy hits a class of upstream behavior I didn't plan for. That post is more interesting to write than this one. This one is just bookkeeping.

Repo is at github.com/catancs/spendwall. brew install catancs/tap/spendwall, spendwall init, spendwall start, spendwall panel, that's the whole onboarding for the local mode. SDKs are pip install spendwall and npm install spendwall.

· · ·

Evening · 7 May 2026

Two things happened between the Week 2 post above and dinner: M1 closed out, and most of M2's visibility work landed. Cut a release as v0.0.9.

M1 closeout

The release pipeline that was "ships binaries" by the morning's post now actually ships them. The npm provenance attestation was wrong, sigstore couldn't verify the package, so npm install spendwall failed for anyone who'd opted into --require-provenance. The Homebrew formula was landing in the tap repo's root rather than Formula/, so brew install didn't see it until I went looking. There was a saturating-arithmetic bug in the Stripe replay-window check that let a negative window slip through under specific clock skew; it's caught by tests now and the math is saturating at the boundary.

The Windows MSVC target got dropped from the release matrix, the supervisor-unit story (launchd on macOS, systemd --user on Linux) doesn't have a Windows equivalent I'm willing to ship yet, so distributing a binary nobody can run-as-a-service feels like worse-than-not-shipping. cargo-deny moved to running on the host rather than in a target-specific container, and the wildcard-version rule got softened so a transitive crate I don't control can't fail the release.

That's the dull part. Now the interesting part.

M2: per-app caps via UDS peer credentials

The thing I most wanted to do in M2. When an SDK call comes in over the Unix domain socket, Spendwall now captures peer credentials at accept time, PID, UID, executable path, derived app name. Each is attached as a request extension and persisted on every spend_events row alongside the existing agent_id. Caps can now scope by which application is making the call, not just by the agent_id the caller chose to send.

This matters because agent_id is self-reported. A misconfigured agent can claim to be anything; a compromised one definitely will. peer_exe can't lie, it's the actual binary the kernel sees on the other end of the socket. So you get a real ground-truth axis for attribution. The CLI surfaces it: spendwall ls --by app|agent --since 1h|1d|7d|30d shows actual spend per real binary. The panel groups by app instead of by self-declared agent.

Curated peer-label table. Raw paths look bad in a UI. ~/.spendwall/labels.toml ships seeded with mappings for the common LLM CLIs (claude, codex, cursor) and the obvious tool runners, so the panel shows "Claude Code" instead of /Users/cata/.local/bin/claude. Users extend it for their own binaries. The seed is deliberately conservative, if I'm not sure what something is, I'd rather show the path than guess wrong.

Tag redaction, mid-flight warnings, fail-closed

Tag-value redaction. At the on-disk persistence boundary, configured tag values get redacted before they ever land in the audit ledger. The kind of thing compliance customers ask for in the first meeting, "what if my agent stuffs the customer email into the request body and we end up with an audit row containing PII?" Now the redactor runs between the proxy's audit-emit helper and the SQLite writer.

Mid-flight 80% cap warnings. Parallel signal alongside hard enforcement. The proxy emits a CapAlert when an in-flight call would push usage past 80% of the cap; alerts route through the Slack/email pipeline that landed in week one. Catches cap creep before it becomes cap hit, which is what you actually want for the soft side of the budget conversation, you don't want every cap to feel like a wall, you want the wall to be the last warning, not the only one.

Fail-closed-on-policy-unavailable. The fail-open default I argued for in week two stays. But there's now a config flag (policy.fail_mode = "closed") that flips the behavior at the daemon level for compliance customers who explicitly want the opposite. Strict mode at the SDK edge plus fail-closed at the engine. The two overrides compose: strict callers get hard refusals when the engine is unavailable, lax callers fail-open or fail-closed depending on the daemon-level switch.

Health and doctor

A /v1/health endpoint over UDS that probes the policy engine specifically (alive, but is the cap evaluator responsive?), and spendwall doctor calls it on top of the existing port + PID + UDS checks. The doctor report finally tells you the daemon is actually serving rather than just running. The number of times I've watched a process appear healthy via ps while the engine inside it was deadlocked is not zero.

Where this leaves things

v0.0.9 ships the M2 work I cared most about. Cloud sync (the M3 milestone) hasn't started, and won't until the build-in-public alpha has run for long enough to know what the cloud half should actually do. As far as the local tool goes: feature-complete for the use cases I had in mind. Per-app attribution, per-app caps, tag redaction, mid-flight warnings, configurable fail mode, real health reporting, signed-and-published-and-installable.

The next post is when somebody hits a class of problem I hadn't designed for.