#4 Before the agent gets memory, it needs accountability

Why the first store layer is about trust, not personalization.

Apr 10, 2026

Before any message flows through Intendant, I need to decide what it remembers.

Not in the product sense — contacts, preferences, history. That comes later. I mean the operational kind: what did the system do last night, what is it waiting for me to approve, what broke and when. The kind of memory that makes a system trustworthy rather than just functional.

This is the store layer. Two components. Simple in shape, load-bearing in practice.

Starting from the actual problem

My instinct was to start with the schema. Three tables — event log, confirmations, kv store — map to file, implement, done.

I stopped myself.

The schema is an answer. I hadn’t asked the right question yet.

The right question: what breaks first when the agent starts acting? What is the thing that, without it, makes the system impossible to trust?

Two things, in order.

First: the agent acts and I have no record of what it did. I wake up, it ran overnight, and I am flying blind. I can’t verify, I can’t debug, I can’t audit. That’s the observability problem.

Second: some actions need my approval before they happen. The agent wants to send an email. It has to wait for my tap. If the process restarts before I tap, the pending action is gone. That’s the confirmation durability problem.

The kv store — Gmail cursor, scheduler state — is a real need. It’s not a right now need. The scheduler doesn’t exist yet. I’m not designing for it.

Two components. Everything else is deferred.

The event log

Simple in theory: every significant thing that happens gets written down. Append-only, never mutated, queryable later.

The interesting design question is what significant means. Left undefined, it drifts. Someone adds an event here, skips one there, and six months later the log is incomplete in ways you only discover during a 3am debugging session.

So I wrote a contract. An event must be emitted for every inbound message, every policy decision that blocks or gates an action, every step in a confirmation’s lifecycle, every tool call and its result, every outbound message, and every failure. If something isn’t in that list, it doesn’t need an event. If something in that list doesn’t produce one, that’s a bug.

Each event knows who did what: the actor (agent, me, a tool, the system), the action name, where it came from, and what it concerned. When it relates to a confirmation, it carries a reference back to the pending action. That thread is what lets you reconstruct a full decision chain later — draft by draft, note by note.

The implementation is SQLite today. I want something richer eventually, with a proper UI for browsing decisions. The interface is designed so that swap is one line in setup. The rest of the system never notices.

One thing I decided explicitly: the event log is for audit and debugging only. Not for state reconstruction. Not for replay. If the process restarts, the agent recovers from the confirmation store — not by replaying events. That’s a deliberate choice to keep things simple at this scale.

The confirmation store

This one took more thinking.

The naive design is a gate: agent proposes, I tap yes or no, action executes or is dropped.

Binary. Simple. A dead end.

It’s a dead end because my actual use case is more nuanced than yes or no. I want to guide the agent toward how I would handle things. When it drafts an email I don’t like, I don’t want to reject it and start over. I want to send it back with a note. I want to stay in the loop until the draft is right.

So revision is first-class. Three outcomes: approved, rejected, revised. The confirmation isn’t a gate — it’s a pending action with state.

The iteration model: agent proposes, I tap revise and add a note, agent reproposed with a new draft, repeat until I approve or reject. No limit on revisions. The confirmation record always reflects the current state — the latest draft, the current status. The full history of what was tried, what notes I gave, what changed — that lives in the event log, linked by the same id.

One record in the confirmation store. N events in the log. The store answers: what is pending right now. The log answers: what happened to get here.

One thing that isn’t obvious: revised is not a terminal state. The agent moves it back to awaiting once it has reproposed. approved, rejected, and expired are terminal — once you’re there, there’s no way out. This distinction matters for the tests, and it matters for the operational question of what the system is actually waiting on at any given moment.

How the two components fit together

The two components are separate by design. The event log doesn’t know about the confirmation store. The confirmation store doesn’t know about the event log. That’s the right design.

But some operations have to write to both.

Creating a pending action and logging its creation must succeed or fail together. If one side writes and the other fails, history and current state diverge. In a human-facing approval flow, that mismatch becomes a trust failure — you approved something the system has no record of, or the system is waiting on something that no longer exists.

The fix is a small set of compound methods on the Store facade — one for creating a pending action, one for resolving it, one for expiring it — each wrapping both writes in a single transaction. The individual stores stay separate. The agent never calls them independently.

Testing before the agent exists

The agent doesn’t exist yet. The Telegram adapter doesn’t exist yet. That is exactly the right moment to test this layer.

At this point I wanted proof, not just diagrams. So I wired a small CLI harness to exercise the store layer before the agent exists.

you> /plan send_email {"to":"alice@company.com","subject":"Q3 report","body":"Hi Alice, please find the Q3 report attached."}
bot> Created pending action 70045045… (send_email)
bot> Expires in 1h. Use /approve or /revise.

you> /revise 70045045 Make it less formal, add a greeting
bot> Revised 70045045… — note: Make it less formal, add a greeting

you> /pending
bot> Revised:
  70045045… send_email — {"to": "alice@company.com", "subject": "Q3 report", "body": "Hi Alice, please find the Q3 report attached."}

you> /events
bot> [14:39:31] action.created by agent — send_email
bot> [14:39:45] action.revised by me — "Make it less formal, add a greeting"

That tiny transcript is enough to validate the shape of the design. Current state lives in the confirmation store. History lives in the event log. And both exist independently of any agent loop or model call.

In this harness, /plan stands in for the future agent proposal step. The important part at this stage is that the pending state and event history are both persisted correctly.

The unit tests cover the state machine — every valid transition, every invalid one, what happens when a write fails halfway through. The integration tests run against a real SQLite file, with direct calls replacing the agent: happy path approval, a full revision cycle, expiry, and process restart recovery.

That last one is the most important. A fresh Store against the same database file should recover all pending actions with no event replay. The agent recovers from current state, not from history. If that test passes, the recovery model is sound.

What’s next

Telegram adapter end to end. A deterministic stub agent — no LLM, fixed routing, real confirmation flows. One complete approval cycle in production shape before any model touches it.

Infrastructure before intelligence. Still.

— Louis

Louis de Vitry

Discussion about this post

Ready for more?