Skip to main content

Command Palette

Search for a command to run...

I Built a Memory Layer for AI That Forces It to Be Honest

Published
8 min read
I Built a Memory Layer for AI That Forces It to Be Honest

A conversation about memory, reasoning, and what happens when you refuse to accept the standard answer.


I didn't set out to build a memory system for AI.

I set out to understand something that was bothering me. And the only way I know how to understand something is to ask questions — not the obvious ones, but the ones that make people pause.

This is the story of those questions. And where they led.


Question 1: Why does context exist at all?

Not "how do we make context better." Not "how do we extend the context window." Why does it exist?

I knew the surface answer: transformers are stateless, so we paste history into the input. But I kept pulling on that thread. Why are transformers stateless? What would it mean if they weren't?

The answer, when I looked at it clearly, was uncomfortable.

Context isn't a feature. It's a workaround.

A transformer is a pure function. Input tokens in, output token out. Nothing persists between calls. The "memory" you experience in a conversation is an illusion we create by copying the past into the present on every single call. When that history grows too long, we cut the oldest parts. The model never actually remembers — it re-reads.

That realization changed how I saw every AI product I'd ever used.


Question 2: Isn't this just like a stateless server?

This one came out of nowhere and turned out to be the sharpest analogy I found.

A JWT-authenticated server has no session state. Every request carries everything the server needs in the token. The server reads it, acts on it, and immediately forgets. The client carries all the state. Sound familiar?

The LLM is a dummy JWT server. The context window is the token.

The difference: a JWT payload stays small by design. You only put claims in it. Nobody puts their entire chat history in a JWT. But that's exactly what we do with context — we keep appending to it until it explodes.

JWT "solved" statelessness by keeping the payload tiny, sending only what the server needs to decide. LLMs have no such discipline. They dump everything.

So: what would a JWT-style context look like? Send only what the model needs to reason, right now. Not the whole history. The relevant facts.

That question opened a door.


Question 3: Isn't this like the human mind while doing something?

Think about driving.

You're not consciously thinking about where you learned to drive. You're not replaying every road trip you've ever taken. Your entire life history is not loaded into working memory. Only what matters right now is active: the car ahead is slowing, you need to turn left in 200 meters, the road is wet.

The rest of your knowledge? Somewhere deeper. Surfacing on demand when relevant.

The human brain doesn't paste its full history into every thought. It retrieves what's relevant.

Current AI does the opposite. It pastes everything, hoping the model figures out what matters. The model, to its credit, tries. But it's an expensive, noisy, bounded approach to a problem that biology solved differently.

What if we built AI memory the way the brain actually works — not as a growing document, but as a structured store that surfaces what's relevant, right now, for this question?


Question 4: What if the LLM itself is just a SQLite?

This was the question that cracked everything open.

Think about what a language model actually is, at the storage level. It's a file. A very large file of floating point numbers. And those numbers encode knowledge — billions of facts, patterns, relationships, compressed into weights you can query (inference) but cannot update, cannot inspect, and cannot surgically modify.

Compare that to SQLite: a file. You can SELECT, INSERT, UPDATE, DELETE. You can inspect every row. You can fix a wrong value. You can remove outdated data.

What if knowledge lived in a store like SQLite instead of dissolved into weights?

"Paris is the capital of France" — in a language model, that fact is smeared across billions of parameters, entangled with everything else, impossible to find or change without retraining. In a database, it's a row. One row. You can update it. You can delete it. You can query it directly.

The reason we can't do this with current models isn't that it's technically impossible. It's that nobody designed them that way. Knowledge and reasoning were baked together, inseparable, from the start.

What if you separated them?


Question 5: What is reasoning, actually?

Before building anything, I wanted to understand what I was trying to preserve when I separated "reasoning" from "knowledge."

Reasoning, stripped to its core, is this: the ability to apply learned structure to new content to produce conclusions that were never explicitly seen before.

Causality. Analogy. Deduction. Composition. These are content-free patterns. They work on any A and B. They don't require knowing the capital of France or the boiling point of water. They are the shape of thought, not the substance.

This matters because it means the reasoning engine — the model — doesn't need to be large. It doesn't need to encode encyclopedic content. It needs to be good at reasoning over whatever you hand it.

The knowledge lives in the store. The model provides only the reasoning mechanism.

A small model reasoning over a rich, specific, personally-relevant store will outperform a massive general-purpose model that knows everything about the world but nothing about you.


Question 6: What if the model had no knowledge at all?

This was the final question. The one that built Loci.

Not "how do we give the model better memory." Not "how do we improve retrieval." What if we made the model completely ignorant — and forced it to reason exclusively from facts we provided?

If the store doesn't have a fact, the model must admit it doesn't know. Not hallucinate. Not guess. Refuse, with a structured explanation of what's missing.

If the store does have the facts, the model reasons over them precisely and honestly.

The store isn't a supplement to the model. It replaces what the model would otherwise contribute. The model is the arbiter of reasoning, not of truth. The store is the arbiter of truth.

That inversion — store as intelligence, model as mechanism — is what Loci is built on.


What emerged from the questions

A memory system with a few properties nobody else had combined:

A grounding protocol that forbids the model from using its own knowledge. Not as a suggestion — as a behavioral specification. If the model can't derive an answer from injected facts, it returns a typed insufficient_facts response. Not a hallucination. An honest acknowledgment.

A living store where facts aren't static rows. They decay when unused. They strengthen when used. They supersede each other when contradicted. Every night, a background process reads through recent interactions and distills them into durable knowledge — the same way sleep consolidates the day's experiences into memory.

A transparent layer that any AI model can use. The model doesn't change. Point any OpenAI-compatible client at Loci and it silently injects relevant facts, captures what's learned, and maintains a memory that survives forever across every session, every model, every platform.


The name

Loci comes from the Method of Loci — the ancient Greek memory technique of placing memories in specific locations and walking through them to recall. The oldest memory system humans ever devised.

The insight behind it: memory needs structure to be retrievable. Random storage isn't memory. Context — location, association, relevance — is what makes recall possible.

That's what Loci does for AI. Not random storage. Structured, retrievable, living knowledge.


Where it went

We built it. It's real. It runs.

One Docker command. Works with Ollama or any OpenAI-compatible endpoint. Pure Go, PostgreSQL, no GPU required.

If you want to dig into the details — the formal method, how the retrieval scoring works, the benchmark results — there's a full research paper.

The code: github.com/alash3al/loci

The paper: zenodo.org/records/19490263


The questions didn't start with "let me build a product."

They started with: why does this work the way it does? what would happen if it worked differently? what is the thing underneath the thing?

That's the only kind of thinking that builds something genuinely new.


Reasoning without memory isn't intelligence. It's improvisation.