Research

Memory and context

Memory and context as product problems

What the system should remember

There is a question underneath the AI product conversation that almost nobody is asking clearly in April 2021. It is not which model is best. It is not which company will win. It is something more structural and more interesting: what should a system remember, and what should it forget?

This sounds like an engineering question. It is actually a product question, and under that, an organisational one.

Every AI system in use today has some relationship to context. The model receives a window of information, some text, some history, some instruction, and it produces a response based on that window. The window closes. The next interaction opens a new one. Whatever was not explicitly passed back in is gone.

This works well enough for isolated tasks. It breaks down quickly for ongoing work.

If you are using a language model to help draft a client proposal, and you have spent three sessions refining the positioning, the tone, the specific language the client responded well to: none of that is automatically available in session four. You either rebuild it manually each time, or the quality degrades, or you stop using the tool because the cognitive overhead is too high.

That degradation is not a model failure. It is a memory architecture failure. And it is the main reason AI tools that impress in demos underperform in daily operational use.

Context windows as economic constraints

The technical framing for this problem in early 2021 is context windows. Models can only attend to so much text at once. Make the window too small, and the model loses track of earlier work. Make it larger, and inference becomes expensive and slow.

This is a real constraint, and it shapes everything. But the economic argument about window size misses something important. Even if context windows grow significantly, and they will, the question of *what to put in them* remains. Longer windows do not eliminate the curation problem. They defer it.

When a system can technically hold one hundred pages of context, the question becomes: whose hundred pages? Which conversations? Which decisions? Which background documents? Which institutional norms?

A longer window creates more surface area for noise. The signal question, what actually matters to this task, this person, this moment, does not get easier. It gets harder to answer sloppily.

This is where the product design challenge becomes genuinely interesting. Memory is not just storage. It is selection. And selection requires a model of what the work actually is.

What organisations lose when systems forget

There is an organisational cost to memory failure that is rarely quantified. Every time a knowledge worker re-explains context to a tool, to a colleague, to a system: time is spent re-establishing ground that was already covered. This is friction most people have learned to live with because they have never experienced the alternative.

The alternative is a system with genuine persistent understanding of the work. Not just keyword retrieval from a database, but something closer to how a trusted colleague operates. A colleague who knows your standards, your history with a client, your current priorities, the decisions you made last month and why.

Building that kind of memory into software is hard. It requires decisions about what to store explicitly versus what to infer. It requires decisions about when memory should update, when it should hold firm, and when it should surface something the user forgot they knew. It requires a position on privacy, on scope, on who controls what the system remembers.

None of these are purely technical decisions. They are product decisions with significant consequences.

Organisations that adopt AI tooling without addressing this layer will find themselves in an uncomfortable middle state: capable enough to raise expectations, persistent enough in use to expose the absence of memory, and dependent enough on the tools that rolling back is not straightforward.

Explicit memory versus implicit context

There is a useful distinction to draw between what a system knows because it was told, and what it understands because of accumulated signal.

Explicit memory is declarative. A user writes a note. A document is uploaded. An instruction is set. The system holds it and refers to it. This is manageable, predictable, and easy to audit.

Implicit context is different. It is built from patterns across interactions. The system learns, or should learn, that this user prefers shorter outputs, that this type of task usually leads to a revision cycle, that this particular phrase has meant something specific in previous conversations. No single explicit instruction covers any of that. It emerges through use.

Both kinds of memory matter, but they operate differently, and they carry different risks.

Explicit memory can be wrong if the user set it incorrectly. It can be stale if the situation has changed and nobody updated it. But it is inspectable. You can look at what the system knows and correct it.

Implicit context is harder to audit. The system's behaviour shifts gradually as its sense of the context develops, but the mechanism is not always visible to the user. That opacity is a real product risk. Users who cannot understand why the system is behaving a certain way will eventually distrust it, even if the behaviour is technically good.

The right architecture probably needs both, explicit memory for things that should be stated clearly, implicit context for things that emerge through use, with enough transparency that a user can interrogate either layer when something feels wrong.

Where Orion enters the thinking

At MSG, this question has direct operational relevance. Orion is the intelligence layer underneath Orbit, and memory is central to what Orion is being built to do.

The commercial work Orbit supports is not a series of isolated tasks. It is a continuous thread: leads being tracked, proposals being developed, relationships being maintained, products being built and adjusted. The context of any given moment is shaped by everything that came before it. A system that cannot hold and use that context intelligently is not an operating system, but a fancy search bar.

Getting memory right for Orion means making concrete decisions about the distinctions above. What should be stored explicitly and surfaced on demand? What should be inferred from patterns and applied quietly? Where should the system ask rather than assume? These are not solved problems. They are active research questions with real product consequences.

The thinking here draws on early work at Benediction Lab, where memory systems, agent behaviour and context handling are being studied not as theoretical questions but as practical ones grounded in what is actually needed to make AI useful in commercial operations.

The enterprise layer is arriving

April 2021 is also the month when the phrase "enterprise AI" is beginning to carry real weight rather than just aspiration. Large organisations are moving beyond pilots. They are starting to deploy AI-assisted tools into real workflows: sales, legal review, document summarisation, customer support.

The memory problem is acute at enterprise scale. Individual users can sometimes compensate for memory failures by carrying context manually. Teams cannot do this reliably. Organisations cannot do this at all without significant overhead.

Enterprise AI that lacks a coherent memory layer will produce a characteristic failure mode: inconsistent outputs across teams, loss of institutional knowledge between sessions, inability to personalise to the specific context of a department or function. Users will feel like they are constantly introducing themselves to a system that should know them by now.

The response from most AI vendors in this period is to frame this as a data integration problem. Connect more of your systems to the AI. Feed it more documents. Build better retrieval.

This is not wrong, but it is incomplete. Retrieval solves part of the problem. Selection, prioritisation and genuine contextual understanding are different challenges. A system that can retrieve anything but does not know what is relevant is not much better than one that retrieves nothing.

The working question

The question worth sitting with through 2021 is not how much context a system can hold but how well it uses what it has.

A small, well-curated context is often more useful than a large, undifferentiated one. The system that knows the five things that actually matter to this task, and knows why they matter, will outperform a system drowning in every document it has ever been given access to.

This implies that memory architecture is not just an infrastructure problem. It is a reasoning problem. The system needs some model of relevance, salience, and purpose. Without that, more memory is just more noise.

What makes this hard, and what makes it interesting, is that the model of relevance is not fixed. It changes with the work, with the user, with the moment. A useful memory layer has to be dynamic in how it surfaces things, not just comprehensive in what it stores.

Building towards that is not a short-term project. But it is the right project. Systems that solve the memory problem, not as a retrieval feature but as a genuine operating capability, will be meaningfully more useful than those that treat each session as a blank slate. The gap between those two kinds of systems is already visible. It will widen considerably as AI tooling matures and user expectations rise.

The product question is how to close that gap systematically rather than session by session.