Research

The need for a research arm

Why product work needs research space

There is a particular kind of confusion that builds slowly in product organisations. It does not arrive as a crisis. It accumulates as a drift: a gradual narrowing of what the team allows itself to think about.

The immediate pressure of shipping, of customer conversations, of keeping things running, creates an implicit filter on what counts as a worthwhile question. Anything that cannot be resolved in a sprint tends to fall out of view. And for a while, this feels like discipline.

It is not discipline. It is a blindspot forming.

What commercial pressure actually does

Product work done well is genuinely demanding. Orbit has a commercial thesis: that the full lead-to-launched-product workflow can be operated more coherently from a single surface than across the fragmented constellation of tools most teams currently stitch together. That thesis has to be tested against real conditions: against how actual organisations handle handoffs, how people account for status, where information gets lost and decisions stall.

That kind of product work requires attention. It resists distraction. There is a real argument that focus is the resource, and that anything pulling away from near-term execution costs more than it returns.

The problem is that this logic, taken seriously, rules out exactly the questions that matter most in the medium term. When the environment is changing as quickly as it currently is, and in August 2019, the AI environment is changing quickly, a product organisation that only looks inward will be caught wrong-footed, not immediately, but consistently, in ways that compound.

Commercial pressure does not produce bad instincts about what the product needs today. It produces reliably poor instincts about what the field is about to make possible or obsolete. These are different kinds of question, and they require a different kind of attention.

The gap between observing and evaluating

There is no shortage of observation. Research papers are published continuously. Benchmark results circulate. The language models available in 2019 are meaningfully more capable than those available two years ago, and the trajectory is legible to anyone paying attention.

But observation is not evaluation.

Watching what is announced is not the same as understanding what a given capability means for a specific system under real operating conditions. It is not the same as knowing how a new approach to memory management affects a context window in a product that handles long-running workflows. It is not the same as testing whether an agent architecture that looks compelling on paper actually degrades gracefully when inputs are messier than the benchmark assumes.

These are applied questions. They require the ability to build small things, run them, break them deliberately and form views. That kind of work does not fit inside a sprint. It does not produce shippable features on a two-week cadence. But it produces something else: an internal capability to evaluate external progress against the specific systems and bets MSG is actually making.

Without that capability, the organisation is in the position of reading the weather and then guessing whether to take an umbrella. With it, there is a genuine basis for judgment about where to move and when.

What research actually means here

This is worth being precise about, because the word research carries freight that does not apply.

Academic research operates on a publication cycle. It builds on prior literature. It is accountable to peer review and cares deeply about establishing novelty. These are genuine goods, but they are not the goods MSG needs.

What is needed is something closer to a standing capacity for structured inquiry: a function that can spend time with a class of problem long enough to form real views, rather than cycling through the latest announcements and producing reactions. The questions this function works on are not general. They are specific to the systems MSG builds and the bets it has made.

How do memory systems behave when task context becomes extended and non-linear? What does useful agent autonomy actually look like inside a product surface where a human still needs to feel in control? Where does tool orchestration produce compounding capability, and where does it produce compounding fragility? How do coaching and accountability functions shift when a language model is in the loop? And what should Naira never do regardless of what it becomes capable of?

These are not questions the product team has time to sit with. But they are questions that will determine whether the product team's decisions, three months from now, are well-founded or not.

Benediction Lab

The concept forming around this problem is Benediction Lab.

The intention is a research function attached to MSG that is not accountable to the product roadmap but is accountable to the product's actual questions. It is not a skunkworks. It is not a separate company. It is a structured way of giving certain kinds of inquiry the time they require, whilst keeping the output genuinely connected to what MSG builds.

The Lab's initial focus sits across a few areas: how agents handle memory and context in non-trivial task sequences, how autonomous systems should be designed to interact with graphical interfaces, and what applied evaluation frameworks look like when the thing being evaluated is an AI-assisted product rather than a model in isolation.

These are not trendy research topics chosen for their press coverage. They are the specific technical and design questions that keep surfacing in the process of building Orbit, Orion and TUXX. The research function exists because those questions deserve time that the product function cannot give them.

The relationship between research and product

A research function that operates in isolation from product is an indulgence. A product function that operates without research input is operating with progressively degrading assumptions. The useful relationship is something else: research that is close enough to the real systems to produce applicable insight, and product work that is open enough to use it.

This requires organisational honesty. The research output is not always going to arrive on a useful timeline. Some threads will not resolve into usable conclusions for months. Some will not resolve at all, which is itself informative, but which can feel like poor return on investment from inside a commercial cadence.

The discipline required is to hold both timescales simultaneously. Not to collapse research into short-term feature work, and not to let it drift so far from the operational reality that it becomes irrelevant. The balance is uncomfortable, and there is no clean framework for achieving it. It requires judgment, and judgment requires practice.

Staying honest about the field

One other thing that a research function enables is institutional honesty about what is actually happening in the broader AI field, as distinct from what is being said about it.

In 2019, there is a significant gap between the two. The public conversation about artificial intelligence cycles through enthusiasms and panics that bear only a loose relationship to the state of the technology. Capabilities are both overstated and understated, often simultaneously, depending on who is speaking and what they need the claim to serve.

A product organisation without a research function tends to inherit these distortions uncritically, because the only signal it has is the external conversation. A product organisation with one can form its own views: more slowly, more carefully, but more usefully.

That is what this is for. Not to produce papers. Not to keep pace with every published result. To give MSG the internal standing to evaluate progress honestly, and to translate that evaluation into better decisions in the systems it actually builds.

The model layer is changing. What that means for specific products, specific users and specific operating conditions is a question worth spending real time on.

Benediction Lab is that time.