The Data You Ignore Is the Data You Need

Key points

AI trained only on successful experiments carries a built-in success bias that limits what it can learn and warn against
Ungoverned data fed into an AI system does not get better; it produces ungoverned outputs faster
Starting with administrative data in Microsoft 365 before tackling scientific data builds governance structures that scale
AI readiness is not about the size of the budget; it is about the discipline of the foundation

Why the information most organizations discard is exactly what their AI needs most

Scientists have always known something that the rest of the business world is still learning: failure is data. The negative result, the experiment that did not replicate, and the batch that did not behave as expected are all part of the scientific record. In good science, these results are recorded and analyzed. Peer review demands it. Progress requires it.

Yet when organizations begin building AI systems, that discipline often disappears. The instinct is to clean the data first. Teams filter out the failures, standardize the successes, and present the model with a polished version of the truth. The result is an AI training data strategy built on half the picture. In pharma R&D, that missing half is where the most costly errors hide.

Dr. Marko Gentzsch, Team Lead of the Digital Office at Richter Biologics, calls this one of the most significant barriers to AI success. Richter is a mid-sized Contract Development and Manufacturing Organization (CDMO) specializing in proteins and vaccines, and Gentzsch has seen how the bias of success is baked in from the start. When organizations hide their mistakes from the model, the bias compounds over time.

Models that cannot learn from failure cannot lead you forward

The biggest breakthroughs in scientific research rarely come from repeating what worked. They come from finally understanding why something did not work. Scientists know this intuitively because it is how they are trained to think. Applying that same discipline to AI means capturing failed experiments with the same rigor as successful ones.

An AI that has only ever seen success cannot warn you away from an approach that the data shows has already failed. It cannot surface the conditions under which a process consistently underperforms. As a result, building an AI foundation only on success is building a foundation on bias. If the model only sees half the picture, it can only provide half the value.

Faster chaos is not a strategy

The data problem runs deeper than what gets included in training. It starts with how data is governed and structured across the organization. When Gentzsch joined Richter in 2012, the situation was familiar. Data was scattered across LIMS, QMS systems, and unstructured PDFs. There was no consistent metadata and no shared taxonomy that allowed information from one department to mean the same thing in another.

Deploying AI on top of that environment does not solve the problem. It accelerates it. Ungoverned data fed into an AI system produces ungoverned outputs at high speed. The platform thinking that Gentzsch advocates starts from a different premise. Data governance for AI in life sciences means building a governed ecosystem in which AI can function across domains, not isolated tools operating in disconnected bubbles.

Build the foundation before the science

The practical question is where to begin. Richter’s starting point was Microsoft 365. Consolidating administrative and organizational data first was not the most scientifically ambitious choice, but it was the right one. The exercise forced the team to define naming conventions, assign data owners, and establish metadata standards. They built the access and lifecycle rules that any future scientific data integration would need to follow.

The governance structures built on lower-stakes administrative data are directly transferable when scientific data comes into scope. Moreover, the connectors and the data models carry forward. Gentzsch is direct about the alternative. Attempting to unify all scientific data simultaneously is often too large and too easy to fail. The goal is to start with what is manageable and build iteratively.

The advantage of not being able to afford failure

Mid-sized organizations carry a structural advantage that is easy to overlook. They cannot absorb big-bang failures the way larger enterprises can. This constraint means they are incentivized to get the foundations right before scaling. Tighter scope and faster feedback loops are not limitations. They are the conditions under which a genuinely AI-ready data foundation gets built properly.

Data architecture shapes everything that follows. The organizations that recognize that now, before the pressure to deploy becomes overwhelming, are the ones that will have something real to show for their AI investment. Richter’s experience proves that AI readiness is not about the size of the budget. It is about the discipline of the foundation.

Dr. Marko Gentzsch presented at Practical AI for Science Leaders, a joint Zifo and Sapio Sciences event held in Hamburg in April 2026, alongside Adam Paton of Zifo, Dr. Prashant Vaidyanathan of OXB, and Yuri de Lugt and Kelly Maddison of Sapio Sciences.

The Data You Ignore Is the Data You Need

Key points

Why the information most organizations discard is exactly what their AI needs most

Models that cannot learn from failure cannot lead you forward

Faster chaos is not a strategy

Build the foundation before the science

The advantage of not being able to afford failure

You may also like

When to Stop Optimizing and Start Reimagining

The AI Readiness Gap Is Not a Technology Problem

Good Science Runs on Good Data. AI Hasn’t Changed That.