As pharma organizations race to achieve digital transformation, the obstacle that most often stops them in their tracks isn’t the cost of compute, security concerns over AI, or even a lack of expert talent. It’s that the data they’ve invested hundreds of thousands, or even millions, of dollars to generate isn’t actually ready for AI.
To infer the mechanics of complex biological phenomena, AI models need to understand how outcomes differ across thousands of experimental conditions. However, the vast majority of biological data has historically been captured for a single, immediate purpose, scoped to a single experiment or a small handful of experiments. Because the experimental conditions are complex, and factors that later prove important are often considered trivial at the time of design, data is generally recorded in ways that prevent comparison outside that handful of experiments.
There are exceptions to this, of course, particularly among more platform-focused biotechs that invest heavily into building multi-modal, longitudinal datasets. But most pharma organizations have to prioritize progress towards shorter-term goals. As a result, they often find that years of investments in experimental data have barely moved the needle when it comes to leveraging AI.
But it doesn’t have to be this way. It is possible to meet immediate needs while ensuring that the collected data can also power AI models in the longer term. It just requires breaking a three-part vicious cycle made up of old habits, negative feedback, and unsuitable software.
These old habits come from decades of biology in which most data was collected to answer a single, immediate question, then never used again. Because biology is driven more by exceptions than by rules, many scientists never even consider the possibility of using data from their highly specific, context-dependent experiments for other purposes. When combined with the high pressure and short deadlines of a modern pharma lab, it’s no wonder scientists prioritize answering urgent questions over building long-term data resources.
This is further reinforced by the contrast between past hype around data-driven biotech/pharma and the scarcity of examples where it has clearly worked. While examples are finally emerging, it has taken years longer than originally promised or implied. Moreover, most of these examples required years of sustained investment that is not available to most pharma companies.
Of course, this success has been so slow partially because the experts who initially made these promises didn’t realize how bad the existing data was. And as long as bench scientists continue to prioritize answering immediate questions over building long-term data assets, it won’t get any better.
In the middle of this vicious cycle between bench scientists’ existing habits and the lack of examples to motivate them to change, is the one thing that organizations actually have direct control over: The software that bench scientists use to collect information about experiments, including conditions and other context.
This software, particularly Electronic Lab Notebooks (ELNs) and Laboratory Information Management Systems (LIMS), has primarily been designed around bench scientists’ preferred workflows and habits. So most ELNs and LIMS have been built to collect the context and conditions of experiments in a form that prioritizes answering immediate questions: They provide the flexibility to quickly capture data in a way that can be interpreted on its own or in conjunction with a handful of other experiments. But they don’t encourage the consistency and detail required to build long-term data assets.
Sapio, on the other hand, has built a unified platform that ensures consistent, AI-ready data while still fitting into bench scientists’ workflows and habits. Because Sapio started from a world-class data analysis platform, we designed ELN and LIMS components to capture AI-ready data from day one. In particular, Sapio’s Scientific Data Integration Solution, Jarvis, allows users to capture more detailed data, then use Sapio’s Science Aware AI, ELaiN, to leverage it through scientific AI agents, driving complex R&D workflows without increasing cognitive overhead.
It won’t be easy for most pharma companies to break the vicious cycle and create the organizational change needed to enable true digital transformation. But without the right software, it will be virtually impossible. These organizations need data platforms specifically designed for the unique needs of both biology and AI. And Sapio has been leading the way for years.