For any large pharmaceutical company or research organization, scientific data integration is a massive, ongoing challenge. From the basics of integrating instrument and lab informatics data to structuring it in a way that scientists can actually use, data consolidation isn’t a straightforward task. And it doesn’t come without risk – the wrong approach to tackling your scientific data can cost a lot of time and money but not actually help your scientists. 

To get the lowdown on how our scientific data integration platform (Sapio Jarvis) addresses these challenges and actually works with scientists, we spoke to Rob Brown, Senior Director Product Marketing at Sapio Sciences. Read on to learn more about Jarvis and how it can help accelerate scientific discoveries.  

Lab instrument and scientific research data has been and is fragmented – why is this an issue?

A: Well, let’s take those two separately. If we look first at scientific research data, the issue is that scientists really need all of the relevant information about the project they’re working on if they’re going to make really well-informed decisions. But that data is typically siloed across various sources, including on-premises and cloud-based systems. It’s in multiple ELNs (Electronic Lab Notebook) and LIMS (Laboratory Information Management System), leading to a significant challenge in the data integration process. So they really just don’t have a single place to go to find all that data. 

Therefore, they have two choices: they either spend a lot of valuable time manually joining it together, probably introducing errors, or they simply collate the data that’s easiest to reach, and then they make a decision. But then that decision is not going to be fully informed because of the time pressure they’re under. 

And then if you look at the instrument side, instrument data is typically very fragmented. Many companies build bespoke connectors between each instrument output and the transactional systems that need those instrument results, creating a complex and costly web of data pipelines. This not only becomes a nightmare for data management but also impacts the scalability and data quality maintenance, underscoring the need for a unified data integration platform that can streamline workflows and ensure compatibility across various types of data formats and connectors.

There have been numerous attempts to solve scientific data fragmentation. What are they and why have they failed?

A: The first approach that was adopted by many companies was to try and build a data warehouse, aiming to consolidate all their research data in one place. But the main issue with data warehousing is its inflexibility; once you define the data types and schemas, it’s hard to adapt to new scientific discoveries. Research is always evolving, generating new data types and connections between those complex data types. As a result, the warehouse structure quickly becomes outdated. 

Organizations then looked towards data lakes for their flexibility and scalability, capable of handling life sciences big data from different sources without predefined schemas or rigorous data structure. However, data lakes require specific data science skills to retrieve data, limiting their accessibility to non-expert users and failing to provide the ease of use, self-service functionality necessary for most scientists. And finding scientific data within these lakes isn’t usually possible as the lakes don’t know what a protein is, or what a DNA sequence is. So the bottom line is that most scientists can’t or won’t use a data lake.

This scenario highlights the growing demand for cloud-based scientific data integration solutions that offer drag-and-drop functionality and pre-built connectors for seamless application integration and data migration, facilitating a more user-friendly approach to data consolidation (Read our blog on this: Why Do I Need A Scientific Data Management System).

How do I know if I need to consolidate my lab data?

A: A better question would be ‘why would you ever not need to consolidate your data?’. Unless all research data across your entire organization is already on a single, unified, searchable data platform (such as Sapio LIMS or Sapio ELN), data fragmentation and silos are likely hindering your data quality and decision-making processes. In any company with a history of diverse informatics systems, the need for a centralized data integration solution becomes immediately compelling. 

This is where the adoption of SaaS-based data platforms, offering a range of data integration tools and APIs for connecting various data sources, becomes critical. Such platforms not only facilitate the extraction and loading of data from numerous sources and providers (lab instruments, LIMS, ELN) but also support the transformation and cleansing of data to ensure high data quality and governance, ultimately enabling scientists to make more informed decisions.

What is Sapio Jarvis? And why is it any different to other solutions on offer?

A: Put simply, Sapio Jarvis is a scientific data cloud solution that allows organizations to do two things. On the one hand, it lets them unify all of their data from all their ELNs, LIMS and other research informatics systems across the organization. And then on the other hand, it automatically captures and parses all of their instrument data. But then, more importantly, it also contextualizes all of that data into the samples and the experiments that came from the ELN and LIMS. 

Ultimately, it’s an application that’s designed for scientists. So having collected all that data, it’s a place that then allows them to explore, search, visualize, and analyze all of their data in a scientifically aware environment. And in terms of what makes Sapio Jarvis different, that last point is really the most important one. It’s designed for scientists rather than IT professionals or data scientists. 

There’s two key points here. The first is that Jarvis allows scientists to easily browse and search all their data. And that’s all done via a graphical, ‘no-SQL’ / low-code interface, or even through an AI chat powered interface (Sapio ELaiN). And then secondly, we bring all the scientific visualization and analysis to the scientist where the data already is, rather than making them download datasets, and then jump off to multiple third party applications like Excel and GraphPad or open-source tools such as Pymol to analyze their data. This provides a huge increase in scientist satisfaction as well as efficiency.

What are the benefits of taking a science-aware approach to scientific data integration?

A: Fundamentally, it’s because any kind of data consolidation solution is not valuable to a scientific organization unless their scientists can and will use it. And that’s the biggest challenge with most data lake style solutions. They may allow data consolidation, but they really do nothing to provide scientists a place they can and will go to understand all their project data, to analyze it, and then to make the best informed project decisions.

Ultimately, many of these data consolidation solutions are really solutions for data scientists or IT professionals. What an end user scientist needs is a really simple place to go and find all of their data without having to know any SQL or any code. And then to be able to visualize and analyze that data in situ where the data already is.

How can a better approach to scientific data management support the use of AI and ML?

A: Effective data management is crucial for leveraging AI and ML in research. High-quality, well-curated data is essential for training accurate models. And don’t just take my word for it – the Pistoia Alliance carried out a survey last year where 58% of the 200 biopharma executives they surveyed said that ‘low-quality and poorly-curated data’ was the biggest impediment to implementing AI and ML at scale in a laboratory environment. 

And that’s exactly what Jarvis is built to support. It allows for the curation of the complete, contextualized data you need to train a machine-learning model that helps scientists make even better informed decisions.

What is the business case for a science-aware approach to scientific data integration?

A: The good news is companies are going to see benefits across multiple teams. And these are measurable business benefits. So just to give you a few examples, if you look first at research teams, they’re going to see hugely increased efficiency for their scientists. So the time they used to spend finding and analyzing the data will be time that they get back, which they can then spend in the lab, acting on decisions and driving their research forward. 

Secondly, if you look at automated labs and core facilities, they can really decrease the turnaround time, because they’re going to completely eliminate all the manual data collection and parsing. And not only that, as soon as you automate data processing, then the accuracy of their results is absolutely guaranteed. 

And then finally, for research IT teams, they’re going to eliminate the entire cost of building and maintaining this complex web of connections between instruments and informatics applications. And again, once you eliminate all of that activity, they can spend time on much more valuable work that drives the company’s goals forward.

How can I get started with consolidating my scientific data?

A: It can seem overwhelming if you look across your entire company, thinking about how you’re going to make a change like this. So what we’ve seen to be really successful in a number of companies is identifying a lab with a specific use case where the scientists are being hindered by not being able to easily search and analyze data, and then start work with us on that use case. And then that allows you to really see how Jarvis can be implemented on a larger scale, and understand the value that will come as this approach broadens out to the rest of the organization (For more information, check out the blog ‘A Comprehensive Guide To Lab Data Management‘).

To get a demo of Jarvis and to understand how it could be used within your organization, get in touch with us here