In the last two decades, tools like electronic lab notebooks (ELNs) and laboratory information management systems (LIMS) have achieved what Gartner, Inc. calls “the Plateau of Productivity.” These transactional systems have proven so popular that multiple instances of LIMS and ELNs, often from different vendors, have proliferated across and within research organizations. Organizations routinely capture experiments and data with good compliance, but their data now resides in isolated data silos that impede scientific decision-making. Simultaneously, laboratory automation, the pace and scale of research, and clinical data science have put in-silico augmented research tantalizingly out of reach. Organizations possess the data that can help them explore complex biological mechanisms of action, accelerate the discovery of novel targets, rapidly discover and optimize healthcare and therapeutics, and fuel artificial intelligence (AI) and machine learning (ML) algorithms. But the data analysis is difficult because it isn’t findable, accessible, interoperable, and reusable (FAIR – see below), and most of all, it lacks the scientific context that gives raw data meaning. Hence the need for a SDMS – a Scientific Data Management System – to help scientists make the most of the data they’ve generated.

In this blog we take a look at the challenges life sciences organisations face in empowering scientists to capitalize fully on data, R&D’s most precious resource. We’ll also take a look at some of the ways you can break down data silos and get the most out of your clinical data (including our lab data integration solution Sapio Jarvis). You can also download the PDF white paper version of this blog, which includes a checklist for choosing the right scientific data management system for your needs.

Current Barriers to Scientific Data Synergy

Since 2003, when the completion of the Human Genome Project launched genomics and its related subdisciplines (proteomics, transcriptomics, epigenetics, etc.), there’s been a revolution in drug discovery, development, and delivery. This revolution has completely reshaped the way scientists explore biology, study disease mechanics, identify and pursue targets, and test the impact of their discoveries in the clinic. Personalized medicine is finally a reality, providing new, targeted treatments to patients and opening new lines of inquiry for future drug candidates.

This revolution has also presented research organizations with two broad big data management challenges. First, organizations now generate unprecedented quantities of varied data sources and types. While this provides significant opportunities to inform all aspects of scientific and business decision-making, a critical prerequisite is collecting, parsing, and presenting that data to scientists in actionable ways.

Second, capitalizing on this revolution’s fruits requires data access, especially for scientists seeking to conduct in-silico augmented research such as model-informed drug development (MIDD), ML, and AI. There is no lack of algorithms and models to accept data. The challenge for organizations is finding and preparing all their datasets for efficient and effective data mining by these algorithms and models. Most organizations don’t have a model problem preventing them from running advanced data analytics; they have a data problem.

What Are The FAIR Guiding Principles for Scientific Data?

Current scientific informatics landscapes keep biopharma, clinical research, and clinical labs from fully exploiting the data they possess. Many organizations currently utilize a patched-together assortment of tools and custom connectors to get raw data off instruments and into the disparate ELNs and LIMS that contain context about those datasets. They must then deploy another set of systems and one-to-one integrations to pull data into places where scientists can use it to assess scientific outcomes and decision-making.

While many ELN and LIMS providers offer applications to facilitate data import and export, these integrations are limited. Organizations often try to build their own integrations, but these can be difficult to implement and expensive to maintain. As a result, crucial organizational assets remain fragmented and dispersed. Without a centralized reference point for easily discovering and utilizing data, data lacks vital scientific context and meaning. Most importantly, it isn’t FAIR—findable, accessible, interoperable, and reusable.

Organizations have used various informatics solutions to bridge the gap between disparate transactional applications and FAIRify their data. Scientific data management systems (SDMSs) are used to capture and store instrument data. Data warehouses are constructed to contain structured, filtered data processed to be queried en masse.

Because data warehouses can be rigid and hard to adapt to accommodate new data types, organizations have turned to data lakes, which can store data in its native format, allowing greater flexibility in integrating it with various applications, including advanced analytics tools. Warehouses, lakes, and lakehouses, though, are merely repositories for data and aren’t equipped with interfaces and tools suitable for scientists. Extensive (and expensive) coding and spot integrations are necessary to get data out of these systems and into the hands of scientists to support their work.

Understanding the Scientific Data Management Maturity Model

When managing and capitalizing on scientific data, too many pharma and research organizations are stuck at the bottom of an adoption maturity model. They have acquired extensive infrastructure for organizing and managing data. Still, they cannot consolidate and unify access to that data in an actionable way for scientists or research and clinical project team leaders.

Without this access, scientists are unable to query all their organization’s data intelligently and thoughtfully, asking questions ranging from simple queries (“Which studies ran with which subjects?”) to complex interrogations crossing multiple siloed source systems (“What are all the experiments we’ve run involving samples of particular oligonucleotides?” or “What assays have been run on my project compounds and which ones show my desired selectivity and ADME profiles?”).

Without a holistic view of their data, organizations struggle to use it in enterprise decision-making, including resource planning, utilization and management, compliance, and quality control.

Managers can struggle to answer simple questions such as “How many flow cytometry runs did we do last week across the whole company, and what was the average turnaround time?” And, at the highest level, data that isn’t FAIR can’t be quickly and effectively used to fuel advanced analytics.

Few solutions exist to address these issues and advance organizations along the adoption pathway. Some applications focus on gathering and collating data. Others serve as a background infrastructure to pass data through to ELNs and LIMS rather than capturing data so that it can be deployed to inform scientific and business decisions. Many systems fail to provide an intuitive place for scientists to view and interact with data and lack a scientifically-aware analytics layer. Those that do supply analytics often call out to third party analytics, adding more data processing hurdles for users to jump through.

image 7

What Is A Scientific Data Cloud?

A truly scientific, science-aware data cloud does more than collect, store, and parse raw data. It automatically syncs data with information stored in transactional applications such as ELN and LIMS software to connect and provide valuable context to instrument results, which can help drive decision-making.

A scientific data cloud shows science entities in a meaningful way. Compounds render as compounds, plasmids show annotations, and proteins display as 3D objects that can be interrogated. Science-based searches should enable scientists to search for molecular substructures as well as chemical properties or assay results. Most importantly, a scientific data cloud should erase boundaries between systems and present all relevant context on a scientific entity, including information about the samples, experiments, and projects in which data was obtained, used, and consumed. And that information and context should be accessible regardless of in which LIMS, ELN, or instrumentation file format data is stored.

With a scientific data cloud, it’s not about workflows and getting data in, which is the transactional focus of ELN and LIMS applications and many analytics tools. It’s about providing a single place to get insights out—capitalizing on current and historical data to empower scientists in making data-driven scientific and enterprise-wide decisions and supporting data sharing. Notably, a scientific data management system does this in a way that flexibly adapts to constantly evolving data as science progresses.

Introducing JarvisSM – The Next-Generation Scientific Data Management System

In the Marvel Cinematic Universe, Tony Stark, the industrialist inventor who becomes Iron Man, creates an AI that serves as a virtual assistant. Its primary duties are running systems across Stark’s business and controlling Stark’s Iron Man armor suit. He calls this AI J.A.R.V.I.S, which is said to stand for “Just A Rather Very Intelligent System.”

Like its cinematic namesake, Jarvis from Sapio Sciences manages and assists scientists in gaining access to and control of the scientific data they need to make decisions. Built on Sapio’s low-code/no-code platform, Jarvis’s primary innovation comes in its ability not just to collect and parse data off instruments but also to sync that data automatically with vital contextual data on samples, specimens, experiments, and projects contained in disparate ELN and LIMS applications—any applications, not just the ones developed by Sapio. This is accomplished through no-code pipeline rules built into Jarvis that are easily configured to detect new raw data and load that data to Jarvis, where it is parsed and synced to proper context about that data’s use in projects, studies, subjects, experiments, and samples.

Data is not only usable within a scientific team’s target applications, but accessible within the Jarvis knowledge graph and through configurable dashboards. It’s easy to track new data types—Jarvis comes with over 200 parsers for common instrument files, and new parsers can be rapidly developed right in the Jarvis user interface. Using the same rules interface, contextualized instrument results can even be inserted back into the source ELN or LIMS connected to the samples for which they were generated.

Importantly, Jarvis is designed to be used by all scientists, not just data scientists. In addition to capturing structured and unstructured text and numeric data and making it searchable within the Jarvis knowledge graph, Jarvis offers ways to visualize scientific data objects such as compounds (2D), biologics (3D), and annotated plasmids. Charts and tables are built into the system, and Jarvis provides meaningful statistical and scientific analytics and AI directly in the platform where the data resides—no more moving data in and out of data warehouses or data lakes. With Jarvis, scientists have all the data they need, all in one place. And it’s easy to provide access to new data and capabilities straight from a common, scientifically friendly user interface–no coding needed. With built-in science tools for a range of analytics, organizations can empower their scientists to ask powerful questions of their data and, most crucially, find answers that may lead to the next research breakthrough.

If you want to understand how our scientific data management system Jarvis can help you make better use of your scientific data, arrange a demo call with us.