Developing a platform to integrate and process different types of data
During work packages 1 and 2, clinical and omics data will be collected from patients suffering from bladder cancer or multiple myeloma. In work package 3, a platform will be created to integrate these different types of data. The goal is to generate a flexible system, where each task is executed by a functional module, yet integrated to create a manageable and cohesive entity.
Combining clinical data and omics data
To obtain a truly integrated picture of the diseases under study, it is vital to incorporate as much information as possible on diagnostic accuracy, disease progression, treatment response, etc. To do so, we will collect data which can be, roughly, subdivided into two categories.
First, clinical data sources, also referred to as non-omics data, will offer insights in the patient medical history as well as clinical manifestation of disease progression and treatment efficacy. These data include clinical data (typically captured in the electronic health records, containing patient medical history, allergies, vaccination dates, treatment plans, lab test results, etc.), patient reported data (containing information on patient reported outcome, as well as more general physical, social functioning, productivity and quality of life), and visual data (including CT scans, MRI images, etc.).
Secondly, omics data present the opportunity to frame the clinical and reported data in a biological context. It will provide insights in the complexity of disease onset, progression and treatment response. The term omics refers to the scientific techniques that characterize large sets of biological molecules in order to relate specific observations, such as treatment response, to a patient’s biology (Conesa et al., 2019) . The most important omics techniques are genomics (which describes the genetic information of the patient), transcriptomics (which represents the activity of genes in a given situation) and proteomics (which generates an overview of the proteins in a particular context).
The federated data platform will serve many functions
The data platform possesses several features, such as data integration, storage, distribution, accessibility, processibility and management. The technical partners included in this project each possess ready-to-use solutions and expertise that will be incorporated in this platform. The modularity of the system will make it flexible in case extra functions need to be added and will allow the system to be scaled up and translated more easily to other application domains and institutions.
Integration of all data and software solutions is another key task of this work package. Based on the individual contributions of each partner, it will become clear what the general structure of the platform will be. It will show how data will be gathered, standardized, processed and integrated in a data warehouse for each of the participating medical institutions. Challenges of incompatibility of data sources will have to be overcome. In work package 4, the data platform will allow machine learning algorithms to find novel connections between variables and present improved pathways for patient care.
The data platform will be a federated system, indicating that it will enable collectively agreed analysis efforts of data, across multiple partners with differing internal structures. Such a federated approach is especially useful to allow complex statistical research on sensitive, medical data, a process which needs to comply to the privacy standards. As such, ATHENA ensures a secure system where the medical institutions retain full ownership of their patients’ data and privacy is preserved, allowing only the analyses agreed upon with the partners of the project.
Experienced partners guarantee a smooth data flow
Gathering and processing of clinical and patient-reported data will be performed by Janssen and Inovigate. Visual data will be managed by Robovision. Omics data will be governed by Janssen (subcontractor Illumina) and imec. The processed data will be integrated in data platforms, for each of the participating medical institutions. In work package 4, machine learning algorithms will be used to convert these data in usable patient care recommendations.
Conesa A., Beck S. 2019. Making multi-omics data accessible to researchers. Sci Data 6, 251.