June 12th 2020
June 12th 2020
Topic: The PhenoMeNal cloud-aware e-Infrastructure
Dr.Pablo Moreno, Project Lead for Expression Atlas Data Production,EMBL-EBI, Cambridge, UK
Dr.Steffen Neumann, Head of Bioinformatics and Scientific Data,Leibniz Institute of Plant Biochemistry, Halle, Germany
The PhenoMeNal consortium.
Metabolomics as a high-throughput molecular phenotyping technique is growing across all domains in the life-sciences. The data processing and analysis is often performed with many programs using conventional computing solutions but little standardisation for interoperable and reproducible research. With increasing data size this becomes intractable for desktop computers. Cloud computing allows to instantiate on-demand resources (virtual servers, networks, storage), users only pay for the time the resources are used. Microservices can run in clouds that can dynamically grow or shrink, enabling applications to be scaled. We developed a robust and performant data analysis infrastructure that integrates all necessary components. The software tools are encapsulated as Docker containers. To automate the instantiation of this cloud-portable microservice-based system, the PhenoMeNal project developed a Virtual Research Environment (https://portal.phenomenal-h2020.eu/) to deploy on some of the largest public cloud providers, including Amazon Web Services, Microsoft Azure, Google Cloud Platform and OpenStack-based scientific and private clouds. Kubernetes (https://kubernetes.io/) is used for container orchestration in the cloud. Galaxy (https://galaxyproject.org/) is used as interface for individual tools, users can share workflows and analysis histories. Together, we achieved a complete integration of several major metabolomics software suites resulting in a turn-key workflow for mass-spectrometry-based metabolomics. We will also discuss how the Galaxy-Kubernetes integration has evolved past the lifetime of PhenoMeNal through different projects and collaborations with the Galaxy community.