Community-Driven Data Science to Advance Microbiome Research

Author:
DOE/Lawrence Berkeley National Laboratory

Date
08/13/2019

 PDF
The National Microbiome Data Collaborative will develop an open-access framework for harnessing microbiome data to accelerate discoveries

The National Microbiome Data Collaborative (NMDC), a new initiative aimed at empowering microbiome research, is gearing up its pilot phase after receiving $10 million from the U.S. Department of Energy (DOE) Office of Science. Spearheaded by Lawrence Berkeley National Laboratory (Berkeley Lab), in partnership with Los Alamos (LANL), Oak Ridge (ORNL), and Pacific Northwest (PNNL) national laboratories, the NMDC will leverage DOE's existing data-science resources and high-performance computing systems to develop a framework that facilitates more efficient use of microbiome data for applications in energy, environment, health, and agriculture.

Nearly every ecosystem and organism on Earth hosts a diverse community of microorganisms - its microbiome. Yet we know little about the functions of individual microbes, let alone how they interact with each other, their hosts, or their environments, and how their activity varies over time or in response to perturbations. The past decade has seen tremendous advances in genome and metagenome DNA-sequencing technologies, which has led to an unprecedented volume of microbiome data being generated. However, further progress in the field has been hindered by the lack of computational infrastructure for processing and performing integrative analyses of these and other microbiome-relevant data.

The NMDC, led by the DOE Joint Genome Institute (JGI)'s Emiley Eloe-Fadrosh, will tackle this data integration challenge by developing a community-centric framework based on large-scale, collaborative partnerships that draw on the capabilities, expertise, and resources of four DOE national laboratories. The guiding principles at the initiative's core are: making data findable, accessible, interoperable, and reusable (FAIR); connecting data and compute resources; and community engagement that supports open science and shared ownership.

"While this pilot project is led by DOE national labs, the data sets, resources, and community opportunities are open to all microbiome researchers, regardless of funding, institute, or domain," said NMDC Deputy Lead and JGI Director Nigel Mouncey.

Capabilities not currently available to the microbiome research community that NMDC will enable include: [bullet list]

- Accessing, analyzing, and integrating multi-omics data sets (metagenome, metatranscriptome, metaproteome, metabolome, and environmental data) to discover community dynamics, metabolic networks, and other microbe-microbe, microbe-host, and microbe-environment interactions.

- Accelerating search through linked data using existing and enhanced ways to describe microbiome data sets, diversifying the sample space and depth for new discoveries.

    - Aggregating and viewing both taxonomic and functional profiles of unassembled and assembled metagenome sequence data to gain new insights into microbiome composition and function.

Background

In 2015, the White House Office of Science and Technology Policy (OSTP) solicited input from the microbiome research community on what the key challenges facing the field were and how best to address them. Berkeley Lab submitted a coordinated Lab-wide response and a number of related papers were published thereafter, including a Policy Forum article in Science, on which Berkeley Lab's Paul Alivisatos, Eoin Brodie, and Mary Maxon were co-authors; and a Trends in Microbiology article by the JGI's Nikos Kyrpides, Natalia Ivanova, and Eloe-Fadrosh that introduced the notion of the collaborative and cited DOE's long history of jumpstarting innovative data projects.

The next year, the OSTP, in collaboration with federal agencies and private-sector stakeholders, launched the National Microbiome Initiative focused on three main priorities: supporting interdisciplinary research, developing platform technologies, and expanding the microbiome workforce. This prompted the formation of the Microbiome Interagency Working Group (MIWG). Co-chaired by the DOE, this consortium of representatives from 20-plus National Science and Technology Council (NSTC) departments and agencies was tasked with developing a Federal Strategic Plan for microbiome research.

The MIWG released its Interagency Strategic Plan for Microbiome Research in April 2018, outlining areas of focus for strategic investments over the next five years, which included the development of platform technologies that support open and transparent data through a user-friendly, robust, integrated system with expert curation.

Following a series of workshops, professional society meetings, online conferences, and visits to Washington, D.C., the FY19 Energy and Water Appropriations Bill included $10 million to "begin establishment of a national microbiome database." The NMDC was formally unveiled to the research community at a June 22 town hall held during the American Society for Microbiology's 2019 meeting in San Francisco. Funding for NMDC commenced July 1.

Phase One

The first phase of the project, a 27-month pilot, will focus on four aims: designing metadata standards; designing and deploying data-processing workflows; facilitating data integration and access; and delivering multiple opportunities for community engagement. Berkeley Lab houses several key resources for this pilot phase, most notably two data analysis platforms (the Integrated Microbial Genomes & Microbiomes and DOE Systems Biology Knowledgebase), data provided by the JGI, and data standards through participation in the Gene Ontology Consortium. Importantly, Berkeley Lab will lead the first phase of NMDC with a strong commitment to execute all related activities according to our commitment to diversity, equity, inclusion, and accountability.

Aim 1 leads Alison Boyer (ORNL), Lee Ann McCue (PNNL), and Chris Mungall (Berkeley Lab) will oversee the application of existing ontology mapping tools and curation resources to automate annotation of metadata to comply with FAIR principles. Aim 2 leads Patrick Chain (LANL) and Shane Canon (Berkeley Lab) will guide the design of workflows that leverage high-performance computing systems to generate integrated, interoperable, and reusable microbiome data. Aim 3 lead Kjiersten Fagnan (Berkeley Lab) will spearhead the development of a scalable infrastructure and web-based graphical user interface to enable scientists to explore and interact with the NMDC data.

"The study of microbiomes is currently one of the most promising arenas for discoveries to advance human health and environmental science. We are just beginning to understand the implications of this new frontier," said FAIR strategic team lead Stanton Martin (ORNL), who will provide guidance and support across Aims 1-3. "I am excited to be part of the NMDC project, which will serve as an integral public resource for data relating to microbiomes."

Aim 4 lead Elisha Wood-Charlson (Berkeley Lab) is responsible for the NMDC's communication strategy for raising community awareness and engagement. Upcoming events include an October 2019 workshop on Merging Ontologies, a December 2019 American Geophysical Union (AGU) session on Creating Data Synchronicity Across Earth Microbiome Research (FAIR data), and a related session at the Ocean Sciences Meeting in February 2020.

EurekAlert!, the online, global news service operated by AAAS, the science society: https://www.eurekalert.org/pub_releases/2019-08/dbnl-acd081219.php

RELATED