W12 - Workshop on Recent Computational Advances in Metagenomics

The workshop will take place on September 7 at the PMC (Palais de la Musique et des Congrès, Strasbourg Convention Centre, place de Bordeaux) in the room "Salon Orangerie".

Preliminary program :

09:00 - 09:10 Welcome

09:10 - 10:10 Keynote Steven Kembel (University of Québec at Montréal): "Using metagenomics to model community assembly in the plant microbiome"

10:10 - 10:45 Pierre Péricard: "SortMeRNA 2: ribosomal RNA classification for taxonomic assignation" (see abstract and co-authors below)

10:45 - 11:15 Coffee break

11:15 - 11:50 Simon Foucart: "Quikr & WGSQuikr: Rapid Bacterial Community Reconstruction Via Compressive Sensing"  (see abstract and co-authors below)

11:50 - 12:25 Kévin Vervier: "Towards Large-scale Machine Learning for Metagenomics Sequence Classification"  (see abstract and co-authors below)

12:25 - 13:00 Frédéric Mahé: "Swarm: robust and fast clustering method for amplicon-based studies"  (see abstract and co-authors below)

13:00 - 14h15 Lunch

14:14 - 15:15 Keynote Aaron Darling (University of Technology, Sydney): "Toward resolving the fine scale genetic structure of microbial populations: a metagenomic Hi-C approach"

15:15 - 15:50 Edi Prifti: "Quantitative metagenomics: form reads to biomarkers" (see abstract and co-authors below)

15:50 - 16:20 Coffee break

16:20 - 16:55 Frédéric Plewniak: "Metagenome-scale metabolic network reconstruction"  (see abstract and co-authors below)

16:55 - 17:30 Clovis Galiez: "Identifying distant homologous viral sequences in metagenomes using protein structure information"  (see abstract and co-authors below)

17:30 Closing remarks

 

 

Objectives:

This workshop aims at promoting discussions and collaborations between biologists (modelers), computer scientists and applied-mathematicians involved in metagenomics and/or metatranscriptomics studies, either in the bioinformatics or statistical aspects of such analysis.

Metagenomics studies refer to analyses based on high-throughput sequencing of environmental samples and microbial ecosystems. Both marker-gene (16S, 18S, ITS, ...) and whole-genome strategies will be adressed to cover a wide array of question ranging from quantifying the microbial diversity in order to find structuring factors to assessing the functional role of microbial communities.

The workshop will provide an overview of the state-of-the-art methods currently used in metaomics including comparative metagenomics and metatranscriptomics. At the other end of the spectrum, case-studies will illustrate how these methods produce insightful biological knowledge.

Scope:

The program of this workshop will consist mainly of presentations of refereed papers (30 minutes) and 2 invited talks (50 minutes). Contributions are welcomed on all aspects about bioinformatics and statistical analyses of metaomics datasets, including, but not limited to:

  • data reduction,
  • comparative metaomics,
  • models for community assembly,
  • informal methods (that could be candidate to formalization),
  • challenges for tackling the complexity of metaomics datasets.

Keynote speakers:

  • Steven Kembel¬†(University of Qu√©bec at Montr√©al): "Using metagenomics to model community assembly in the plant microbiome"
  • Aaron Darling (University of Technology, Sydney): "Toward resolving the fine scale genetic structure of microbial¬†populations: a metagenomic Hi-C approach"

Accepted contributions:

  • Quikr & WGSQuikr: Rapid Bacterial Community Reconstruction Via Compressive Sensing

D. Koslicki, S. Foucart and G. Rosen

Short abstact : 

Many metagenomic studies compare hundreds to thousands of environmental and health-related samples by extracting and sequencing their DNA. However, one of the first steps - to determine what bacteria are actually in the sample - can be a computationally time-consuming task since most methods rely on computing the classification of each individual read out of tens to hundreds of thousands of reads. We introduce Quikr: a QUadratic, K-mer based, Iterative, Reconstruction method which computes a vector of taxonomic assignments and their proportions in the sample using an optimization technique motivated from the mathematical theory of compressive sensing. On both simulated and actual biological data, we demonstrate that Quikr is typically more accurate as well as typically orders of magnitude faster than the most commonly utilized taxonomic assignment techniques for both whole genome techniques (Metaphyler, Metaphlan) and 16S rRNA techniques (the Ribosomal Database Project's Naive Bayesian Classifier). We also show that in general nonnegative L1 minimization can be reduced to a simple nonnegative least squares problems.  

  • Towards Large-scale Machine Learning for Metagenomics Sequence Classification

K. Vervier, J.-P. Vert, M.Tournoud, J.-B. Veyrieras and P.Mahé

Short abstract :

We investigated the potential of modern, large-scale SVM implementations for taxonomic binning and compared them to similarity-based approaches, like BLAST and TMAP. In practice, compositional approaches offer significant gain in terms of classification time over similarity-based approaches while achieving comparable performances in terms of classification accuracy. Compositional approaches must be trained on a set of sequences with known taxonomic labels, typically obtained by sampling fragments from reference genomes.

We considered a reference database with 356 complete genome sequences from 52 bacterial species to mimic the expected flora of a Human respiratory sample and simulated test sets of around 130k Roche 454 and IonTorrent PGM reads. We observed that increasing the number of training fragments (up to five millions) and longer k-mers (up to k=15) improved the accuracy of SVM models to values similar to TMAP. In terms of speed, the best SVM was at least 17 times faster than TMAP and took no more than 3 minutes to classify the 130k test sequences on a single core. These first results demonstrate the potential of SVM-based methods with massive training set for sequence classification in metagenomics.

  • Swarm: robust and fast clustering method for amplicon-based studies

F. Mahé, T. Rognes, C. Quince, C. de Vargas and M. Dunthorn

Short abstact : 

Popular de novo amplicon clustering methods suffer from two fundamental flaws: arbitrary global clustering thresholds, and input-order dependency induced by centroid selection. Swarm was developed to address these issues by first clustering nearly identical amplicons iteratively using a local threshold, and then by using clusters' internal structure and amplicon abundances to refine its results. This fast, scalable, and input-order independent approach reduces the influence of clustering parameters and produces robust operational taxonomic units, improving the amount of meaningful biological information that can be extracted from amplicon-based studies.

  • SortMeRNA 2: ribosomal RNA classification for taxonomic assignation

E. Kopylova, L. Noé, P. Pericard, M. Salson and H. Touzet

Short abstact : 

We have developed SortMeRNA, a software designed to filter ribosomal reads from metatranscriptomic or metagenomic data. It is capable of handling large data sets and sorting out all fragments matching to a database of annotated ribosomal RNA sequences from the three-domain system with high sensitivity and a low running time. 

We propose a new version, SortMeRNA2, with extended functionalities for improved data analysis. Most importantly, it can now perform sequence alignments to any ribosomal RNA database, which allows the user to study the taxonomic content of a microbial sample. For that, we have developed an alignment strategy based on approximate seeds and seed extension using a variant of the Longest Increasing Subsequence. SortMeRNA2 also applies statistical analysis to evaluate the significance of an alignment, based on the E-value, which confers a great accuracy to the program. 

  • Quantitative metagenomics: form reads to biomarkers

E. Prifti, E.Le Chatelier, N. Pons, M. Almeida, A. Ghozlane, F. Plaza, N. Aram Gaye, P. Leonard, J.-M. Batto and D. Erlich

Short abstact : 

The study of complex microbial ecosystems has been increasingly improved with the advent of metagenomics. The role of gut microbiota in human health and disease has received unprecedented attention over the past few years and several complex chronic diseases have been associated with gut microbiota. Even though the number of metagenomics data-mining tools is growing many issues concerning data processing and statistical analyses are still to be tackled. Here we discuss our experience with different data processing techniques and analytical approaches that have been proposed or adapted to explore quantitative metagenomics data in identifying gut microbial biomarkers associated with complex diseases. We illustrate the application of these different approaches, implemented in a suite of tools (Meteor Studio, MetaOMineR and Metaprof), to real gut microbiome data in different studies. 

  • Identifying distant homologous viral sequences in metagenomes using protein structure information

M. Boccara, M. Carpentier, J. Chomilier, F. Coste, C. Galiez, J. Pothier and A. Veluchamy

Short abstact : 

It is estimated that marine viruses daily kill about 20% of the ocean biomass. Identifying them in water samples is thus a biological issue of great importance. The metagenomic approach for virus identification is a challenging task: standard homology searches usualy fail to identify them especially because of the large amount of mutations in their sequences.
The PEPS VAG project aims at establishing a novel methodology that uses protein structures as extra-information in order to annotate metagenomes without relying only on sequence homology.
In the context of the first experiments made on one metagenome of the TARA Ocean Project, we use the structures of capsid proteins to infer signatures in the amino-acid sequence that are characteristic of their fold. We will present here the methodology, the first experiments and the on-going improvements
 

  • Metagenome-scale metabolic network reconstruction

F. Guktnecht, G. Collet, J. Cambefort, J. Andres, S. Prigent, D. Eveillard, A. Siegel, F. Plewniak

Short abstact : 

Genome-scale metabolic networks have been successfully used to understand a wide range of organism's physiology from Escherichia coli to Homo sapiens. However, despite their interest, individual genomes remain difficult to extract from their natural environment, which seriously limits our ability to model the metabolism and metabolic interactions within natural ecosystems. We present herein the reconstruction of a metabolic network model of the microbial community of Carnoulès Acid Mine Drainage. This community was shown to be intricate and dominated by seven bacterial strains and a photosynthetic protist, Euglena mutabilis.

The complete network was built from a combination of transcriptomic, metabolomic, metagenomic and metatranscriptomic data allowing investigating all potential interactions between the partners by the sake of introduction of hypothetical exchange reactions to link the non-produced reactants of every partner to the corresponding metabolite produced in others. Although more elaborate analyses still remain to be done in order to identify in detail the actual exchanges at stake, preliminary observations suggest that metabolic networks reconstructed at the scale of a whole microbial community from metagenomic data may provide insights into the potential interactions between its members.

 

Submission:

Please send the abstract to This email address is being protected from spambots. You need JavaScript enabled to view it.

All submitted abstracts will be peer-reviewed by the program committee.

Submitted abstracts should not exceed 2 pages including bibliography,

Abstracts must be written and presented in English. They may describe original work that has been published or that is simultaneously
submitted to a journal, conference, or workshop with refereed proceedings.

Program co-chairs:

  • Sophie Schbath (INRA/MIG, Jouy-en-Josas, France)
  • Valentin Loux (INRA/MIG, Jouy-en-Josas, France)
  • Mahendra Mariadassou (INRA/MIG, Jouy-en-Josas, France)

 Sponsor:

  • Meta-omics of Microbial Ecosystems (M2E)¬†metaprogramme from INRA, the french National Institute for Agricultural Research

Latest News

Awards ECCB'14 awards have been announced during the closing ceremony of ECCB'14 on Wednesday September... Read more
Next conferences: ISMB/ECCB & JOBIM 2015 Next ECCB will be held in conjunction with ISMB in Dublin, Ireland, July 10-14, 2015: ISMB/ECCB... Read more

Silver Sponsors

Other Exhibitors