Journal of Systems Biology & Proteome Research

Commentary - Journal of Systems Biology & Proteome Research (2017) Volume 1, Issue 1

Searching nearest neighbours in metric microbiome spaces using phylogenetic distance measures

Andreas Henschel*

Masdar Institute, Khalifa University of Science and Technology, United Arab Emirates

Corresponding Author:
Andreas Henschel
Masdar Institute
Khalifa University of Science and Technology
United Arab Emirates
Tel: +971 2 810 9222
Email: [email protected]

Accepted Date: October 09, 2017

Citation: Henschel A. Searching nearest neighbours in metric microbiome spaces using phylogenetic distance measures. J Syst Biol Proteome Res. 2017;1(1):5-6

Visit for more related articles at Journal of Systems Biology & Proteome Research


The deposition of metagenomic data in large scales bears great potential to understand universal mechanisms and environmental factors of microbial community assembly. Notable efforts are the Human Microbiome Project, the Earth Microbiome Project, IMNGS and Qiita/QiimeDB [1-4]. We have thus entered a new era, in which it is in principle possible to discover commonalities between microbial communities from entirely different ecosystems. E.g., samples from below the ocean floor were surprisingly similar to non-marine communities due to methanogens [5]. However, with the current data deluge from Next Generation Sequencing projects it is becoming increasingly difficult to perform manual exhaustive searches to find the most similar microbial communities. Suitable search algorithms for microbiomes are required and solutions for automated microbiome search are beginning to appear, e.g., Meta-Storms [6] and Visibiome [7].

A popular approach to describe compositional features of microbial communities is through marker gene sequencing, in particular using the hypervariable regions of the 16S rRNA gene. These sequences can then be clustered into Operational Taxonomic Units (OTUs). In turn, a sample is represented as a vector of relative abundances of OTUs, spanning the microbiome search space. The total number of considered OTUs (either from a reference library or picked de novo) determines the dimensionality of the microbiome search space. The advent of deep environmental sequencing takes microbiome search literally to another dimension: many low abundance OTUs are now above the detection threshold. Moreover, the previously unappreciated diversity of different, novel bacterial OTUs in the environment [8] exacerbates this curse of dimensionality. From the classical Nearest Neighbour similarity searches, only few algorithms like GNAT and AESA can handle such high dimensions in the order of tens of thousands (or above) [9,10]. Their complexities are suitable for the size of the abovementioned sample databases. Nearest Neighbour search commonly requires a metric distance measure (e.g., fulfilling the triangle inequality). Weighted UniFrac [11], a popular tool for measuring distances in microbiomes, indeed is a metric while also appreciating phylogenetic relations between OTUs. Visibiome therefore deploys an Earth Mover Distance based implementation of weighted UniFrac [12], optimized for sparse vectors (since not every sample contains every OTU). The entire search algorithm is capable of sublinear searches in highdimensional metric spaces.

In order to scale to high demands while simultaneously providing user friendly access to microbiome research, Visibiome leverages a scalable, modular and distributed architecture that combines web framework technology, task queuing and scheduling, cloud computing and a dedicated database server.

In analogy to sequence similarity search tools like BLAST [13] that facilitate annotation transfer, Visibiome matches novel microbial communities to other well annotated samples and can thus provide clues about the function of a particular community at hand. Extending the analogy, the equivalent of BLAST’s query-subject sequence alignment is in Visibiome a series of comparative stacked bar charts that show corresponding abundances of compositional taxa in query and subject, on various, user selected taxonomic levels.

In conclusion, novel search engines for microbial communities are poised to cope with the demand created by the data deluge in microbiome research. Visibiome in particular is a convenient, scalable and efficient framework to search microbiomes against a comprehensive database of environmental samples. It confirmed the atypical composition of the abovementioned ocean floor sample. The search engine leverages a phylogeny based distance metric, while providing advantages over existing tools.