“Just one teaspoon of human stool can yield a terabyte of data. Within that rich data lies the potential for marvelous insights.”

Just one teaspoon of human stool can yield a terabyte of data, an amount of space that can hold over 500,000 photos from today’s typical phone camera. Within that rich data lies the potential for marvelous insights. As VP of Data Services for Tend, it’s my job to assemble the jigsaw puzzle of these large datasets, working with our science team to find those insights.

With a background in software engineering, I still have so much to learn about using what we know about the gut microbiome to improve human health. One of my many questions upon beginning my work with Tend was how do we accurately identify and quantify the microorganisms in the gut? Knowing precise what’s there and how much helps to answer questions further down the scientific pipeline, such as finding the differences between the microbiomes of healthy people versus those with serious gut impairments.

Metagenomics is the process of identifying, all in one go and without culturing, the species of microbes present in a sample through DNA sequencing. At Tend, we use it to examine stool, but it is also applied to sewage plant samples as leading indicators of community antimicrobial resistance and infection; agricultural samples for detection of threats to the food supply; human and animal skin and oral samples for medical research and development of therapeutics; and much more.

What I have learned so far is that metagenomics is an imprecise science. From stool sample collection to algorithms, the process differs across laboratories and locations. And variations in each step can compound along the way.

So how is metagenomics done? Consider the steps below, along with their variations, to identify and count the microorganisms present in a single human gut:

Step 1. Sampling

What a person ate and drank in recent days before producing a stool sample, whether they are healthy or ill, their lifestyle and genetics, and a number of other factors influence what species they are hosting in their gut. Conditions in the home, the time of day, and whether the sample was frozen or refrigerated before going to the clinic can influence the type and amount of microbiota in the sample. Furthermore, poop is not homogenous, so two different scoops may contain different amounts of microbiota.

Step 2. Sample Preparation

A researcher or medical professional preps the sample for lab analysis. Without a standard protocol, prep is a matter of preference, and variation can occur in steps such as:

homogenization of the sample with various tools (blender, mortar and pestle, others)
addition of glycerol to keep the microbiota alive
use of preservatives to prevent the DNA from degrading
choice of storage - the sample may be sent to the lab right away or stored in a freezer for months.

Additionally, when a sample is prepared specifically for gut microbiome transplant (aka GMT or FMT), the sample may be encapsulated for delivery orally or rectally, or mixed with saline in a blender for administration via feeding tube, with some of the preparation set aside for metagenomic testing.

Freezing, the type of preservative, whether the sample is preserved, and even the introduction of air during blending may influence which DNA is present and how much is conserved in the sample. Some bugs are more likely to survive parts of this process than others: aeration during blending, for example, can kill off anaerobic species that live in our guts.

Image courtesy of Berkeley Labs via Flickr (license)

Step 3. Sequencing

After preparation, the sample goes to a sequencing lab. Various lab techniques exist, but I’ll focus on the current state of the art, shotgun metagenomics.

In shotgun metagenomics, the sample must undergo a few processing steps to isolate DNA and make it available for sequencing. This DNA is then amplified and sequenced using specialized machines that write the sequences to a text file. The text file includes one “read” per line, plus information about the estimated quality of that read.

The choice of DNA isolation protocol and the type of sequencing machine can influence how many DNA strands are read successfully, and in what quantities.

Step 4. Bioinformatics

The file of “reads” is passed along to bioinformatics where algorithms are used to look up each read in a dictionary, or reference database, of the DNA of known taxa – i.e. kingdom, phylum, and so on down to species.

The result of the lookups is a table containing a taxonomic name and number of reads that matched that taxon. While some reads can be linked to species, others may only map to less specific phylogenetic levels (genus, kingdom). Reads that could not be matched at all are listed in the table as unknowns.

The percentage of a particular taxon’s reads compared to the total reads in the file is also recorded. This number is the relative abundance of each organism and is one of the key metrics for scientific understanding of which organisms are present, as well as comparison to other microbiomes.

Matching short DNA reads to a reference database is more complex than a simple dictionary lookup and multiple, competing algorithms have been developed. Additionally there are multiple public and private reference databases in use. Often the reads can only be mapped to a family or phylum rather than an individual species. Each of these factors can influence the final counts.

An example:

Consider two profiles derived from a single stool sample. One part was homogenized in a blender and the other part was merely scooped and sent to the lab. Their processing was otherwise identical, sequenced in the same lab, on the same day, using the same isolation, machine type, and bioinformatics pipeline. You can see below that the result showed the blender sample with 14% more abundant Firmicutes than in the scoop sample.

Varying any other part of the process could compound such a difference. Herein lies the extraordinary opportunity and challenge to the field of metagenomics. At Tend we’re working on solutions to reduce this variability and increase standardization across FMT treatment and microbiome data reporting. That means a sample collection and prep device that's simple to operate consistently across many clinicians and researchers. And a data system that leverages metagenomics, science, and data science to measure and account for the imprecisions of metagenomics, in order to more confidently report on the gut microbiome.

References

What’s in your poop? Turns out, (way) more than you’d think.

Step 1. Sampling

Step 2. Sample Preparation

Step 3. Sequencing

Step 4. Bioinformatics