Ancient Metagenomics

The techniques in this section of the book can be used in a variety of stages of any ancient metagenomics projects, for screening for pathogens (what species should I target for downstream genomic mapping?), for differential abundance analysis (does the community make of this sample change between different cultural periods?), but also for reference-free assembly of genomes (can I recover the genome architecture of a variety of species in my sample?). It focuses on the concept of ‘many samples to many genomes’ using high-throughput techniques and algorithms and trying to analyse data at whole ‘community’ levels.

Taxonomic Profiling

One of the first questions when looking at metagenomics data is often Who is there?. But this question is actually not specific to (ancient) metagenomics, but in fact is a broader ecological question that has first been applied to macro ecological systems.

In this chapter, we will show you how to answer this question at the microbial scale using methods applied to ancient metagenomics data. We will talk about taxonomic profiling, ecological diversity metrics, how to compare different samples and see how they relate through their community composition. We will learn about taxonomic profiler tools, ways to analyze and visualize taxonomic profiles, libraries to manipulate taxonomic profile tables, and libraries to compute and visualize ecological diversity metrics.

De novo Assembly

De novo assembly of ancient metagenomic samples enables the recovery of the genetic information of organisms without requiring any prior knowledge about their genomes. Therefore, this approach is very well suited to study the biological diversity of species that have not been studied well or are simply not known yet.

In this chapter, we will show you how to prepare your sequencing data and subsequently de novo assemble them. Furthermore, we will then learn how we can actually evaluate what organisms we might have assembled and whether we obtained enough data to reconstruct a whole metagenome-assembled genome. We will particularly focus on the quality assessment of these reconstructed genomes and how we can ensure that we obtained high-quality genomes.

Authentication

When exploring composition of metagenomic samples, validation of the detected organisms represents a particular challenge, because it may happen that the detected organism may have have the following characteristics

  1. Was mis-identified (the DNA belongs to another organism than initially thought)
  2. Has a modern origin (for example, lab or sequencing contaminant)
  3. Is of exogenous origin (for example, an ancient microbe that entered the host post-mortem).

In this case, authentication analysis needs to be performed to demonstrate that the detected organism is truly present in the metagenomic sample and is of endogenous and ancient origin.

In this chapter, we explain how to recognise that a detected organism was mis-identified based on breadth / evenness of coverage, how to validate findings using alignments and assess multiple mapping quality metrics, how to detect modern contaminants via deamination profile, DNA fragmentation and post-mortem damage scores.

We will also introduce metaDMG, a tool currently under development, that enables fast and flexible taxonomic profiling as well as the assessment of post-mortem DNA damage in metagenomic datasets of ancient eukaryotic sequences. We will learn how to execute metaDMG via the command line and explore some primary statistics using R scripts to evaluate deamination patterns and mean fragment length on a temporal scale.

Contamination

When ancient metagenomic samples have been taxonomically profiled and organisms of interest have been identified, there is a risk that presence of unwanted exogenous or modern DNA can bias conclusions of the analysis. In this case, decontamination analysis needs to be performed for filtering out organisms that are not related to the metagenomic samples.

In this chapter, we explain decontamination analysis via comparison with negative controls (blanks), as well as similarity to expected microbiome source (microbial source tracking).