SPAAM3
Programme
Session Structure
Length: 3 days
All times CEST (see here for Time Zone conversion)
Day 1 - September 1 (15:00-18:55)
- 15:00-15:15 Welcome to SPAAM3
- 15:15-16:35 Session 1: Ethical considerations and best practices in ancient metagenomics research
- 15:15-15:25 “Respectfully moving forward: Discussing how the ancient metagenomic community will address ethical concerns” - Rita Austin
- 15:25-15:40 “Addressing the shifting legacies of US racism through science communication and dialogue” - Justin Lund
- 15:40-15:50 “Bolstering Ethics in aDNA: the Use of Ethics Review” - Jessica Hider
- 15:50-16:35 Joint discussion with presenters
- 16:35-16:45 Break
- 16:45-18:40 Session 2: Current challenges and biases in ancient patho(meta)genomics
- Part I: Looking for the invisible: lab and bioinformatic techniques to detect pathogens
- 16:45-16:55 “Methodology for rapid scanning of aDNA extracts based on real-time PCR melting curves” - Richell Maribet Ramírez Molina
- 16:55-17:05 “Few and far between? Detection and reconstruction of ancient pathogen genomes” - Meriam Guellil
- 17:05-17:15 “Quantitative Construction of Metagenomic Screening Databases” - Ian Light
- 17:15-17:45 Joint discussion with presenters
- 17:45-17:55 Break
- Part II: How to present and discuss your findings
- 17:55-18:10 “Paleopathology-informed sampling strategies for Mycobacterium tuberculosis complex aDNA recovery” - Kelly E. Blevins
- 18:10-18:40 Joint discussion with presenter
- Part I: Looking for the invisible: lab and bioinformatic techniques to detect pathogens
- 18:40-18:55 Open discussion and recap
- 18:55 - ?? Join us in Gathertown for brew O’Clock! Let’s get to know each other, chat or play games!
Day 2 - September 2 (15:00-18:00)
- 15:00-15:10 Welcome and Introductions
- 15:10- 18:00 Session 3: Urgent challenges and future steps in ancient microbiome research
- Part I: Systemizing Pre-Laboratory Set Up: Optimizing and standardizing sample collection for reproducibility in ancient metagenomics
- 15:10-15:30 “The Importance of Documenting Oral Geography” - Abigail Gancz
- 15:30-15:50 Joint discussion with presenter
- 15:50-16:00 Break
- Part II:From bag to tube: Identifying the challenges and opportunities to advance laboratory practices
- 16:00-16:20 Joint discussion
- Part III: What is minimal and what is better? Best practices for authenticating and analyzing ancient microbiome datasets
- 16:20-16:40 “Impacts of agro-pastoral societies on biodiversity: palaeogenomic, palaeoecological and archaeological approaches” - Nathan Martin
- 16:40-17:00 “Archaeogenetic detection of eukaryotes in host-associated microbial communities: challenges and considerations” - Allison Mann
- 17:00-17:20 “The problem of false discoveries in ancient microbiome analysis” - Nikolay Oskolkov
- 17:20-17:40 “Study of biodiversity and the past environments by DNA in sediment records from Chalco Lake, Mexico” - Barbara Moguel
- 17:40-18:00 Joint discussion with presenters
- 18:00 -?? Join us in Gathertown for brew O’Clock! Let’s get to know each other, chat or play games!
Day 3 - September 3 (15:00-17:15)
- 15:00-15:05 Hello and recap
- 15:05-17:00 Session 4: Tool up or die - gaps and solutions in ancient metagenomics analysis
- Part I: Let’s start from the beginning: Metagenomic classification
- 15:05-15:15 “Which taxonomic classifier should I use?” - Irina Velsko
- 15:15-15:25 “Computing lowest common ancestors on SAM files with sam2lca” - Maxime Borry
- 15:25-15:55 Joint discussion with speakers on metagenomic classification
- 15:55-16:10 Break
- Part II: Pipelines to analyse your data
- 16:10-16:20 “High-throughput and scalable ancient metagenomic analysis using nf-core/eager” - James Fellows Yates
- 16:20-16:30 “Phylogenomics of ancient and modern pathogens with AMPHY” - Frederic Lemoine
- 16:30-17:00 Joint discussion with speakers on how to analyse your data
- Part I: Let’s start from the beginning: Metagenomic classification
- 17:00-17:15 Final open discussion, choosing next SPAAM organising committee and closing of SPAAM3
- 17:15-?? Join us in Gathertown for brew O’Clock! Let’s celebrate SPAAM3!
Abstracts
Session 1
Ethical considerations and best practices in ancient metagenomics research
Respectfully moving forward: Discussing how the ancient metagenomic community will address ethical concerns
Rita Austin1
1 Frontiers in Evolutionary Zoology, Natural History Museum, University of Oslo, Oslo, 0318, Norway
The incredible developments in molecular technologies and techniques have made, otherwise invisible, narratives from and about the past accessible. However, the field’s scientific advancements have overshadowed ethical considerations and concerns. In overlooking ethical necessities and frameworks, disengaged research strategies, practices, and interpretations have been implemented. Scientists conducting ancient metagenomics must reconceptualize their research processes and goals to respect the human remains being utilized, the ancient contexts they represent, and the descendant and communities of practice of assessed individuals. The current drive for more ethical practices within ancient metagenomic research provides the opportunity for realistic and respectful standards to be developed and applied. However, collaborative discussions among practitioners and assessment of what specific ethical best practices are, has not been assessed or synthesized. The SPAAM conference offers a unique opportunity to present, openly discuss, and begin to develop the field’s ethical research standards in a collaborative environment. In discussing ethical expectations, this research aims to synthesize how the ancient metagenomic community understands and wants to improve their research conduct.
Addressing the shifting legacies of US racism through science communication and dialogue
Justin Lund1
1 Laboratories of Molecular Anthropology and Microbiome Research (LMAMR), University of Oklahoma, Norman Oklahoma, USA
Anthropology is well aware of the complex relationship between colonization and racism. For much of the public, however, these realities are only now becoming apparent. The pandemic and the events that lead to the racial unrest of 2020 have highlighted for many people some of the more problematic aspects of our colonial history and how those past events continue to negatively impact our society. Unchecked these histories will play out in our society every day, including within the sciences. In the past, even very recently, geneticists did not need to be aware of these histories. Here, I demonstrate the shifty nature of racialized politics and how they impact the field of ancient genomics and why it is important. I share my perspective and concern as an Indigenous anthropologist and suggest a path forward to mend these complex relationships. As geneticists we create powerful narratives about the past, and as scientists we may be unequipped for the politics we find ourselves and our research within. By recognizing the political nature of science, we should also come to recognize our role as unwitting politicians. Future efforts to educate will move beyond the colonial academy and find direct avenues to foster discussions with those who need the power of knowledge most.
Bolstering Ethics in aDNA: the Use of Ethics Review
Jessica Hider1
1 Ancient DNA Center, Department of Anthropology, McMaster University, Hamilton, ON, Canada
We are working to improve ethical ancient DNA research. I have been working with the modern medical tissue ethics review board at McMaster University to create a form that is applicable to ancient and historical samples. Ancient DNA research at McMaster does not currently fall under the purview of this ethical board. We typically determine whether a project is appropriate and adds to our understanding of the past through discussions with collaborators and with culturally or biologically related (or potentially related) communities. We are hoping to improve the way we do research by also incorporating ethical review. We genuinely care about reducing the destruction of finite remains and reducing harm to individuals and groups who have been (or might be) negatively impacted by colonial and oppressive research. Many of the questions that are asked on a modern tissue ethics form do not apply to ancient DNA research. Many other questions need to be asked. We hope that thoughtful forms created with concern for supporting the rights and interests of Indigenous and descendent communities can support ethical research and also responsible treatment of remains. We hope to encourage more people (including ourselves) to consider ethical review. We also acknowledge that it is extremely difficult to create a form that might cover all circumstances experienced with ancient molecular research. We further acknowledge that this is one small step in the right direction and that there are many other facets to completing responsible, useful research.
Session 2
Current challenges and biases in ancient patho(meta)genomics
Methodology for rapid scanning of aDNA extracts based on real-time PCR melting curves
Richell Maribet Ramírez Molina1, Angélica González Oliver1
1 Laboratorio de Antropología Molecular. Departamento de Biología Celular. Facultad de Ciencias. Universidad Autónoma de México, México
This presentation is aimed at researchers who carry out paleopathology studies and are interested in learning about a simple strategy to identify the extracts with pathogenic DNA that should be analyzed. High-throughput sequencing is used to sequence aDNA, exist strategies of DNA capture and enrichment that increase research efficiency; however, they tend to be highly expensive. First, aDNA extracts are analyzed with real-time PCR. In second place, the extracts that emitted a positive signal possibly contain the pathogenic DNA are then selected. Finally, the selected extracts are subjected to sequencing procedures. Real-time PCR is a simple and inexpensive strategy, however, one of the main problems are the false positives caused by non-specificity of the primers, which causes an unnecessary search of bacterial DNA in the extracts that do not contain the DNA of interest. I wish to mention a methodology for rapid scanning of aDNA extracts based on Melting curves, which are generated in real time PCR. By analyzing the curve and the Melting Temperature, it is possible to select the extracts that contain the amplificons with the size and content of GC like the marker of interest, therefore this method is a simple strategy that decreases research time and costs.
Few and far between? Detection and reconstruction of ancient pathogen genomes
Meriam Guellil1
1 University of Tartu, Institute of Genomics, Estonia
The study of pathogenic bacterial and viral species in ancient DNA datasets is expanding our knowledge of the evolutionary history of infectious diseases rapidly. However, their identification is plagued by low DNA preservation often lying under 0.1% for bacterial species and a few reads for viral species in shotgun datasets. Additionally, due to small fragment lengths in ancient DNA, only a fraction of the total DNA will be classifiable. To address this lack of analyzable data and eliminate false-positive results, a thorough screening workflow, knowledge of the genome structure of studied species and sequence similarity across genera are indispensable. Furthermore, sampling locations and tissue types can lead to differential interpretations of hits across datasets. In this talk, I will address common issues and further expand on tools and techniques at our disposition to increase target DNA content. Emphasis will be on the use of KrakenUniq for ancient DNA data analysis. The talk is targeted to researchers working or starting out in the fields of pathogenomics and ancient metagenomics.
Quantitative Construction of Metagenomic Screening Databases
Ian Light1, Felix M. Key1
1 Max Planck Institute for Infection Biology, Berlin, Germany
The increasing sizes of reference genome databases over the past several years pose a challenge for whole-genome-based species identification in ancient metagenomic data. In particular microbial species, database entries of nearly identical reference-quality genomes number in the thousands; thus database reduction is an important step to reduce memory requirements and speed up processing of samples. Prior approaches include random sampling of taxon genomes, but this approach may not sample relevant intra-taxon diversity and cause a loss in specificity and sensitivity. Alternative approaches using whole-database genome clustering are computationally intensive and not feasible for many databases and labs without access to large computational resources. We will present some possible quantitative approaches to informed database size reduction that aim to preserve genetic diversity, reduce database size, and identify incorrectly labeled sequences. We hope that discussing our approach and gathering input from others can help bolster systematic approaches to database size reduction. This talk is intended for researchers who utilize whole-genome databases but want a way to reduce the number of sequences included for screening while retaining high sensitivity and specificity.
Paleopathology-informed sampling strategies for Mycobacterium tuberculosis complex aDNA recovery
Kelly E. Blevins1, Josefina Mansilla Lory2, Jane E. Buikstra3, and Anne C. Stone3
1 Department of Archaeology, Durham University, UK 2 Direción de Antropología Física, Instituto Nacional de Antropología e Historia, Mexico 3 Center for Bioarchaeological Research, Arizona State University, USA
Tuberculosis (TB) is a global disease that remains a significant cause of morbidity and mortality, despite decades of vaccine development efforts, concerted public health campaigns, and improvements in detection and treatment. The origin and evolution of the causative agents of tuberculosis, members of the Mycobacterium tuberculosis complex (MTBC), can be analyzed directly using ancient genomes recovered from archaeological remains. Our ability to predict successful MTBC aDNA recovery, however, remains tenuous, resulting in destructive sampling that yields few successful results. To address the uncharacterized relationship between sampling strategy and MTBC DNA yield, we selected 56 elements representing 48 individuals with a spectrum of skeletal lesions associated with TB from Tlatelolco, a late postclassic Mesoamerican urban center (1300-1521 CE). We subsampled and extracted DNA from each element in various locations and across various pathological categories. DNA extracts were shotgun sequenced, and MTBC positive samples were identified using a taxonomic binning approach with a custom database of mycobacteria and other closely related genera. Of 56 elements, 13 representing 10 individuals were positive for MTBC DNA. Positive screening assignments were confirmed using in-solution hybridization capture. We compare differences in sampling element, sampling location, and pathological category between MTBC positive and negative samples and within MTBC positive samples. Additionally, we describe how we justified our large destructive sampling project. Our findings suggest that sampling element, location, pathological manifestation, and age of individual affect MTBC DNA recovery. We discuss the implications of these results for MTBC aDNA sampling and propose guidelines for justifying destructive sampling projects.
Session 3
Urgent challenges and future steps in ancient microbiome research
The Importance of Documenting Oral Geography
Abigail Gancz1
1 Penn State, USA
Dental calculus is a non-renewable biomolecule with immense potential for informing researchers about the diets, lifestyles, and health trends of past populations. With increasing calls for destructive analyses (ex. paleogenomic, isotopic, proteomic, and phytoliths approaches) it is imperative to evaluate and improve the ways in which the dental calculus research community documents and samples this material. To address this issue, we surveyed over 110 laboratories and researchers regarding how they sample dental calculus, what types of skeletal and dental metadata they record and utilize in their analyses, and how they specifically account for contamination in their procedures. Our results, derived from 64 unique respondents, indicate that practitioners have highly variable approaches to metadata collection and utilization, dental calculus sampling, and contamination controls. While some of these differences are doubtlessly linked to the rapid advancement of dental calculus research in recent decades, others are derived from intellectual silos between those excavating skeletal remains and those analyzing dental calculus in laboratory settings. This communication challenge has substantial implications not only for multidisciplinary cooperation but for the capacity of future researchers to engage in population-level investigations. Without adequate skeletal (sex, age, pathologies, etc) and dental (presence/absence of dental calculus, caries, etc) metadata, paleoepidemiological inquiries that attempt to extand dietary and health questions to the population level are unable to account and control for biases in individuals and sampling. This impedes not only paleoepidemiologists, but all researchers interested in cross-cultural or temporal comparisons. Therefore, based on the findings of our survey, we conclude our paper by providing a list of guidelines regarding dental calculus recording and sampling as well as a freely downloadable sample recording sheet.
Impacts of agro-pastoral societies on biodiversity: palaeogenomic, palaeoecological and archaeological approaches
Nathan Martin1, Régis Debruyne2, Pierre Stephan3, Jose Utge-Buil4, Françoise Dessarps4, Gregor Marchand5, Dominique Marguerie1, Fréderique Barloy-Hubler6, Morgane Ollivier1
1 UMR 6553 ECOBIO, University of Rennes 1, Rennes, France 2 National Museum of Natural History, Paris, France 3 LETG - UMR6554, European University Institute of the Sea, Plouzané, France 4 National Museum of Natural History UMR 7206, Paris, France 5 UMR 6566 - CReAAH, CNRS - University of Rennes 1, Rennes, France 6 Institute of Genetics and Development of Rennes, UMR6290-CNRS, University of Rennes 1, Rennes, France
Paleo-environmental DNA (PalenvDNA), thanks to the development of high-throughput sequencing has been proved to be a valuable technique for retrieving past presence of taxa in given environments. PalenvDNA can be obtained from sedimentary archives where it has been trapped and preserved. A promising method to investigate PalenvDNA is the targeted capture approach. Its use as a complement to archeological and geological data has delivered promising results on humans and certain mammals, and in specific environments. Thus, we have design capture probes targeting chloroplastic loci and mitochondrial genomes of several plants and mammals species of interest. We tracked their taxonomic presence in sediment coming from lakes and other challenging environments such as peatbogs and estuarine-lagoon. To assess reliable taxonomic assignment, complex downstream bioinformatic analyses are required. Despite rapid advances in aDNA techniques, improvements are still needed. We explore different assignation strategies by comparing mapping and k-mers approaches, and investigate the effects of database design. We provide a light on bioinformatic pipeline to assess more reliable and authenticate data and propose an improved approach. We could also assess that PalenvDNA can be successfully retrieved from sediments from various environment types thanks to the capture approach. Also, we demonstrate positive and improved identification of extinct and extant taxa correlated to archeological context thanks to our bioinformatic approach.
Archaeogenetic detection of eukaryotes in host-associated microbial communities: challenges and considerations
Allison Mann1, James A. Fellows Yates2,3, Zandra Fagernäs2, Rita M. Austin4,5,6,7, Elizabeth A. Nelson2,8,9, Courtney A. Hofman5,6
1 Department of Biological Sciences, Clemson University, Clemson, SC, USA 2 Department of Archaeogenetics, Max Planck Institute for the Science of Human History, Jena, Germany 3 Institut für Vor- und Frühgeschichtliche Archäologie und Provinzialrömische Archäologie, Ludwig Maximilian University, München, Germany 4 Department of Anthropology, Smithsonian National Museum of Natural History, Washington, DC, USA 5 Department of Anthropology, University of Oklahoma, Norman, OK, USA 6 Laboratories of Molecular Anthropology and Microbiome Research, University of Oklahoma, Norman, OK, USA 7 Natural History Museum Oslo, University of Oslo, Oslo, Norway 8 Department of Anthropology, University of Texas at Arlington, Arlington, TX, USA 9 Institut für Naturwissenschaftliche Archäologie, Eberhard Karls Universität Tübingen, 72074, Tübingen, Germany
Archaeological dental calculus and other preserved microbiome substrates can serve as long-term reservoirs of biomolecules informative of human health, behavior, and evolution. Recently, dental calculus and other host-associated archaeological substrates have been explored as a source of information on non-host eukaryotes, including those that might be indicative of diet (e.g., plants and animals), using archaeogenetic techniques. The detection of eukaryotic species in ancient metagenomic datasets presents specific challenges that differ from other members of the microbial community (e.g., bacteria). In this study we demonstrate some of these challenges using both synthetic and previously published ancient microbiome datasets. We find that accurate detection of eukaryotes in archaeological microbiome substrates is hindered primarily by incomplete reference databases, microbial richness, and high genomic similarity between eukaryotic groups. Our work demonstrates that although archaeological microbiome substrates may serve as long-term sources of endogenous DNA, further development of authentication criteria and methods to detect eukaryotic taxa are needed.
The problem of false discoveries in ancient microbiome analysis
Nikolay Oskolkov1
1 Molecular Cell Biology, Lund University, Sweden
In this presentation, I will report results of testing our ancient microbiome specific analysis workflow on aDNA samples from humans, mammals, insects and sediments. Particularly, I will discuss the problem of false discoveries in ancient microbiome analysis that we experienced due to incomplete and contaminated databases. As an example, screening microbial communities with a database enriched for microbial pathogens resulted in increased false-discovery rate. In addition, our authentication and validation analysis was complicated by the fact that aDNA data represent a complex mixture of both ancient and modern sequences originating from multiple organisms that are sometimes difficult to disentangle. This can result in spurious signals coming from PCR reagent contaminants that might appear to have a descent deamination pattern due to non-trivial mixing with damaged sequences from truly ancient organisms. Finally, I will emphasize peculiarities of our ancient microbiome specific workflow compared to other aDNA pipelines established in the field
Study of biodiversity and the past environments by DNA in sediment records from Chalco Lake, Mexico
Barbara Moguel1,2, Liseth Pérez3, Luis David Alcaraz4, Socorro Lozano-García5, Luis Herrera- Estrella6,7, María Ávila-Arcos1, Juan Pedro Laclette8, Israel Muñoz-Velasco9
1 Universidad Nacional Autonoma de México, International Laboratory of Human Genome Research, Population and Evolutionary Genomics, México 2 Tecnológico de Monterrey, Escuela de Ingeniería y Ciencias, Centro de Bioingeniería, Campus Querétaro, México 3 Institut für Geosysteme und Bioindikation, Technische Universität Braunschweig, Germany 4 Departamento de Biología Celular, Facultad de Ciencias, Universidad Nacional Autónoma de México, México 5 Institute of Geology, Universidad Nacional Autónoma de México, México 6 Laboratorio Nacional de Genómica para la Biodiversidad (LANGEBIO)/CINVESTAV, México 7 Institute of Functional Genomics for Abiotic Stress, Texas Tech University, USA 8 Instituto de Investigaciones Biomédicas, Universidad Nacional Autónoma de México, México 9 Departamento de Biología Evolutiva, Facultad de Ciencias, Universidad Nacional Autónoma de México, México
Lake Chalco is located in the center of the Trans-Mexican Volcanic Belt in Mexico, and it is considered one of the most ancient lakes in the transition zone, between the Nearctic and Neotropical regions. Lake Chalco has an extensive study using several proxies in the sedimentary records, despite never approaching sed(a)DNA. In this study, we explore the new proxy using DNA collected from the sediment records corresponding to the Holocene to describe the biodiversity and reconstructed the ancient landscapes by metagenomics analysis. Sequencing of these samples generated 1,421,823,631 total reads. The taxonomic annotation revealed 81% of Bacteria, 15% of Archaea, 3% of Eukarya and 1% of unclassified sequences. We described three zones related to the past environmental conditions of the lake, using traditional and metagenomics proxies. The early Holocene (>11,000 cal years BP) lake was characterized by cool, freshwater conditions, which later became warmer and hyposaline (11,000-6,000 cal years BP). In this hyposaline zone we found high abundances of cyanobacteria, and other Bacteria and Archaea, mainly anaerobes and extremophiles that are involved in the sulfur, nitrogen, and carbon cycles. Between the warmer and hyposaline conditions we observed a transition zone (around 6,000 cal years BP), associated with Ti/Fe and with a drastic landscape change. It is reported that the first human settlements in this area were around 6,000 years BP. From this zone to present days, we observed Bacteria and Archaea mainly aerobics and some pathogens of human, animals, and crops. We could not separate between ancient and modern microorganisms in our samples and is still one of our main questions to solve. However, cyanobacteria and reads annotated to other photosynthetic bacteria in the deepest layers probably suggest ancient DNA trapped in sediments. This study opens the opportunity to continue exploring the microorganism over time and its relationship with the past environment in the Neotropic region.
Session 4
Tool up or die - gaps and solutions in ancient metagenomics analysis
Which taxonomic classifier should I use?
Irina Velsko1
1 Max Planck Institute for the Science of Human History, Jena, Germany
Ancient metagenomics offers a unique opportunity to reconstruct the evolutionary trajectories of microbial communities through time, as well as taxonomic profiles of total DNA content of a variety of historic and ancient samples. Ancient DNA damage patterns could potentially affect taxonomic classification and our ability to accurately reconstruct community assemblages. Based on an assessment of 5 popular metagenomic taxonomic classification programs, damage patterns had minimal impact on the taxonomic profiles produced by each program, while false-positive rates and biases were intrinsic to each program. The addition of ancient DNA damage resulted in minimal differences in species detection and relative abundance between simulated ancient and modern data sets for most programs. Therefore, the most appropriate classification program is one that minimizes the biases related to the questions being addressed. Overall, taxonomic profiling biases are program specific rather than damage dependent, and the choice of taxonomic classification program should be tailored to specific research questions.
Computing lowest common ancestors on SAM files with sam2lca
Maxime Borry1
1 Max Planck Institute for Evolutionary Anthropology, Leipzig, Germany
Many programs producing DNA single-reference sequence alignments exist, however, there are a lot fewer tools available to perform sequence alignments in the context of metagenomics, where each query sequence can potentially align to multiple references. While the seeding and alignment tasks don’t necessarily differ in a single-reference alignment compared to alignment of metagenomic data, there is, however, one extra step necessary when performing sequence alignment with multiple references. Because a single query sequence can align to multiple reference sequences, it is necessary to assign the query sequence to the common ancestor of all reference sequences it aligned to, with an algorithm known as lowest common ancestor (LCA). While there are many alignment-free metagenomics classifiers available that can perform LCA, there is still a need for a program performing LCA from a standard SAM alignment file. Here, we introduce sam2lca, a program that performs a LCA from a SAM/BAM/CRAM file. Because sam2lca uses the standard SAM alignment file format as input, it is easy to use as an addon to any single-reference alignment program to perform metagenomics alignments, and easy to combine with damage estimation programs for aDNA metagenomics data. Furthermore, its command line, and Python interface make it easy to integrate in a reproducible computational pipeline.
High-throughput and scalable ancient metagenomic analysis using nf-core/eager
James A. Fellows Yates1,2,3
1 LMU München, München, Germany 2 Max Planck Institute for the Science of Human History, Jena, Germany 3 Max Planck Institute for Evolutionary Anthropology, Leipzig, Germany
As ancient metagenomics becomes more popular and techniques advance, the amount of data that needs to be processed will get larger and larger. Efficiently processing such data is not only an issue for the primary researcher, but also for reproducible science - allowing other researchers to check the results of previous publications as well as incorporate this data into their own research. nf-core/eager is a user-friendly and flexible dedicated ancient DNA (meta)genomics screening pipeline. It follows current established ‘best practices’ for processing aDNA data and employs Nextflow to allow portability across different computing infrastructures (from laptops to HPC clusters) facilitating reproducible analyses. In this talk I will demonstrate the metagenomic and microbial genomic capability of the pipelines. I will show the different functionalities of the pipeline that allow for taxonomic screening and authentication of both multi-species communities such as microbiomes, and also modules that are particularly relevant for both phylogenetic and functional downstream genomics analyses of microbes such as pathogens.
Phylogenomics of ancient and modern pathogens with AMPHY
Frédéric Lemoine1, Nicolás Rascovan1
1 Institut Pasteur, Paris, France
Whole genome phylogenetics has become a common tool in ancient and modern pathogen genomics. Although different labs established somehow converging methodologies to perform this type of analysis, open-source automated pipelines are currently missing. To fill this gap, we developed AMPHY (Ancient and Modern PHYlogenomics), a Nextflow (nextflow.io) workflow that performs phylogenetic analyses of ancient and modern genomes from raw sequencing data. AMPHY can use local fastq files or automatically download raw sequencing data from public databases (e.g., ENA, SRA) and already assembled genomes (e.g, NCBI RefSeq), simulating reads from the latter to standardize input data. It then performs the classic steps from raw data to variant calling, including adapter removal and quality trimming, mapping into a reference genome, merging of bam files from the same sample, duplicates removal, ancient DNA damage correction and variant calling. AMPHY will then place single nucleotide variants (SNV) into the reference genome backbone, retrieve coding sequences, concatenate core genes, and build Maximum-Likelihood phylogenetic trees. Importantly, AMPHY is scalable to hundreds of genomes, as it includes optimized solutions depending on the number of genomes being analyzed, including tailored model selection and software parameters. AMPHY is currently under development, and new planned features include phylogenetic placement of very low-covered genomes, automated phylogenetically-guided pathogen detection and phylogenetic-driven visualization of SNV. We will illustrate AMPHY functionalities on a dataset made of ancient and modern strains of Yersinia pestis, that retrace the history of plague epidemics.