Before you Start
The summer school course that this textbook is derived from was designed to be as practical as possible. This means that most of the chapters are designed to act as a walkthrough to guide you through the steps on how to generate and analyse data for each of the major steps of an ancient metagenomics project.
The summer school utilised cloud computing to provide a consistent computing platform for all participants, however all tools and data demonstrated are open-source and publicly available. We describe here to approximately recreate the computing platform used during the summer schools.
Basic requirements
Bioinformatics often involve large computing resource requirements! While we aim to make example data and processing as efficient as possible, we cannot guarantee that they will all be able to work on standard laptops or desktop computing - most likely due to memory/RAM requirements. As a guide, the cloud nodes used during the summer school had 16 cores and 32 GB of RAM.
To following the practical chapters of this text book, you will require:
- A unix based operating system (e.g., Linux, MacOS, or possibly Windows with Linux Subsystem - however the latter has not be tested )
- A corresponding Unix terminal
- An internet connection
- A web browser
- A
conda
installation withbioconda
configured.- Conda is a very popular package manager for installing software in bioinformatics.
bioconda
is currently the most popular distribution source of bioinformatics software for conda.
- Conda is a very popular package manager for installing software in bioinformatics.
For each chapter, if it requires pre-prepared data, the top of the page will have a box called ‘Self guided: chapter envionment setup’ that link to a .tar
archive. This contain the raw data will be available to download for following the chapter Furthermore, the same box will describe how to use a conda .yml
file that specifies the software environment for that chapter will also be available for you to install.
See the rest of this page on how to install conda
(if not already available to you), and also how to create conda
software environments.
Software Environments
Before loading the environment for the exercises, the software environment will need to be created using the .yml
with the instructions below, and then activated. A list of the software in each chapter’s environment can be found in the Appendix.
If you’ve not yet installed conda
, please follow the instructions in the box below.
These instructions have been tested on Ubuntu 22.04, but should apply to most Linux operating systems. For OSX you may need to download a different file from here.
Change directory to somewhere suitable for installing a few gigabytes of software, e.g.
mkdir ~/bin/ && cd ~/bin/
Download miniconda
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
Run the install script
bash bash Miniconda3-latest-Linux-x86_64.sh
Review license
Agree to license
Make sure to install miniconda to the correct directory! e.g.
/home/<YOUR_USER>/bin/miniconda3
Yes to running
conda init
Copy the
conda config
commandClose the terminal (e.g. with
exit
or ctrl + d)Open the terminal again and run the command you copied (i.e.,
conda config --set auto_activate_base false
)Exit and open the terminal again
Type
conda --version
to check conda is installed and workingSet up bioconda
conda config --add channels bioconda conda config --add channels conda-forge
Once conda
is installed and bioconda
configured, at the beginning of each chapter, to create the conda
environment from the yml
file, you will need to carry out the following steps.
Download and unpack the
conda
env file the top of the chapter by right clicking on the link and pressing ‘save as’. Once uncompressed, change into the directory.Then you can run the following conda command to install the software into it’s dedicated environment
conda env create -f /<PATH/<TO>/<DOWNLOADED_FILE>.yml
You only have to run the environment creation once for each chapter/environment!
Follow the instructions as prompted. Once created, you can see a list of installed environments with
conda env list
To load the relevant environment, you can run
conda activate <NAME_OF_ENVIRONMENT>
Once finished with the chapter, you can deactivate the environment with
conda deactivate
To reuse the environment, just run step 4 and 5 as necessary.
To delete a conda software environment, run conda remove --name <NAME_OF_ENV> --all -y
If at any point you have issues of running out of space on your machine, you can first try running
conda clean --all
And you can answer ‘yes’ to all the prompts to remove unused packages and caches.
If you are still having issues, you can remove the environments that you don’t need any more with the following.
conda env list
To list all your existing conda environments, and then delete the directory of the unneeded environment with rm
.
rm -r /<path>/<to>/<conda_install>/envs/<environment_name>
Additional Software
For some chapters you may need the following software/and or data manually installed, which are not available on bioconda
:
Introduction to the command line
rename (if not already installed, e.g. on OSX)
sudo apt install rename
De novo assembly
-
conda create -n metawrap-env python=2.7 conda activate metawrap-env conda install -c bioconda biopython=1.68 bwa=0.7.17 maxbin2=2.2.7 metabat2 samtools=1.9 checkm-genome=1.0.12 cd /<path>/<to>/denovo-assembly git clone https://github.com/bxlab/metaWRAP.git ## don't forget to update path/to! export PATH=$PATH:/<path>/<to>/metaWRAP/bin
If you close your terminal halfway through the chapter, when opening the terminal again to continue the chapter, you MUST re-export the path to the metaWRAP bin directory to have access to the software!
Functional Profiling
HUMAnN3 UniRef database (where the functional providing conda environment is already activated - see the Functional Profiling chapter for more details)
humann3_databases --download uniref uniref90_ec_filtered_diamond /<path>/<to>/functional-profiling/humann3_db
Authentication
- pip
- metaDMG
Make a conda environment file called
metadmg.yml
containing the following.name: metaDMG channels: - conda-forge - bioconda dependencies: - conda-forge::python=3.9.15 - bioconda::htslib=1.17 - conda-forge::eigen=3.4.0 - conda-forge::cxx-compiler=1.5.2 - conda-forge::c-compiler=1.5.2 - conda-forge::gsl=2.7 - conda-forge::iminuit=2.17.0 - conda-forge::numpyro=0.10.1 - conda-forge::joblib=1.2.0 - conda-forge::numba=0.56.2 - conda-forge::flatbuffers=22.9.24 - conda-forge::psutil=5.9.4
Create the environment with
conda env create -f metadmg.yml
Change to the chapter’s directory
cd /<path>/<to>/authentication
Activate the new environment
conda activate metaDMG
Clone the latest version of metaDMG, and compile
git clone https://github.com/metaDMG-dev/metaDMG-cpp.git cd metaDMG-cpp make clean && make CPPFLAGS="-L${CONDA_PREFIX}/lib -I${CONDA_PREFIX}/include" HTSSRC=systemwide -j 8
Install some patches, and some additional modules
pip install git+https://github.com/metaDMG-dev/metaDMG-core #@stopiferrors_branch pip install metaDMG[viz]
Deactivate the conda environment
conda deactivate
Make the metaDMG executable available in the environment
export PATH="$PATH:/<path>/<to>/authentication/metaDMG-cpp/"
If you close your terminal halfway through the chapter, when opening the terminal again to continue the chapter, you MUST re-export the path to the metaDMG-cpp directory to have access to the software!
Phylogenomics
- Tempest (v1.5.3)
It is also recommended to assign the following
bash
variable so you can access the tool without the full pathcd /<path>/<to>/phylogenomics tar -xvf TempEst_v1.5.3.tgz cd TempEst_V1.5.3 export PATH="$PATH:/<path>/<to>/phylogenomics/TempEst_v1.5.3/bin/"
If you get an error like
Exception in thread "main" java.lang.UnsatisfiedLinkError: Can't load library: /usr/lib/jvm/java-11-openjdk-amd64/lib/libawt_xawt.so
, make sure you have Java installed e.g.sudo apt install openjdk-11-jdk
- MEGAX (v11.0.11)
If you close your terminal halfway through the chapter, when opening the terminal again to continue the chapter, you MUST re-export the path to the Tempest bin directory to have access to the software!
Ancient metagenomic pipelines
Docker (installation method will vary depending on your OS)
Linux-nerd install
sudo install -m 0755 -d /etc/apt/keyrings curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg sudo chmod a+r /etc/apt/keyrings/docker.gpg echo \ "deb [arch="$(dpkg --print-architecture)" signed-by=/etc/apt/keyrings/docker.gpg] https://download.docker.com/linux/ubuntu \ "$(. /etc/os-release && echo "$VERSION_CODENAME")" stable" | \ sudo tee /etc/apt/sources.list.d/docker.list > /dev/null sudo apt-get update sudo apt-get install docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin ## May need to do a reboot or something here sudo groupadd docker sudo usermod -aG docker $USER newgrp docker sudo reboot ## will kick you out, but it'll be back in a minute or two
aMeta (make sure you’ve already downloaded the data directory as per the chapter instructions)
cd /<path>/<to>/ancient-metagenomic-pipelines/ git clone https://github.com/NBISweden/aMeta cd aMeta ## We have to patch the environment to use an old version of Snakemake as aMeta is not compatible with the latest version sed -i 's/snakemake-minimal>=5.18/snakemake <=6.3.0/' workflow/envs/environment.yaml conda env create -f workflow/envs/environment.yaml