6 Introduction to the Command Line
For this chapter’s exercises, if not already performed, you will need to download the chapter’s dataset, decompress the archive, and create and activate the conda environment.
Do this, use wget
or right click and save to download this Zenodo archive: 10.5281/zenodo.13759270, and unpack.
tar xvf bare-bones-bash.tar.gz
cd bare-bones-bash/
You can then create the subsequently activate environment with
conda env create -f bare-bones-bash.yml
conda activate bare-bones-bash
6.1 Preamble
This chapter is a condensed version of both ‘Basic’ and ‘Boosted’ Bare Bones Bash courses that can be found in the Bare Bones Bash website (https://barebonesbash.github.io/). The material is displayed here under a CC-BY-SA 4.0 license.
The original Bare Bones Bash material was created by Aida Andrades Valtueña, James Fellows Yates, and Thiseas C. Lamnidis.
Below is a quick reference guide to the commands discussed in this tutorial (Table 6.1). To understand actually what each command does, carry on reading below! For a complete run of all these commands AND MORE(!!), consider following the full Bare Bones Bash walkthroughs here.
command | description | example | common flags or arguments |
---|---|---|---|
pwd | print working directory | pwd | |
ls | list contents of directory | ls | -l (long info) |
mkdir | make directory | mkdir pen | |
cd | change directory | cd ~/pen | ~ (home dir), - (previous dir) |
ssh | log into a remote server | ssh |
-Y (allows graphical windows) |
mv | move something to a new location (& rename if needed) | mv pen pineapple | |
rmdir | remove a directory | rmdir pineapple | |
wget | download something from an URL | wget www.pineapple.com/pen.txt | -i (use input file) |
cat | print contents of a file to screen | cat pen.txt | |
gzip | a tool for dealing with gzip files | gzip pen.txt | -l (show info) |
zcat | print contents of a gzipped file to screen | zcat pen.txt.gz | |
whatis | get a short description of a program | whatis zcat | |
man | print the man(ual) page of a command | man zcat | |
head | print first X number of lines of a file to screen | head -n 20 pineapple.txt | -n (number of lines to show) |
| | pipe, a way to pass output of one command to another | cat pineapple.txt | head | |
tail | print last X number of lines of a file to screen | tail -n 20 pineapple.txt | -n (number of lines to show) |
less | print file to screen, but allow scrolling | less pineapple.txt | |
wc | tool to count words, lines or bytes of a files | wc -l pineapple.txt | -l (number of lines not words) |
grep | print to screen lines in a file matching a pattern | grep pineapple.txt | grep pen | |
ln | make a (sym)link between a file and a new location | ln -s pineapple.txt pineapple_pen.txt | -s (make symbolic link) |
nano | user-friendly terminal-based text editor | nano pineapple_pen.txt | |
rm | more general ‘remove’ command, including files | rm pineapple_pen.txt | -r (to remove directories) |
$VAR | Dollar sign + text indicates the name of a variable | $PPAP | |
echo | prints string to screen | echo “$PPAP” | |
for | begins ‘for’ loop, requires ‘in’, ‘do’ and ‘done’ | for p in apple pineapple; do echo “$p$PPAP”; done applePen pineapplePen |
|
find | search for files or directories | find -name ‘pen’ | -type f (search only for files) -name ’*JPG’ (search for file names matching the pattern) |
“$var” | use double quotes to use contents of variable | pen=apple && echo “$pen” | |
< > 2> |
redirects the standard input/output/error stream respectively into a file | cat <file.txt >file_copy.txt 2>cat_file.err |
6.2 Introduction
The aim of this tutorial is to make you familiar with using bash everyday… for the rest of your life! More specifically, we want to do this in the context of bioinformatics. We will start with how to navigate around a filesystem in the terminal, download sequencing files, and then to manipulate these. Within these sections we will also show you simple tips and tricks to make your life generally easier.
This tutorial is designed so you follow along on any machine with a UNIX terminal (no warranty provided).
In this tutorial, you will learn:
- What a command prompt is
- How to navigate around the filesystem via the command line
- How to view the contents of a file
- How to remove files and directories
- What a datastream is, and how they can be redirected
- How to chain commands together
- What a variable is, how to assign them and how to unpack them
- How to construct a simple for loop
- How to google more efficiently
6.3 The 5 commandments of Bare Bones Bash
The Bare Bones Bash philosophy of learning to code follows five simple commandments:
1) Be lazy! | Desire for shortcuts motivates you to explore more! |
2) |
99% of the time, someone else has already had the same issue. |
3) Document everything you do! | Make future you happy! |
4) There will ALWAYS be a typo! | Don’t get disheartened, even best programmers make mistakes! |
5) Don’t be afraid of you freedom! | Explore! Try out things! |
Remember: No one writes code that works first time, or without looking at StackOverflow sooner or later.
6.4 What is a terminal?
A terminal is simply a fancy window that allows us to access the command-line interface of a computer or server (Figure 6.3).
The command-line itself is how we can work on the computer with just text.
bash
(bourne again shell) is one of the most popular languages used in the terminal.
6.5 Understanding the command prompt
After opening the terminal what we will normally see is a blank screen with a ‘command prompt’, like the one shown above. This typically consists of our username, the device name, a colon, a directory path and ends with a dollar symbol. Like so.
<username>@<device_name>:~$
The command prompt is never involved in any command, it is just there to ensure we know who and where we are. When copying a command we should NOT copy the command prompt.
Often times, when looking for commands online, the commands ran will be prefaced with a $
. This is a stand-in for the command prompt. When adding multi-line commands, it is also common to preface the additional lines with a >
. When copying such commands it is therefore important to remove these characters from the start of each line (if present).
Finally, in this tutorial, the symbols<>
are used to show things that will/should be replaced by another value. For example, in Thiseas’ command prompt <username>
will be replaced by lamnidis
, as that is his username.
Now back to our prompt: It tells us that we are in the directory ~
. The directory ~
, stands for our home directory. Note that this shorthand will point to a different place, depending on the machine and the user.
If we want to know what the shorthand means, (here comes our first command!) we can type in pwd
, which stands for “print working directory”.
Our “working directory” is whichever directory we are currently in.
pwd
/home/<YOUR_USERNAME>
In programming documentation, it is very common to use <ALL_CAPS>
notation to indicate something that doesn’t exist, or will vary depending on the user.
Whenever you see such a notation (i.e., triangular brackets and all caps), you must always replace that whole section (including the <>
) with whatever is on your system! You should never copy and paste this blindly!
This prints the entire “filepath” of the directory i.e. the route from the “root” (the deepest directory of the machine), through every subdirectory, leading to our particular working directory.
If you’re not in your home directory (i.e., the output isn’t exactly the same as the output above), please run the following command.
cd $HOME
You’ll learn about this commands in a bit!
Given the following command prompt: bare@bones_bash:~$, what does “bare” mean?
That’s right, “bare” is the username that is log into the machine.
6.6 Absolute vs Relative paths
Filepaths (a.k.a. “paths”), come in two flavours. Let’s talk a bit about them!
- An absolute path will start with the deepest directory in the machine shown as a
/
. Paths starting with~
are also absolute paths, since~
translates to an absolute path of our specific home directory. That is the directory path we see in the output of thepwd
command we just ran. - Alternatively a relative path always begins from our working directory (i.e. our current directory). Often this type of path will begin with one (
./
) or two (../
) dots followed by a forward slash, but not always. In the syntax of relative pathways.
means “the current directory” and..
means “the parent directory” (or the ‘one above’).
6.6.1 A real life analogy for paths
We have just arrived to Leipzig for a summer school that is taking place at MPI-EVA. After some questionable navigation, we find ourselves at the Bayerische Bahnhof. Tired and disheartened, we decide to ask for help.
We see a friendly-looking metalhead (Figure 6.4), and decide to ask them for directions!
I’m Happy to help, but I only give directions in absolute paths!
From Leipzig Main Station, you should take Querstraße southward.
Continue straight and take Nürnberger Str. southward until you reach Str. des 18 Oktober.
Finally take Str. des 18 Oktober. moving southeast until you reach MPI-EVA!
The directions above are equivalent to an absolute path, because they will ALWAYS take us to MPI-EVA, but we can only apply these directions if we start from Leipzig Main Station!
Examples of absolute paths:
/home/<PATH>/<TO>/
/Leipzig_Main_Station/Querstraße/Nürnberger_Str/Str_18_Oktober/Deutscher_Platz/MPI-EVA
Not sure how to get back to Leipzig Main Station to follow those directions, we decide to ask someone else for directions…
Lucky for us, a friendly looking local is passing by (Figure 6.5)!
You’re currently on Str. des 18 Oktober. Walk straight that way, past the tram tracks, and you will find Deutscher Platz. You will see MPI-EVA to your right!
These directions are equivalent to a relative path! They are easy to follow, but only work when we happen to start at the position we were in when we first got the directions!
Examples of relative paths:
./<PATH>/<TO>/my_file.txt
../Str_18_Oktober/Deutscher_Platz/MPI-EVA
What would be the relative path to the file “bones.txt” if you are in the folder “test” given the following absolute path “/home/sweet/home/lets/test/your/knowledge/to/the/bones.txt”
The relative path from the “test” folder to the “bones.txt” is: “./your/knowledge/to/the/bones.txt”
6.7 Basic commands
We will now explore some basic commands that we will use to explore folders and interact with files:
list directory contents
ls
Output should look like the following.
Desktop Downloads Pictures Templates bin snap
Documents Music Public Videos cache thinclient_drives
We will use this format to show you commands and their corresponding output in the terminal (if any) for the rest of this chapter.
make a directory
## ⚠ this will be our working directory for the rest of this chapter! Do not move out of it unless asked to! ⚠ mkdir barebonesbash
move (or rename) files and directories
mv barebonesbash BareBonesBash
change directories
cd BareBonesBash
Download (www get) a remote file to our computer
wget git.io/Boosted-BBB-meta
copy a file or directory to a new location
cp Boosted-BBB-meta Boosted-BBB-meta.tsv
remove (delete) files
rm Boosted-BBB-meta
Concatenate file contents to screen
cat Boosted-BBB-meta.tsv
See only the first/last 10 lines of a file
head -n 10 Boosted-BBB-meta.tsv
tail -n 10 Boosted-BBB-meta.tsv
NoteThis is because the start of a cat is its head and the end of the cat is its tail (The great humour of computer scientists)
Look at the contents of a file interactively (less than the complete file, press
q
to quit)less Boosted-BBB-meta.tsv
word count the number of lines (-l) in a file
wc -l Boosted-BBB-meta.tsv
15 Boosted-BBB-meta.tsv
Which command can you use to print the last 6 lines of the file “Boosted-BBB-meta.tsv”
For that you will need to use to use tail and modify the parameter given to the -n flag like the following.
tail -n 6 Boosted-BBB-meta.tsv
6.8 Datastreams, piping, and redirects
Each of the commands we learned above is a small program with a very specialised functionality. Programs come in many forms and can be written in various programming languages, but most of them share some features. Specifically, most programs take some data in and spit some data out! Here’s how that works, conceptually:
6.8.1 Datastreams
Computer programs can take in and spit out data from different streams (Figure 6.6). By default there are 3 such data streams.
stdin
: the standard inputstdout
: the standard outputstderr
: the standard error
Each programme also has an ‘exit code’, which can tell you if execution completed with/without errors. You will rarely see these in the wild.
Typically, the stdin
is where the input data comes in.
The stdout
is the actual output of the command. In some cases this gets printed to the screen, but most often this is the information that needs to be saved in an output file.
The stderr
is the datastream where errors and warnings go. This gets printed to our terminal to let us know when something is not going according to plan (Figure 6.7)!
6.8.2 Piping
A “pipe” (|
) is a really useful feature of bash
that lets us chain together multiple commands! When two commands are chained together with a pipe, the stdout
of the first command becomes the stdin
of the second (Figure 6.8)! The stderr
is still printed on our screen, so we can always know when things fail.
Example
head -n 10 Boosted-BBB-meta.tsv | tail -n 1
netsukeJapan C Artwork
The above command will only show the 10th line of Boosted-BBB-meta.tsv
. The way it works is that head
will take the first 10 lines of the file. These lines are then passed on to tail
which will keep only the last of those lines.
How can we combine tail and head to print the 10th line starting from the end of the file “Boosted-BBB-meta.tsv”?
You will use the following command employing a | (pipe):
tail -n 10 Boosted-BBB-meta.tsv | head -n 1
6.8.3 Redirects
Much like streams of water in the real world, datastreams can be redirected.
This way we can save the stdout of a program (or even the stderr) into a file for later!
stdin
can be redirected with<
- An arrow pointing TO the program name!
stdout
can be redirected with>
- An arrow pointing AWAY the program name!
stderr
can be redirected with2>
- Because it is the secondary output stream.
It is also possible to combine streams, but we won’t get into that here.
Example:
head -n 10 Boosted-BBB-meta.tsv | tail -n1 > line10.txt
This will create a new file called line10.txt
within our work directory. Using cat
on this file will BLOW YOUR MIND - (GONE WRONG)!
(Don’t forget to like and subscribe!)
cat line10.txt
netsukeJapan C Artwork
6.9 Help text
We don’t always have to google for documentation! Many programs come with in-built help text, or access to online manuals right from our terminal!
We can get a one sentence summary of what a tool does with
whatis
whatis cat
cat(1) - concatenate files and print on the standard output
While
man
gives us access to online manuals for each tool (exit withq
)man cat
6.10 Variables
Variables are a central concept of all programming. In short, a variable is a named container whose contents we can expand at will or change.
We can assign variables (tell the computer what do we want it to contain) with =
and pull their contents with $
The easiest way to see the contents of a variable is using echo
!
echo "This is my home directory: $HOME"
This is my home directory: /home/ubuntu
$HOME
is a variable of the type called environment variables, which are set the moment we open our terminal or log into a server, they ensure the system works as intended and should not be change unless we are very sure of why.
Environment variables in bash are typically named in all capital letters. It is a good idea to avoid using only capital letters for your variable names, so you avoid accidentally overwriting any environment variables.
But as mentioned, we can store in a variable anything we want, so let’s see a few examples:
First, let’s try to store a number.
GreekFood=4 #Here, 'GreekFood' is a number.
echo "Greek food is $GreekFood people who want to know what heaven tastes like."
Greek food is 4 people who want to know what heaven tastes like.
The #
is used to add comments to your code. Comments are annotations that you write in your code to understand what it is doing but that the computer does not run. Very useful for when your future self or another person looks at your code
Now let’s store a word (“string”).
GreekFood=delicious #We overwrite that number with a word (i.e. a 'string').
echo "Everyone says that Greek food is $GreekFood."
Everyone says that Greek food is delicious.
We can also store more than a single word (that is still a “string”).
GreekFood="Greek wine" #We can overwrite 'GreekFood' again,
## but when there is a space in our string, we need quotations.
echo "The only thing better than Greek food is $GreekFood!"
The only thing better than Greek food is Greek wine!
Since variables can be reset to whatever we want, we can also store a number again.
GreekFood=7 #And, of course, we can overwrite with a number again too.
echo "I have been to Greece $GreekFood times already this year, for the food and wine!"
I have been to Greece 7 times already this year, for the food and wine!
In these examples we have seen how the same variable has been overwritten, this means that we can only access the last content that we stored in the variable. All the previous contents that a variable may have had are inaccessible as soon as the same variable is given a new value.
6.11 Quotes matter!
In bash, there is a big difference between a single quote '
and a double quote "
!
- The contents of single quotes, are passed on as they are
- Inside double quotes, contents are interpreted!
In some cases the difference doesn’t matter.
echo "I like Greek Food"
echo 'I like Greek Food'
I like Greek Food
I like Greek Food
In other cases it makes all the difference!
Arr="Banana"
echo 'Pirates say $Arr'
echo "Minions say $Arr"
Pirates say $Arr
Minions say Banana
Why does it make a difference in the second example?
This is because in the second example we are using a variable. We have assigned Banana
to the variable $Arr
. As mentioned above, when single ('
) quotes are used the computer just prints what it receives without caring that $Arr
is a variable.
In the echo
with the double ("
) quotes we are telling the computer to extract the value from the variable $Arr
and that is why we see the store value (Banana
) in the printed output in the terminal.
What is the correct notation to assign a value to a variable in bash?
$arr=2
arr$=2
arr=2
The option C is the correct one. To assign a variable we need to put the chosen name follewed by =
and finally the value (number, string) that we want to be stored. When we want to call the variable we need to add $
in from of the variable name.
6.12 Find
We can also ask our computer where we have put our files, in case we forgot. To do this we can use find
! The find
command has the following syntax.
## ⚠ Don't run! Fake example ⚠
find /<PATH>/<TO>/ -type f -name 'your_file.txt'
- First part of the
find
command: the place to look from- E.g.
.
to indicate ‘here’ - Could also use
~/
- Could use absolute path e.g.
/home/james/
- E.g.
- Second part of the
find
command: what type of things to look for?- Use
-type
to define the filetype:- file
- directory
- Use
- Third part of the
find
command: what to look in?- Use
-name
to say ‘look in names of things’
- Use
- Finally after
-name
we give the the ‘strings’ to search for- Use wildcards (
*
) for maximum laziness!
- Use wildcards (
Now let’s put into practise what we have learnt about find
.
For that we will download a messy folder from a collaborator, remember to check we are in the BareBonesBash folder!
wget git.io/Boosted-BBB-images -O Boosted-BBB.zip
We realise that this is a compressed file, and more precisely is a zip file (extension .zip
). In order to access its content we will need to “unzip” it first. For that we can use the command unzip.
unzip Boosted-BBB.zip
We know that our collaborator has shared with us some pictures from animals that we need to use for our research, and according to our collaborator they are marked with JPG
. We first try to check the contents of the directory to find them quickly.
ls Boosted-BBB
And Digging Friday Leave Only Where Young
Anybody Everything Getting Looking Ooh With Youre
Dancing Feel Having Night Watch You
Wow, what a mess! How would we retrieve all the files? Thanks to our wonderful teachers we have learnt how to use find and can simply run the following.
find Boosted-BBB -type f -name '*JPG*'
Boosted-BBB/Having/the/time/of/your/life/bubobubo.JPG.MP3.TXT
Boosted-BBB/With/a/bit/of/rock/music/exhibitRoyal.JPG.MP3.TXT
Boosted-BBB/Friday/night/and/the/lights/are/low/fanta.JPG.MP3.TXT
Boosted-BBB/Everything/is/fine/nomnom.JPG.MP3.TXT
Boosted-BBB/Getting/in/the/swing/giacomo.JPG.MP3.TXT
Boosted-BBB/Youre/in/the/mood/for/a/dance/snore.JPG.MP3.TXT
Boosted-BBB/Digging/the/dancing/queen/excited.JPG.MP3.TXT
Boosted-BBB/Anybody/could/be/that/guy/alopochenaegyptiacaArnhem.JPG.MP3.TXT
Boosted-BBB/And/when/you/get/the/chance/stretch.JPG.MP3.TXT
Boosted-BBB/Looking/out/for/angry.JPG.MP3.TXT
Boosted-BBB/Feel/the/beat/from/the/tambourine/oh/yeah/netsukeJapan.JPG.MP3.TXT
Boosted-BBB/Watch/that/scene/licorne.JPG.MP3.TXT
Boosted-BBB/You/can/weimanarer.JPG.MP3.TXT
Boosted-BBB/Night/is/young/and/the/musics/high/bydgoszczForest.JPG.MP3.TXT
Boosted-BBB/Ooh/see/that/girl/pompeii.JPG.MP3.TXT
After -name
we have written '*JPG*'
, this tells to find
to search for any file that contains JPG
in any part of its name, indicated by the *
. The *
are what are known as wildcards. To learn more on how to use them, please refer to the more complete material for this tutorial https://barebonesbash.github.io/.
Now we have all the paths of the files that we will need!
How should you write a find
comand to search for all files ending with .TXT in the folder Boosted-BBB?
The correct find command will be the following
find Boosted-BBB -type f -name '*.TXT'
As mentioned the first part is the folder Boosted-BBB
followed by the type which is file and finally the name of what we are looking for, in our case all the files ending with .TXT
6.13 For loops
Until now we have seen how to run single commands on a file. But, what about when we need to repeat a command multiple times on a list of things, for example a list of files?
To repeat an action (command) for a set of things (list, e.g. files) one needs to employ the concept of a loop. One of the most commonly used loops, is the for loop.
A for loop allows us to go through a list of things and perform some actions. Let’s see an example.
Variable=Yes
for i in Greece Spain Britain; do
echo "Does $i have lovely food? $Variable"
done
Does Greece have lovely food? Yes
Does Spain have lovely food? Yes
Does Britain have lovely food? Yes
The for loop went through the list Greece Spain Britain
and printed a statement with each item in the list. What happens if we change the order of the list to Britain Greece Spain
?
Variable=Yes
for i in Britain Greece Spain; do
echo "Does $i have lovely food? $Variable"
done
Does Britain have lovely food? Yes
Does Greece have lovely food? Yes
Does Spain have lovely food? Yes
We see that changing the order of the list will affect the output, this is because the for loop will go through the list in a sequential manner.
We can also add more elements to the list, and the for loop will continue until it reaches the end of the list.
How many times will the following for loop run the command echo
?
for i in Greece Spain Britain Italy Slovenia; Do
echo "I would love to go to $i"
done
The for loop will run the echo
command 5 times, once for every element on the list Greece Spain Britain Italy Slovenia
.
6.14 How to Google like a pro
One of the most important skills we develop when coding and/or using the command line is how to phrase our questions so we can get relevant answers out of our search engine.
As Deep Thought put it in the Hitchhiker’s Guide to the Galaxy:
Only when you know the question will you know what the answer means.
Here are some quick tips to get you started:
- ALWAYS include the name of the language in your query
- BAD: “How to cat” (Figure 6.10)
- GOOD: “How to cat bash” (Figure 6.11)
- BROADEN your question!
- BAD: “How to set X to 4 in bash?”
- GOOD: “How to set a variable to an integer in bash?”
- When you are more familiar, use fancy programmer lingo to make google think you know what you are talking about
string
and nottext
float
and notdecimal
Note: some of these terms can be language specific.
6.15 (Optional) clean-up
It is extremely important to ALWAYS keep our directories clean from random clutter. This lowers the chances we will get lost in our directories, but also ensures we can stay lazy, since tab completion will not keep finding similarly named files. So let’s clean up our working directory by removing all the clutter we downloaded and worked with today. The command below will remove the /<PATH>/<TO>/BareBonesBash
directory as well as all of its contents.
Always be VERY careful when using rm -r
. Check 3x that the path we are specifying is exactly what we want to delete and nothing more before pressing ENTER!
cd ~ ## We shouldn't delete a directory while we are still in it. (It is possible though).
rm -r /<PATH>/<TO>/BareBonesBash*
We can also get out of the conda
environment with
conda deactivate
To delete the conda environment
conda remove --name bare-bones-bash --all -y
6.16 Summary
You should now know the basics of working on the command line, like:
- What a command prompt is
- How to navigate around the filesystem via the command line
- How to view the contents of a file
- How to remove files and directories
- What a datastream is, and how they can be redirected
- How to chain commands together
- What a variable is, how to assign them and how to unpack them
- How to construct a simple for loop
- How to google more efficiently
If you would like to know more about the magic of bash, you can find more commands as well as and more advanced bash concepts in the BareBonesBash website (https://barebonesbash.github.io/).