6  Introduction to the Command Line

Author

Thiseas C. Lamnidis, Aida Andrades Valtueña

For this chapter’s exercises, if not already performed, you will need to download the chapter’s dataset, decompress the archive, and create and activate the conda environment.

Do this, use wget or right click and save to download this Zenodo archive: 10.5281/zenodo.13759270, and unpack.

tar xvf bare-bones-bash.tar.gz 
cd bare-bones-bash/

You can then create the subsequently activate environment with

conda env create -f bare-bones-bash.yml
conda activate bare-bones-bash

6.1 Preamble

This chapter is a condensed version of both ‘Basic’ and ‘Boosted’ Bare Bones Bash courses that can be found in the Bare Bones Bash website (https://barebonesbash.github.io/). The material is displayed here under a CC-BY-SA 4.0 license.

The original Bare Bones Bash material was created by Aida Andrades Valtueña, James Fellows Yates, and Thiseas C. Lamnidis.

Cartoon portrait of Aida Cartoon portrait of James Cartoon portrait of Thiseas

Figure 6.2: Cartoony pictures of the three authors of the BareBonesBash course. Designed by Zandra Fagernäs.

Below is a quick reference guide to the commands discussed in this tutorial (Table 6.1). To understand actually what each command does, carry on reading below! For a complete run of all these commands AND MORE(!!), consider following the full Bare Bones Bash walkthroughs here.

Table 6.1: Table with a quick summary of all the commands in this tutorial, some common arguments for them, and a short example command
command description example common flags or arguments
pwd print working directory pwd
ls list contents of directory ls -l (long info)
mkdir make directory mkdir pen
cd change directory cd ~/pen ~ (home dir), - (previous dir)
ssh log into a remote server ssh @.com -Y (allows graphical windows)
mv move something to a new location (& rename if needed) mv pen pineapple
rmdir remove a directory rmdir pineapple
wget download something from an URL wget www.pineapple.com/pen.txt -i (use input file)
cat print contents of a file to screen cat pen.txt
gzip a tool for dealing with gzip files gzip pen.txt -l (show info)
zcat print contents of a gzipped file to screen zcat pen.txt.gz
whatis get a short description of a program whatis zcat
man print the man(ual) page of a command man zcat
head print first X number of lines of a file to screen head -n 20 pineapple.txt -n (number of lines to show)
| pipe, a way to pass output of one command to another cat pineapple.txt | head
tail print last X number of lines of a file to screen tail -n 20 pineapple.txt -n (number of lines to show)
less print file to screen, but allow scrolling less pineapple.txt
wc tool to count words, lines or bytes of a files wc -l pineapple.txt -l (number of lines not words)
grep print to screen lines in a file matching a pattern grep pineapple.txt | grep pen
ln make a (sym)link between a file and a new location ln -s pineapple.txt pineapple_pen.txt -s (make symbolic link)
nano user-friendly terminal-based text editor nano pineapple_pen.txt
rm more general ‘remove’ command, including files rm pineapple_pen.txt -r (to remove directories)
$VAR Dollar sign + text indicates the name of a variable $PPAP
echo prints string to screen echo “$PPAP”
for begins ‘for’ loop, requires ‘in’, ‘do’ and ‘done’ for p in apple pineapple; do
echo “$p$PPAP”; done
applePen pineapplePen
find search for files or directories find -name ‘pen’ -type f (search only for files)
-name ’*JPG’ (search for file names matching the pattern)
“$var” use double quotes to use contents of variable pen=apple && echo “$pen”
<
>
2>
redirects the standard input/output/error stream respectively into a file cat <file.txt >file_copy.txt 2>cat_file.err

6.2 Introduction

The aim of this tutorial is to make you familiar with using bash everyday… for the rest of your life! More specifically, we want to do this in the context of bioinformatics. We will start with how to navigate around a filesystem in the terminal, download sequencing files, and then to manipulate these. Within these sections we will also show you simple tips and tricks to make your life generally easier.

This tutorial is designed so you follow along on any machine with a UNIX terminal (no warranty provided).

In this tutorial, you will learn:

  • What a command prompt is
  • How to navigate around the filesystem via the command line
  • How to view the contents of a file
  • How to remove files and directories
  • What a datastream is, and how they can be redirected
  • How to chain commands together
  • What a variable is, how to assign them and how to unpack them
  • How to construct a simple for loop
  • How to google more efficiently

6.3 The 5 commandments of Bare Bones Bash

The Bare Bones Bash philosophy of learning to code follows five simple commandments:

Table 6.2: The five commandments of BareBonesBash
1) Be lazy! Desire for shortcuts motivates you to explore more!
2) Google The Hive-Mind knows everything! 99% of the time, someone else has already had the same issue.
3) Document everything you do! Make future you happy!
4) There will ALWAYS be a typo! Don’t get disheartened, even best programmers make mistakes!
5) Don’t be afraid of you freedom! Explore! Try out things!
Pro Tip

Remember: No one writes code that works first time, or without looking at StackOverflow sooner or later.

6.4 What is a terminal?

A terminal is simply a fancy window that allows us to access the command-line interface of a computer or server (Figure 6.3).

The command-line itself is how we can work on the computer with just text.

bash (bourne again shell) is one of the most popular languages used in the terminal.

6.5 Understanding the command prompt

Figure 6.3: An example command prompt

After opening the terminal what we will normally see is a blank screen with a ‘command prompt’, like the one shown above. This typically consists of our username, the device name, a colon, a directory path and ends with a dollar symbol. Like so.

<username>@<device_name>:~$

The command prompt is never involved in any command, it is just there to ensure we know who and where we are. When copying a command we should NOT copy the command prompt.

Often times, when looking for commands online, the commands ran will be prefaced with a $. This is a stand-in for the command prompt. When adding multi-line commands, it is also common to preface the additional lines with a >. When copying such commands it is therefore important to remove these characters from the start of each line (if present).

Finally, in this tutorial, the symbols<> are used to show things that will/should be replaced by another value. For example, in Thiseas’ command prompt <username> will be replaced by lamnidis, as that is his username.

Now back to our prompt: It tells us that we are in the directory ~. The directory ~, stands for our home directory. Note that this shorthand will point to a different place, depending on the machine and the user.

If we want to know what the shorthand means, (here comes our first command!) we can type in pwd, which stands for “print working directory”.

Our “working directory” is whichever directory we are currently in.

pwd
/home/<YOUR_USERNAME>
Warning

In programming documentation, it is very common to use <ALL_CAPS> notation to indicate something that doesn’t exist, or will vary depending on the user.

Whenever you see such a notation (i.e., triangular brackets and all caps), you must always replace that whole section (including the <>) with whatever is on your system! You should never copy and paste this blindly!

This prints the entire “filepath” of the directory i.e. the route from the “root” (the deepest directory of the machine), through every subdirectory, leading to our particular working directory.

Note

If you’re not in your home directory (i.e., the output isn’t exactly the same as the output above), please run the following command.

cd $HOME

You’ll learn about this commands in a bit!

Question

Given the following command prompt: bare@bones_bash:~$, what does “bare” mean?

That’s right, “bare” is the username that is log into the machine.

6.6 Absolute vs Relative paths

Filepaths (a.k.a. “paths”), come in two flavours. Let’s talk a bit about them!

  • An absolute path will start with the deepest directory in the machine shown as a /. Paths starting with ~ are also absolute paths, since ~ translates to an absolute path of our specific home directory. That is the directory path we see in the output of the pwd command we just ran.
  • Alternatively a relative path always begins from our working directory (i.e. our current directory). Often this type of path will begin with one (./) or two (../) dots followed by a forward slash, but not always. In the syntax of relative pathways . means “the current directory” and .. means “the parent directory” (or the ‘one above’).

6.6.1 A real life analogy for paths

We have just arrived to Leipzig for a summer school that is taking place at MPI-EVA. After some questionable navigation, we find ourselves at the Bayerische Bahnhof. Tired and disheartened, we decide to ask for help.

We see a friendly-looking metalhead (Figure 6.4), and decide to ask them for directions!

Figure 6.4: A friendly-looking metalhead (actually James).

I’m Happy to help, but I only give directions in absolute paths!
From Leipzig Main Station, you should take Querstraße southward.
Continue straight and take Nürnberger Str. southward until you reach Str. des 18 Oktober.
Finally take Str. des 18 Oktober. moving southeast until you reach MPI-EVA!

Absolute paths

The directions above are equivalent to an absolute path, because they will ALWAYS take us to MPI-EVA, but we can only apply these directions if we start from Leipzig Main Station!

Examples of absolute paths:
/home/<PATH>/<TO>/
/Leipzig_Main_Station/Querstraße/Nürnberger_Str/Str_18_Oktober/Deutscher_Platz/MPI-EVA

Not sure how to get back to Leipzig Main Station to follow those directions, we decide to ask someone else for directions…

Lucky for us, a friendly looking local is passing by (Figure 6.5)!

Figure 6.5: A friendly-looking local (actually Aida).

You’re currently on Str. des 18 Oktober. Walk straight that way, past the tram tracks, and you will find Deutscher Platz. You will see MPI-EVA to your right!

Relative paths

These directions are equivalent to a relative path! They are easy to follow, but only work when we happen to start at the position we were in when we first got the directions!

Examples of relative paths:
./<PATH>/<TO>/my_file.txt
../Str_18_Oktober/Deutscher_Platz/MPI-EVA

Question

What would be the relative path to the file “bones.txt” if you are in the folder “test” given the following absolute path “/home/sweet/home/lets/test/your/knowledge/to/the/bones.txt”

The relative path from the “test” folder to the “bones.txt” is: “./your/knowledge/to/the/bones.txt”

6.7 Basic commands

We will now explore some basic commands that we will use to explore folders and interact with files:

  • list directory contents

    ls

Output should look like the following.

Desktop    Downloads  Pictures  Templates  bin    snap
Documents  Music      Public    Videos     cache  thinclient_drives
Note

We will use this format to show you commands and their corresponding output in the terminal (if any) for the rest of this chapter.

  • make a directory

    ## ⚠ this will be our working directory for the rest of this chapter! Do not move out of it unless asked to! ⚠
    mkdir barebonesbash
  • move (or rename) files and directories

    mv barebonesbash BareBonesBash
  • change directories

    cd BareBonesBash
  • Download (www get) a remote file to our computer

    wget git.io/Boosted-BBB-meta
  • copy a file or directory to a new location

    cp Boosted-BBB-meta Boosted-BBB-meta.tsv
  • remove (delete) files

    rm Boosted-BBB-meta
  • Concatenate file contents to screen

    cat Boosted-BBB-meta.tsv
  • See only the first/last 10 lines of a file

    head -n 10 Boosted-BBB-meta.tsv
    tail -n 10 Boosted-BBB-meta.tsv
    Note

    This is because the start of a cat is its head and the end of the cat is its tail (The great humour of computer scientists)

  • Look at the contents of a file interactively (less than the complete file, press q to quit)

    less Boosted-BBB-meta.tsv
  • word count the number of lines (-l) in a file

    wc -l Boosted-BBB-meta.tsv
    15 Boosted-BBB-meta.tsv
Question

Which command can you use to print the last 6 lines of the file “Boosted-BBB-meta.tsv”

For that you will need to use to use tail and modify the parameter given to the -n flag like the following.

tail -n 6 Boosted-BBB-meta.tsv

6.8 Datastreams, piping, and redirects

Each of the commands we learned above is a small program with a very specialised functionality. Programs come in many forms and can be written in various programming languages, but most of them share some features. Specifically, most programs take some data in and spit some data out! Here’s how that works, conceptually:

6.8.1 Datastreams

Computer programs can take in and spit out data from different streams (Figure 6.6). By default there are 3 such data streams.

  • stdin : the standard input
  • stdout: the standard output
  • stderr: the standard error
Figure 6.6: Diagram showing stdin going into the program, and two output streams from the program: stderr and stdout.
Pro Tip

Each programme also has an ‘exit code’, which can tell you if execution completed with/without errors. You will rarely see these in the wild.

Typically, the stdin is where the input data comes in.
The stdout is the actual output of the command. In some cases this gets printed to the screen, but most often this is the information that needs to be saved in an output file.
The stderr is the datastream where errors and warnings go. This gets printed to our terminal to let us know when something is not going according to plan (Figure 6.7)!

Figure 6.7: This program (in the form of a bash script executed in the terminal) takes no input, and prints one line to the stdout and one line to the stderr.

6.8.2 Piping

A “pipe” (|) is a really useful feature of bash that lets us chain together multiple commands! When two commands are chained together with a pipe, the stdout of the first command becomes the stdin of the second (Figure 6.8)! The stderr is still printed on our screen, so we can always know when things fail.

Figure 6.8: Diagram showing stdin going into the program, and two output streams from the program: stderr and stdout, with the stdout becoming the stdin of a second program.

Example

head -n 10 Boosted-BBB-meta.tsv | tail -n 1
netsukeJapan    C       Artwork

The above command will only show the 10th line of Boosted-BBB-meta.tsv. The way it works is that head will take the first 10 lines of the file. These lines are then passed on to tail which will keep only the last of those lines.

Question

How can we combine tail and head to print the 10th line starting from the end of the file “Boosted-BBB-meta.tsv”?

You will use the following command employing a | (pipe):

tail -n 10 Boosted-BBB-meta.tsv | head -n 1

6.8.3 Redirects

Much like streams of water in the real world, datastreams can be redirected.

This way we can save the stdout of a program (or even the stderr) into a file for later!

  • stdin can be redirected with <
    • An arrow pointing TO the program name!
  • stdout can be redirected with >
    • An arrow pointing AWAY the program name!
  • stderr can be redirected with 2>
    • Because it is the secondary output stream.
Pro Tip

It is also possible to combine streams, but we won’t get into that here.

Example:

head -n 10 Boosted-BBB-meta.tsv | tail -n1 > line10.txt

This will create a new file called line10.txt within our work directory. Using cat on this file will BLOW YOUR MIND - (GONE WRONG)!

(Don’t forget to like and subscribe!)

cat line10.txt
netsukeJapan    C       Artwork
Figure 6.9: Here we can see an example of redirecting the output of the datastreams_demo.sh program from before. Redirecting the stdout with > only prints the stderr to the screen, and saves the stdout into output.txt. Additionally, we can redirect the stderr with 2> into runtime.log, and then nothing is printed onto the screen.

6.9 Help text

We don’t always have to google for documentation! Many programs come with in-built help text, or access to online manuals right from our terminal!

  • We can get a one sentence summary of what a tool does with whatis

    whatis cat
    cat(1)  - concatenate files and print on the standard 
            output
  • While man gives us access to online manuals for each tool (exit with q)

    man cat

6.10 Variables

Variables are a central concept of all programming. In short, a variable is a named container whose contents we can expand at will or change.

We can assign variables (tell the computer what do we want it to contain) with = and pull their contents with $

The easiest way to see the contents of a variable is using echo!

echo "This is my home directory: $HOME"
This is my home directory: /home/ubuntu

$HOME is a variable of the type called environment variables, which are set the moment we open our terminal or log into a server, they ensure the system works as intended and should not be change unless we are very sure of why.

Environment Variables

Environment variables in bash are typically named in all capital letters. It is a good idea to avoid using only capital letters for your variable names, so you avoid accidentally overwriting any environment variables.

But as mentioned, we can store in a variable anything we want, so let’s see a few examples:

First, let’s try to store a number.

GreekFood=4            #Here, 'GreekFood' is a number.
echo "Greek food is $GreekFood people who want to know what heaven tastes like."
Greek food is 4 people who want to know what heaven tastes like.
Note

The # is used to add comments to your code. Comments are annotations that you write in your code to understand what it is doing but that the computer does not run. Very useful for when your future self or another person looks at your code

Now let’s store a word (“string”).

GreekFood=delicious   #We overwrite that number with a word (i.e. a 'string').
echo "Everyone says that Greek food is $GreekFood."
Everyone says that Greek food is delicious.

We can also store more than a single word (that is still a “string”).

GreekFood="Greek wine" #We can overwrite 'GreekFood' again, 
## but when there is a space in our string, we need quotations.
echo "The only thing better than Greek food is $GreekFood!"
The only thing better than Greek food is Greek wine!

Since variables can be reset to whatever we want, we can also store a number again.

GreekFood=7 #And, of course, we can overwrite with a number again too.
echo "I have been to Greece $GreekFood times already this year, for the food and wine!"
I have been to Greece 7 times already this year, for the food and wine!
Overwriting variables

In these examples we have seen how the same variable has been overwritten, this means that we can only access the last content that we stored in the variable. All the previous contents that a variable may have had are inaccessible as soon as the same variable is given a new value.

6.11 Quotes matter!

In bash, there is a big difference between a single quote ' and a double quote "!

  • The contents of single quotes, are passed on as they are
  • Inside double quotes, contents are interpreted!

In some cases the difference doesn’t matter.

echo "I like Greek Food"
echo 'I like Greek Food'
I like Greek Food
I like Greek Food

In other cases it makes all the difference!

Arr="Banana"
echo 'Pirates say $Arr'
echo "Minions say $Arr"
Pirates say $Arr
Minions say Banana

Why does it make a difference in the second example?

This is because in the second example we are using a variable. We have assigned Banana to the variable $Arr. As mentioned above, when single (') quotes are used the computer just prints what it receives without caring that $Arr is a variable.

In the echo with the double (") quotes we are telling the computer to extract the value from the variable $Arr and that is why we see the store value (Banana) in the printed output in the terminal.

Question

What is the correct notation to assign a value to a variable in bash?

  1. $arr=2
  2. arr$=2
  3. arr=2

The option C is the correct one. To assign a variable we need to put the chosen name follewed by = and finally the value (number, string) that we want to be stored. When we want to call the variable we need to add $ in from of the variable name.

6.12 Find

We can also ask our computer where we have put our files, in case we forgot. To do this we can use find! The find command has the following syntax.

## ⚠ Don't run! Fake example ⚠
find /<PATH>/<TO>/ -type f -name 'your_file.txt'
  • First part of the find command: the place to look from
    • E.g. . to indicate ‘here’
    • Could also use ~/
    • Could use absolute path e.g. /home/james/
  • Second part of the find command: what type of things to look for?
    • Use -type to define the filetype:
      • file
      • directory
  • Third part of the find command: what to look in?
    • Use -name to say ‘look in names of things’
  • Finally after -name we give the the ‘strings’ to search for
    • Use wildcards (*) for maximum laziness!

Now let’s put into practise what we have learnt about find.

For that we will download a messy folder from a collaborator, remember to check we are in the BareBonesBash folder!

wget git.io/Boosted-BBB-images -O Boosted-BBB.zip

We realise that this is a compressed file, and more precisely is a zip file (extension .zip). In order to access its content we will need to “unzip” it first. For that we can use the command unzip.

unzip Boosted-BBB.zip

We know that our collaborator has shared with us some pictures from animals that we need to use for our research, and according to our collaborator they are marked with JPG. We first try to check the contents of the directory to find them quickly.

ls Boosted-BBB
And        Digging       Friday     Leave      Only     Where    Young
Anybody    Everything    Getting    Looking    Ooh      With     Youre
Dancing    Feel          Having     Night      Watch    You

Wow, what a mess! How would we retrieve all the files? Thanks to our wonderful teachers we have learnt how to use find and can simply run the following.

find Boosted-BBB -type f -name '*JPG*' 
Boosted-BBB/Having/the/time/of/your/life/bubobubo.JPG.MP3.TXT
Boosted-BBB/With/a/bit/of/rock/music/exhibitRoyal.JPG.MP3.TXT
Boosted-BBB/Friday/night/and/the/lights/are/low/fanta.JPG.MP3.TXT
Boosted-BBB/Everything/is/fine/nomnom.JPG.MP3.TXT
Boosted-BBB/Getting/in/the/swing/giacomo.JPG.MP3.TXT
Boosted-BBB/Youre/in/the/mood/for/a/dance/snore.JPG.MP3.TXT
Boosted-BBB/Digging/the/dancing/queen/excited.JPG.MP3.TXT
Boosted-BBB/Anybody/could/be/that/guy/alopochenaegyptiacaArnhem.JPG.MP3.TXT
Boosted-BBB/And/when/you/get/the/chance/stretch.JPG.MP3.TXT
Boosted-BBB/Looking/out/for/angry.JPG.MP3.TXT
Boosted-BBB/Feel/the/beat/from/the/tambourine/oh/yeah/netsukeJapan.JPG.MP3.TXT
Boosted-BBB/Watch/that/scene/licorne.JPG.MP3.TXT
Boosted-BBB/You/can/weimanarer.JPG.MP3.TXT
Boosted-BBB/Night/is/young/and/the/musics/high/bydgoszczForest.JPG.MP3.TXT
Boosted-BBB/Ooh/see/that/girl/pompeii.JPG.MP3.TXT

After -name we have written '*JPG*', this tells to find to search for any file that contains JPG in any part of its name, indicated by the *. The * are what are known as wildcards. To learn more on how to use them, please refer to the more complete material for this tutorial https://barebonesbash.github.io/.

Now we have all the paths of the files that we will need!

Question

How should you write a find comand to search for all files ending with .TXT in the folder Boosted-BBB?

The correct find command will be the following

find Boosted-BBB -type f -name '*.TXT'

As mentioned the first part is the folder Boosted-BBB followed by the type which is file and finally the name of what we are looking for, in our case all the files ending with .TXT

6.13 For loops

Until now we have seen how to run single commands on a file. But, what about when we need to repeat a command multiple times on a list of things, for example a list of files?

To repeat an action (command) for a set of things (list, e.g. files) one needs to employ the concept of a loop. One of the most commonly used loops, is the for loop.

A for loop allows us to go through a list of things and perform some actions. Let’s see an example.

Variable=Yes
for i in Greece Spain Britain; do
  echo "Does $i have lovely food? $Variable"
done
Does Greece have lovely food? Yes
Does Spain have lovely food? Yes
Does Britain have lovely food? Yes

The for loop went through the list Greece Spain Britain and printed a statement with each item in the list. What happens if we change the order of the list to Britain Greece Spain?

Variable=Yes
for i in Britain Greece Spain; do
  echo "Does $i have lovely food? $Variable"
done
Does Britain have lovely food? Yes
Does Greece have lovely food? Yes
Does Spain have lovely food? Yes

We see that changing the order of the list will affect the output, this is because the for loop will go through the list in a sequential manner.

We can also add more elements to the list, and the for loop will continue until it reaches the end of the list.

Question

How many times will the following for loop run the command echo?

for i in Greece Spain Britain Italy Slovenia; Do
  echo "I would love to go to $i"
done

The for loop will run the echo command 5 times, once for every element on the list Greece Spain Britain Italy Slovenia.

6.14 How to Google like a pro

One of the most important skills we develop when coding and/or using the command line is how to phrase our questions so we can get relevant answers out of our search engine.

As Deep Thought put it in the Hitchhiker’s Guide to the Galaxy:

Only when you know the question will you know what the answer means.

Here are some quick tips to get you started:

  • ALWAYS include the name of the language in your query
  • BROADEN your question!
    • BAD: “How to set X to 4 in bash?”
    • GOOD: “How to set a variable to an integer in bash?”
  • When you are more familiar, use fancy programmer lingo to make google think you know what you are talking about
All the cool hackers say:
  • string and not text
  • float and not decimal

Note: some of these terms can be language specific.

Figure 6.10: How to cat the wrong way (i.e., don’t include the language in our Google search and get loads of cat pictures)
Figure 6.11: How to cat the BASH way (i.e., include the the language in our Google search and we get lots of terminal pictures! Much better, right? …right?)

6.15 (Optional) clean-up

It is extremely important to ALWAYS keep our directories clean from random clutter. This lowers the chances we will get lost in our directories, but also ensures we can stay lazy, since tab completion will not keep finding similarly named files. So let’s clean up our working directory by removing all the clutter we downloaded and worked with today. The command below will remove the /<PATH>/<TO>/BareBonesBash directory as well as all of its contents.

Pro Tip

Always be VERY careful when using rm -r. Check 3x that the path we are specifying is exactly what we want to delete and nothing more before pressing ENTER!

cd ~     ## We shouldn't delete a directory while we are still in it. (It is possible though).
rm -r /<PATH>/<TO>/BareBonesBash*

We can also get out of the conda environment with

conda deactivate

To delete the conda environment

conda remove --name bare-bones-bash --all -y

6.16 Summary

You should now know the basics of working on the command line, like:

  • What a command prompt is
  • How to navigate around the filesystem via the command line
  • How to view the contents of a file
  • How to remove files and directories
  • What a datastream is, and how they can be redirected
  • How to chain commands together
  • What a variable is, how to assign them and how to unpack them
  • How to construct a simple for loop
  • How to google more efficiently

If you would like to know more about the magic of bash, you can find more commands as well as and more advanced bash concepts in the BareBonesBash website (https://barebonesbash.github.io/).