-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy pathintro_to_command_line_slides.qmd
More file actions
328 lines (180 loc) · 9.37 KB
/
intro_to_command_line_slides.qmd
File metadata and controls
328 lines (180 loc) · 9.37 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
---
title: "Command Line Workshop"
format: revealjs
smaller: true
scrollable: true
execute:
echo: true
output-location: fragment
---
## Welcome!
.png)
## Introductions
. . .
- Who am I?
. . .
- What is [DaSL](https://hutchdatascience.org/)?
. . .
- Who are you?
- Name, pronouns, group you work in
- What you want to get out of the workshop
## Goals of the workshop
. . .
- Understand the purposes of Command Line Interface vs. Graphical User Interface
. . .
- Be able to navigate a directory tree
. . .
- Be able to create, delete, copy, and move files
. . .
- Analyze the components of a shell command: what are the possible inputs, options, and outputs, and where to find documentation for help.
. . .
- Apply what we learned to a demo: bioinformatics tool
## Ways we can interact with a computer:
. . .
- Graphical User Interface (GUI): interaction between a pointer with windows and menus
. . .
- Command Line Interface (CLI): text-based interaction
. . .
When is one more effective than the other?
::: notes
Let's think about the ways we can interact with a computer: keyboard and mouse, hand gestures on a smartphone, voice commands, AR/VR, etc. Most of these interactions are related to a Graphical User Interface (GUI), which centers on the interaction between a pointer and colorful windows and menus.
However, the way a computer interprets and executes instructions are based on text commands. Even graphical information, such as where the mouse is when it clicked a button, is converted into numbers and characters. That means to be an effective programmer and data scientist, we also need to learn how to interact with our computers in a text-based way. This text-based interaction is called the Command Line Interface (CLI).
:::
## Common command line usage examples
. . .
- Scalable manipulation of text, files, and folders
. . .
- Use of programming languages and scientific software tools
. . .
- Use to high performance computing systems and the cloud
::: notes
- Scalable manipulation of text, files, and folders: if we want to move all files that have the words "tax returns" to a new folder, it would probably not scale easily via a mouse, but it could be done in one command in the Command Line.
- Use of programming languages and scientific software tools often require Command Line knowledge: running Python and R programs, using Git, alignment and variant calling bioinformatics software. Although there are nice graphical user interfaces such as RStudio, Juypter Notebooks, and Galaxy, to have full flexibility of these languages you need to control them from the Command Line.
- Use to high performance computing systems and the cloud all require Command Line knowledge as they do not typically offer GUIs.
:::
## Getting started
We will be using Posit Cloud, a online coding interface for R, but it has a built-in Command Line.
- Go to the project: <https://posit.cloud/spaces/622776/join?access_code=602ukbDJCZ0h366T6ktC3IDvbiWtZdmxKeSERdYq>
- Create an account or sign-in.
- Open "[Intro_to_Command_Line_practice](https://posit.cloud/spaces/622776/content/9888837)"
- Click on the "Terminal" tab.
## Interacting with the command line, a perspective
. . .
- Unlike a GUI, the CLI does not provide immediate options to you to interact with.
. . .
- Therefore, we need to keep a mental model of a task we want to complete.
. . .
- It is (mostly) forgiving, and encourages exploration and experimentation.
::: notes
Unlike a GUI, the CLI does not provide immediate options to you to interact with. We have to know a learn a handful of vocabulary to interact with it well. But besides the vocabulary, we need to keep a mental model of a task we want to complete. In GUIs that that mental model is shown to us visually, such as a file browser.
We organizes our seminar by constructing several mental models and learning relevant commands related to each model.
Lastly, the CLI is forgiving. It will tell you if you did something you did not intend to, with little consequences, which encourages exploration and experimentation. Consequential actions have security safeguards. With this mindset, we will explore the CLI openly.
:::
## Mental Model 1: Navigating a directory tree
. . .
On our computer, the **directory tree** organizes files and directories in an (upside down) tree-like structure. In each folder, there is a parental directory, and there can be files and directories within it.

## Basic directory tree navigation
. . .

. . .
- `pwd` prints out our current directory.
. . .
- `cd /` changes directory to the root directory.
. . .
- `cd /cloud/project` is where we started.
## Absolute vs. relative paths
{alt="Posit Cloud directory tree"}
Suppose you want to access the folder `input_files`.
. . .
- The **absolute directory path** specifies the directory from the root directory: `/cloud/project/input_files`
. . .
- The **relative directory path** is a path *relative to our current directory*: if we are currently at `/cloud/project`, the relative path is `input_files` .
. . .
- The symbol `..` specifies the parent directory. If we are currently at `/cloud/project/software`, the relative path is `../input_files`.
. . .
- `ls` lists all the files in the current directory.
## Exercise: explore the `project` folder
Use `cd` and `ls` to explore `/cloud/project`
. . .
Commands to look at text files:
- `cat [filename]` prints out the entire text file.
- `less [filename]` opens up the text file in a separate screen to explore. Press `q` to quit.
- `head [filename]` prints out the first few lines of the text file.
- `tail [filename]` prints out the last few lines of the text file.
. . .
Hit `Ctrl-C` to stop a running program.
## Mental Model 2: Treat commands as functions
A **function** is a tool that you can use over and over again:
- It can take in inputs
- Does something
- and can give an output.
. . .
The commands you have been using, `pwd`, `cd`, `ls`, and `cat` can be viewed as functions.
- It can take in **input arguments**, **options**
- Does something
- and can give an output
## `ls` example
. . .
A command's usage can have any of the following combination:
- Input argument
- `ls /cloud/project`
. . .
- Options
- `ls -l`
. . .
- Options with input argument
- `ls -l /cloud/project`
. . .
- Multiple input arguments
- `sh my_script.sh –i my_input.txt -o my_output.txt`
## What are the possible options and arguments for a command?
. . .
There is often `--help` or `-h` option that tells you how to use the software.
```
ls --help
```
. . .
Online resources: <https://explainshell.com/>
## Exercise: options for `ls`
In the `project` folder, try out a bunch of ways to list files and directories using various options of `ls`. Some questions to explore:
- Can you display all files? This would include files that start with `.`, which are considered "hidden".
- Can you sort by time (last modified)?
- Can you show the long format? What does it show?
- Can you print out the entire project directory tree by using a recursive option?
- Can you show the size of each file?
- How can you use multiple options at once?
## File manipulation
. . .
Here are some commands that allows you to create, move, copy, and delete files and folders. All of these commands have no return value.
- `cp [from] [to]` copies a file or folder from the `[from]` path to a `[to]` folder.
- `mv [from] [to]` moves a file or folder from the `[from]` path to a `[to]` folder.
- `mkdir [folderPath]` creates a new folder at the path specified by `[folderPath]`.
- `rm [path]` deletes a file at `[path]`. `rm -r [folder]` deletes a folder and its subcontents. Cannot be undone! 😱
## Demo
. . .
- `*` wildcard
## Running software in CLI
Putting it all together, let's run a (fake) software that aligns sequencing reads to the reference genome.
. . .
```
~/CommandLineDaSL/project$ cd software/
~/.../project/software$ python aligner.py --help
usage: aligner.py [-h] --reference REFERENCE --input INPUT
options:
-h, --help show this help message and exit
--reference REFERENCE, -r REFERENCE
Reference genome file
--input INPUT, -i INPUT
Input fastq file
```
You should use a fastq file from the `input_files` folder and the reference genome file from the `input_files/reference_genome` folder.
## Redirect and Pipes
The `python aligner.py` command seems pretty messy - it just dumps an aligned sequencing file in your command line console - what do you do with it?
. . .
- Redirect the output to a file: `python aligner.py --reference [reference genome fasta file] --input [unaligned sequences fastq file] > [output file]`
. . .
- Pipe the output into the next command: `python aligner.py --reference [reference genome fasta file] --input [unaligned sequences fastq file] | head`
## Accessing the Command Line on your computer
- Macs: `/Applications/Utilities/Terminal.app`
- Windows: Powershell or Windows Command Prompt gives a *different* flavor of the command line. To use the Unix command line, [install Windows Subsystem for Linux](https://learn.microsoft.com/en-us/windows/wsl/install).