Skip to content
This repository was archived by the owner on Mar 2, 2021. It is now read-only.

Latest commit

 

History

History
137 lines (82 loc) · 5.42 KB

File metadata and controls

137 lines (82 loc) · 5.42 KB
title Bash_extras
author Radhika Khetani
date 2017-07-06
duration 30

Overview


Setting up some aliases

On your local machine do the following:

$ cd

$ ls -l

$ ll

ll should not work for you, but it works on my computer, why? It's because I have set up an alias for my bash environment, using the alias command, such that it knows that I want to actually do ls -l when I say ll. Let's set it up for your environment.

$ alias ll='ls -l'

$ ll

This alias is only going to be available to you while that Terminal window is open. If you wanted to use that alias all the time, what would you do?

You would add it to ~/.bashrc or ~/.bash_profile!

Let's open either ~/.bash_profile or ~/.bashrc files on your laptop (not on orchestra), and add a few commands to it.

alias ll='ls -l'

alias o2='ssh <your_ecommons_ID>@o2.hms.harvard.edu'

Now, open a new Terminal window, and try these out! You will still need to add your password, if you want to set up some "ssh keys" so that you don't have to enter your password you can find more information within the O2 documentation.

You can now create an alias to run an interactive session on O2!

alias interactive=`srun --pty -p interactive -t 0-12:00 --mem 1G /bin/bash`

# and/OR

alias interactive6='srun --pty -p interactive -t 0-12:00 -c 6 --mem 6G /bin/bash'

You can not try one of them out from the login node.

Similar to what we did above, you can put this (or a similar) command in the .bashrc or .bashprofile files so it is available when you log on next time.

Copying files to and from the cluster

scp

So far we have used FileZilla to copy files over form O2, but there are other way to do so using the command line interface. Similar to the cp command to copy there is a command that allows you to securely copy files between computers. The command is called scp and allows files to be copied to, from, or between different hosts. It uses ssh for data transfer and provides the same authentication and same level of security as ssh.

The first argument in the example below is the location on the remote server and the second argument is the destination on your local machine. You can also do this in the opposite direction by swapping the arguments.

$ scp username@transfer.rc.hms.harvard.edu:/path/to/file_on_O2 Path/to/directory/local_machine

rsync

rsync is used to copy or synchronize data between directories. It has many advantages over cp, scp etc. It works in a specific direction, i.e. from the first diretory to the second directory, similar to cp.

Between directories on the same machine

#DO NOT RUN
$ rsync -av ~/large_dataset/. /n/groups/dir/groupdata/

Between different machines

When copying over large datasets to or from a remote machine, rsync works similarly to scp.

#DO NOT RUN
$ rsync -av -e ssh testfile <your_ecommons_ID>@transfer.o2.hms.harvard.edu:~/large_files/

Please do not use Orchestra’s login servers for heavy I/O jobs like rsync or sftp. When transfering large files to and from O2, use their transfer server transfer.o2.hms.harvard.edu.

Salient Features of rsync

  • If the command (or transfer) is interrupted, you can start it again and it will restart from where it was interrupted.
  • Once a folder has been synced between 2 locations, the next time you run rsync it will only update and not copy everything over again.
  • It runs a check to ensure that every file it is "syncing" over is the exact same in both locations. This check is run using a version of "checksum".

You can run the checksum function yourself when transferring large datasets without rsync using one of the following commands (or similar): md5, md5sum.

Symbolic Links or "sym links"

Symbolic links are like shortcuts you may create on mac. Let's check out an example of a folder with lots of symlinks.

ls -l /n/app/bcbio/tools/bin/

Now, let's create a shortcut in our home directory for that folder with the scripts for session IV homework.

$ cd

$ ln -s /n/groups/hbctraining/ngs-data-analysis-longcourse/sessionIV_hmwk/ rnaseq_homework_NGScourse

$ ls -l

We recommend that you create something like this for your raw data so it does not accidentally get corrupted or overwritten.

Note: a “hard” link (just ln without the -s option) is very different. Always use “ln -s” unless you really know what you’re doing!

Additional topics

If you are interested in learning more about regular expressions (regex) and the tools awk and sed1, you can find more information in the "extra_bash_tools" lesson.


This lesson has been developed by members of the teaching team at the Harvard Chan Bioinformatics Core (HBC). These are open access materials distributed under the terms of the Creative Commons Attribution license (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.