For biology students, I think learning a programming language is not a waste of time. However, there is many, many different languages [Wiki] and you have to choose a language with a good balance between criteria like ease of learning, wealth of tutorials and examples, large user community,etc. Nowadays, IMHO, two programming languages are fulfilling these criteria:
- Python
- JavaScript and all the web technologies (HTML5 and CSS3)
Note*: If you need more sophisticated statistics functions, it is good to look at the "R" language. It is more complex but more powerful in this field.
Variables are "boxes" containing one value. This value may be a:
- Number (integer, floating-point numbers,etc.)
- Boolean (True of False)
- String (aka Text)
- List
- Dictionary
- Object
Variables names are case sensitive, cannot begin with a digit and you cannot use a reserved keyword used by Python (like for).
myVar = 3
myVar = myVar + 1 # ← 4
0test = 5 # ← ERRORif condition :
instruction_1
instruction_2
instruction_3
elif other_condition:
instruction_a
instruction_b
else:
instruction_01
instruction_02
instruction_03
instruction_04while exit_condition:
instruction_1
instruction_2
instruction_3For example, to display each character of the string 'python'
lang = 'python'
i = 0
while i < len(lang):
print(lang[i])
i = i + 1And the result is...
p
y
t
h
o
nThis loop requires a list and apply the block of instructions for each element with the following syntax:
for obj in a_list:
instruction_1
instruction_2
instruction_3Here is a basic example...
lang = ['p','y','t','h','o','n']
for char in lang:
print(char)This type of loop is very often used with the range(...) function. This very convenient function range(star, end, step) creates a list containing a series of numbers.
mySeries = range(0,10,2) # ← [0,2,4,6,8]
for i in mySeries:
print(i)
movie = 'starwarsreturnofthejedi.'
for i in range(1, 10):
poweroftwo = i ** 2
print(poweroftwo) # 1, 4, 9, 16, 25, ..., 100
def function_name(arg1,arg2,..., argn):
instruction_1
instruction_2
instruction_3
return resultFor example, if you want to get codons of a nucleic sequence ...
def getCodons(seq):
""" Extract codons from a nucleic sequence """
codons_list = []
for i in range(0,len(seq),3):
codon = seq[i:i+3]
codons_list.append(codon)
return codons_list
# Test
mySeq='actgctgtcgaaccg'
myCodons = getCodons(mySeq) # ← ['act', 'gct', 'gtc', 'gaa', 'ccg']Python Fiddle is a good starting environment to write your first scripts.
Using extensively Python Fiddle, it is not possible AFAIK to import BioPython. Moreover, the latter is a little bit too complex for biology students with no programming skills.
Seq class storing information (title and sequence data) from a FASTA sequence.
class Seq:
def __init__(self, seqdata):
_tmp = seqdata.split('\n')
self.description = _tmp[0][1:] if _tmp[0][0] == '>' else _tmp[0]
self.data = ''.join(_tmp[1:]).strip()
# Read title
self.author1 = 'None'
self.author2 = 'None'
self.copy = 0
self.db = 'None'
self.id = 'None'
self.db2 = 'None'
self.acc = 'None'
self.title = 'None'
# Try to read information within the description
sep = '|'
_tmp = self.description.split(sep)
self.db = _tmp[0]
if self.db == 'xzy':
#CrazyBio header: xzy|first author|copynumber|second author
self.author1 = _tmp[1]
self.author2 = _tmp[3]
self.copy = int( _tmp[2])
self.length = len(self.data)
elif self.db == 'gi':
# gi|numéro gi|gb|numéro d'accession|locus
self.id = _tmp[1]
self.db2 = _tmp[2]
self.acc = _tmp[3]
self.title = _tmp[4]
elif self.db == 'sp':
# sp|numéro d'accession|nom
self.db = _tmp[0]
self.acc = _tmp[1]
self.title = _tmp[2]
def show(self):
attrs = vars(self)
return ', '.join("%s: %s" % item for item in attrs.items())
def fasta(self):
return '>{:s}\n{:s}'.format(self.description,self.data)Usage
fasta = """>sp|P68871|HBB_HUMAN Hemoglobin subunit beta OS=Homo sapiens GN=HBB PE=1 SV=2
MVHLTPEEKSAVTALWGKVNVDEVGGEALGRLLVVYPWTQRFFESFGDLSTPDAVMGNPK
VKAHGKKVLGAFSDGLAHLDNLKGTFATLSELHCDKLHVDPENFRLLGNVLVCVLAHHFG
KEFTPPVQAAYQKVVAGVANALAHKYH"""
seq = Seq(fasta)
print seq.acc # ← P68871EMBOSS — European Molecular Biology Open Software Suite — is a European package containing various bioinformatics programs available as web-based or as local tools. The list of all the tools is available here and grouped by categories there and I created my own version combining both there