# Lecture 6: Molecular evolution¶

Three features of evolutionary processes: descent, variation, and selection.

Basic principals of sequence evolution: new genes generated from old genes - point mutations, insertion/deletions.

Duplications, crossing over, horizontal gene transfer.

Not all variation is adaptive (suggested reading: The Spandrels of San Marco and the Panglossian Paradigm: A Critique of the Adaptationist Programme, by Stephen Jay Gould and Richard C. Lewontin)

## Coding: loops and lists¶

### Importing functions from other files¶

A useful topic to briefly cover here is how to import functions, objects, etc. from other python files. To do that you can use the `import` statement. Today we’re going to look at this to import from the `random` module and the `__future__` module, and model the idea of genetic drift.

Start a python terminal and enter the following line:

```from __future__ import division
```

This performs floating point division when starting with integers, so `1/2` will equal `0.5` (a floating point value) rather than `0` (an integer value).

Warning

It’s a good idea to always add `from __future__ import division` to the top of your python scripts. Python 3 will include true division, but until then not adding this import statement can lead to bugs. Note that in python’s default division values are truncated, not rounded so, for example, `11/4 == 2`, not `3` as you would expect if python were rounding.

### Genetic drift programming example¶

This example illustrates the concept of genetic drift in a population of genotypes. Imagine we start with four genotypes, `A`, `B`, `C`, and `D` and 10,000 individuals. First, let’s define a starting population with the following genotype frequencies:

```genotype_frequencies = ['A'] * 5000 + ['B'] * 2500 + ['C'] * 1250 + ['D'] * 1250
```

This syntax may be new to you. Break this apart to individual addends to figure out what the full statement is doing.

Next, let’s define a function that will conveniently allow us to summarize this population:

```def summarize_composition(population):
population_size = len(population)
print 'A: %0.4f' % (population.count('A') / population_size)
print 'B: %0.4f' % (population.count('B') / population_size)
print 'C: %0.4f' % (population.count('C') / population_size)
print 'D: %0.4f' % (population.count('D') / population_size)
```

Now run this function on the population that was just created:

```summarize_composition(genotype_frequencies)
```

You should get the following result – if not, you did something wrong so go back and figure out what it was.

```A: 0.5000
B: 0.2500
C: 0.1250
D: 0.1250
```

Next we’re going to import the `sample` function from the `random` module. Given a list (`population`) and a number of elements (`k`) to select, `sample` randomly samples (without replacement) `k` elements from the list and returns those as a new list. So, if we sample the full population and summarize the genotype composition, we should get the same result – let’s test it out:

```from random import sample
new_genotype_frequencies = sample(genotype_frequencies,10000)
summarize_composition(new_genotype_frequencies)
```

Now, let’s simulate genetic drift. Imagine we have a population of organisms with the genotype frequencies represented in our `genotype_frequencies` list. Regardless of which of these genotypes confers the most selective advantage a random removal of a large component from the population has the ability to affect the resulting genotypic composition.

Simulate an event that randomly kills off 10% of the total population, and look at the resulting genotype composition:

```new_genotype_frequencies = sample(genotype_frequencies,9000)
summarize_composition(new_genotype_frequencies)
```

Do this a few times. You should notice that the frequencies don’t change a lot. What happens if instead of this relatively small dying off, there is a near-extinction event. Simulate an event that randomly kills off 99.9% of the population. What happens now? Run this simulation several times and explain the results of this experiment.

### Working with loops: the for loop¶

In the next set of assignments you’ll be introduced to loops. These allow you to perform an operation many times using a slightly different input each time. To continue our sequencing processing script, let’s add a few new features that make use of a `for` loop. We’ll add a feature to our sequence processing script that allows a user to pass several sequences on the command line:

```from sys import argv
script_name, sequences = argv

sequences = sequences.split(',')

def reverse_complement(sequence):
return sequence.replace('A','t').replace('T','a').replace('G','c').replace('C','g').upper()[::-1]

# iterate over the sequences and print the reverse complement
# of each to the command line
for sequence in sequences:
print reverse_complement(sequence)
```