Wednesday, March 23, 2016

Inv 2: Hardy-Weinberg

Abstract
Through the creation of our own Hardy-Weinberg Equilibrium equation, the lab group tried to better understand the process by which evolution occurs and use a program such as Microsoft Excel to model passing on of alleles for a certain gene in an extremely simple way. After establishing our usable model and testing it a few times, then more complex evolutionary factors were added, such as a heterozygous advantage, to better understand the process of evolution in the world. I made two hypotheses for my own personal Hardy-Weinberg model, that the Hardy-Weinberg model will work best with large populations because the larger the population the less wild variation can occur between two successive generations. The other hypothesis that I made was that in the case of a heterozygous advantage, the alleles of a gene will approach even percentages in the population (i.e. 50-50) because the heterozygous individuals have 50% of each allele and the heterozygous will survive to pass on their genes.

Procedure
To create the Hardy-Weinberg model, I used Microsoft Excel, a spreadsheet program that can also perform basic mathematical operations, which would be extremely useful for modeling evolution. I began by inputting the initial frequency of p (the dominant allele) and q (the recessive allele). It does not really matter what these two values are, because the operations will work regardless of the value. The p and q values both had to be between 0 and 1, and must add to equal 1. So the q value is 1-p and vice versa. Then I began creating my zygotes by randomly assigning an A (for dominant allele) or B (for recessive) by using the random number and the if functions in the program.

The random number function produces a random number between 0 and 1 in the cell it is input into. The if function checks the value of a predetermined cell and decides to show one of two outcomes. The if function can become more complicated by embedding more if functions in the original if function, allowing for a variety of outputs. In the first allele cell, I typed out =IF(RAND()<=D$2, "A","B"). This may seem like computer gibberish, but in fact this is the combination of the if and random functions. It says "if the random number generated is less than or equal to D2 (which is the allele frequency of A in this generation), then the out put is an A. If the random number is greater than D2, then the output is a B. Through this, I was able to craft a system by which I could randomly generate zygotes, and then add them up using a simple sum function, which is simply adding together the values of different cells on the spreadsheet. Here is a sample of the spreadsheet:



With simple copy and paste functions, I was able to extend the spreadsheet as far as I liked, allowing me to have an almost infant population with which to test hypotheses.

For however many individuals I had in my population, I had to add together all of the different genotypes that I produced (AA, AB, or BB) and then figure out the allele frequency for A or B in that generation so that I could use this frequency in the ensuing generation and then create more generations all linked by "common ancestors". To find the total A gametes, I added 2 times the number of AA individuals (because they have 2 A alleles) and one times the number of AB individuals (because they only have 1 A allele) and this gave me the total number of alleles in the population. I then divided the total A gametes by the total number of gametes in the population, or 2 times the population size because every individual has two gametes. Then I got a number somewhere between 0 and 1 (sound familiar?) and plugged it in for the next generation. A picture of how this looks is below.


By this method I obtained values for simple populations so that I could track allele frequencies and their changes due to randomness. There were no external variables such as natural selection or incomplete dominance accounted for in this simple spreadsheet.

After completing the simple stuff, I wanted a challenge so I moved on to something more difficult. I also modeled a heterozygous advantage situation in which the heterozygous individuals had a better survival rate than homozygous individuals for whatever reason. An example of this in real life is the gene for sickle cell anemia. If you are homozygous for this gene, you get either sickle cell anemia or susceptibility to malaria. If you are heterozygous, however, you do not have sickle cell anemia and you are also immune to malaria.

For this experiment, I would take the totals numbers of individuals of each of the three genotypes and then I would multiply them by a percentage that represented the survival rate. In my simple experiment I tested a survival rate of 75% for AA and BB and a survival rate of 100% for AB. Then the allele frequencies were calculated in much the same way, the only difference being that when I divided the total number of alleles by the population size, I divided by the surviving population size. This would make sure my numbers did not get messed up.

Results

First experiment:Changing population sizes
For the first experiment modeling different sizes of populations, I found that the Harvey Weinberg Equilibrium works much better with larger populations rather than small ones. This proved my hypothesis correct about larger populations being more stable. But how did the model show me this?

Below are the graphs for the 5 generations of an organism with only 10 individuals in the population. Also the original allele frequencies that I began with were 50% for both A and B.





As you can see, there is quite wild fluctuation in the number of individuals with each genotype. In fact, in the first population there are 0 of the AA genotype, but by generation 5, there are 3. These wild shifts in genotypic ratios showed me that a small sample size was not good. I thought then perhaps a medium sample size would be best, so I created a spreadsheet with 100 individuals per generation. I also started with the same allele frequencies at the beginning, 50% for both A and B. The graphs are given below.





Ass seen in these graphs, the wild fluctuations that were seen in the population size of 10 are gone. Though there are still some fluctuations, seen in the jump of BB individuals between generations 2 and 3, but they are much more mild than that of population of 10. I saw that increasing the population size led to less wild fluctuations, so I thought that I could increase the population size even further. I then increased the size of the population to 1000, started with equal amounts of allele A and allele B in the population, and created graphs for those five generations as well. These graphs are shown below.





As seen in the graphs above, the ratios of genotypes are much more concrete with a very large population. There is some slight fluctuation, but nothing major at all. Therefore, I concluded that the genotypic ratios proposed by Hardy-Weinberg work best in a large population because there is less of a chance for dramatic changes that are extremely possible in small populations. After this I moved onto my next experiment, that of the heterozygous advantage.

Second experiment: Heterozygous advantage
For this experiment, I created 3 sets of generations. The variable that I changed in this experiment was the ratio of A and B alleles in the starting population. I held my population size steady at 1000 and I gave the homozygous individuals a 75% chance of survival and the heterozygous individuals a survival rate of 100%. My first trial was done with allele frequencies of 50-50 to begin with. I thought this would be a good experiment because it could show that even though fluctuations may occur in the first few generations, there will be a trend of the frequencies approaching 50-50 again. Here is a graph of the allele frequencies over the 5 generations that I tested.


As I predicted, there was some initial fluctuation and the A allele, represented by the blue line, was less common than the B allele, over successive generations, I can see that the two lines are getting closer together. This means that the two frequencies are approaching more equal ratios than they began as. To test my hypothesis further, I changed the starting allele frequencies to 70% A and 30% B alleles. Here is a graph of those results.


The graph shows that the A allele began the experiment as having much more of a share of the gene pool's allele frequency, and the B as having less. Yet over every single generation, the A allele lost some of that share and the B allele gained some of that share. Therefore, the two alleles are once again approaching equality in representation in the gene pool. Finally, to really test my hypothesis, I started an experiment with the A allele being 90% of the entire gene pool, and the B only having a mere allele frequency of 10%. Here is a graph of that trial.


Once again, there is an obvious trend that shows the two allele frequencies are getting closer together. The A allele frequency, which started out so high, consistently decreased while the B frequency consistently increased. These trials proved my hypothesis was correct because eventually, all three trials will approach 50% allele frequency for both A and B. Because this is not an exact science and incorporates an element of randomness, the allele frequencies won't hit exactly 50-50 and stay there, but they will continues to fluctuate. However, the two frequencies will be quite close to one another in value and close to a 50-50 split.

Conclusion
I think that the entire experiment was a success. I successfully modeled the Hardy-Weinberg hypothesis and also proved both of my hypotheses correct. Larger populations will produce the best results for the Hardy-Weinberg equilibrium, and if there is a heterozygous advantage, no matter what the int ital allele frequencies are, the allele frequencies will approach 50-50. I think that the only part of the equation that I wish I had done was tested different evolutionary methods such as natural selection or a bottleneck effect. These would have been fun and interesting to do, but unfortunately other commitments have forced me to confine myself to only testing a single evolutionary method.