Allele Based Inference on Evolution and Extinction ; A Genetic Drift Approach

In other to present a series of stochastic models from population dynamics capable of describing rudimentary aspects of genetic evolution, we studied two-allele Wright–Fisher and the Moran models for evolution of the relative frequencies of two alleles at a diploid locus under random genetic drift in a population of fixed size “simplest form, selection, and random mutation”. Principal results were presented in qualitative terms, illustrated by Monte Carlo simulations from R and http://www.radford.edu/~rsheehy/Gen_flash/popgen. Moran and the Wright-Fisher Models exhibited the same fixation probabilities, only that the Moran model runs twice as fast as the Wright-Fisher Model. A clue that can help us to understand this result is provided by the variance in reproductive success in the two models. Genetic changes due to drift were neither directional nor predictable in any deterministic way. Nonetheless, genetic drift led to evolutionary change in the absence of mutation (P=0.5), natural selection or gene flow. In general, alleles drift to fixation is significantly faster in smaller populations. Probability of fixation of an allele A was approximately equivalent to the initial frequency of that allele. With the inclusion of selection in our model, probability of fixation of a favoured allele due to natural selection increased with increase in fitness advantage and population size. The time taken to reach fixation is much slower, in case of no selective advantage, than a fixation under mutation with selective advantage. DOI: 10.14302/issn.2572-3030.jcgb-19-2597 Corresponding author: S. Oluwafemi Oyamakin, Department of Statistics, University of Ibadan, Nigeria, Phone no: +2348066266535, Email: fm_oyamakin@yahoo.com,


Introduction
This paper aim to presents a series of stochastic models from population dynamics capable of describing rudimentary aspects of genetic evolution [11,12,13,14,15,16,17]. With focus on the Wright-Fisher model and its variant [5,6,7,8,9,10], the Moran model [2]; we describe a population of individuals (genes) of different types (alleles) organized into a finite population and where the change in composition of the population is caused by pure genetic drift i.e. randomness with no underlying deterministic behaviour [19]. We demonstrate that stochastic computer simulation is an important method for comparing the evolutionary patterns [5,6] and processes associated with radically different intervals of time.

Methodology
The Wright-Fisher Model We consider a finite population of 2N genes (or alternatively -N diploid organisms) with each haploid possessing either allele A or allele a, which assumes random reproduction, and generations are not overlapping, Let x t be the number of offspring at time t, in the state space [5,6,7,8] S 2N = {0,1,…,2N} ……(1) Let the initial generations contain i genes of allele A and 2N -i genes of type a. Then we define a probability of choosing an A allele for the next generation (success) as: P = i /2N ….. (2) and the probability of choosing a non-allele A (failure) for each Bernoulli trial as: Where i is the initial frequency of allele A Then the transition probabilities from x t to x t+1 is determined by sampling with replacement of 2N independent Bernoulli trials such that x t+1 = j is a binomial random variable from the genes of Generation t. For any integer i, j: X 0 , …, X t -1 in the state space, we have P(x t+1 = j/X t = i x t+1 = x t-1 = x t-1 ,…, = X 0 = x 0 ) = P(x t+1 = j/X t = i) ………. (4) This implies that given the present, the future is conditionally independent on the past. This expression which characterizes the Markov chain in general is the key to analyze the Wright-Fisher model, computed according to the binomial distribution as P(x t+1 = j/X t = i =P ij = (2N/j) (i/2N) j (1 -i/2N) 2N-j P ij =(2N/j) p j q 2N-j ……… (5) We can use (5) to describe a "transition probability matrix" for the Wright-Fisher model, which gives the probability of going from any state i to any state j in one generation, [1].
We represent the initial state of the system And therefore the transition of the system is then given by the matrix equation; Each column sums to one because a population that starts with i copies of the allele must have some number between 0 and N copies in the next generation P t+1 = ρp t ⇒ ρ = p t+1 /p t …….. (11) Freely Available Online www.openaccesspub.org | JCGB CC-license DOI : 10 The elements p ij in (10) are called the one step transition probabilities. More generally the n-step transition probability matrix is given by p ij (n) = p(X n = j/X 0 = i ……. (12) By the Chapman Kolmogorov Equation

And by Extension, Markov Probability
The helpful part about writing (10) in matrix form is that it can be iterated using the rules of matrix multiplication The Moran Model This model due to [2], although less popular than the WF model amongst biologists, represents a mathematically attractive alternative. This model is also known as a birth-and-death model. let X be a random variable, representing the frequency of alleles of type A in the population, to replace individual X, we choose an individual at random from the population (including X itself) to be the parent of the new individual. Thus the model allows only one-step" transitions in which X either decreases or increases, but both transitions occur at the same rate, such that in population t + 1, the number of alleles A can be either (j = i -1), (j = i + 1), or j = i.
The system can go from i to i+1 if A is chosen to reproduce an offspring and a is chosen to die, expressed as; Similarly, if it is A that is chosen to die and a is chosen to reproduce, then the system can go from i to i-1, expressed below as; it takes either A to reproduce and die or a to reproduce and die, for the system to go from i to i, expressed as : (17) Where p = i/2N And therefore the transition probability for the implied Markov chain for the Moran model is given by; The probability of A to reach fixation is called the fixation probability. This holds true, for any neutral model of pure random drift (no mutation and selection) in an unstructured population, at that point, the population is composed of only A genes (X t = 2N) or a genes (X t = 0) . That is, with probability one, either of the absorbing states (either 0 or 2N) is eventually entered.
Thus, for 0 < j < 2N, The probability of extinction given that it started with i copies is; And the probability of fixation given that it started with i copies is; Note that with the martingale property (i.e. a random process without bias), the expectation at each time step is expected to be the same; This shows that the expected allele frequency is constant, [3] called this property the constancy of expectation, and nonetheless variability must be lost eventually through chance [4]. Let, Be the probability that A is eventually fixed in a population of size 2N that initially contains i copies of A. In an identical manner, we can also express the probability that A eventually becomes lost in the population (extinction at 2N).

Let,
Be the Probability of Extinction A similarity between the Moran model and the WF-model is that both models have the same fixation probabilities. The only difference is that the Moran model runs twice as fast as the WF-model, a result we will show in the next chapter.

The Monte Carlo Experiment
The Monte Carlo model [18] simulates genetic drift using a random number generator to sample genes from a small parental population and passes them on to offspring. Population size is assumed to be constant from generation to generation and gene frequency changes the result only from the random sampling process. 2N individuals will be simulated in the population, and in each generation each individual will reproduce randomly and independently [19]. This could store the results for each generation in a data frame and then allow one to plot them in a graph. • Record the number of fixed loci for A and for a as well as the number of loci which remain polymorphic.
• Approximate the number of generations until fixation and extinction for each population explored.
Simulations were repeated 20, 50 and 100 times for a total of 5 replicates for each population size.

Experiment 2: Fixation
To explore the number of generations it takes for one type to either fixes or go extinct, we will run unlinked loci simultaneously (collectively), each with an initial population size of 10, and simulations will be for 100 generations. To explore the fixation of an allele, by running this program for different population sizes, we will record the total number of loci fixed for allele A each time we run the simulation.  For all experiments, iterations were made at 20, 50 and 100 times by the same series of random numbers.   reproductive success has more variable, stochasticity (here, random genetic drift) plays a stronger role, and polymorphism will be lost by chance more rapidly.

Results and Discussions
For our genetic model, we can also describe figure 2 with a "transition probability matrix" for the Wright-Fisher model, which gives the probability of going from any state i to any state j in one generation. Because we could have anywhere from 0, 1, 2, to N copies of type A, this matrix has N +1 rows and columns, in our population of size five, the transition probability matrix is to j from i  Each rows sum to one because a population that starts with i copies of the allele must have some number between 0 and 2N copies in the next generation: The first and last rows are particularly simple because there is no mutation; if nobody is type A (i = 0; first column) or if everybody is type A (i = N; last column), then no further changes are possible.
The matrix p ij can be iterated using the rules of matrix multiplication. P 2 tells us the probability that there were j copies at time t+2 given that there was i copies at time t. In general, P t tells us the probability that there were j copies at time t given that there was i copies at time 0. For example, calculating P 100 using equation (using a mathematical software package) gives (The zeros in the middle of this matrix are not exactly zero, but they are less than 10 _156 ).
The initial state of the system is represented using a vector, since our population initially had two copies of the allele, then P(X (0) =2) =1 and all other entries in this vector are zero.
Multiplying P 100 on the right by this initial vector, gives; The vector on the right indicated that there is approximately 50% chance that type A will be lost (j =0) after 100 generations and a 50% chance that type A will be fixed (j=5).
These results suggested that if we start with i copies of type A, then type A will eventually be lost with    are selected for or against [20]. Over Dominance (Aa has the Highest Fitness) Over dominance = heterozygote most fit.
Surprising things happen when the heterozygote is most fit.
In     Table 6 When the heterozygote has the lowest fitness, the system is considered unstable, allele frequency will move until either A or a is fixed. Equilibrium occurs at 50% of each allele.
When P (A) is above equilibrium A will be fixed.
When P (A) is below equilibrium a will be fixed.
Example of the under dominance is the African butterfly pseudacraea eurytus, the orange and blue homozygotes each resemble a local toxic species, but the heterozygote resembles nothing in particular and is attractive to predators.
Although mutation is sometimes considered as the raw material of evolution, it is a very weak force in changing allele frequency. As shown in table 12 Table 7 In

Conclusions and Recommendation
The procedure of stochastic modeling is not, of course, restricted to statistics. In particular, stochastic models have played a pivotal role in understanding the dynamics of evolutionary systems and as such one sees many similarities in the approaches and methods that have been used to those employed by Statistician. This paper has been a review of the ideas and formalism used to model stochastic processes in fields that statistical physicists are not typically acquainted with, specifically population genetics, ecology and linguistics.
As a consequence, some parts of the discussion will seem familiar, other parts will not. We have tried, and we hope that we have succeeded, to explain the background ideas and motivation, since this will be the greatest obstacle to understanding among a readership of statistical physicists. On the other hand the degree of mathematical sophistication that has been assumed is greater than would be typical outside physics or mathematical biology.
In our discussions of the mathematical models, In genetics, ecology or language, just as in physics, reality cannot be described by ideal models; there will be a multitude of ways in which real systems deviate from the ideal models created by scientists when they first enter a field. One of the methods that have been devised by population geneticists to deal with this will be very familiar to physicists. This is to characterize a non-ideal system by a few parameters, which will hopefully, if chosen correctly, capture the essence of the system. It may be that a simple model can then be The aim of this paper is to give an account of useful analytical results in population genetics, together with their proofs.