Modeling DNA Sequence Evolution (continued)

Among Site Rate Variation

Some sites within a sequence may be more likely to undergo change than are other sites

    In other words, the underlying rate of sequence evolution may vary

    Also called "site to site rate variation"

    For our discussion here, each site is a single character

Invariant Sites Model

    Takes into account sites that can never vary

    This is sometimes confused with sites that are not observed to vary. In this case the concern is sites that cannot possibly vary.

    Assumes characters fall into two rate categories

    Not variable

    these positions are assumed to never vary under any circumstances

    Sites under very strong selection, where any mutation was lethal, would never be observed to vary

    Variable

    All other sites are assumed to be evolving at the same rate

    Realistic, in that there probably are some sites that are effectively invariant, but unrealistic in that the distinction between variable and non-variable is probably not a boolian variable.

Gamma Rate Distribution

    Convenient, because it can model a wide range of plausible distributions of rates

    Two parameters, alpha and beta

    Beta is a scale parameter. If beta = 1/alpha, then mean rate is 1, and alpha is the critical parameter.

    If alpha is large (> 200), then all sites evolve at roughly the same rate (i.e., the rates cluster around 1).

    If alpha is small (< 1), then many sites evolve slowly, but a few evolve rapidly. This approximates an invariant sites model.

    With intermediate values of alpha, there is a broad distribution of rates from slow to fast.

    A similar method has been proposed by Van de Peer.

Invariant Sites plus Gamma Distribution

    The two models can be combined. Of course the gamma shape parameter alpha will have a substantially different value in a combined model.

Relative Rates

DNArates

    Use maximum likelihood to estimate individual rate for each site

LogDet

Models of Amino Acid Substitution

Dayhoff Matrices (Dayhoff, 1979)

    Probability of Accepted Mutation (PAM Matrices)

    Based on empirical observations of amino acid substitutions

    Dayhoff's calculations were based on a very limited number of protein comparisons by today's standards, but more modern calculations of PAM matrices are available

    Jones et al., 1992.

Codon Models

Chemical Property Models