Analytical methods: Assessing Confidence
What gives confidence?
- Most phylogenetic methods will always find at least one tree
- Consider how the analyses are done:
- Choose an optimality criterion
- Test a number of alternative topologies against that criterion
- Even random data will have some 'best' tree
- The importance of independent lines of evidence
- Do morphological and molecular data agree?
- Do two genes agree?
- Most methods of phylogenetic analysis share many assumptions
- Therefore, different analytical method is not an independent line of
evidence!
- Because it is difficult to get independent lines of evidence, we want to
assess confidence in the tree that we have got.
- Tree length distributions
- Equally parsimonious trees
- Relaxation of optimality criterion
- Bremer Index/Decay Analysis
Random Permutation Methods
Nonparametric Bootstrapping (often just called 'Bootstrapping')
- Build a new data matrix by randomly sampling characters with replacement
- Take the original data matrix
- Sample the matrix, randomly copying one character (column) from the
original matrix
- Do not delete the character after copying it
- Each taxon's character state for the sampled character remains as
it was in the original matrix
- Add the selected character to a new data matrix
- Repeat sampling until the new data matrix has as many characters as
the original
- Some characters will be sampled more than once, others not at all
- The new dataset (a pseudosample) contains the same number
of characters as the original data set, and the taxa included are
unchanged.
- Perform full phylogenetic analysis
- Repeat many times
- The higher the number of replicates, the more precise the bootstrap
values will be
- But remember the difference between accuracy and precision
- Calculate frequency with which taxon bipartitions (branches) appear
in the new analyses (these frequencies are often reported as percentages)
- Any tree can be thought of as a set of bipartitions
- Calculate frequency for each taxon bipartition that is found during
replication
- What bootstrap values mean
- Boostrapping measures how consistently the data support given
taxon bipartitions
- High bootstrap values (close to 100%) mean uniform support
- i.e., if the bootstrap value for a certain clade is close to 100%, nearly
all of the characters informative for this group agree that it is a group.
- Pitfalls:
- Does not indicate whether or not the tree is 'correct'
- Will be mislead by 'long branch attraction'
- Slow, especially with messy data
- Low bootstrap values (below 50%) are essentially meaningless
- Every psuedosample's analysis must be performed correctly
- In big analyses, may not be practical to find the best tree for each
psuedosample
Jackknifing
- Randomly delete characters until a given fraction (usually half) have been
removed
- Advantages
- No character is represented more than once
- Disadvantages
- Size of data matrix is different
Hypothesis Testing
- Anecdotal
- Kishono-Hasegawa test (p. 505)
- g1 statistics
- Likelihood ratio tests
- d=2(lnL1-lnL0)
- Applicable to tree topology only under limited conditions, e.g., when
one topology is a subset of the other, or perhaps when they differ only
by the placement of a single branch.
- Simulation methods