Estimating Confidence

Analytical methods: Assessing Confidence

Most phylogenetic methods will always find at least one tree
- Consider how the analyses are done:
  - Choose an optimality criterion
  - Test a number of alternative topologies against that criterion
  - Even random data will have some 'best' tree
The importance of independent lines of evidence
- Do morphological and molecular data agree?
- Do two genes agree?
Most methods of phylogenetic analysis share many assumptions
- Therefore, different analytical method is not an independent line of evidence!
Because it is difficult to get independent lines of evidence, we want to assess confidence in the tree that we have got.
- Tree length distributions
- Equally parsimonious trees
- Relaxation of optimality criterion
  - Bremer Index/Decay Analysis

Build a new data matrix by randomly sampling characters with replacement
1. Take the original data matrix
2. Sample the matrix, randomly copying one character (column) from the original matrix
  - Do not delete the character after copying it
  - Each taxon's character state for the sampled character remains as it was in the original matrix
3. Add the selected character to a new data matrix
4. Repeat sampling until the new data matrix has as many characters as the original
  - Some characters will be sampled more than once, others not at all
  - The new dataset (a pseudosample) contains the same number of characters as the original data set, and the taxa included are unchanged.
Perform full phylogenetic analysis
Repeat many times
- The higher the number of replicates, the more precise the bootstrap values will be
- But remember the difference between accuracy and precision
Calculate frequency with which taxon bipartitions (branches) appear in the new analyses (these frequencies are often reported as percentages)
- Any tree can be thought of as a set of bipartitions
- Calculate frequency for each taxon bipartition that is found during replication

What bootstrap values mean
- Boostrapping measures how consistently the data support given taxon bipartitions
- High bootstrap values (close to 100%) mean uniform support
- i.e., if the bootstrap value for a certain clade is close to 100%, nearly all of the characters informative for this group agree that it is a group.
Pitfalls:
- Does not indicate whether or not the tree is 'correct'
- Will be mislead by 'long branch attraction'
- Slow, especially with messy data
- Low bootstrap values (below 50%) are essentially meaningless
- Every psuedosample's analysis must be performed correctly
- In big analyses, may not be practical to find the best tree for each psuedosample

Randomly delete characters until a given fraction (usually half) have been removed
Advantages
- No character is represented more than once
Disadvantages
- Size of data matrix is different

Anecdotal
Kishono-Hasegawa test (p. 505)
g1 statistics
Likelihood ratio tests
- d=2(lnL1-lnL0)
- Applicable to tree topology only under limited conditions, e.g., when one topology is a subset of the other, or perhaps when they differ only by the placement of a single branch.
Simulation methods