Analytical methods: Assessing Confidence
What gives confidence? 
	
  - Most phylogenetic methods will always find at least one tree 
    
      - Consider how the analyses are done:
        
          - Choose an optimality criterion
- Test a number of alternative topologies against that criterion
- Even random data will have some 'best' tree
 
 
- The importance of independent lines of evidence 
    
      - Do morphological and molecular data agree? 
- Do two genes agree? 
 
- Most methods of phylogenetic analysis share many assumptions 
    
      -  Therefore, different analytical method is not an independent line of 
        evidence! 
 
- Because it is difficult to get independent lines of evidence, we want to 
    assess confidence in the tree that we have got. 
    
      - Tree length distributions 
- Equally parsimonious trees 
- Relaxation of optimality criterion 
        
          - Bremer Index/Decay Analysis
 
 
Random Permutation Methods 
Nonparametric Bootstrapping (often just called 'Bootstrapping') 
  - Build a new data matrix by randomly sampling characters with replacement 
    
      - Take the original data matrix
- Sample the matrix, randomly copying one character (column) from the 
        original matrix 
        
          - Do not delete the character after copying it
- Each taxon's character state for the sampled character remains as 
            it was in the original matrix
 
- Add the selected character to a new data matrix
- Repeat sampling until the new data matrix has as many characters as 
        the original 
        
          - Some characters will be sampled more than once, others not at all
- The new dataset (a pseudosample) contains the same number 
            of characters as the original data set, and the taxa included are 
            unchanged.
 
 
- Perform full phylogenetic analysis 
- Repeat many times 
    
      - The higher the number of replicates, the more precise the bootstrap 
        values will be
- But remember the difference between accuracy and precision
 
- Calculate frequency with which taxon bipartitions (branches) appear 
    in the new analyses (these frequencies are often reported as percentages) 
    
      - Any tree can be thought of as a set of bipartitions
- Calculate frequency for each taxon bipartition that is found during 
        replication 
 
  - What bootstrap values mean 
    
      - Boostrapping measures how consistently the data support given 
        taxon bipartitions
- High bootstrap values (close to 100%) mean uniform support 
- i.e., if the bootstrap value for a certain clade is close to 100%, nearly 
        all of the characters informative for this group agree that it is a group.
 
- Pitfalls: 
    
      - Does not indicate whether or not the tree is 'correct'
- Will be mislead by 'long branch attraction'
- Slow, especially with messy data 
- Low bootstrap values (below 50%) are essentially meaningless 
- Every psuedosample's analysis must be performed correctly 
- In big analyses, may not be practical to find the best tree for each 
        psuedosample 
 
Jackknifing 
  - Randomly delete characters until a given fraction (usually half) have been 
    removed 
- Advantages 
    
      - No character is represented more than once 
 
- Disadvantages 
    
      - Size of data matrix is different 
 
Hypothesis Testing 
  - Anecdotal 
- Kishono-Hasegawa test (p. 505)
- g1 statistics 
- Likelihood ratio tests 
    
      - d=2(lnL1-lnL0) 
- Applicable to tree topology only under limited conditions, e.g., when 
        one topology is a subset of the other, or perhaps when they differ only 
        by the placement of a single branch.
 
- Simulation methods