Analytical methods: tree searching
- Given a data set, the goal is to find the tree that best explains those
data.
- Optimality criteria allow one to choose between two trees.
- How does one find the trees to compare?
A. Exhaustive search
- Evaluate length of every possible tree; follow a search tree
- Three taxa - one possible tree
- Four taxa - three possible trees
- Five taxa - 15 possible trees
- 21 taxa - 8.2x1021 possible trees
- 50 taxa - 2.75x1076 possible trees
- The observable universe has about 8.8x1077 atoms
- Taxon addition sequence is important only in that the algorithm needs to
remember where it is
- Every possible tree is examined; the shortest tree will always be found
- Search will also generate a list of the lenths of all possible trees, which
can be plotted as a histogram
B. Branch and Bound search
- Hendy and Penny, 1982.
- Much faster, but still guaranteed to find the best tree
- Determine an upper bound for the length of the shortest tree
- Any guess will do, but the more stringent the guess, the faster the
search will be
- Use the length of a random tree, or the length of the shortest tree
known
- Follow a predictable search path through possible tree topologies, similar
to an exhaustive search
- Abandon any fork of the search tree when the upper bound is exceeded before
the last taxon is added
- Does not calculate the length of every tree, but always finds the best one
C. Heuristic search
- Search trees by branch swapping.
- Very fast compared with exhaustive or branch and bound searches
- Not guaranteed to find the best tree!
- Like a blind person climbing up a hill
- Find a starting tree
- Stepwise addition
- Use addition sequence similar to that for an exhaustive search,
but at each addition, determine the shortest tree, and add the next
taxon to that tree.
- Addition sequence will affect what tree topology is found!
- Star decomposition
- Start with all taxa in an unresolved (star) tree.
- Form pairs of taxa, and determine length of tree with paired taxa.
- Rearrange tree looking for better ones
- NNI - Nearest Neighbor Interchange
- Identify an interior branch. It is flanked by four subtrees
- Swap two of the subtrees on opposite ends of the branch
- Two rearrangements are possible
- SPR - Subtree Pruning and Regrafting
- Identify and remove a subtree
- reattach to each possible branch of the remaining tree
- NNI is a subset of SPR
- TBR - Tree Bisection and Reconnection
- Divide tree into two parts.
- Reconnect by a pair of branches, attempting every possible pair
of branches to rejoin
- NNI and SPR are subsets of TBR
Search Strategies
- In cases where it is difficult to find the optimal tree (i.e., large trees, or those with many nearly equally good solutions) there are a variety of modified search strategies that can be helpful.
- Iterative searches
- Parsimony ratchet
(this helps the search avoid getting stuck on local optima)
- Randomly reweight a subset of the data
- Search for a new tree
- Return to the original weighting
- Search again
- Genetic algorithms and monte carlo
- Simulated annealing
- Gradually increase the stringency of the optimality criterion
- in parsimony, this approach is often limited by the need to save many trees
- Divide and conquor
- Break the problem into smaller components
- Quartet puzzling
Supertrees
When it is impractical to find a single tree, either because the dataset is very large, or because the data are heterogeneous or missing for groups of taxa, it is possible to find several trees that span the data and then add these trees together.
A supertree is not the same thing as a consensus tree.
Consensus trees summarize topological variation among trees with the same taxa. Supertrees represent a summary phylogeny of two or more trees with partially overlapping taxa.
Supertree analyses are very useful, but create a number of poorly understood complications.
Bininda-Emonds, O. R. P., J. L. Gittleman, and M. A. Steel. 2002. The (Super)tree of life: Procedures, problems, and prospects. Annual Review of Ecology and Systematics 33:265-289.
Goloboff, P. A., and D. Pol. 2002. Semi-strict supertrees. Cladistics 18:514-525.
Nixon, K. 1999. The parsimony ratchet, a new method for rapid parsimony analysis. Cladistics 15:407-414.
Sanderson, M. J., A. Purvis, and C. Henze. 1998. Phylogenetic supertrees: assembling the trees of life. Trends in Ecology and Evolution 13:105-109.
Felsenstein, Chapters 4 and 5.
Tree Searching: HMM pp. 478-485.