The 232-Taxon Data Set
this page stems from discussions during the Princeton 1998 meeting concerning a 232-taxon benchmark dataset
The Challenges
The 232-taxon benchmark dataset available below poses a challenge to theorists, programmers, and empiricists associated with the GPPRCG (Green Plant Phylogeny Research Coordination Group). Two sets of preliminary results are provided, the first (4025 steps) from a constrained parsimony analysis and the second (4016 steps) from an unconstrained analysis. Neither of these results is from a complete search, and the hunt is still on! A challenge goes out to all those associated with the GPPRCG to find shorter trees.
Everyone is invited to send in their favorite trees, along with comments on both their trees and methods. (See below for submission.) Hopefully, this electronic approach will provide an excellent forum for researchers to implement their favorite approach/algorithm on a consistent, challengingly large data set. Comments and results will be posted below.
Here are some of the specific challenges:
- Who can find the shortest/best tree or trees in the shortest CPU time?
- What happens if you break this into sub-analyses?
- Can a selected set of exemplars give the same topology?
- Are there other good short cuts?
The 232-Taxon Data Set
The 232-taxon data set is available here for downloading. It contains a header with information concerning excluded sites and taxsets. Also, the constraint tree used in the preliminary analyses is included in the file.
Posting Comments and Results
Researchers are invited to send trees and comments for posting to Charles Delwiche
(delwiche@umd5.umd.edu) .
Depending on the response, the comments and discussion will either be posted
below or on a separate page that will be directly linked to GPPRCG homepage.
The Results
Results from preliminary constrained and unconstrained analyses are provided by Brent Mishler below:
Constrained Analyses
Unconstrained Analyses