Using PAUP*
PAUP* is a major analytical tool in phylogenetic analysis. It makes available a very wide variety of analytical methods in a single environment, and can be operated via window/mouse, command-line, or scripts.
The PAUP* web site has considerable useful information.
Unfortunately PAUP* is a commercial program (available from Sinauer Associates), and although it is quite a good value, if you wish to use software that is available free, you will have to use another package. A very flexible alternative to PAUP* is phylip, which also includes a wide range of analytical methods. The phylip web site also has an excellent summary of phylogenetic software available elsewhere on the web. Consult this if you are looking for a specific analytical capability.
Select File -> Execute <filename>
To perform analyses with a NEXUS file, PAUP* must first load the data matrix and assumptions into memory in a format it can work with.
Note the keyboard shortcut: command-R
If there are any error messages during execution, you have probably modified the file since it was imported as and MSF file. Fix any syntax errors and reexecute the file.
Start a parsimony search
Parsimony is the default optimality criterion, so you should see it "checked" on the
Analysis
menu.Inspect the settings in the parsimony options dialog box
Select
Analysis -> Parsimony Options
Dialog boxes are used to control the behavior of the search
Notice that this dialog box has five sub-screens, which may be stepped through by clicking on the tiny arrow on the upper left side of the dialog box, or selected directly by clicking in the text-box to the right of the double arrows.
Contemplate the settings in this dialog box, but do not change any of the settings.
Notice the Defaults button in the lower left corner of the dialog box. If you lose control of what settings you have made, this feature can be used to return the settings to the 'factory' settings.
Remember! It is most unwise to change any settings you don't understand.
Exclude characters that are ambiguously aligned, or for which there are only minimal data available
Select
Data -> Include-Exclude Characters
Highlight the characters you wish to exclude, and click on
>> Exclude >>
to move them to the excluded column.If you want to re-include some characters, select them in the excluded column and click on the
<< Include <<
button.Close the Include-Exclude dialog box
Select
Data -> Show Character Status
to review the character status
Optimality Methods
To set the optimality criterion, select the appropriate method under "Analysis". Each method has its own settings dialog box.
Parsimony
Distance
Remember that distance methods consist of two distinct steps
- Calculating a pairwise distance matrix
- Finding the tree that best corresponds to these distances
Likelihood
Likelihood uses an explicit model to relate the data to the tree.
Two approaches to likelihood are now in widespread use in phylogenetic analysis
Maximum Likelihood, which is an optimality method
Bayesian Analysis, which is a monte carlo method
At present PAUP* supports only maximum likelihood
Tree Searches
Reduce your dataset until it has fewer than ten taxa. We would like to have this analysis require 2-5 minutes of computer time. Somewhere between 8 and 12 taxa should be sufficient.
Perform an exhaustive search
Select the Exhauustive Search dialog box (
Analysis -> Exhaustive Search
).Enjoy the options, then press the Search button.
Note how long the search takes to finish
Inspect the tree-length histogram
Inspect the tree found with the exhaustive search
Note the length of the tree.
Select Trees -> Show Trees
Repeat the same analysis with a Branch-and-bound search.
How did the time required for the search compare with the exhaustive search? B&B searches are guaranteed to find the best possible tree, but they do not examine every tree and cannot output an exact distribution of tree lengths.
Now perform a heuristic search with simple stepwise addition.
How long did the search take? How does this compare with the previous searches?
Heuristic methods are not guaranteed to find the best possible solution. Consequently the precise conditions under which the search is run can be important
Bootstrapping
Bootstrapping is a statistical method that uses
The Command Line Interface
Gettting Help
At the command line interface, type a question mark (?) and press return
paup> ?
This will show you the commands that PAUP understands.
Now select a command and view the options for that command:
paup> hsearch ?
This will show you the syntax for the keyword, followed by a list of available options.
The options are shown in three columns
Keyword
Possible settings
Current setting
Remember that the general syntax for commands is
keyword option1=setting option2=setting;
Use the command line interface to explore PAUP
Suggestions:
Using an alignment of your choice, compare the search speeds for heuristic searches using TBR, SPR, and NNI branch swapping
Compare the results of heuristic searches using a simple addition sequence vs. random addition sequences.
Compare the trees found with Parsimony, Maximum Likelihood, and Minimum Evolution
Compare the trees found with Neighbor Joining and Minimum Evolution
We have previously used PAUP blocks to automatically configure frequently used features of an analysis, such as automatically exluding suspect characters.
Entire analyses can also be automated with a PAUP block
The syntax is the same as is used with the command line interface
If you can figure out the sequence of commands that you would issue to run an analysis from the command line interface, you can put this same sequence of commands into a PAUP block and cause the analysis to run automatically
Commands issued from the command line interface or PAUP blocks are often more convenient if you have an assumptions block with taxon sets, character sets, and other assumptions defined.
This is very useful!
Complex analyses can be performed unattended
Repetitive analyses can be performed with a minimum of human error
Long runs can be restarted by a person unfamiliar with the analysis, and analyses can be precisely reproduced
Place the commands, in proper sequence, within a valid PAUP block (remember the NEXUS file format). Be sure to end each line with a semicolon, and remember that a logical line need not correspond with what is displayed on a single line.
Using PAUP on UNIX computers
Now that you know how to run PAUP from the command line interface, you are free of the Macintosh window interface!
Paup is available for a wide variety of platforms, including some machines that are much faster and more stable than desktop machines.
UNIXprompt> paup inputfile
You will be confronted with the command line interface. Use it in the same way as you learned on the mac.
PAUP blocks are particularly useful when using the command line interface
You can edit the file with vi or any other unix editor, or you can edit it on a desktop computer and FTP the file to the UNIX computer.
Paup is also able to accept command-line redirection, so it is possible to store the data in one nexus file, and keep a standard series of PAUP commands additional files. The syntax for this is:
UNIXprompt> paup datafile < commandfile
It may take some experience to become comfortable with this approach, but it can be very powerful.
If you automate analyses with any random component, it is very important that you provide a unique random number seed for each distinct random analysis
True random number generators are rare
Most programs use "pseudo-random number generators"
Pseudo-random number generators produce a sequence of numbers that seem to be random, and is in fact random for our purposes, but that follow a sequence that is predictable if you know the seed number that was used.
It does not matter what seed number you use in PAUP (in phylip, the number must be odd), but it must be a different number for each run that you want to use a different random sequence
If you want to exactly repeat a random run, you can do it if you know the random number you used, so keep the random number recorded in your notes on your analyses.