PAUP* Practice

Using PAUP*

PAUP* is a major analytical tool in phylogenetic analysis. It makes available a very wide variety of analytical methods in a single environment, and can be operated via window/mouse, command-line, or scripts.

The PAUP* web site has considerable useful information.

Unfortunately PAUP* is a commercial program (available from Sinauer Associates), and although it is quite a good value, if you wish to use software that is available free, you will have to use another package. A very flexible alternative to PAUP* is phylip, which also includes a wide range of analytical methods. The phylip web site also has an excellent summary of phylogenetic software available elsewhere on the web. Consult this if you are looking for a specific analytical capability.

Select File -> Execute <filename>

To perform analyses with a NEXUS file, PAUP* must first load the data matrix and assumptions into memory in a format it can work with.

Note the keyboard shortcut: command-R

If there are any error messages during execution, you have probably modified the file since it was imported as and MSF file. Fix any syntax errors and reexecute the file.

Start a parsimony search

Parsimony is the default optimality criterion, so you should see it "checked" on the Analysis menu.

Inspect the settings in the parsimony options dialog box

Select Analysis -> Parsimony Options

Dialog boxes are used to control the behavior of the search

Notice that this dialog box has five sub-screens, which may be stepped through by clicking on the tiny arrow on the upper left side of the dialog box, or selected directly by clicking in the text-box to the right of the double arrows.

Contemplate the settings in this dialog box, but do not change any of the settings.

Notice the Defaults button in the lower left corner of the dialog box. If you lose control of what settings you have made, this feature can be used to return the settings to the 'factory' settings.

Remember! It is most unwise to change any settings you don't understand.

Exclude characters that are ambiguously aligned, or for which there are only minimal data available

Select Data -> Include-Exclude Characters

Highlight the characters you wish to exclude, and click on >> Exclude >> to move them to the excluded column.

If you want to re-include some characters, select them in the excluded column and click on the << Include << button.

Close the Include-Exclude dialog box

Select Data -> Show Character Status to review the character status

Optimality Methods

To set the optimality criterion, select the appropriate method under "Analysis". Each method has its own settings dialog box.

Parsimony

Distance

Remember that distance methods consist of two distinct steps

Calculating a pairwise distance matrix

Finding the tree that best corresponds to these distances

Likelihood

Likelihood uses an explicit model to relate the data to the tree.

Two approaches to likelihood are now in widespread use in phylogenetic analysis

Maximum Likelihood, which is an optimality method

Bayesian Analysis, which is a monte carlo method

At present PAUP* supports only maximum likelihood

Tree Searches

Reduce your dataset until it has fewer than ten taxa. We would like to have this analysis require 2-5 minutes of computer time. Somewhere between 8 and 12 taxa should be sufficient.

Perform an exhaustive search

Select the Exhauustive Search dialog box (Analysis -> Exhaustive Search).

Enjoy the options, then press the Search button.

Note how long the search takes to finish

Inspect the tree-length histogram

Inspect the tree found with the exhaustive search

Note the length of the tree.

Select Trees -> Show Trees

Repeat the same analysis with a Branch-and-bound search.

How did the time required for the search compare with the exhaustive search? B&B searches are guaranteed to find the best possible tree, but they do not examine every tree and cannot output an exact distribution of tree lengths.

Now perform a heuristic search with simple stepwise addition.

How long did the search take? How does this compare with the previous searches?

Heuristic methods are not guaranteed to find the best possible solution. Consequently the precise conditions under which the search is run can be important

Bootstrapping

Bootstrapping is a statistical method that uses

The Command Line Interface

Gettting Help

At the command line interface, type a question mark (?) and press return

paup> ?

This will show you the commands that PAUP understands.

Now select a command and view the options for that command:

paup> hsearch ?

This will show you the syntax for the keyword, followed by a list of available options.

The options are shown in three columns

Keyword

Possible settings

Current setting

Remember that the general syntax for commands is

keyword option1=setting option2=setting;

Use the command line interface to explore PAUP

Suggestions:

Using an alignment of your choice, compare the search speeds for heuristic searches using TBR, SPR, and NNI branch swapping

Compare the results of heuristic searches using a simple addition sequence vs. random addition sequences.

Compare the trees found with Parsimony, Maximum Likelihood, and Minimum Evolution

Compare the trees found with Neighbor Joining and Minimum Evolution

Using a PAUP block to automate analyses

We have previously used PAUP blocks to automatically configure frequently used features of an analysis, such as automatically exluding suspect characters.

Entire analyses can also be automated with a PAUP block

The syntax is the same as is used with the command line interface

If you can figure out the sequence of commands that you would issue to run an analysis from the command line interface, you can put this same sequence of commands into a PAUP block and cause the analysis to run automatically

Commands issued from the command line interface or PAUP blocks are often more convenient if you have an assumptions block with taxon sets, character sets, and other assumptions defined.

This is very useful!

Complex analyses can be performed unattended

Repetitive analyses can be performed with a minimum of human error

Long runs can be restarted by a person unfamiliar with the analysis, and analyses can be precisely reproduced

Place the commands, in proper sequence, within a valid PAUP block (remember the NEXUS file format). Be sure to end each line with a semicolon, and remember that a logical line need not correspond with what is displayed on a single line.

Using PAUP on UNIX computers

Now that you know how to run PAUP from the command line interface, you are free of the Macintosh window interface!

Paup is available for a wide variety of platforms, including some machines that are much faster and more stable than desktop machines.

UNIXprompt> paup inputfile

You will be confronted with the command line interface. Use it in the same way as you learned on the mac.

PAUP blocks are particularly useful when using the command line interface

You can edit the file with vi or any other unix editor, or you can edit it on a desktop computer and FTP the file to the UNIX computer.

Paup is also able to accept command-line redirection, so it is possible to store the data in one nexus file, and keep a standard series of PAUP commands additional files. The syntax for this is:

UNIXprompt> paup datafile < commandfile

It may take some experience to become comfortable with this approach, but it can be very powerful.

Pseudo-random number generators

If you automate analyses with any random component, it is very important that you provide a unique random number seed for each distinct random analysis

True random number generators are rare

Most programs use "pseudo-random number generators"

Pseudo-random number generators produce a sequence of numbers that seem to be random, and is in fact random for our purposes, but that follow a sequence that is predictable if you know the seed number that was used.

It does not matter what seed number you use in PAUP (in phylip, the number must be odd), but it must be a different number for each run that you want to use a different random sequence

If you want to exactly repeat a random run, you can do it if you know the random number you used, so keep the random number recorded in your notes on your analyses.