Using PAUP*

PAUP* is a major analytical tool in phylogenetic analysis. It makes available a very wide variety of analytical methods in a single environment, and can be operated via window/mouse, command-line, or scripts.

The PAUP* web site has considerable useful information.

Unfortunately PAUP* is a commercial program (available from Sinauer Associates), and although it is quite a good value, if you wish to use software that is available free, you will have to use another package. A very flexible alternative to PAUP* is phylip, which also includes a wide range of analytical methods. The phylip web site also has an excellent summary of phylogenetic software available elsewhere on the web. Consult this if you are looking for a specific analytical capability.

Select File -> Execute <filename>

Start a parsimony search

Optimality Methods

To set the optimality criterion, select the appropriate method under "Analysis". Each method has its own settings dialog box.

Parsimony

Distance

Remember that distance methods consist of two distinct steps

  1. Calculating a pairwise distance matrix
  2. Finding the tree that best corresponds to these distances

Likelihood

Likelihood uses an explicit model to relate the data to the tree.

Two approaches to likelihood are now in widespread use in phylogenetic analysis

Maximum Likelihood, which is an optimality method

Bayesian Analysis, which is a monte carlo method

At present PAUP* supports only maximum likelihood

Tree Searches

Reduce your dataset until it has fewer than ten taxa. We would like to have this analysis require 2-5 minutes of computer time. Somewhere between 8 and 12 taxa should be sufficient.

Perform an exhaustive search

    Select the Exhauustive Search dialog box (Analysis -> Exhaustive Search).

    Enjoy the options, then press the Search button.

    Note how long the search takes to finish

    Inspect the tree-length histogram

Inspect the tree found with the exhaustive search

    Note the length of the tree.

    Select Trees -> Show Trees

Repeat the same analysis with a Branch-and-bound search.

    How did the time required for the search compare with the exhaustive search? B&B searches are guaranteed to find the best possible tree, but they do not examine every tree and cannot output an exact distribution of tree lengths.

Now perform a heuristic search with simple stepwise addition.

    How long did the search take? How does this compare with the previous searches?

    Heuristic methods are not guaranteed to find the best possible solution. Consequently the precise conditions under which the search is run can be important

Bootstrapping

Bootstrapping is a statistical method that uses

The Command Line Interface

Gettting Help

    At the command line interface, type a question mark (?) and press return

    paup> ?

    This will show you the commands that PAUP understands.

    Now select a command and view the options for that command:

    paup> hsearch ?

    This will show you the syntax for the keyword, followed by a list of available options.

    The options are shown in three columns

    Keyword

    Possible settings

    Current setting

    Remember that the general syntax for commands is

    keyword option1=setting option2=setting;

Use the command line interface to explore PAUP

    Suggestions:

    Using an alignment of your choice, compare the search speeds for heuristic searches using TBR, SPR, and NNI branch swapping

    Compare the results of heuristic searches using a simple addition sequence vs. random addition sequences.

    Compare the trees found with Parsimony, Maximum Likelihood, and Minimum Evolution

    Compare the trees found with Neighbor Joining and Minimum Evolution

Using a PAUP block to automate analyses

We have previously used PAUP blocks to automatically configure frequently used features of an analysis, such as automatically exluding suspect characters.

Entire analyses can also be automated with a PAUP block

    The syntax is the same as is used with the command line interface

    If you can figure out the sequence of commands that you would issue to run an analysis from the command line interface, you can put this same sequence of commands into a PAUP block and cause the analysis to run automatically

    Commands issued from the command line interface or PAUP blocks are often more convenient if you have an assumptions block with taxon sets, character sets, and other assumptions defined.

This is very useful!

    Complex analyses can be performed unattended

    Repetitive analyses can be performed with a minimum of human error

    Long runs can be restarted by a person unfamiliar with the analysis, and analyses can be precisely reproduced

Place the commands, in proper sequence, within a valid PAUP block (remember the NEXUS file format). Be sure to end each line with a semicolon, and remember that a logical line need not correspond with what is displayed on a single line.

Using PAUP on UNIX computers

Now that you know how to run PAUP from the command line interface, you are free of the Macintosh window interface!

    Paup is available for a wide variety of platforms, including some machines that are much faster and more stable than desktop machines.

    UNIXprompt> paup inputfile

    You will be confronted with the command line interface. Use it in the same way as you learned on the mac.

    PAUP blocks are particularly useful when using the command line interface

    You can edit the file with vi or any other unix editor, or you can edit it on a desktop computer and FTP the file to the UNIX computer.

Paup is also able to accept command-line redirection, so it is possible to store the data in one nexus file, and keep a standard series of PAUP commands additional files. The syntax for this is:

    UNIXprompt> paup datafile < commandfile

    It may take some experience to become comfortable with this approach, but it can be very powerful.

Pseudo-random number generators

If you automate analyses with any random component, it is very important that you provide a unique random number seed for each distinct random analysis

    True random number generators are rare

    Most programs use "pseudo-random number generators"

Pseudo-random number generators produce a sequence of numbers that seem to be random, and is in fact random for our purposes, but that follow a sequence that is predictable if you know the seed number that was used.

    It does not matter what seed number you use in PAUP (in phylip, the number must be odd), but it must be a different number for each run that you want to use a different random sequence

If you want to exactly repeat a random run, you can do it if you know the random number you used, so keep the random number recorded in your notes on your analyses.