|
PAUP* is a major analytical tool in phylogenetic analysis. It makes
available a very wide variety of analytical methods in a single
environment, and can be operated via window/mouse, command-line, or
scripts. When starting to work with Paup, one will first need to execute a nexus file, which is yet another metadata containing sequence file; this time containing in its data segment a multiple sequence alignment. A Nexus file looks something like this: #NEXUS BEGIN DATA; dimensions ntax=26 nchar=1303; format missing=? symbols="ABCDEFGHIKLMNPQRSTUVWXYZ" interleave datatype=DNA gap= -; matrix Zamia -------------------------------------------------- Cycas ------------------------------------------CGTGTTTAThe above segment shows the gaps padding the beginning of a multiple sequence alignment. Readseq (installed on both the owl cluster and locus) is the most common utility to translate from a given format into another, including nexus. Before using paup for phylogenetic inference, create or acquire an alignment. I suggest going to the protein family database (pfam): http://pfam.wustl.edu and browsing for a protein family of interest. Take a moment to notice the % identity, the structures provided, the descriptions of the protein families, and the information provided regarding the creation of any hidden markov models for a family of interest. When downloading the alignment of a given family, keep in mind that our class is only three hours and phylogenetic inference of even a score of taxa can take multiple days or indeed months; so you may want to only download the seed alignment and even cut taxa out of it before loading it into paup. When working with paup, it is important to recall that there are three main optimality criteria used for phylogenetic analysis; each with strengths and weaknesses: Parsimony, Distance, and Likelihood. I suggest spending a moment considering these strengths and weaknesses before continuing further; if you are uncertain about them, ask your neighbors and/or take a moment to look in the text or online.
In order to compare different methodologies for searching tree space, we
will use the same optimality criterion and first change the search type
from Exhaustive to Heuristic and Branch-and-Bound in order to get a sense
of their relative speed.
As the name "Phylogenetic Analysis Using Parsimony" suggests, paup's
default analytical method consists of a parsimony search. If you are
using the graphical user interface for paup, examine the parsimony
options, noting that there are five subscreens of options which one may
change. Consider each of these; if you change them, take note of the
'defaults' button in paup which allows one to return to the default
settings.
Distance methods boil down the data of the multiple sequence alignment
into a set of distances between each individual taxon and attempt to
match a tree to best fit these distances. Thus it may at times provide
a good middle ground between correctness and requisite speed.
Change the optimality criterion to distance in the Analysis tab,
and recall that distance methods require two steps:
Likelihood methods often prove the most correct and difficult to complete.
The two primary likelihood methods include Bayesian inference using a
monte carlo search and Likelihood inference (strangely I do not believe
any likelihood methods use a monte carlo search.) Paup has only
implemented likelihood inference, selectable once again in the
Analysis tab. A bootstrap provides a measurement of confidence in the inferences inherent in a given multiple sequence alignment and phylogenetic tree. As the name suggests, a bootstrap uses only the already existing information in order to provide this metric; in the case of paup the bootstrap provided is nonparametric and created via repeated random sampling of characters provided from the multiple sequence alignment. Each set of randomly generated characters undergoes a new phylogenetic inference and the result is compared to the original inference for as many iterations as possible in order to generate a percentage of support by the bootstrap for the phylogenetic inference. Perform a bootstrap of your maximum parsimony search using the three types of branch swapping: TBR (Tree Bisection and reconnection), SPR (subtree pruning and reconnection), and NNI (Nearest neighbor interchange) in order to get a sense of their varying support for your initial tree and varying run times.
If you automate analyses with any random component, it is very important
that you provide a unique random number seed for each distinct random
analysis. Created: Wed Sep 15 00:58:22 EDT 2004 by Charles F. Delwiche |