Lab for November 13, 2006
Introduction to phylogenetics -- practice with Phylip
Phylip is an open source phylogenetics package. The hardest part about using Phylip is that you must rename files at every step. If my instructions are not enough refer to the pdf file on this web page.
When you do your multiple alignment with ClustalW, you must save the output file in Phylip format. At the multiple alignment menu choose 9. At the format menu choose 1 to turn off clustal format and 4 to turn on Phylip format. The output file should end with a ".phy" extension.
To make a distance tree use protdist from the phylip package with the default settings. Rename "outfile" to something you will recognize. The outfile is your distance matrix. Use neighbor to derive a phylogeny. Use neighbor-joining and the defaults initially. If you have time, try UPGMA. Rename "outfile" and "outtree."
Parsimony and maximum likelihood are character based techniques. This means they use comparisons of individual residue positions to find phylogenies.
Parsimony tries to find the tree that requires the fewest changes from a group of phylogenetic trees. Do a parsimony tree using protpars. Use the default settings. You should be able to put this ouput directly into a tree drawing program.
Maximum likelihood tries to find the most probable tree from a group of generated trees. If you have time, use "promlk" to create a maximum likelihood tree.
To create a phylogenetic tree, try both "drawgram" and "drawtree" to create a tree. Choose "P" at the menu to change the output file. If you are going to want to view your tree on a PC choose "W" for "MS-Windows Bitmap." Use a resolution of at least 1280 x 800. Click on "plot" to save the tree. If you saved as a windows bitmap, the file should end in bmp and can be opened with a program like Adobe illustrator or photoshop. RENAME PLOTFILE PRIOR TO DOING ANOTHER PLOT!
Bootstrapping is a way of testing the reliability of a phylogeny. When bootstrapping, use the original data file that you used as input. Use the defaults, but pay special attention to how many replicates you are using. Use the output of seqboot as the input of protpars (or "protdist" or "prmlk" if you want). Use "consense" to build the consensus tree.
Questions
What is the major difference between the output of "drawgram" and "drawtree?"
Print a tree for each method used. Label it with the organism names before or after labelling. Do the sequences group together as you would expect based on organism? How can you explain this?
Do the trees from different methods look essentially the same or are some proteins grouped differently.
Does your bootstrapping analysis support your parsimony tree? Values close to 100 are good. The lower they get the less support there is for that branch.
Myosin Sequences (fasta format)