Using cut and paste to download genbank files and converting these files to GCG format
What other specialty software is available on UMBI?
Importing PHYLIP trees into PAUP*
I have found a bunch of sequences in genbank and I want to get them into NEXUS format. How do I do that?
I have found a file of interest by searching genbank with a web browser, but when I try to use fetch to get the sequence, it indicates the accession number can't be found. Is there another way to convert a genbank file to GCG format?
This is probably because they are in "Genbank Updates" rather than "Genbank", and therefore may not be on the version of genbank that is on the UMBI computer. What you can do is to download the sequences directly from genbank either a) cutting and pasting -- do this only if the sequences are short -- or b) using netscape to save them to a file, and then use "fetch" or some other FTP tool to move them to the UMBI box. Note that there are two programs called 'fetch'. One is a macintosh program that is used for transfering files via the FTP protocol. The second is a part of the GCG package, and is used to retrieve files from genbank.
Cutting and pasting is a simple method of moving a genbank file from a web 
  browser to GCG when you can't use the GCG tool fetch, but you will 
  have to master some more unix commands. If you issue the following command: 
cat > newfile
when you press carriage return the computer will start accepting input from 
  the keyboard, so if you then paste, the unix box will put the characters you 
  paste into the file 'newfile'. When you are done with that, press 
  ^d (control-d, which is the unix end-of-file character). At that point you should 
  get the prompt back. Alternatively, download the file to your mac or PC, and 
  then use FTP to move the file to the GCG unix host.
The sequences will then be in Genbank format, *not* GCG format, you can convert 
  them from genbank to GCG with a GCG tool called "fromgenbank". 
It should look like this (use 'more' to look at the contents of 
  the file you have created): 
***********
prompt> cat > newfile PASTE THE GENBANK FILE IN HERE (IN THIS CASE LOCUS MICSPCOX) ^d prompt> more newfile CONTENTS OF THE GENBANK FILE (LOCUS MICSPOCOX) prompt> fromgenbank FromGenBank reformats one or more sequences in the flat file format of the    
GenBank database into individual sequence files in GCG format. Reformat what GenBank data file?  newfile      micspcox.seq   381 bp.   reformatted: temp
  total files: 1
  total bases: 381 prompt> mv micspcox.seq Coleo-mt-coxII.seq **********
Note that using this syntax, GCG assigns a new name to the file. This new name is based on the locus name, in this case micspcox.seq. So 'newfile' is still the same file it ever was. You will probably want to rename the file created by fromgenbank, so the last command shown above moves the file from one file name to another.
The cut and paste approach will only work with relatively small files. If you decide to download a huge file (e.g., the complete yeast genome) use FTP to move the file around. Reformatting the file with fromgenbank is the same in both cases.
CFD 1.18.98
I have installed Clustalw (another alignment program) on the UNIX cluster and Malign ( a simultaneous alignment and tree building program) on UMBI. If you are logged in as pbio699k, these programs will be available with the commands clustalw and malign, respectively. Documentation is in the "bin" directory:
~pbio699k/bin/clustalwdir/clustalw1.7/clustalw.doc ~pbio699k/bin/maligndir/MALIGN.TXT
If you perform *any* analyses that take more than a few minutes to execute, be sure to run them under "nice":
nice clustalw 
For longer runs, where you will need to log out before the run is finished, use "nohup":
 (nice nuhup malign PARAMFILE < INPUTFILE > OUTFILE ) >& STDERR.OUT & 
We will discuss what that complex command line means in a few weeks.
CFD 3.2.98
Trees generated by phylip (and related programs, such as fastDNAml and PROTML) can be imported into paup if the taxon names are identical in both the paup* (NEXUS) data file and the phylip tree file. If the taxon names are not identical, you will have to use search and replace to change the taxon names.
Like PAUP, phylip stores trees as sets of nested parentheses, with taxa separated by commas. Branch lengths follow each OTU, and are separated by a colon. The tree ends with a semicolon:
((taxon1: 0.209428,taxon2: 0.060451): 0.064360,(taxon3: 0.085505,(taxon4: 
  0.099318,taxon5: 0.038405): 0.013970):0.0; 
Because the basic structure of the tree file is the same in both programs, a "wrapper" can be added to the tree file to make it acceptable to paup*:
#nexus
begin trees;
utree example=((taxon1: 0.209428,taxon2: 0.060451): 0.064360,(taxon3: 0.085505,(taxon4:0.099318,taxon5: 0.038405): 0.013970):0.0;
end;