|
When these exercises are complete, one should be able to configure a plotting device for use with gcg and make use of some of the graphical tools available to a user of gcg. We will make use of dotplot in order to examine areas of similarity between two sequences. GCG provides multiple methods for the graphical programs to print output for the user. In the case of dotplot, the output is a set of plotted points which correspond to areas of significant similarity between two sequences. Unfortunately the Unix command shell is not by itself capable of displaying such a plot, and so we must do a little work first. Log on to your GCG account and start GCG
To use dotplot you must use a telnet emulator with TEK emulation (i.e.,
"Better Telnet"), or else use either the GCG X-windows
interface. If you use Better Telnet, be sure that TEK emulation is turned
on under the menu "favorites/edit favorites." Use
the "terminal" tab to set TEK emulation to 4105.
GCG will have to be set to where you want to have plots set. First copy
the file ".gcgmydevices" into your home directory. This only
needs to be done the first time you use your account to plot.
You will be presented with a menu, with the option "plot-telne" highlighted. If it is not highlighted, use the cursor keys to highlight it, then press "return" to accept this setting. If you prefer, you can use setplot to send graphical output to a postscript file (simply use the arrow keys to highlight that option and press return). The postscript file can be sent to a postscript printer, either directly from the GCG platform, or indirectly by transferring the file (using sftp) to your workstation. Add a setplot command to your locus .login to set up your preferred plotter when you login.
In the above example, I used compare to make a dotplot of two sequences; this create the description file of the dotplot, called comparison.pnt. dotplot then took that output as input and plotted it to my current plotter, which is the postscript file graphics.ps. Finally I ran gv to view the output and optionally print it. gv is one of the many available postscript viewers. If you later wish to plot to a terminal you will need to run setplot again. Create a working directory called "plots" and use your Unix skills to enter this new directory. Perform the next operations in this directory. Download the files containing the sequences you want to compare. In the following example both of the sequences originally came from swissprot. They are the same seqeunces that are used in the figure showing dotplot in your textbook. You can also use fetch to retrrieve genbank files. The computer locus has a local copy of genbank installed on it, and the sequences are retrieved from this database. Because the UMBI database is not updated as often as the NCBI database, if you use the web interface to find files at NCBI, you may need to copy the files in another way.
These files will be copied into your current working directory, and will be given the extension .swissprot when acquired from fetch and .rsf from netfetch. If one wants to use the genbank output from ncbi, one will have to modify it to work properly with gcg. One alternative is to use the 'fixnetfetch' program you have already written, or to manually edit the genbank output to remove the header and add a .. after the sequence stanza: Given a sequence from netfetch which looks like this: !!RICH_SEQUENCE 1.0 NETFETCH of: query October 12, 2004 07:10 from server: www.ncbi.nlm.nih.gov 1 Sequences Requested 1 Sequences Returned Sequences Requested ----- p00748 .. { name FA12_HUMAN One will need to remove everything from the line after !!RICH up to the line starting with 'name'. Finally add the gcg separator '..' so that the output changes from: /note="Y -> H. /FTId=VAR_014338." sequence mrallllgfllvslestlsippweapkehkykaeehtvvltvtgepchfpfqyhrqlyhkto /note="Y -> H. /FTId=VAR_014338." sequence .. mrallllgfllvslestlsippweapkehkykaeehtvvltvtgepchfpfqyhrqlyhkNext, remove the final '}' character at the end of the file. If you intend to use the genbank output I suggest spending a few minutes and modifying your Perl translator to do this automatically. Finally use reformat to add the time stamp checksum to the sequence (unless your Perl program does it for you). Use compare to generate a datafile (give the filename an extension ".dat"). There are two ways to do this:
Now that you have looked over a dotplot of the sequence similarity between two sequences, use gap to perform a Needleman Wunsch alignment and bestfit to perform a Smith Waterman alignment. The use of gap and bestfit are exercises left to the reader.
Created by Charles F. Delwiche |