BSCI 380 Laboratory

Exercises
·Unix Introduction
·BLAST
·PERL
·Genbank
·BLAST, GCG
·GCG
·Seqlab
·Synthesis
·MSA
·Paup
·Phylogeny
·Examine

·An editor primer
·A GCG cheatsheet
·Flat2fasta homework
·Dynamic Programming homework
·High scoring words homework
·GCG homework
·Seqlab homework
·Mystery sequence homework
·Paup homework

Continuing GCG

Familiarity with the Unix family of operating systems.
The ability to work with the Blast tools.
Facility with an Unix editor.
An introduction to GCG.

Objectives

When these exercises are complete, one should be able to configure a plotting device for use with gcg and make use of some of the graphical tools available to a user of gcg. We will make use of dotplot in order to examine areas of similarity between two sequences.

Objectives

GCG provides multiple methods for the graphical programs to print output for the user. In the case of dotplot, the output is a set of plotted points which correspond to areas of significant similarity between two sequences. Unfortunately the Unix command shell is not by itself capable of displaying such a plot, and so we must do a little work first.

Log on to your GCG account and start GCG

To use dotplot you must use a telnet emulator with TEK emulation (i.e., "Better Telnet"), or else use either the GCG X-windows interface. If you use Better Telnet, be sure that TEK emulation is turned on under the menu "favorites/edit favorites." Use the "terminal" tab to set TEK emulation to 4105.
Another method of using a Tek plotter is to run xterm from locus, hit Control-middle-mouse-button and click upon 'start Tek emulation' (to use the middle mouse button on the macs in the lab one must hit the 'option' button. (well, that or just run 'xterm -t')

GCG will have to be set to where you want to have plots set. First copy the file ".gcgmydevices" into your home directory. This only needs to be done the first time you use your account to plot.
Download this and put it in your home directory as .gcgmydevices.

Logging into Locus

$  setplot

This has to be done at the beginning of every GCG
session, but only once per session.

You will be presented with a menu, with the option "plot-telne" highlighted. If it is not highlighted, use the cursor keys to highlight it, then press "return" to accept this setting.

If you prefer, you can use setplot to send graphical output to a postscript file (simply use the arrow keys to highlight that option and press return). The postscript file can be sent to a postscript printer, either directly from the GCG platform, or indirectly by transferring the file (using sftp) to your workstation.

Add a setplot command to your locus .login to set up your preferred plotter when you login.

Logging into Locus

$  genhelp dotplot
Before using any command for the first time in gcg,
make sure to check out the documentation!

$  setplot plot-posts
Set the plotter to use postscript as output.

$  compare sequence1.gbk sequence2.gbk
compare creates a description file called comparison.pnt
which contains the positions of each plotted point of the dotplot.

$  dotplot
The dotplot command reads the comparison.pnt and does the
work of plotting the various points corresponding to the comparison of the two
sequences.  dotplot thus requires that the plotter be set as we did above.

$  gv graphics.ps
gv is a program which can read postscript files and show
the user what the printed page will look like when sent to a printer.  The
dotplot program provides the output file 'graphics.ps'.

In the above example, I used compare to make a dotplot of two sequences; this create the description file of the dotplot, called comparison.pnt. dotplot then took that output as input and plotted it to my current plotter, which is the postscript file graphics.ps. Finally I ran gv to view the output and optionally print it. gv is one of the many available postscript viewers. If you later wish to plot to a terminal you will need to run setplot again.

Create a working directory called "plots" and use your Unix skills to enter this new directory. Perform the next operations in this directory.

One more GCG downloading example.

Download the files containing the sequences you want to compare.

In the following example both of the sequences originally came from swissprot. They are the same seqeunces that are used in the figure showing dotplot in your textbook. You can also use fetch to retrrieve genbank files. The computer locus has a local copy of genbank installed on it, and the sequences are retrieved from this database. Because the UMBI database is not updated as often as the NCBI database, if you use the web interface to find files at NCBI, you may need to copy the files in another way.

Logging into Locus

$  fetch p00748
p00748.swissprot
$  fetch p00750
p00750.swissprot
$  netfetch p00748
p00748.rsf
$  netfetch p00750
p00750.rsf
Now you have two versions of the two sequence files in
your current working directory.

Fixing the output from netfetch to use it with GCG

These files will be copied into your current working directory, and will be given the extension .swissprot when acquired from fetch and .rsf from netfetch. If one wants to use the genbank output from ncbi, one will have to modify it to work properly with gcg. One alternative is to use the 'fixnetfetch' program you have already written, or to manually edit the genbank output to remove the header and add a .. after the sequence stanza:

Given a sequence from netfetch which looks like this:

!!RICH_SEQUENCE 1.0

NETFETCH of: query  October 12, 2004 07:10

from server: www.ncbi.nlm.nih.gov

 1      Sequences Requested
 1      Sequences Returned

Sequences Requested
-----
p00748

..
{
name  FA12_HUMAN

One will need to remove everything from the line after !!RICH up to the line starting with 'name'. Finally add the gcg separator '..' so that the output changes from:

                       /note="Y -> H. /FTId=VAR_014338."
sequence
  mrallllgfllvslestlsippweapkehkykaeehtvvltvtgepchfpfqyhrqlyhk

                       /note="Y -> H. /FTId=VAR_014338."
sequence
 ..
  mrallllgfllvslestlsippweapkehkykaeehtvvltvtgepchfpfqyhrqlyhk

Next, remove the final '}' character at the end of the file. If you intend to use the genbank output I suggest spending a few minutes and modifying your Perl translator to do this automatically. Finally use reformat to add the time stamp checksum to the sequence (unless your Perl program does it for you).

Performing some comparisons

Use compare to generate a datafile (give the filename an extension ".dat"). There are two ways to do this:

Logging into Locus

$  compare p00748.swissport p00750.swissport -out=comparison.pnt -default
This performs the comparison and places the results in
comparison.pnt

$  dotplot comparison.pnt -default
If your plotter is set to the TEK interface, a TEK window
should appear on your screen and create a dotplot that resembles the one in your
textbook. This window can be resized by using your mouse to drag the lower right
corner of the TEK window.

Now that you have looked over a dotplot of the sequence similarity between two sequences, use gap to perform a Needleman Wunsch alignment and bestfit to perform a Smith Waterman alignment. The use of gap and bestfit are exercises left to the reader.

Links for more reading.

Dotplot used for visualizing comparisons among a variety of things http://www.research.att.com/~jon/dotplot/
Dottyplot, a stand alone program (mac OS) for dot-plotting is available at the IUBIO archive: http://iubio.bio.indiana.edu/soft/molbio/mac
Text: Baxevanis and Oullette (1998). Bioinformatics, Wiley Interscience, page 149.

Created by Charles F. Delwiche
Last modified: Mon Nov 8 15:49:44 EST 2004 by Ashton Trey Belew.