For today's lab your objectives are to 1) familiarize yourself with contig assembly software; we will use "Sequencher," which is a user-friendly commercial package. The version you will be using is fully functional, but you will not be able to save your results. 2) to look at actual sequence traces and understand the relationship between traces and base-called data, and 3) to explore the consequences of different parameter settings in contig assembly.

Exercise:

Dr. J. Scientist is excited to have found that the following sequences form a single contig when using Sequencher with the parameters set at miniumum match 70%, overlap 15 base pairs, and dirty data. Please check the work and determine if you agree with this conclusion.

Be sure to:

1) Run Sequencher with the parameters used by J. Scientist (set the parameters as specified, then click on "auto align").

2) Repeat the analysis with stricter parameters (i.e., set the parameters to the defaults).

3) Use the Sequencher function "interactive assembly."

4) Go to NCBI trace archive and look at some chromatogram traces. These sequences are from Macaca mulatta. If you have trouble finding the Macaca mulatta traces on your own, follow this link.

When you look at the sequences manually after replicating J. Scientist's work, what characteristics make you suspicious of J. Scientist's work? Why?

How many contigs do you think are formed?

How many sequences below are not included in a contig?

What chromatogram characteristics give you more confidence in some base calls than others?

How do you explain the discrepancy between J. Scientist's observations and your own?

Because Sequencher is expensive and best for interactive use, you may also wish to explore other contig assembly software. However, you should understand that most other contig assembly packages -- particularly if you want to incorporate them into an automated sequence analysis pipeline -- will require at least some basic knowledge of scripting and compilation. Use google to explore the diversity of contig assembly programs that are currently available. Two packages you should probably be familiar with are phred/phrap/consed and the AMOS Assembler, the latter of which is affiliated with the University of Maryland Center for Bioinformatics and Computational Biology (CBCB).

The sequences are available here in plain text format.