BSCI 348s Comparative Bioinformatics

Homework 2003


Assignment 1 - 25 points

Use the Smith-Waterman algorithm to align the following two amino acid sequences. Do the calculations by hand and show your work. Remember to include the sequence alignment itself! Use the BLOSUM 62 scoring matrix and a gap creation penalty of 8. Do not use a separate gap extension penalty.

Sequence 1: SRYVRLALKEEDL

Sequence 2: RLDQEAL

Note: it is acceptable to check your work using a software implementation of the S.-W. algorithm. On the computers in the owl lab in Symons Hall you can use ssearch33, and in GCG, gap.


Assignment 2 - Due 10/21/2003 - 25 Points

Consider the sequence hwk2003.1.txt, which is also available on locus at /fs/bsci348s/bioinf/hwk03/hwk2003.1.txt. What can you determine about this sequence? Prepare a brief (ca. 2-3 page) report on the sequence, including a feature map, descriptions of key features, information on the probable origin and identity of the sequence, and any other interesting tidbits you can discern from analysis of the sequence. Your report should be interpretive and well written, and should NOT include printouts of all analyses you performed. Rather it should document the logical process of hypothesis development and testing that you used to study the sequence, present printouts from analyses that provided key insights, and explain your rationale for your analyses and conclusions. The homework will be graded on the basis of accuracy, analytical rationale, and clarity of presentation.


Assignment 3 - Due 11/13/03 - 25 points

Text mutation. Consider the results from our text transcription exercise. The results are available as the original transcriptions, and in a FASTA format file with the whitespace removed and with a major text transposition aligned and coded as an additional character (o/i at the beginning of the matrix). Your assignment is to determine the "phylogenetic" tree that underlies these texts, using parsimony as an optimality criterion.

To do this you will probably want to do the following:

Import the FASTA file into clustalw. Clustalw will assume that these are amino acid data and will consequently strip out all of the punctuation and letters "o". Set the transition matrix to "identity" and the output format to "nexus". Perform a multiple sequence alignment.

Open the nexus-format sequence alignment in PAUP* (or with another phylogenetic analysis program, e.g., phylip's SEQPARS). Set the optimality criterion to parsimony and perform a search. Turn in at least one of the most parsimonious trees; this may be printed, submitted electronically (via email), or if necessary hand-drawn.

In addition to submitting the most parsimonious tree, please answer the following questions:

How many "best" trees did you find?

What was their length?

If you found more than one equally parsimonious trees, how did these differ?

With what texts does the text transcribed by Thomas share the most recent common ancestry?

Which pair of texts would you expect to have fewer differences, Arya/WoeiJyh or Julia/Teresa?


Assignment 4 - Due 11/18/03 - 25 points

[See Handout]


Assignment 5 (term project) - due 12/9/03 - 100 points

Consider these sequences. They were determined by a graduate student studying insects. The student designed primers intended to amplify the gene ddc, which encodes the protein dopa decarboxylase. These primers were used to perform RT-PCR (reverse transcriptase PCR, which starts with an RNA template) on RNA extracted from four insects Alpha, Beta, Epsilon, and Pi, as indicated in the sequence file. The RT-PCR product from each insect was cloned, and several clones were sequenced from each clone library. In some cases more than one sequence was found. When only one sequence was found from a given insect, only one sequence is reported. When more than one sequence was found, each sequence is provided.

Perpare a detailed analysis of the sequences. Be sure to apply skills that you have learned throughout the semester to help you interpret the sequences. Your grade will be based on the overall quality of your report, and will take into account whether or not analyses performed were thorough, thoughtful, and appropriate, whether your interpretation and conclusions were justified given the data and your analyses, and the quality of presentation including clarity of prose, grammar and spelling.

Your report should be roughly 10 (and must not exceed 15) pages of double-spaced 10 point type with 1 inch margins. Appropriate literature must be cited, but the "literature cited" section and figures may be excluded from the page count, but figure captions should be included. The report should be in standard scientific format, with distinct introduction, materials and methods, results, and discussion sections. The manuscript should be assembled in the following order:

Introduction

Materials and Methods

Results

Discussion

Literature Cited

Figure Captions

Figures

 

Bioinformatics Home
Syllabus
Links
Reading