Bioinformatics Homework, Fall 2000


Homework #1 (20 points), due 9/15/2000.

Consider Genbank accession number: L13613

1) Is this a protein-coding sequence? If so, what protein does the sequence encode?

2) From what organism was the sequence isolated?

3) Perform a BLASTN search with this sequence. How many sequences did you retrieve? According to the annotations, what is the nature of those sequences (i.e., what *different* kinds of sequences did BLAST retrieve)?

4) Perform a BLASTX search with the sequence. Compare your results to the results of your BLASTN search. How do the results differ? Why?

5) Perform a PSIBLAST search starting with the sequence. How did the results of your PSIBLAST search differ from your "basic" blast searches?

6) Under what circumstances would you want to use each of these three different search algorithms?

You should be able to write satisfactory answers to these questions in 1 or two pages of type-written text. Short, direct answers are preferred, and assignments that take more than three pages (total) of 10 point type will not be accepted.


Homework #2 (20 points), due Monday Oct. 2, 2000

Read the article listed below and write a review of the article. You should pretend that you have been given the paper by the editor of the journal in which it was published, but for the purposes of this class you do not need to be overly concerned with matters of style. Your central goal is to discuss the scientific merit of the paper; do this to the best of your ability given your current understanding of the subject. Your review should be less than three pages of double-spaced 12 point text.

Brenner, S.E., C. Chothia, and T.J.P. Hubbard. 1998. Assessing sequence comparison methods with reliable structurally identified distant evolutionary relationships. Proc. Natl. Acad. Sci. USA 95:6073-6078. Available online at http://www.pnas.org/cgi/content/full/95/11/6073

For information on how to write a review, see http://www.life.umd.edu/MSyst/misc/review.html


Homework #3 (20 points), due Monday October 16, 2000

Use the Smith-Waterman algorithm to perform a local alignment of the sequences:

Sequence 1: FPGSEIPFV

Sequence 2: EFPGDIPVI

Use a BLOSUM62 scoring matrix. Be sure to show the basic formulae that you used, and show your calculations to receive partial credit.


Homework #4 (30 points), due Friday, November 10, 2000

A graduate student friend of yours has terrible habits in the maintenance of their laboratory notebook. The student has found the following sequence (which is also available in GCG format on UMBI, at /users/bsci348s/unknown.seq) among their records, but has no idea what the sequence is, or where it came from. What information can you provide that would help this student determine the nature and identity of this sequence? You should use GCG to analyze the sequence, but don't forget about the other skills you have developed in class (hint, hint). GCG programs that you might find particularly useful would include: MAP, CODONPREFERENCE, and FRAMES.

>unknown.seq

CCTTGATAAGTGCGTACGNCNAGGTTTTCCNATTCANANGTTNTAAAANG ACGGCCAGTGAATNGTAATACGACTCACTATAGGGCGAATTGGGTACCGG GCCCCCCCTCGAGGTCGACGGTATCGATAAGCTTGATatgAGTTCCAATC TAAAAAATAATGAATATAAAGAAGGATGTTATCCTTTGTTCTTTTTTGAA AATTTCTACGTAAAAGTCTCTATTAATACTGCTTATTATATACTTAAGAC AGACAAAAGAAAAAAAGATAAACAAATAAAATCCAATTTATTAAAAAAAA ACAATCAAATGATACCTTTCATTTTCTACTTATTAAACGATTAATTCGTA AGATAAGGCAACAAAATTTTAACTGGAGTGAATCATCTAGATTGTATGAT TTTTCTAATAAAATAGAACCTAATTATAAATATGAATATAATAGAATTAA GTTATTTTATATTTTATTAATAGAGAATTTGATATTTTTAGTATTACGAT TCTTATGGGAACAAAAACAAGAGAAAAAGAATGATTTTTCTCTTTTCATT AAAAAATCTATTCAATTTGCTTTTCCTTTTTTAGAACATAAAATGAGTAA TTCTGCTTCAATAATAGAAGGACAACTTTGTTTTTCTTATACAACTAGAA AGCTTAATTTTTTGCTTTTTTTTCTTTACAAGAGAATCCGCGATACTGTC TTTATAAATTTACTAAAAAAAATATTCAAGTTTAATAAATTACTTTTAAG GAAAGAGAATTATTTCAATGTAAATTATTTCAATGTAATGTCTAAAATCA GATTATTGGACTTATTAGCAAACTTATATGGAAATGAATTTGATTCTTTT TTTGTTTACAATATTTTAAAAATACATAACTTAAATTGTCTTTTTTTGCC ATATAAATCTATAGAAGATTATTCTTTACTACAAAAACACAATATTATTA TTAATAGTAATAGTTATAAAAATCAAATAAATATATCTTCTTTTTCTTGG TTAATTATCAATTTTATATATTCCATATACGGACACATTTTCTATATACG CCGTGGCATTTCATTTCTAATAATCCTTAAACTAGGACGAGGTTTCTCTC GATTTTGGAAATTTAATTGTGTCAAATTTATACAATTGAAATTAGAATCT AATCGTTCTTTTTATTTAATACAGTCACGGTTTGTTTTACGTCAAAGTTC GTTATTCTTAGGGTATAAAATTATAAATAGGTTTTGGCAAAAAAAACTAA AAATTAAAGCATCTTCTTGGTCTTTTTTTGTTTTTTTAAAAGATCGAAAA ATATCTTCAGAAATACCAATTGATAATCTTATTACTAATTTAACTGTAAT TAATTTATGTAATAAAAAAGGTTATCCAATTCATAAAGCTTCGTGGTCTA CATTTAGTGATCAACAAATTATAAAAATTTATAATAAAGTGTGGAATGAA TTATTTTTGTATTATTGTGGATCTTCGAATCGTTCTATTTTAACTCAAAT TCAGTATATTTTAGAATTTTCATGTATTAAAACTTTAGCTTTTAAACATA AATCTAATATTAGATTGGCATGGGAGCAATACAGAAAAGATGTGTCATTA TCCAACTTAGAAAGCGATATAGATTATTTTGGTAAAATCTCATATAATTT TCCTTCTTTATTTCAAAAAAAAAACTTTTTTTGGCTTTTAGGAATTTCTA GAATTGATCATCCAAATTCTTTTATTATTGAGTCATATTCAAGAATACAT GAGGAAAGCCGCTTGCATtgaATCGAATTCCTGCAGCCCGGGGGATCCAC TAGTTCTAGAGCGGCCGCCACCGCGGTGGAGCTCCAGCTTTTGTTCCCTT TAGTGAGGGTTAATTTCGAGCTTGGCGTAATCATGGTCATAGCTGTTTCC TGTGTGAAATTGTTATCCGCTCACAATTCCACACAACATACGAGCCGGAA CATAA


Homework #5 (30 points), due Monday, November 27, 2000

Review the following paper:

Ng, W.V. et al. 2000. Genome Sequence of Halobacterium speciees NRC-1. Proc. Natl. Acad. Sci. USA 97:12176-12181.

This article is available on line at http://www.pnas.org/all.shtml from the University of Maryland.

Some additional guidance:

Your review should be less than three pages of text, but will be graded on the basis of how well you identify the important issues raised by the paper, and how clearly you address these issues. This means that you will have to concentrate on determining what is most important about the paper and efficiently discussing those issues. You should consider the entire paper, but you should concentrate most heavily on issues that are distinctive to this paper, and unless you are addressing a specific problem, do not use a lot of text to describe previous research or the materials and methods. Do not get bogged down in irrelevant detail! Try to understand the overall structure of the paper, how the authors are presenting information, and what topics they consider to be most imporant. This will indicate what the authors consider to be important. Do you agree with them? Why, or why not? What important topics did they leave out? Can you guess why they left out any topics? Your review does not need to cover every detail of the paper, but should reflect an understanding of the structure of the paper and the relative importance of different subjects. It is reasonable to assume that a complete microbial genome is of sufficient inherent importance to justify publication, but you should address whether or not you feel this paper does a good job of describing the content and significance of the genome.


Homework #6 (30 points), due Wednesday, December 6, 2000

Our disorganized graduate student has been at it again. The student has found the following four sequences among their records. What can you tell them about each of the sequences? Note that there are FOUR sequences, which are named "sequence 2, sequence 3... sequence 5." GCG format files are available in /users/bsci348s, with the names unknown2.seq, etc. Remember to read the GCG help files, and remember to think your way through this.

>UNKNOWN2 unknown sequence #2

TNNNNNCTTTTGAAACCNTTTTNNAAANACGTGACACTATNGAATACTCAAGCTATGCAT CCAACGCGTTGGGAGCTCTCCCATATGGTCGACCTGCAGGCGGCCGCACTAGTGATTGAT GACCTGCATGAAACATAGAACGATTCGATAGCTGTCTTGCATACATGTGCTGGTAAAATA TTAGGTAGTGTGAAGATACGCTAACAAGGTTATAGGACAAAAAGACCCTGTGAAGCTTTA TAGGGATAAAATTTAGTACAGGTTCTATATCAATATATCGAATGGGTGAAAAGCTATAAC CATTCATTTCAAATACAATATCTCCATTTGAATAAAAAGCATTCTTATGCTGTAAATTTA TAAATTTAGACTCATTAATCTGTGCAGTAGCATAGTTTTATCTGATGTCTTATCTTTGAG ATGTATGATGGGCTCTTTAATTGGGGCGATAACCTTCTTTAATCAAACCAAAGGNGAACA AAGCTATTCTTTTGGCCTTATTCAAGGTGTTTTTATATCATGAGATTTATATTTCATGTT TCAGCCCTACAAAGATAGATGTGTGGCTTAATGGTCAAGCTTTGTTTGATCCAGTATTTA AATGGTATCTTAATGACCTTAAAATGAATCTACCAAATAAAAGCTACTCAAGGGATAACA GGCTTAATAAGCATTATAGCTCACATTAAATGAAGGCTTTGGCACCTCGATGTCGGCTTA TNGAAACCCAATTTTAGGAATACCAATATTTGGTGGGAGTGCTCATCCCATAAGAAAGGT CCGTGAGCTGGGCC

 

>UNKNOWN3 unknown sequence #3

CCCCCCCCCCCGCCCATGCAGTCCAAACTAGTATGTGTCATYMWRYTTTTTTTTTCACCC CCTCATGCAGTCCAAACTTTTTTTTTTTTCCACCCCCTCATGCAGTCCAAACTTTTTTTT TTYYWYTTTTTTGTGTGTCTGTGTTTGACCACACCCCCTCATGCAGTCCAAAMTAGTATG TCTCCTTTTTTTTTTTGKTTTTTTTTTTTGTGTGTGTTTGTTTGACGCCCCTTCCCCCCA TGCAGTCCAAACTAGTATGTGTCATTCTGTTTTTTTTTTTTTGTCTGTCTGTGTTTGTTT GACGCCCCTTCCCCCTCATGCAGTCCAAACTAGTATGTGTCATTCTGTTTTCTTTTTTTT TTTGTCTGTCTGTGTTTGAACCCCCCCCCCCGCCCATGCAGTCCAAACTAGTATGTGTCA TTCTGTTTTTTTTTTCACCCCCTCATGCAGTCCAAACTTTTTTTTTTTCCACCCCCTCAT GCAGTCCAAACTTTTTTTTTTTTTCACCCCCTCATGCAGTCCAAACTTTTTTTTTTTTTC ACCCCCTCATGCAGTCCAAACTTTTTTTTTTYTTCACCCCCTCATGCAGTCCAAACTKTT TTTTTTTYTTTTCACCCCCTCATGCAGTCCAAACTTTTTTTTTTYTCACCCCCTCATGCA GTCCAAACTTTTTTTTTTTTTRKSAKTTTKTTTKWYTYYYTTTYYYYYYWYKYWKTCYWW WKTWKTGTGTGTGWKTYTGTTTGACNCCCCTTCGCCCCATGCAGTCCAAACTAGTATGNG TCATTCTGTTTTTTTTTTTTTGNCTGTCTGNGTTTGTTTGACNCCCC

 

>UNKNOWN4 Unknown sequence #4

CCTGCTCGAANTAACCCTCACTAAAGGGAACAAAAGCTGGAGCTCCACCGCGGTGGCGGC CGCTCTAGAACTAGTGGATCCCCCGGGCTGCAGGAATTCTCTTTCATACCCATTAAGAAA ACCAAATTCTCCTTTCATAAATTTACAAAATCCAAGAATTACAGCTTTAGGCAAGCGAGA AATCATGAAGAGCGAGAGCTCTTTTCATAATCCAAACATGGATCTCGATCAGAAAACTAT GTCTCAAGCGAAGGCTCAAGCCTTAGCTTAAGAACAATTTTGGAGAAATTCTAAGCGAGA CCAAAGCTCAGCGCTTGGCGTCGCTCAAGCCTTGGATAAGCGAGGCTCAAGAAATATTCA GCTTCAGGCTCGATCCAGAGCCTTGGATAAGCGAGGCTCTGGCTCCATCCAGCTCAAAAA TATTGAGCCGAGGGCTCAAAAAACCTCTCCTTATTATAGTAAATTTCAATCTCTAGGAAG TGCCCAGCGTTTGCCTCAAACTCCTCAGAATTTGCGAGCAAAGTCGCAGCGTTTTTGAGT GAGAAATTGGAGAAATCTTCAAATCCAGAAATTTCTCAACCCATCCCAAAAATAAATGCT TCTTATATTAGTAAATTTCAATCTCTAGGAAGTGCCCAGCGTTTGCCTCAAACTCCTCAG AATTTGCGAGCAAAGTCGCAGCGTTTTTGAGTGAGAAATTGGAGAAATCTAAGCCAGACA AATTCTGGTGGGCCAGGGAGCCAAGCCCAAGCTCAGCGGAGCCAAGCCCTGACGGGCAAA ACTGAGCTGAGAAAATAAATAACTTCC

 

>UNKNOWN5 unknown sequence #5

AAAGATTAAGCCATGCATGTCTAAGTATAAGCAATTTATACAGTGAAACTGCGAATGGCT CATTAAATCAGTTATCGTTTATTTGATAGTTCCTTTACTACATGGTATAACTGTGGTAAT TCTAGAGCTAATACATGCTTAAAATCTCGACCCTTTGGAAGAGATGTATTTATTAGATAA AAAATCAATGTCTTCGGACTCTTTGATGATTCATAATAACTTTTCGAATCGCATGGCCTT GTGCTGGCGATGGTTCATTCAAATTTCTGCCCTATCAACTTTCGATGGTAGGATAGTGGC CTACCATGGTTTCAACGGGTAACGGGGAATAAGGGTTCGATTCCGGAGAGGGAGCCTGAG AAACGGCTACCACATCCAAGGAAGGCAGCAGGCGCGCAAATTACCCAATCCTAATTCAGG GAGGTAGTGACAATAAATAACGATACAGGGCCCATTCGGGTCTTGTAATTGGAATGAGTA CAATGTAAATACCTTAACGAGGAACAATTGGAGGGCAAGTCTGGTGCCAGCAGCCGCGGT AATTCCAGCTCCAATAGCGTATATTAAAGTTGTTGCAGTTAAAAAGCTCGTAGTTGAACT TTGGGCCCGGTTGGCCGGTCCGATTTTTTCGTGTACTGGATTTCCAACGGGGCCTTTCCT TCTGGCTAACCTTGAGTCCTTGTGGCTCTTGGCGAACCAGGACTTTTACTTTGAAAAAAT TAGAGTGTTCAAAGCAGGCGTATTGCTCGAATATATTAGCATGGAATAATAGAATAGGAC GTTTGGTTCTATTTTGTTGGTTTCTAGGACCATCGTAATGATTAATAGGGACGGTCGGGG GCATCAGTATTCAATTGTCAGAGGTGAAATTCTTGGATTTATTGAAGACTAACTACTGCG AAAGCATTTGCCAAGGACGTTTTCATTAATCAAGAACGAAAGTTAGGGGATCGAAGATGA TCAGATACCGTCGTAGTCTTAACCATAAACTATGCCGACTAGGGATCGGGTGGTGTTTTT TTAATGACCCACTCGGCACCTTACGAGAAATCAAAGTCTTTGGGTTCTGGGGGGAGTATG GTCGCAAGGCTGAAACTTAAAGGAATTGACGGAAGGGCACCACCAGGAGTGGAGCCTGCG GCTTAATTTGACTCAACACGGGGAAACTCACCAGGTCCAGACACAATAAGGATTGACAGA TTGAGAGCTCTTTCTTGATTTTGTGGGTGGTGGTGCATGGCCGTTCTTAGTTGGTGGAGT GATTTGTCTGCTTAATTGCGATAACGAACGAGACCTTAACCTACTAAATAGTGGTGCTAG CATTTGCTGGTTATCCACTTCTTAGAGGGACTATCGGTTTCAAGCCGATGGAAGTTTGAG GCAATAACAGGTCTGTGATGCCCTTAGACGTTCTGGGCCGCACGCGCGCTACACTGACGG AGCCAGCGAGTCTAACCTTGGCCGAGAGGTCTTGGTAATCTTGTGAAACTCCGTCGTGCT GGGGATAGAGCATTGTAATTATTGCTCTTCAACGAGGAATTCCTAGTAAGCGCAAGTCAT CAGCTTGCGTTGATTACGTCCCTGCCCTTTGTACACACCGCCCGTCGCTAGTACCGATTG AATGGCTTAGTGAGGCCTCAGGATCTGCTTAGAGAAGGGGGCAACTCCATCTCAGAGCGG AGAATTTGGACAAACTTGGTCATTTAGAGGAACTAAAAGTCGTAACAAGGTTTCCGTAGG TGAACCTGCGGAAGGATCATTA