Exercises
·Unix Introduction
·BLAST
·PERL
·Genbank
·BLAST, GCG
·GCG
·Seqlab
·Synthesis
·MSA
·Paup
·Phylogeny
·Examine


·An editor primer
·A GCG cheatsheet
·Flat2fasta homework
·Dynamic Programming homework
·High scoring words homework
·GCG homework
·Seqlab homework
·Mystery sequence homework
·Paup homework

This week we will create another Perl program. This time we will implement the hash table of the highest scoring words as described for Blast in class. Thus your program should take as input a string of amino acids, and output all positions of all three character words which have the n highest scores (3).
I wrote a small Perl library SubstMatrix.pm which handles the work with substitution matrices. The code for SubstMatrix contains perldoc documentation as well as some accessory code which should demonstrate everything one needs to know in order to complete the homework. While you are at it, check out my implementation of the dynamic programming algorithm.
As an example, given the sequence 'ABCDEF,' your program will need to be able to figure out that:

'ABC' = 17
'BCD' = 19
'CDE' = 20
'DEF' = 17
My implementation of the code makes a datastructure that looks like this:
%hsw = (
  ABC => { 
    score => 17,
    positions => [1], },
  BCD => {
    score => 19,
    positions => [2], },
  CDE => {
    score => 20,
    positions => [3], },
  DEF => {
    score => 19,
    positions => [4], },
};

Given the following sequence:

> The N terminus of the CBT gene
WHANTHATAPRILLEWITHHERSHRESSTETHEDRGHTEFMARCHEHATHPERCEDTTHERTE
(10 points) Create a hash table like that described in class of all words and their positions. (I would suggest two tables, one with all possible words and their score, another all words in the amino acid sequence and their positions.
(10 points) Find the location and score of the three character words ILL and CED.
(5 points) Find the position of the highest scoring three character word.

Created: Mon Oct 4 02:23:45 EDT 2004 by Ashton Trey Belew
Last modified: Wed Oct 6 16:56:52 EDT 2004 by Ashton Trey Belew.