Data Sources: DNA, RNA, peptide sequences 
Sequence level data 
  - Advantages 
    
      - homology is (relatively) easy to determine 
- evolution is well characterized (ML models) 
- Give lots of characters although one gene alone is still not generally 
        enough 
        
          - ~1-10 kb of reasonably variable sequence for a big phylogeny 
 
- Methods are easy 
 
- Disadvantages 
    
      - Do not sample entire genome 
        
          - consequently get single gene phylogeny; this may be an advantage 
            or disadvantage
- always try to remember the constraints of the system you are working 
            with 
 
- Relatively expensive
 
- Organellar genes 
    
      - Remember that inheritance is typically uniparental
- Usually single copy -- this is nice, no confusion with paralogy 
- However, in some cases the cell may have a population of different organellar 
        genomes 
        
          - "Calibrating the Mitochondrial Clock"; Science 279:28 
            (1998)
 
 
- Nuclear genes 
    
      - Plants and animals are diploid or polyploid 
        
          -  Protists may be haploid, diploid, or polyploid
 
- Nuclear genes are often in gene families 
- With sexual reproduction, things get complicated 
 
Nucleic Acid Sequences
  - Draw nucleic acid sequences with 5' end on left, 3' end on right
- Draw amino acid sequences with amino terminus to left, carboxyl terminus 
    to right
- First codon in coding sequence is (usually) Methionine (= start codon) 
- Stop codon is somewhat variable between taxa, but is often TAA
- Several genes are often encoded in an operon Ð transcribed together
Transcription & Translation
  - Transcription 
    
      - DNA is read by RNA polymerase; mRNA produced 
- RNA is then processed in various ways ("post-transcriptional" 
        modification) 
        
          - Introns are removed (or self-splice) 
- capping of 5' end - poly-A tail added 
            
              - nuclear genome only 
-  export and stabilization 
 
 
 
- Translation 
    
      - mRNA is read by ribosome; polypeptide (protein) produced 
- Polypeptides can also be modified ("post-translational modification") 
        
          - Modification of some amino acids 
- Ôpeptide intronsÕ 
- activation
- transit peptides removed 
 
- Chaperonins help fold peptides 
 
RNA Sequences 
  - Direct sequencing of rRNA 
    
      - Lots of ribosomal RNA in cell
- This can be sequenced directly 
- Error prone 
        
          -  Reverse transcriptase makes relatively frequent mistakes
-  2¡ structure knocks off polymerase 
 
- Not now widely used, but many direct rRNA sequences are in genbank
 
- cDNA 
    
      - Make DNA from RNA sequence 
        
          - Use reverse transcriptase to make a complementary DNA strand 
- Digest RNA with RNAase 
- Fill out strand with DNA polymerase 
- Amplify, clone, whatever 
- Yields processed sequences 
 
- DNA is more stable than RNA
- Introns will have been removed (although of course this depends upon 
        the kinetics of RNA processing)
- In some organisms alternate splicing of RNA can be important in gene 
        expression. 
- Verifies that gene is actually expressed.
-  Often important if PCR produces a ÔweirdÕ sequence.
 
DNA Sequences 
PCR Ð The polymerase chain reaction 
  - Put in a test tube: 
    
      - Primers that match a known region of template DNA 
        
          - Degenerate primers have broader specificity 
- Use a molar excess of primers (Michaelis-Menton kinetics) 
- one primer at 5' end, one at 3' end 
-  most critical bases are those at 3' end of primer 
- DNA polymerases won't work on single stranded DNA, so a primer is 
            needed to initiate polymerization
 
- A thermostable DNA polymerase 
        
          - Several enzymes, inc. Taq polymerase( From Thermus aquaticus) 
          
- Taq is tolerant, but error prone 
- Enzymes with higher fidelity are also available, e.g., Pfu polymerae 
            (Pyrococcus furiosus) 
 
- A reaction buffer suitable for the enzyme
- Magnesium - influences stringency of reaction
- the four deoxynucleotides (ACGT), in approximately equal abundance
- Template DNA
 
- Temperature cycle 
    
      - Melt - denature the DNA (ca 94°C)
- Anneal - high annealing temperature for high stringency, low annealing 
        temperature for low stringency (ca. 55°C)
- Extend - at optimal temperature for polymerase activity (72°C for 
        Taq polymerase)
 
- Can greatly amplify a chosen DNA sequence.
- Things to know about 
    
      - Exonuclease activity of polymerases influences effective primer sensitivity
- Risk of contaminants! PCR may amplify gene from DNA other than the intended 
        target. 
- May produce a mix of different products! 
 
- Environmental DNA 
    - The ultimate in PCR of contaminants 
- Allows analysis of taxa that cannot be cultured 
- Revolutionizing microbial ecology 
Cloning 
  - An alternative way to generate large quanties of a sequence of interest 
    
      - Use engineered bacterial plasmid (or other vector) 
- Cut template DNA with restriction enzyme(s) 
- Cut plasmid with the same enzyme 
- Mix the two, allow annealing
- Ligate with DNA Ligase 
- Transform a bacterial cell with the new (chimaeric) plasmid 
- Grow up bacterium 
        
          - Plasmid is engineered to make it easy to work with, e.g., 
            
              - vector confers antibiotic resistance
- color change if insert is present 
 
 
- (Perhaps) use phage properties to generate single-stranded DNA 
        
          - Single stranded DNA is easy to sequence
 
 
- Get lots of DNA, even by PCR standards 
- Clones are easy to store & relatively stable 
- Each clone is unique, i.e., derived from a single DNA molecule even if a 
    population of sequences were present in the original template. 
    
      - Screen clones for desired properties 
- Cloning of PCR product can be used to study complex PCR product 
 
- More complex than PCR, slower, more specialized facilities
OK, so youÕve got lots of DNA... 
Sanger Dideoxy Sequencing 
  - Melt template 
- Anneal a sequencing primer 
- Nondegenerate if possible 
- Only one direction 
- Label with radioisotopically or fluorescently labeled nucleotides 
- Synthesize DNA in four vials, each with a small fraction of one dideoxynucleotide 
    (ddA, ddC, ddG, ddT) 
- dideoxynucleotide will terminate DNA polymerization upon incorporation 
- run out on acrylamide gel. 
- read like climbing a ladder 
Automated sequencing 
Peptide Sequences 
  - Draw with amino terminus on L, carboxy terminus on R 
- Edman degradation Ð classical method, often still used 
- Label amino terminus with PITC 
- Release terminal AA by cyclizing at different pH 
- Repeat Lots of related methods 
- Commercially available Ð just Ôsend it outÕ 
- Requires a fair bit of the peptide 
- Can only sequence relatively short piece 
- Requires fairly pure protein 
- Mass Spectrometry -- can also do DNA sequencing 
- Not widely used 
- Allows sequence from small quantity of polypeptide 
- Moderate mixtures are no problem 
- Requires expertise and expensive instrumentation
-  It is often easiest to determine DNA sequence first, then translate (electronically) 
  
- But nothing can actually substitute for a genuine peptide sequence when 
    needed. 
- Not all compartments use the same genetic code 
- Protistologists are advised to verify use of the ÔuniversalÕ genetic code. 
  
Molecular Biological Codes and Abbreviations