Data Sources: DNA, RNA, peptide sequences

Sequence level data

  1. Advantages
    1. homology is (relatively) easy to determine
    2. evolution is well characterized (ML models)
    3. Give lots of characters although one gene alone is still not generally enough
      1. ~1-10 kb of reasonably variable sequence for a big phylogeny
    4. Methods are easy
  2. Disadvantages
    1. Do not sample entire genome
      1. consequently get single gene phylogeny; this may be an advantage or disadvantage
      2. always try to remember the constraints of the system you are working with
    2. Relatively expensive
  3. Organellar genes
    1. Remember that inheritance is typically uniparental
    2. Usually single copy -- this is nice, no confusion with paralogy
    3. However, in some cases the cell may have a population of different organellar genomes
      • "Calibrating the Mitochondrial Clock"; Science 279:28 (1998)
  4. Nuclear genes
    1. Plants and animals are diploid or polyploid
      • Protists may be haploid, diploid, or polyploid
    2. Nuclear genes are often in gene families
    3. With sexual reproduction, things get complicated

Nucleic Acid Sequences

Transcription & Translation
  1. Transcription
    1. DNA is read by RNA polymerase; mRNA produced
    2. RNA is then processed in various ways ("post-transcriptional" modification)
      1. Introns are removed (or self-splice)
      2. capping of 5' end - poly-A tail added
        • nuclear genome only
        • export and stabilization
  2. Translation
    1. mRNA is read by ribosome; polypeptide (protein) produced
    2. Polypeptides can also be modified ("post-translational modification")
      • Modification of some amino acids
      • Ôpeptide intronsÕ
      • activation
      • transit peptides removed
    3. Chaperonins help fold peptides
RNA Sequences
  1. Direct sequencing of rRNA
    1. Lots of ribosomal RNA in cell
    2. This can be sequenced directly
    3. Error prone
      1. Reverse transcriptase makes relatively frequent mistakes
      2. 2¡ structure knocks off polymerase
    4. Not now widely used, but many direct rRNA sequences are in genbank
  2. cDNA
    1. Make DNA from RNA sequence
      1. Use reverse transcriptase to make a complementary DNA strand
      2. Digest RNA with RNAase
      3. Fill out strand with DNA polymerase
      4. Amplify, clone, whatever
      5. Yields processed sequences
    2. DNA is more stable than RNA
    3. Introns will have been removed (although of course this depends upon the kinetics of RNA processing)
    4. In some organisms alternate splicing of RNA can be important in gene expression.
    5. Verifies that gene is actually expressed.
    6. Often important if PCR produces a ÔweirdÕ sequence.
DNA Sequences
PCR Ð The polymerase chain reaction
  1. Put in a test tube:
    1. Primers that match a known region of template DNA
      1. Degenerate primers have broader specificity
      2. Use a molar excess of primers (Michaelis-Menton kinetics)
      3. one primer at 5' end, one at 3' end
      4. most critical bases are those at 3' end of primer
      5. DNA polymerases won't work on single stranded DNA, so a primer is needed to initiate polymerization
    2. A thermostable DNA polymerase
      1. Several enzymes, inc. Taq polymerase( From Thermus aquaticus)
      2. Taq is tolerant, but error prone
      3. Enzymes with higher fidelity are also available, e.g., Pfu polymerae (Pyrococcus furiosus)
    3. A reaction buffer suitable for the enzyme
    4. Magnesium - influences stringency of reaction
    5. the four deoxynucleotides (ACGT), in approximately equal abundance
    6. Template DNA
  2. Temperature cycle
    1. Melt - denature the DNA (ca 94°C)
    2. Anneal - high annealing temperature for high stringency, low annealing temperature for low stringency (ca. 55°C)
    3. Extend - at optimal temperature for polymerase activity (72°C for Taq polymerase)
  3. Can greatly amplify a chosen DNA sequence.
  4. Things to know about
    1. Exonuclease activity of polymerases influences effective primer sensitivity
    2. Risk of contaminants! PCR may amplify gene from DNA other than the intended target.
    3. May produce a mix of different products!
  5. Environmental DNA
    1. The ultimate in PCR of contaminants
    2. Allows analysis of taxa that cannot be cultured
    3. Revolutionizing microbial ecology
Cloning
  1. An alternative way to generate large quanties of a sequence of interest
    1. Use engineered bacterial plasmid (or other vector)
    2. Cut template DNA with restriction enzyme(s)
    3. Cut plasmid with the same enzyme
    4. Mix the two, allow annealing
    5. Ligate with DNA Ligase
    6. Transform a bacterial cell with the new (chimaeric) plasmid
    7. Grow up bacterium
      1. Plasmid is engineered to make it easy to work with, e.g.,
        1. vector confers antibiotic resistance
        2. color change if insert is present
    8. (Perhaps) use phage properties to generate single-stranded DNA
      1. Single stranded DNA is easy to sequence
  2. Get lots of DNA, even by PCR standards
  3. Clones are easy to store & relatively stable
  4. Each clone is unique, i.e., derived from a single DNA molecule even if a population of sequences were present in the original template.
    1. Screen clones for desired properties
    2. Cloning of PCR product can be used to study complex PCR product
  5. More complex than PCR, slower, more specialized facilities

OK, so youÕve got lots of DNA...

Sanger Dideoxy Sequencing
  1. Melt template
  2. Anneal a sequencing primer
  3. Nondegenerate if possible
  4. Only one direction
  5. Label with radioisotopically or fluorescently labeled nucleotides
  6. Synthesize DNA in four vials, each with a small fraction of one dideoxynucleotide (ddA, ddC, ddG, ddT)
  7. dideoxynucleotide will terminate DNA polymerization upon incorporation
  8. run out on acrylamide gel.
  9. read like climbing a ladder
Automated sequencing

Peptide Sequences

  1. Draw with amino terminus on L, carboxy terminus on R
  2. Edman degradation Ð classical method, often still used
  3. Label amino terminus with PITC
  4. Release terminal AA by cyclizing at different pH
  5. Repeat Lots of related methods
  6. Commercially available Ð just Ôsend it outÕ
  7. Requires a fair bit of the peptide
  8. Can only sequence relatively short piece
  9. Requires fairly pure protein
  10. Mass Spectrometry -- can also do DNA sequencing
  11. Not widely used
  12. Allows sequence from small quantity of polypeptide
  13. Moderate mixtures are no problem
  14. Requires expertise and expensive instrumentation
  15. It is often easiest to determine DNA sequence first, then translate (electronically)
  16. But nothing can actually substitute for a genuine peptide sequence when needed.
  17. Not all compartments use the same genetic code
  18. Protistologists are advised to verify use of the ÔuniversalÕ genetic code.

Molecular Biological Codes and Abbreviations