Data Sources: DNA, RNA, peptide sequences
Sequence level data
- Advantages
- homology is (relatively) easy to determine
- evolution is well characterized (ML models)
- Give lots of characters although one gene alone is still not generally
enough
- ~1-10 kb of reasonably variable sequence for a big phylogeny
- Methods are easy
- Disadvantages
- Do not sample entire genome
- consequently get single gene phylogeny; this may be an advantage
or disadvantage
- always try to remember the constraints of the system you are working
with
- Relatively expensive
- Organellar genes
- Remember that inheritance is typically uniparental
- Usually single copy -- this is nice, no confusion with paralogy
- However, in some cases the cell may have a population of different organellar
genomes
- "Calibrating the Mitochondrial Clock"; Science 279:28
(1998)
- Nuclear genes
- Plants and animals are diploid or polyploid
- Protists may be haploid, diploid, or polyploid
- Nuclear genes are often in gene families
- With sexual reproduction, things get complicated
Nucleic Acid Sequences
- Draw nucleic acid sequences with 5' end on left, 3' end on right
- Draw amino acid sequences with amino terminus to left, carboxyl terminus
to right
- First codon in coding sequence is (usually) Methionine (= start codon)
- Stop codon is somewhat variable between taxa, but is often TAA
- Several genes are often encoded in an operon Ð transcribed together
Transcription & Translation
- Transcription
- DNA is read by RNA polymerase; mRNA produced
- RNA is then processed in various ways ("post-transcriptional"
modification)
- Introns are removed (or self-splice)
- capping of 5' end - poly-A tail added
- nuclear genome only
- export and stabilization
- Translation
- mRNA is read by ribosome; polypeptide (protein) produced
- Polypeptides can also be modified ("post-translational modification")
- Modification of some amino acids
- Ôpeptide intronsÕ
- activation
- transit peptides removed
- Chaperonins help fold peptides
RNA Sequences
- Direct sequencing of rRNA
- Lots of ribosomal RNA in cell
- This can be sequenced directly
- Error prone
- Reverse transcriptase makes relatively frequent mistakes
- 2¡ structure knocks off polymerase
- Not now widely used, but many direct rRNA sequences are in genbank
- cDNA
- Make DNA from RNA sequence
- Use reverse transcriptase to make a complementary DNA strand
- Digest RNA with RNAase
- Fill out strand with DNA polymerase
- Amplify, clone, whatever
- Yields processed sequences
- DNA is more stable than RNA
- Introns will have been removed (although of course this depends upon
the kinetics of RNA processing)
- In some organisms alternate splicing of RNA can be important in gene
expression.
- Verifies that gene is actually expressed.
- Often important if PCR produces a ÔweirdÕ sequence.
DNA Sequences
PCR Ð The polymerase chain reaction
- Put in a test tube:
- Primers that match a known region of template DNA
- Degenerate primers have broader specificity
- Use a molar excess of primers (Michaelis-Menton kinetics)
- one primer at 5' end, one at 3' end
- most critical bases are those at 3' end of primer
- DNA polymerases won't work on single stranded DNA, so a primer is
needed to initiate polymerization
- A thermostable DNA polymerase
- Several enzymes, inc. Taq polymerase( From Thermus aquaticus)
- Taq is tolerant, but error prone
- Enzymes with higher fidelity are also available, e.g., Pfu polymerae
(Pyrococcus furiosus)
- A reaction buffer suitable for the enzyme
- Magnesium - influences stringency of reaction
- the four deoxynucleotides (ACGT), in approximately equal abundance
- Template DNA
- Temperature cycle
- Melt - denature the DNA (ca 94°C)
- Anneal - high annealing temperature for high stringency, low annealing
temperature for low stringency (ca. 55°C)
- Extend - at optimal temperature for polymerase activity (72°C for
Taq polymerase)
- Can greatly amplify a chosen DNA sequence.
- Things to know about
- Exonuclease activity of polymerases influences effective primer sensitivity
- Risk of contaminants! PCR may amplify gene from DNA other than the intended
target.
- May produce a mix of different products!
- Environmental DNA
- The ultimate in PCR of contaminants
- Allows analysis of taxa that cannot be cultured
- Revolutionizing microbial ecology
Cloning
- An alternative way to generate large quanties of a sequence of interest
- Use engineered bacterial plasmid (or other vector)
- Cut template DNA with restriction enzyme(s)
- Cut plasmid with the same enzyme
- Mix the two, allow annealing
- Ligate with DNA Ligase
- Transform a bacterial cell with the new (chimaeric) plasmid
- Grow up bacterium
- Plasmid is engineered to make it easy to work with, e.g.,
- vector confers antibiotic resistance
- color change if insert is present
- (Perhaps) use phage properties to generate single-stranded DNA
- Single stranded DNA is easy to sequence
- Get lots of DNA, even by PCR standards
- Clones are easy to store & relatively stable
- Each clone is unique, i.e., derived from a single DNA molecule even if a
population of sequences were present in the original template.
- Screen clones for desired properties
- Cloning of PCR product can be used to study complex PCR product
- More complex than PCR, slower, more specialized facilities
OK, so youÕve got lots of DNA...
Sanger Dideoxy Sequencing
- Melt template
- Anneal a sequencing primer
- Nondegenerate if possible
- Only one direction
- Label with radioisotopically or fluorescently labeled nucleotides
- Synthesize DNA in four vials, each with a small fraction of one dideoxynucleotide
(ddA, ddC, ddG, ddT)
- dideoxynucleotide will terminate DNA polymerization upon incorporation
- run out on acrylamide gel.
- read like climbing a ladder
Automated sequencing
Peptide Sequences
- Draw with amino terminus on L, carboxy terminus on R
- Edman degradation Ð classical method, often still used
- Label amino terminus with PITC
- Release terminal AA by cyclizing at different pH
- Repeat Lots of related methods
- Commercially available Ð just Ôsend it outÕ
- Requires a fair bit of the peptide
- Can only sequence relatively short piece
- Requires fairly pure protein
- Mass Spectrometry -- can also do DNA sequencing
- Not widely used
- Allows sequence from small quantity of polypeptide
- Moderate mixtures are no problem
- Requires expertise and expensive instrumentation
- It is often easiest to determine DNA sequence first, then translate (electronically)
- But nothing can actually substitute for a genuine peptide sequence when
needed.
- Not all compartments use the same genetic code
- Protistologists are advised to verify use of the ÔuniversalÕ genetic code.
Molecular Biological Codes and Abbreviations