Genomic analysis of two baterial pathogens
Mycoplasma genitalium
- Bacterium
- Phylogenetically among Gram positive bacteria
- However lacks a cell wall, so does not stain with Gram stain
- Class Molicutes
- low G+C group
- Parasite
- Strain studied was isolated from patient with non-gonoccol urethritis
- Genome
- Sequenced at TIGR in mid 1990's
- Among the first complete genomes sequenced
- Very small, thought to be reduced in size from a larger ancestral genome
- 580 Kb
- Roughly 470 open reading frames
- Seen as model for a minimal functional gene set
Minimal functional gene set
- One key question in genomics is the search for the minimal gene set
- What is the minimal complement of genes that is necessary to make a living
organism
- This is a complex question, and depends upon assumptions of what functions
define a living organism
Sequencing strategy
- Construct two random libraries from DNA sheared to appropriate size
- Large -- 15-20 kb
- Small -- ca 2 kb
- Plate Library & verify its quality
- Sequence
- High throughput DNA sequencing using dye primer chemistry
- Sequence each clone from both ends, but do not worry about completely
sequencing clone
- 9846 sequencing reactions run in 8 weeks by 5 people runing 8 ABI373
sequencers
- Got 8472 high quality sequences, combined these with 299 random marker
sequences from the literature
- Mean sequence coverage was 6.5x
- 99% of sequence was sequenced with better than single-stranded coverage
- Assemble
- TIGR ASSEMBLER generated 39 contigs
- Ranged from 606-73,351 bp
- A total of 3,806,280 bp of primary DNA sequence data
- ASM_ALIGN links contigs on basis of orientation of reads from each end
of a single clone
- All 39 gaps were covered by at least one clone from the small-size library
- Verification of assembly
- Checked location of marker sequences on known map
- Used GRASTA to look for small overlaps that would have been missed
by TIGR ASSEMBLER
- This reduced gaps to 28
- Close gaps
- Physical Gaps
- In this case no physical gaps were present
- Sequence Gaps
- Selected clones that spanned gaps, and selectively sequenced these
clones
- Edit
- Manually inspected sequencer output traces in alignment for ambiguities
that could be resolved
- 53 ambiguities and 25 possible frameshifts were found
- These regions were re-sequenced with dye terminator chemistry
- Annotate
- Organization
- Circular chromosome of 580,070 bp
- G+C content 32% overall, with lower G+C regions flanking the origin
of replication
- Ribosomal RNA and tRNA genes had higher G+C content, presumably
because of functional constraints
- 74 EcoRI fragments
- Generally consistent with map; one discrepancy was resolved
in favor of the sequence
- Precise origin of replication was not identified, but a 4 kb region
probably containing it was identified
- This lies between dnaA and dnaN
- An untranscribed regions between these was selected as the origin
for numbering.
- There is a polarity to transcription: genes to the right are preferentially
transcribed on the plus strand, and those to the left on the minus
strand, with the distinction extending roughly half way around the
chromosome
- Predicted Coding Regions
- Initial search for ORFs larger than 100 bp
- Translations were made assuming UGA encodes tryptophan (as is
known to be the case in some Mycoplasmas)
- Predicted proteins were searched against a non-redundant bacterial
protein database (NRBP)
- Sequences that were similar to a protein in NRBP were assigned that
name
- Matches were aligned with PRAZE (a modified Smith-Waterman algorithm)
- GenMark was trained with 308 M. genitalium sequences and used to
evaluate 170 unidentified ORFs
- Peptide sequences from other genomes were also used to search all
six reading-frames of the genome
- Sequence similarity searches support close relationship between Mycoplasma
spp. and gram positive bacteria
The genome of Mycoplasma genitalium compared with that of Haemophilus
influenzae
- At the time the Mycoplasma genitalium sequence was determined, very
few complete sequences were available, but that of Haemophilus influenzae
had already been determined.
- Both are human pathogens. H. influenzae causes a form of meningitis.
- The two bacteria are not closely related
- Mycoplasma genitalium is in a group of obligate parasites embedded
within the gram positive bacteria
- Haemophilus influenzae is a gamma proteobacterium (i.e., relatively
closely related to Escherichia coli)
- M. genitalium has a more highly reduced genome
|
Size |
Complexity |
M. genitalium |
580,070 bp |
470 predicted coding regions |
H. influenzae |
1,830,137 |
1743 predicted coding regions |
- M. genitalium is missing many metabolic pathways necessary for independent
growth.
- Biosynthesis of amino acids, cofactors, and cell wall are all reduced,
and there are relatively few regulatory factors.
- The category of "unassigned" genes is also greatly reduced.
- Thus a relatively large fraction of its genome is devoted to DNA replication
and gene expression.
- 90 genes not found in H. influenzae; most of these (60%) resemble genes
found in other gram positive bacteria
- Relatively few unique genes in M. genitalium -- this is a function
of the reduced genome
- Both have mechanisms to promote antigenic variation
- M. genitalium
- Stalked bacterium with an an adhesin protein (MgPa) on the tip
- The adhesin elicits a strong immune response
- This protein is encoded in an operon along with two other open reading
frames
- The arrangement is 29 kd ORF -- 6nt spacer -- MgPa (160 kd) -- 1
nt spacer -- 114kd ORF
- Several copies of the MgPa gene were known to be scattered around
the genome
- The complete genome included the complete MgPa operon and nine partial
repeats, constituting 4.7% of the entire genome
- Sequence identity of the repeats to the intact MgPa operon ranges
from 78 - 90%
- Recombination is thought to occur among the members of the gene
family, resulting in increased antigenic variation
- H. influenzae
- Antigenic response is primarily to adhesins and lipo-oligosachharides
- A key antigenic locus (lic-1) had been identified, which contains
4 genes
- The first gene in the lic-1 operon contains tandem tetramer repeats
(CAAT)
- The number of these repeats varies, which shifts the genes in or
out of frame
- This constitutes a translational switch
- Promotes antigenic variation
- Determination of the H. influenzae genome made it practical
to identify all such repeats in the genome
Fraser, C.M (and 29 others). 1995. The Minimal Gene Complement of Mycoplasma
genitalium. Science 270:397-403.
Weiser, J.N. et al. 1989. The molecular mechanism of phase variation of H.
influenzae lipopolysaccharide. Cell 59:657-665.