BSCI 348S - Bioinformatics
Review Guide, Fall 2000
General concepts
Listed below are some of the basic subject areas we have discussed in class.
Corresponding to each of these areas are readings from the textbook and selected
readings assigned in class. This list is not comprehensive. It is intended
to help you study, but not as a substitute for your class notes or assigned
reading.
It is also recommended that you review all of the posted lecture notes.
- Relevant molecular biology and genetics
- Approaches to genome sequencing
- Map-based approach
- Random approach
- Combined approaches
- Pairwise sequence alignment - you should be familiar with each of the following
approaches to pairwise sequence alignment, and have a sense of the different
conditions under which each of them is useful.
- Needleman-Wunsch algorithm
- Smith-Waterman algorithm
- BLAST
- Available sequence databases and their different properties and uses
- How to read and interpret a genbank "flat" format file
- The different missions of different major databases (genbank, PIR, structure
databases, etc.)
- Multiple sequence alignment
- Phylogenetic Analysis
- Basic methods
- Parsimony
- Distance methods
- Likelihood
- Considerations when choosing a phylogenetic method
- How to tell when a molecular phylogeny is reliable and when it is not
- Current evidence for relationships among major taxa
- Models of DNA sequence evolution
- Jukes-Cantor
- Kimura two parameter
- F84/HKY85
- GTR
- Probability of accepted mutation (PAM) matrices, BLOSUM matrices
- How the matrices are constructed
- How they are used in alignment and sequence comparison
- The difference between similarity and identity in sequence comparison
- The qualitative significance of different values in the matrix
- Functional prediction from sequences
- Similarity searches
- Sequence motifs
- Genomes
- The common characteristics of the genomes of free living and parasitic
prokaryotes
- Key differences between prokaryotic and eukaryotic genomes
- Relationships among fully sequenced eukaryotic genomes, and patterns
of similarity and difference among these genomes
- Automated methods for gene identification
- What horizontal gene transfer is, and what evidence can be used to study
it.
Terms
- Bioinformatics
- Genomics
- Genetics
- Open reading frame (ORF)
- Minimal genome
- Rooted and unrooted trees, branch length
- Model
- Model parameter
- Algorithm
- Optimality criterion (both in the context of phylogenetic analysis, and
in the broader context of bioinformatics)
- Contig assembly
- Bacteria, Archaea, Eukarya
- In eukaryotes: Animals, fungi, plants, heterokonts, alveolates, microsporidia,
slime molds
- Homology, homolog
- Orthology, ortholog
- Paralogy, paralog
- Horizontal gene transfer
- Plasmid, cosmid, BAC, YAC
- Intron, exon
- Clade
- Bootstrapping in phylogenetic analysis
- random sampling with replacement
- how to interpret bootstrap values
Algorithms
- Needleman-Wunsch algorithm for pairwise sequence alignment
- Smith-Waterman algorithm for pairwise sequence alignment
- BLAST
- Multiple sequence alignment by clustering
Genomes we have discussed in some detail
Practical aspects of sequence analysis
Note that as we used different software tools we also discussed ideas involved
with the use and interpretation of these tools. It is important that you study
not only the mechanical aspects of using the software, but also what assumptions
underly the analysis, how to interpret the results, limitations, when it is
and is not appropriate to use a particular software tool.
- How to submit and interpret BLAST searches, including variants such as PSI-BLAST
- The structure of the Genbank database and genbank flat file format
- Other internet facilities for sequence analysis
- Basic GCG commands and programs (including, but not limited to:)
- netblast
- dotplot
- gap
- bestfit
- map
- pileup
- frames
- codonpreference
- clustalx (know how clustalx and pileup differ from each other)
Sample Exams
Final Exam 1999
Sample Midterm Questions