|
Explore the integrated databases provided at NCBI with the "entrez" search tool in order to discover and learn how to navigate the various types of information available.
Three primary consortia provide public access to a tremendous cache of
current scientific knowledge including such disparate topics as sequence
data, protein structure inferences, phylogenetic information, archived
literature, and myriads of accessible tools: We have used BLAST to find matches to a query sequence in the GenBank database that is maintained by NCBI. In this exercise we will use Entrez to explore GenBank and look at other resources provided by NCBI. Entrez is an important integrated database search tool provided by NCBI.
NCBI maintains extensive documentation regarding entrez including: Find a gene of interest. If there are very few similar sequences, this is very easy. For example, it would seem that not much of anyone really cares about the (very pretty) green alga Draparnaldia. Go to the NCBI web site and use Entrez to search GenBank with the name Draparnaldia.
To do this you can just go to NCBI, type draparnaldia in the search box,
be sure that GenBank is selected in the pop-up menu, and click on
go.
What have you found?
Follow the link and examine the GenBank file. This is a hyperlinked
version of the GenBank flat file format. Explore the links
provided, paying attention to the Taxon browser so that you might
discover other related organisms. Clicking upon the Medline and PubMed
links provides you links to the paper(s) relating to the sequence(s) under
examination. Continue exploring the information provided. It is possible to expand the range of sequences found by adding a 'glob' to the end of a search term: compare a search for Draparnaldia to one for "Draparn*", note though that *naldia is not supported.
NCBI has information on the flat file format and other formats used by
GenBank. A great deal of additional information is available on the NCBI
website. Examine some of the other formats provided by clicking upon
the drop down box beside 'Display.'
Follow the link next in the Medline field. What does it show you?
Return to the flat file.
Unfortunately the size of the database makes most such searches much more
complex than finding Draparnaldia sequences. Try searching for
genes from the plant Lycopersicon. Notice that you are now
using entrez set to search the nucleotide database; if you want to
explicitly limit your search to the nucleotide database from the outset,
you can select entrez from the menu bar on the NCBI home page,
and then select nucleotide from the entrez menu bar.
How did your search compare with the search on Draparnaldia? This is not as easy as it sounds. Try searching on lycopersicon AND tufA What did you find? Why? When you are performing an unrestricted search, entrez will find a record if it has the matched words anywhere in the record. Consequently successful searches need to be carefully constructed to find the features that you really want.
You can limit your search to specific fields by entering the search term
in the format Entrez permits the use of boolean operators: AND, OR, and NOT are all supported. The boolean operator must be entered in capital letters, but the search terms themselves are not case sensitive. For more information on how entrez handles boolean operators, you can click on entrez nucleotide help, and look for the section on refining your search.
What other elongation factors are available in genbank? Search on
elongation factor. This search will show you several qualitatively different types of sequence. Identify some of the categories of sequence you have found. You can limit your search to specific types of sequences. Click on the button that says limits, look at what your options are, click on the check box that says all of the above, and then repeat the search.
How has this affected the results of your search?
The gene encoding the protein triose-phosphate isomerase (TPI) has been
used in several studies of intron evolution. The distribution of introns
in mosquito TPI genes has been of particular interest. We would like to
identify what TPI genes have been determined from the insect order
Diptera (flies). We will spend a lot of time examining sequences in the Genbank flat file format; thus we will take a moment to examine it.
Gibas and Jambeck, Chapter 6. Baxevanis & Ouelette, Chapter 5. http://www.ncbi.nlm.nih.gov:80/entrez/query/static/help/helpdoc.html Created: Wed Sep 15 00:58:22 EDT 2004 by Charles F. Delwiche |