Hints for sequence analysis

Hints for analysis of an unknown sequence.

First, remember the points made by Platt in his article on strong inference. It can help a lot to stop for a moment and think about what kinds of analysis will give you the most information for the least work.

Have you done blastn?

Have you done blastx?

Have you done blastp?

Do the results correspond? If not, why?

The NCBI databases include some sequences that are specifically intended to help you recognize things that might confuse you. For example, each of the commonly used cloning vectors has a clearly annotated entry. Does your sequence include any cloning vector? Where in the sequence would fragments of cloning vector typically occur?

Do you get hits for different parts of the same gene?

Are the different parts in the same reading frame? How can you most easily find this out?

Although blast is helpful, remember that Needleman-Wunsch and Smith-Waterman can give more definative results.

Looking for a distant homolog? Try Psi-Blast.

Getting too many hits for the same protein? Use a smaller database.

Have you tried Psi-Blast with refseq?