| 
  • If you are citizen of an European Union member nation, you may not use this service unless you are at least 16 years old.

  • You already know Dokkio is an AI-powered assistant to organize & manage your digital files & messages. Very soon, Dokkio will support Outlook as well as One Drive. Check it out today!

View
 

Searching GenBank with your Accession Number

Page history last edited by PBworks 17 years, 1 month ago

 

Searching GenBank with your Accession Number

by Linda Jonas

Administrator, U5 mtDNA Project

 

Many list members have recently submitted their Mega mtDNA results to GenBank. Here's a description on how to search GenBank with your Accession Number. If your sequence has not yet appeared in GenBank, you can use the example sequence to practice until your own sequence is loaded.

 

1. To search, go to

http://www.ncbi.nlm.nih.gov/BLAST/

 

2. In the Nucleotide section, click on "Quickly search for highly similar sequences (megablast)."

 

3. The next page is the 'megablast BLAST' page. Enter your GenBank Accession Number into the Search box. You can use Accession Number DQ661681 as an example. In the same section under "Choose database," click 'Others' and choose "nr" from the dropdown menu.

 

Now scroll down the page to the Format section, and in the "Alignment View" box select "flat query-anchored with identities."

Go to the bottom of the page and click the blue "Blast!" button.

 

4. On the 'formatting BLAST' page, wait about ten seconds before clicking the blue "Format!" button.

 

5. You will now see the 'results of BLAST' page. Scroll down the page. Your sequence will be at the top of the results list, with your closest matches below. Scroll to the bottom of the matches. Below the list of matches is an alignment of the sequences. Anytime a sequence differs from yours, you will see a letter instead of a dot in the position where the difference occurs. If you are using the example sequence DQ661681, you should see a lot of Ts at position 146 [on line 121-180]. The sequence DQ661681 has 146C, and most of the others have 146T.

 

6. Scroll back up the page, and look under the box with all of the red lines. You will see a link for 'Distance Tree of Results.' Click on the link. You will next see one of two trees. One is the Fast-minimum evolution tree, and the other is the Neighbor Joining tree. You can select either one in the "Tree method" box. Try both. In the 'Sequence label' box, select "Sequence Title (if available)." When either tree is displayed, your sequence will be highlighted in yellow. It will often be located near the bottom of the tree, so scroll down. You can see exactly where you fit in the tree and who is closest to you. Sequences labeled "Homo sapiens haplotype . . ." are often from FTDNA customers who submitted their own sequence. Most of the non-FTDNA sequences are labeled "Homo sapiens isolate . . . ".

 

7. On the displayed tree, see who you most closely match. Click the dot to the left of your closest match to find out more about it. You can even show several sequences together (a branch of the tree) by clicking on the dot that includes the branch you want. Choose 'Show Alignment.' You will next see a page where the sequence you clicked is aligned with your sequence. Whenever there is a mismatch between the two you will see, instead of a dot, the letter of the mismatch on the second line. If you chose a branch of the tree, all of the sequences in the branch will be aligned with your sequence. Scroll down the page to see where the mismatches occur.

 

8. Click on a sequence's GenBank Accession number (any of the blue links on the left other than your own Accession Number). You will now see the details for that sequence.

 

In the description you will see TITLE. The 'title' is the name of the journal article where the sequence was published. The journal citation is immediately below it. There may be a PUBMED link below the name of the journal. Click the PUBMED link to get see a description of the article. Sometimes there is even a free full-text link to the article. Your own sequence will have 'Direct Submission' as the title, so there will be no PUBMED link.

 

Slightly below the TITLE is a section for FEATURES. It may have descriptions such as /isolate='F127', /haplotype="U5a", or /country="Finland". The 'isolate' is how the sequence is usually designated in the journal article. For example, in the journal article you may find a phylogenetic tree with each sequence labeled. In this example, you would look for F127. 'Haplotype' is the haplogroup assigned to this sequence, and 'country' is the country where the person lived. Sometimes the country is not listed even though it may appear in the journal article.

 

You can't easily determine the list of mutations for the sequence, so near the top of the page find 'Display: GenBank.' Change this to 'Display: FASTA.' Now you can either copy the sequence to your clipboard or save it as a file by making a selection in the "Send to" box. You will next take that FASTA file to another site that will convert it to a list of mutations.

 

9. You will use a great tool from Argus Biosciences. Go to http://www.argusbio.com/sooryakiran/gensnip/gensnip.php

Leave the left box alone, and enter the FASTA file in the box to the right. Then click the yellow 'Run GEN-SNiP' button. It will seem to freeze, and may appear to stop working, but just wait. It occasionally takes a few minutes. The sequence"s list of mutations will be returned on the next page. You can compare that list with your list of mutations.

 

Play around with the NCBI blast to see what else you can do. It takes a lot of getting used to, but the important thing is that you're now in there! Periodically search again to see what new tools are available and if you have any new matches.

Comments (0)

You don't have permission to comment on this page.