Searching GenBank for mtDNA Sequences

Searching mtDNA sequences

by Linda Jonas

Administrator, U5 mtDNA Project


There is no easy way to search GenBank for HVR1 and HVR2 mutations. I never do it. I only search for full-sequence results, because HVR1 and HVR2 results, as I stated earlier, are not terribly relevant.


For the U5 Project, I have a file of all GenBank U5 full-sequences. I posted all of the HVR1 and HVR2 mutations to the Results page of the U5 Project website.


You can create a file for yourself or for your mtDNA project by searching through the sequences posted by Ian Logan at http://www.ianlogan.co.uk/ . Click on your haplogroup, and copy any sequences similar to yours. Paste them into your file.


You can also search GenBank for your HVR1 mutations. There are many ways to do this. None of them is fun. Here's one way:


Log into www.familytreedna.com with your kit number and password. Click the orange mtDNA Results tab. At the bottom of the page is a section that looks like this:


HVR1 Reference Sequence (Starts At: 16001)


16010 etc.




It has several lines ending with something similar to





Now copy all of those letters and paste them into a text file. Next, each of the red letters has to be changed. For example, if there is a red letter T under position 16189, look at the top of your results page to your list of mutations. Find 16189 and see the letter after it. It would probably say 16189C. Now change the red letter T to a C. Do this for each red letter. Then remove all of the spaces until your file looks like this:










Save your file so you don't have to do that again! Now, go to http://www.ncbi.nlm.nih.gov/BLAST/


Click "Search for short, nearly exact matches." Copy your list of mutations, and paste it into the search box. Use the list above as an example. Underneath the search box at "Choose Database", click "Others (nr,etc.)" Make sure the dropdown menu says "nr". Scroll down to the Format section of the page, and in the "Alignment view" dropdown menu, choose "flat query-anchored with identities." Click the blue "Blast!" button.


On the next page, wait several seconds, then click the blue "Format!" button. The closest match will be at the top of the list of results. If you used the example above, sequence DQ661681 should be at the top of the list. Below the list of matches, you will find the sequences aligned. Any mismatches will be noted with a letter. For example, look at position 16174. Sequence DQ661681 has a T there, but the rest of the sequences have a C. You might want to create FASTA files for the first few matches on the list. Click on the Accession number of the sequence you want. At the top of the next page, find the box that says Display: GenBank. Instead of "GenBank" choose "FASTA" from the dropdown menu. The FASTA file will appear on the next page. You can copy and paste it, or even easier, select an option in the "Send to" box (across from the Display FASTA box). Then run your FASTA files through the GEN-SNiP program at Argus Biosciences to get a list of mutations. <http://www.argusbio.com/sooryakiran/gensnip/gensnip.php>


Of course, the easiest way to search GenBank is to get a full mitochondrial DNA sequence and submit it to GenBank. Contact me offlist to find out how to do this. This is not a commercial announcement on my part; I do not charge anybody.


Because many of the people on this list have now submitted their Mega mtDNA results to GenBank, I will post a message showing how to search with your GenBank Accession Number.



