BCB 590

Lab 1

NCBI & Biological Databases: Genomes, OMIM

 

Instructions:

  1. Complete the lab exercise
  2. Answer the questions in red
  3. Email your answers to terrible@iastate.edu

 

 

Key points:

  1. There is a whole lot of information out there
  2. The first key skill is being able to find what you need quickly
  3. The second key skill is being able to learn how to use new (unfamiliar) databases and tools

 

Introduction

Bioinformatics means different things to different people.  The one constant is that the main function of bioinformatics is to make use of the large volumes of biological data available.  In today’s lab, we will explore some of the many biological databases available and learn how to find information quickly.  There are many resources available, but we will focus on the databases and tools at NCBI.  We will explore the NCBI databases and then take a quick tour of topics we will not spend time on in this course.

 

Lab Exercise

 

1.      The National Center for Biotechnology Information (NCBI) is a great resource and starting point for all things related to bioinformatics.  For our first stop, take a look at the site map for NCBI.  You can explore some of these tools and databases if you wish, but for now we just wanted to see a quick view of all the resources available. 

2.      The main search portal at NCBI is the Entrez Query page.  Entrez allows you to search all of the databases at NCBI at the same time.  We will begin with a general interest and proceed to finding detailed information. 

a.       First, we will search for information about a disease.  Enter type I diabetes in the search box at the top of the page and click GO.  When the search is complete, numbers will appear next to each database showing the number of hits our search returned from each.

b.      A good database to start in is OMIM (Online Mendelian Inheritance in Man).  OMIM contains information about inherited traits, especially diseases.  Click on OMIM to view the search results for type I diabetes.  The first search result is the one we want, so click on the top hit – the identifier should be %222100. 

c.       Browse through the page to get an idea of what information OMIM provides.

d.      As you can see, OMIM provides a huge amount of information, from clinical symptoms to specific genes associated with the disease.  One gene that is involved in type I diabetes is called HLA-DQB1.

3.      To find information about the HLA-DQB1 gene, we will go to the NCBI Entrez Gene database.  Entrez Gene is the central database for all information that is known about any particular gene.  Enter HLA-DQB1 in the query box at the top of the page and click Go.

a.       Click on the top hit, which should be the human HLA-DQB1 gene.

b.      Browse through the page to see what information is available and answer the following questions:

 

  1. What is the official full name for the HLA-DQB1 gene?
  2. What chromosome is HLA-DQB1 located on?
  3. What diseases besides type I diabetes is HLA-DQB1 involved in?
  4. What are the RefSeq accession numbers for the mRNA and protein sequences for the HLA-DQB1 gene?

 

4.      Go to the protein RefSeq page for the HLA-DQB1 gene by clicking of the link in the RefSeq section of the page.

a.       The amino acid sequence is shown at the bottom of the page.  To change the format of the sequence, click on the Display box near the top of the page and change the view from the default GenPept to FASTA.

b.      FASTA format is a simple sequence format that consists of an identifier line that begins with a > and then the sequence on the following lines.  The only reason to point this out is that many bioinformatics programs use FASTA format as input and output.  It is very useful to know that you can get the FASTA formatted sequence quickly and easily on the NCBI pages.

c.       The most useful information on this page besides the sequence itself is hidden behind a few links on the top right – they say Blink, Conserved Domains, and Links.

5.      Click on Links – a box should pop up listing all of the NCBI databases you can access for more information about this protein.  All of the NCBI databases are linked together this way.  Clicking on the Gene link will take you back to the Gene page for HLA-DQB1 that we were just on.  Clicking on Nucleotide will take you to the nucleotide sequence for this protein.  Go ahead and choose one of the linked pages and see what information is there.

6.      Go back to the Protein page for the HLA-DQB1 gene (here is the link in case you have gotten lost).  Click on the Conserved Domains link at the top right.

a.       This page shows the conserved domains in the HLA-DQB1 protein.  There are two conserved domains and holding your mouse over the blue and red boxes for the domains will pop up a window with a short description of the conserved domain.  This page will give you a quick idea of what is known about the function of each known domain in the protein.

7.      Go back to the Protein page for the HLA-DQB1 gene (here is that link again).  Click on BLink at the top right of the page.

a.       BLink stands for BLAST link – a pre-computed BLAST search for similar proteins.  The BLink option allows you to quickly view similar sequences in the NCBI databases without having to perform a BLAST search yourself.  It is a good idea to check out the BLink for any gene or protein you are interested in before spending the time to do a BLAST search yourself because some of the information on the BLink page is not available in a typical BLAST search.  The main interesting items that are not available are the Taxonomy report and the Multiple Sequence Alignment.  I think the multiple sequence alignment is particularly cool because it can take a long time to generate a multiple sequence alignment on your own and it shows the consensus sequence for the protein right at the top. 

8.      Go back to the Protein page for the HLA-DQB1 gene (here is that link again).  Click on Links at the top right, then Related Structure.

a.       This page shows all of the known structures for proteins that are very similar to the HLA-DQB1 protein.  Click on the top hit, 1UVQ_B.

b.      This brings up a page with a summary of the structure including an image of the protein. 

9.      Feel free to browse around some more to find more information about the HLA-DQB1 gene, or start the search all over again with a disease, gene, or protein that you are interested in.  The main point of this section of the lab was to show the wealth of information available, how it is all linked together, and how easily you can find what you need.  We went from a specific gene to knowing a set of similar sequences (including a consensus sequence) and knowing the protein’s function and even it’s structure in a matter of minutes.  If you clicked on the other Links, you would also be able to see information about SNPs, its location in the genome including nearby genes and markers, and even how to order a cDNA clone of this gene (from the UniGene page if you are interested in seeing that).  Pretty cool, right?

 

 

10.  We have only scratched the surface of what NCBI has to offer in this example.  For a brief description of all of the NCBI databases go to: http://www.ncbi.nlm.nih.gov/About/tools/restable_mol.html

 

  1. Choose two of the databases not mentioned above and give their descriptions and why you might be interested in them.

 

 

11.  There are lots of other types of databases out there besides the ones at NCBI.  Some are big, some small, some general, some for very special purposes.  The journal Nucleic Acids Research publishes a special issue on databases once a year and keeps a list of all of the databases they have published.  Go to the list of databases from NAR - http://www3.oup.co.uk/nar/database/a/  and look through them to see what else is available. 

 

6. Choose a database that interests you, visit the database, see what is available, and describe what information it contains and why it interests you.

 

 

12.  We won’t be covering many important topics in bioinformatics in this course.  To get an overview of things we won’t be covering, visit the NCBI introductory pages for each of these topics:

a.       Bioinformatics - http://www.ncbi.nlm.nih.gov/About/primer/bioinformatics.html

b.      Single Nucleotide Polymorphisms (SNPs) - http://www.ncbi.nlm.nih.gov/About/primer/snps.html

c.       Expressed Sequence Tags (ESTs) - http://www.ncbi.nlm.nih.gov/About/primer/est.html

d.      Microarrays - http://www.ncbi.nlm.nih.gov/About/primer/microarrays.html

e.       Phylogenetics - http://www.ncbi.nlm.nih.gov/About/primer/phylo.html

f.        Genome mapping - http://www.ncbi.nlm.nih.gov/About/primer/mapping.html

g.       How to use the map viewer - http://www.ncbi.nlm.nih.gov/About/outreach/gettingstarted/mapviewer/index.html

 

 

  1. Which of these topics interests you the most?  Or, if none of these really excite you, what topic in bioinformatics would you like to learn about?

 

Answer the questions in red and email them to terrible@iastate.edu