BCB 590

Lab 7

RNA Secondary Structure Prediction, Protein Interactions

 

Instructions:

  1. Complete the lab exercise
  2. Answer the questions in red
  3. Email your answers to terrible@iastate.edu

Objectives

 

  1. Learn about the resources available for RNA secondary structure prediction
  2. Practice using RNA secondary structure prediction software
  3. Compare the results of RNA secondary structure predictions
  4. Explore some of the available resources for protein interactions and protein interaction networks.

 

 

Introduction

 

Most models for the function of molecules and experimental observations make more sense if we know the structures of the molecules involved.  For RNA, it is often important to know if the bases we have determined to be crucial for function are in a helical region, a loop, or a bulge.  Having an accurate secondary structure prediction for RNA can aid in designing and interpreting experiments and developing functional models.

 

We will also spend some time in this lab looking at some of the available resources for protein interactions and protein interaction networks.  Interactions are at the heart of all of molecular biology.  Whether the interaction is an enzyme interacting with its substrate or a transcription factor binding to DNA, no biological function can happen without interactions.  Interaction networks are also being used as one of the main data sources for systems biology.  This field is really still in its infancy, so there is not really a standard set of tools and databases yet.  I expect dramatic changes and improvements to be developed in many areas of protein interactions and interaction networks over the next several years.

 

Exercises

 

Required questions are in red.  

 

Part I

 

RNA Secondary Structure

 

A. The first exercise is taken from Baxevanis and Ouellette’s Bioinformatics:  A Practical Guide to the Analysis of Genes and Proteins.

 

To demonstrate the utility of color annotation on the mfold server, predict the secondary structure for the Drosophila sucinea R2 3’ UTR, as shown here:

 

 

Figure 6.2 from Baxevanis and Ouellette

 

R2 elements are a class of retrotransposons that are found in most arthropods (Eickbush, 2002).  During retrotransposition, the 3’ UTR of the message RNA is specifically recognized by the reverse transcriptase during target-primed reverse transcription (Luan & Eickbush, 1995; Luan et al., 1993).  The secondary structure of the 3’ UTR was predicted for Drosophila with comparative sequence analysis of 10 sequences (Mathews et al., 1997).  The sequence of the R2 element from D. sucinea, which can adopt the comparative analysis structure, was later determined (Lathe & Eickbush, 1997).  This sequence has been chosen for this example because it has a known secondary structure and the prediction of this secondary structure by free energy minimization is less accurate than average, so that the usefulness of color annotation is demonstrated (Zuker & Jacobson, 1995; Zuker & Jacobson, 1998).

 

Here is the R2 3’ UTR sequence:

 

UGAUCUCUGUAUUUGUUUCUAUUUUGAACAUUUGCCUGCUACCUUGGCAUAACAUCAAUAAGGUACAAACAUCGCAAAAAGUCAUCAUAAGGUGGGUUUUAGUACGUAGGCGCUGUAGAACUUAAUUGUUCUGAUAGAGCAGCGAGUCGUGCAUGCUAGUCUAGCAUUUCUUGCUACCUAGUAUCUUUAGAAGAUUUCCCUCCCUUAGCGGUCAAA

 

Access the mfold Web server and paste the sucinea R2 element sequence into the large field on the server Web site for the input sequence.  Scroll to the bottom of the Web page, to the section marked “Choose structure annotation.”  Select the button after “p-num” to choose a color annotation that reflects how well determined base pairs are.  Keep the default settings for all other fields.  Note, however, that there are links to a help page with an explanation of each user definable setting.

 

Click the “Fold RNA” button at the bottom of the form.  This sequence is short enough that the default immediate job can be performed, so the Web browser will move quickly to the results page.  The results remain available on the server for 24 hours.  Note that the energy dot plot can be viewed by following a hyperlink at the top of the page.  Furthermore, a zip or tar file can be downloaded that contains all the predicted structures.  On the results page, view the first individual structure by clicking jpg under Structure 1.

 

1. In the color coding scheme, which color means that the base-pair has the highest probability?  Which color corresponds to the lowest probability?

 

Go to the RNAfold server and paste the sucinea R2 element sequence in the input box.  Scroll to the bottom and click on Fold it to generate the prediction. 

 

2. Are there similarities between the structures predicted by mfold and RNAfold?

3. How does the predicted structures compare to the structure shown above?

 

References cited in this section:

Eickbush, TH (2002).  In Mobile DNA II (Craig, NL, Craigie, R, Gellart, M, and Lambowitz, AM eds).

Lathe, WC and Eickbush, TH (1997).  Mol. Biol. Evol. 14, 1232-1241.

Luan, DD and Eickbush TH (1995).  Mol. Cell. Biol. 15, 3882-3891.

Luan, DD et al. (1993).  Cell 72, 595-605.

Mathews, DH et al. (1997).  RNA 3, 1-16.

Zuker, M, and Jacobson, AB (1995).  Nucl. Acids Res. 23, 2791-2798.

Zuker, M, and Jacobson, AB (1998).  RNA 4, 669-679.

 

 

B. Go through the exercise at:

 

 http://cnx.rice.edu/content/m11065/latest/

 

NOTE:  The link on this page for the RNA free energy (problems 2 through 8) web site is incorrect.  The correct link is:

 

http://mfold.bioinfo.rpi.edu/cgi-bin/efn-form1.cgi

 

 

4. There are 12 questions in the exercise.  Submit answers to all 12.

 

Part II

 

Protein interactions and protein interaction networks

 

There are several protein interaction databases available online.  Each database contains different information and has different standards for including an interaction.  Some databases aim to provide all possible protein interactions by accepting any evidence, no matter how vague.  Other databases rely on scientists entering each protein interaction only after having verified the interaction from the literature.  In other words, the data varies widely in quality and quantity between databases.  We will visit a few of the more popular databases, but please note that this field is still evolving rapidly and there is not a standard tool or best place to go for this information.

 

The protein we will use as an example for most of this section is human p53.  If you are not familiar with p53, check out this Wikipedia page.  P53 makes a good example because it is well characterized and is known to interact with many other proteins.

 

Go to the Database of Interacting Proteins (DIP).  First, go to the Help page to get an idea of what information is included in this database and how to use it.  Then go to the Search page and choose to search for a node.  Enter p53 in the NodeID box at the top of the search page and click the Query DIP button.  The node we are interested in is DIP:368N, which is the human p53 protein.  From the search results page, you can click on the node identifier to see more information about the protein.  The window that opens with the information about p53 provides links to a number of other databases for more information about the sequence, structure, and conserved domains within p53.  My favorite link in the node information window is the graph link at the top right.  Click on graph to see a visual representation of the protein interaction network for p53.  The graph may not make much sense at first, so click on LEGEND in the bottom right corner for a description of what the graph is showing.  You can also click on the other nodes in the graph to see more information about the proteins p53 interacts with.  After playing with the graph, go back to the search results page and click on the bullet under Links to see the details about the interactions.  (Here is a direct link to this page:  http://dip.doe-mbi.ucla.edu/dip/Browse.cgi?PK=368&D=1).  This page lists all of the interactions in DIP involving p53.  You can click on the various interactions for more information, including what type of experimental evidence there is for the interaction and a link to the journal article describing the interaction.  Click on a few of the interactions to see the variety of experimental evidence types used by DIP.

 

 

Next, go to the BioGRID - http://www.thebiogrid.org/.  Click on about us to find out about this database.  After reading this page, return to the main page, enter p53, and click on Submit you search.  Explore the results a bit to get a good idea of what information is there, keeping in mind what you have already seen at DIP so that you can compare the two databases later.

 

Our final protein interaction database is MIPS - http://mips.gsf.de/proj/ppi/.  Read through the information on the main page to learn about this database and related resources.  After reading the page, go back to the top and click on Protein search.  Enter p53 in the protein name box and click search.  Click on the full link for homo sapiens p53 to see the interactions and look through the results.

 

5. Why so few interactions in MIPS when DIP and BioGRID had so many?

 

 

6. What did you like and dislike about each of the three databases?  Which did you prefer and why?

 

 

Part III

 

Pathways

 

Our final stop will be at a database for looking at biological pathways. 

 

Go to KEGG - http://www.genome.ad.jp/kegg/.  Read the first paragraph, then go to the Introduction and Overview pages to see what KEGG is trying to do.  After that, I’m leaving it up to you to what you want to look at here.  KEGG has wonderful graphical representations of metabolic pathways and disease pathways.  You can zoom in or out to see more or less detail and each node in the pathways can be clicked on to see a large amount of information about individual proteins.  Have some fun and see what is available here.