BCB
590
Lab
7
RNA Secondary Structure Prediction, Protein Interactions
Instructions:
Objectives
Introduction
Most models for the function of molecules and experimental
observations make more sense if we know the structures of the molecules
involved. For RNA, it is often important to know if the bases we have
determined to be crucial for function are in a helical region, a loop, or a
bulge. Having an accurate secondary structure prediction for RNA can aid
in designing and interpreting experiments and developing functional models.
We will also spend some time in this lab looking at some of
the available resources for protein interactions and protein interaction
networks. Interactions are at the heart
of all of molecular biology. Whether the
interaction is an enzyme interacting with its substrate or a transcription
factor binding to DNA, no biological function can happen without
interactions. Interaction networks are
also being used as one of the main data sources for systems biology. This field is really still in its infancy, so
there is not really a standard set of tools and databases yet. I expect dramatic changes and improvements to
be developed in many areas of protein interactions and interaction networks
over the next several years.
Exercises
Required questions are in red.
Part I
RNA Secondary Structure
A. The first exercise is taken from Baxevanis and Ouellette’s Bioinformatics: A Practical
Guide to the Analysis of Genes and Proteins.
To demonstrate the utility of color annotation on the mfold server, predict the secondary structure for the Drosophila
sucinea R2 3’ UTR, as shown here:

Figure 6.2 from Baxevanis and
Ouellette
R2 elements are a class of retrotransposons
that are found in most arthropods (Eickbush,
2002). During retrotransposition, the 3’ UTR of
the message RNA is specifically recognized by the reverse transcriptase during
target-primed reverse transcription (Luan & Eickbush,
1995; Luan et al., 1993). The secondary structure of the 3’ UTR was
predicted for Drosophila with comparative sequence analysis of 10
sequences (Mathews et al., 1997). The sequence of the R2 element from D. sucinea, which can adopt the comparative analysis
structure, was later determined (Lathe & Eickbush,
1997). This sequence has been chosen for this example because it has a
known secondary structure and the prediction of this secondary structure by
free energy minimization is less accurate than average, so that the usefulness
of color annotation is demonstrated (Zuker &
Jacobson, 1995; Zuker & Jacobson, 1998).
Here is the R2 3’ UTR sequence:
UGAUCUCUGUAUUUGUUUCUAUUUUGAACAUUUGCCUGCUACCUUGGCAUAACAUCAAUAAGGUACAAACAUCGCAAAAAGUCAUCAUAAGGUGGGUUUUAGUACGUAGGCGCUGUAGAACUUAAUUGUUCUGAUAGAGCAGCGAGUCGUGCAUGCUAGUCUAGCAUUUCUUGCUACCUAGUAUCUUUAGAAGAUUUCCCUCCCUUAGCGGUCAAA
Access the mfold Web server and paste the sucinea
R2 element sequence into the large field on the server Web site for the input
sequence. Scroll to the bottom of the Web page, to the section marked
“Choose structure annotation.” Select the button after “p-num” to choose
a color annotation that reflects how well determined base pairs are. Keep
the default settings for all other fields. Note, however, that there are
links to a help page with an explanation of each user definable setting.
Click the “Fold RNA” button at the bottom of the form.
This sequence is short enough that the default immediate job can be performed,
so the Web browser will move quickly to the results page. The results
remain available on the server for 24 hours. Note that the energy dot
plot can be viewed by following a hyperlink at the top of the page.
Furthermore, a zip or tar file can be downloaded that contains all the
predicted structures. On the results page, view the first individual
structure by clicking jpg under Structure 1.
1. In the color coding scheme, which
color means that the base-pair has the highest probability? Which color
corresponds to the lowest probability?
Go to the RNAfold
server and paste the sucinea R2 element
sequence in the input box. Scroll to the
bottom and click on Fold it to generate the prediction.
2. Are there similarities between
the structures predicted by mfold and RNAfold?
3. How does the predicted structures
compare to the structure shown above?
References cited in this section:
Eickbush, TH (2002). In Mobile
DNA II (Craig, NL, Craigie, R, Gellart, M, and Lambowitz, AM eds).
Lathe, WC and Eickbush,
TH (1997). Mol. Biol. Evol. 14,
1232-1241.
Luan, DD and Eickbush
TH (1995). Mol. Cell. Biol. 15, 3882-3891.
Luan, DD et al. (1993). Cell 72, 595-605.
Mathews, DH et al. (1997). RNA 3, 1-16.
Zuker, M, and
Zuker, M, and
Jacobson, AB (1998). RNA 4, 669-679.
B. Go through the exercise at:
http://cnx.rice.edu/content/m11065/latest/
NOTE: The link on this page for the RNA free energy (problems 2 through 8) web site is incorrect. The correct link is:
http://mfold.bioinfo.rpi.edu/cgi-bin/efn-form1.cgi
4. There are 12 questions in the
exercise. Submit answers to all 12.
Part II
Protein interactions and protein interaction networks
There are several protein interaction databases available online. Each database contains different information and has different standards for including an interaction. Some databases aim to provide all possible protein interactions by accepting any evidence, no matter how vague. Other databases rely on scientists entering each protein interaction only after having verified the interaction from the literature. In other words, the data varies widely in quality and quantity between databases. We will visit a few of the more popular databases, but please note that this field is still evolving rapidly and there is not a standard tool or best place to go for this information.
The protein we will use as an example for most of this section is human p53. If you are not familiar with p53, check out this Wikipedia page. P53 makes a good example because it is well characterized and is known to interact with many other proteins.
Go to the Database of
Interacting Proteins (DIP). First,
go to the Help page to get an idea of what information is included in this
database and how to use it. Then go to
the Search page and choose to search for a node. Enter p53 in the NodeID
box at the top of the search page and click the Query DIP button. The node we are interested in is DIP:368N, which is the human p53 protein. From the search results page, you can click
on the node identifier to see more information about the protein. The window that opens with the information
about p53 provides links to a number of other databases for more information
about the sequence, structure, and conserved domains within p53. My favorite link in the node information
window is the graph link at the top right.
Click on graph to see a visual representation of the protein interaction
network for p53. The graph may not make
much sense at first, so click on LEGEND in the bottom right corner for a
description of what the graph is showing.
You can also click on the other nodes in the graph to see more
information about the proteins p53 interacts with. After playing with the graph, go back to the
search results page and click on the bullet under Links to see the details
about the interactions. (Here is a
direct link to this page: http://dip.doe-mbi.ucla.edu/dip/Browse.cgi?PK=368&D=1). This page lists all of the interactions in
DIP involving p53. You can click on the
various interactions for more information, including what type of experimental
evidence there is for the interaction and a link to the journal article
describing the interaction. Click on a
few of the interactions to see the variety of experimental evidence types used
by DIP.
Next, go to the BioGRID - http://www.thebiogrid.org/. Click on about us to find out about this
database. After reading this page,
return to the main page, enter p53, and click on Submit you search. Explore the results a bit to get a good idea
of what information is there, keeping in mind what you have already seen at DIP
so that you can compare the two databases later.
Our final protein interaction database is MIPS - http://mips.gsf.de/proj/ppi/. Read through the information on the main page
to learn about this database and related resources. After reading the page, go back to the top
and click on Protein search. Enter p53
in the protein name box and click search.
Click on the full link for homo sapiens p53 to
see the interactions and look through the results.
5. Why so few interactions in MIPS
when DIP and BioGRID had so many?
6. What did you like and dislike
about each of the three databases? Which
did you prefer and why?
Part III
Pathways
Our final stop will be at a database for looking at biological pathways.
Go to KEGG - http://www.genome.ad.jp/kegg/. Read the first paragraph, then
go to the Introduction and Overview pages to see what KEGG is trying to
do. After that, I’m leaving it up to you
to what you want to look at here. KEGG
has wonderful graphical representations of metabolic pathways and disease
pathways. You can zoom in or out to see
more or less detail and each node in the pathways can be clicked on to see a
large amount of information about individual proteins. Have some fun and see what is available here.