BCB
590
Lab
5
Name _____________________________
Protein
Structure Prediction
Objectives
Introduction
Protein
structure is essential to understanding the intricacies of protein
function. However determining structure from X-Ray crystallography, NMR,
or other physical means is a long and labor intensive process. Because we
know that sequence is responsible for structure, we should be able to use
sequence information to get an idea of the structure of a protein.
In this lab we will use protein sequences to predict secondary structure, and
several different methods to predict three dimensional structures.
Exercises
Required
questions are in red.
Note: If you were not
able to attend the regularly scheduled lab section, it may help to review the
background lecture slides, which can be downloaded from the course webpage.
Please feel free to ask a TA if you have any questions regarding these slides.
Exercise
1: Secondary Structure Prediction
Protein secondary
structure falls into three categories: (H) Helix, (E) Extended (usually Beta
Sheet), and (C) Coil. (If you are unfamiliar with secondary
structure please explore it further at Wikipedia.)
While there are subclasses of each type, it is common to use the three letter
H,E,C code to describe secondary structure. Currently, the best secondary
structure prediction classifiers can predict the three letter code with around
80% accuracy. There are a large number of secondary structure prediction
programs across the web that work to varying degrees of success. We will
use three different methods to predict the secondary structure, CDM, PSIPRED,
and Proteus. CDM is a secondary structure prediction server which
combines the results from the GOR V method and a fragment data mining (FDM)
prediction. FDM involves searching for short fragments that have highly
similar structures available in the PDB. The GOR V, FDM, and CDM methods
were developed by Robert Jernigan’s group here at ISU. Proteus is a
meta-server which uses several other servers, including PSIPRED, to predict
secondary structure.
Use this sequence for the
following exercise
>gi|9629359|ref|NP_057854.1|
Rev [Human immunodeficiency virus 1]
MAGRSGDSDEELIRTVRLIKLLYQSNPPPNPEGTRQARRNRRRRWRERQRQIHSISERILGTYLGRSAEP
VPLQLPPLERLTLDCNEDCGTSGTQGVGSPQILVESPTVLESGTKE
Go to CDM
and enter the sequence for prediction. The comment line must be left out
on this server. This method actually returns three different predictions
from different methods called GOR V, CDM, and FDM.
Go to PSIPRED and enter the sequence in the “Input Sequence”
box. Select the “Predict Secondary Structure” radio button. Leave
the filtering options alone. Enter your email address and click Submit.
Go to Proteus and submit the sequence for prediction. You can choose to view
the results in your browser or receive your results via email.
1.) Are the
predictions by the methods the exact same? Are they even similar?
2.) Do any of
the predictions closely match the actual secondary structure observed in the
1ETF structure in PDB? Describe.
Exercise 2: Homology
modeling with SWISS-MODEL
Use the following sequence
for this exercise:
>MYB305
MDKKPCNSQDVEVRKGPWTMEEDLILINYIANHGEGVWNSLAKSAGLKRTGKSCRLRWLNYLRPDVRRGNITPEEQLLIMELHAKWGNRWSKIAKHLPGRTDNEIKNYWRTRIQKHIKQAENMNGQAANSEQNDHQEGSSSHMSSAGPTETYSPTSYSANIDTTFQGPFLTETNDNIWSMEDIWSMQLLNGD
Go to the SWISS-MODEL
website: http://swissmodel.expasy.org/SWISS-MODEL.html and click “First Approach mode” in the left menu.
Submit the provided sequence. Note that there is an option near the bottom of
the screen to specify a particular PDB entry to use as a template. You will
most likely have to complete question 5 later, as results take a while to
arrive.
Let’s see what happens if
we wish instead to find our own model for SWISS-MODEL to use. Go to PDB (http://pdb.org) and use the advanced search to BLAST
PDB with the sequence provided above. In this particular instance, the
researcher who provided the specified sequence was interested in identifying
residues that may be involved in binding DNA.
3.) Which
structure(s) might be useful in making such a prediction?
4.) What if
we specifically wanted a model of the unbound protein structure?
5.) In light
of the answers to these questions, and the results from SWISS-MODEL, discuss a
strategy for how you might use SWISS-MODEL in the future to obtain a model for
a protein sequence you are interested in.
Exercise 3: Using the
BioInfoBank metaserver to use multiple methods and obtain multiple models
If a BLAST search of PDB
does not turn up any structures with significant sequence homology, you may
need to use a threading method such as FUGUE to identify suitable templates you
can use for comparative modeling. Even better than using just one method,
however, is using a metaserver to submit your sequence simultaneously to
several different webservers. As an added bonus
Go to http://meta.bioinfo.pl (If interested, you can read more about
the servers utilized here: http://meta.bioinfo.pl/servers.pl)
Complete results sometimes
require several days to a week, so I have provided links to a set of results.
In this case, the sequence provided to us was:
>Him-8
MNGNSLNSFVNIRIYGKTLENLKHYGLIEYLNEFAGGSSFVSLTESTSISSINTATVDTPRFSTPIVPNV
GLYQKFTLNLSEKISEIGPNDENEDLKESYDQEPEEELNSSHESNNSVEKVMDMIIEDVVSNHTTNIADG
DDINSPIVSSGQSEFLQDGVDNDGNIDDYEEYQSLPPNDDVIMNETELMDVDRTTVMTPLRSPTFFDYHN
ESGDEDQLNENEMKSPDSKNDEINKDEIHNIQCHFPNCNRAIAWKRKYGKLRLIDHALVHCDKNFLKCKK
CKHTCHTIRQMRYHYRIFHSTSKMEGFGVSGLPTKNKGFQKIMNACFADQLVEMNKRKNPPKSQNGSRRS
RVKSKSKRSGI
Results obtained when
submitting the entire sequence:
Him-8 http://meta.bioinfo.pl/3djury.pl?meta=v2&id=11775
One caveat listed on the
submission page is that protein domains should be submitted individually, the
results obtained after dividing the sequence into two sequences are listed
below:
Him-8 c-term: http://meta.bioinfo.pl/3djury.pl?meta=v2&id=11810
Him-8 n-term: http://meta.bioinfo.pl/3djury.pl?meta=v2&id=11776
6.) Compare
the results obtained when submitting the entire sequence vs. breaking it into
domains.
7.) Which
portions of these predictions do you think are likely to be decent models? Why?
8.) Which
portions will likely require more extensive modeling methods? Why?
Other resources:
Critical assessment of
protein structure prediction methods (CASP): http://predictioncenter.gc.ucdavis.edu/
Statistics for CASP8 (http://www.predictioncenter.org/casp8/numbers.cgi)
~130 servers!
I-TASSER server (top
server from CASP7): http://zhang.bioinformatics.ku.edu/I-TASSER/
(results take 2-3 weeks)
ROBETTA server (from David
Baker’s lab): http://robetta.bakerlab.org/
(results take 3-4 months!)
MolIDE homology modeling
software (from Roland Dunbrack’s lab): http://dunbrack.fccc.edu/molide/ (available
for Windows and Linux)
Freely available molecular
dynamics software (not necessarily user friendly):
GROMACS: www.gromacs.org/
NAMD: www.ks.uiuc.edu/Research/namd/
Please email your completed assignment to Peter.
Email: petez
Domain: iastate.edu