BCB 590                                                                       

Lab 5                                                                                 Name _____________________________

Protein Structure Prediction

 

Objectives

  1. Understand the difference between the three different types of protein structure prediction
  2. Learn how to use several free webservers to predict the structure of a protein from it’s primary sequence

 

Introduction

Protein structure is essential to understanding the intricacies of protein function.  However determining structure from X-Ray crystallography, NMR, or other physical means is a long and labor intensive process.  Because we know that sequence is responsible for structure, we should be able to use sequence information to get an idea of the structure of a protein.   In this lab we will use protein sequences to predict secondary structure, and several different methods to predict three dimensional structures.

 

 

Exercises

 

Required questions are in red.  

 

Note: If you were not able to attend the regularly scheduled lab section, it may help to review the background lecture slides, which can be downloaded from the course webpage. Please feel free to ask a TA if you have any questions regarding these slides.

 

Exercise 1:  Secondary Structure Prediction 

Protein secondary structure falls into three categories: (H) Helix, (E) Extended (usually Beta Sheet), and (C) Coil.   (If you are unfamiliar with secondary structure please explore it further at Wikipedia.)  While there are subclasses of each type, it is common to use the three letter H,E,C code to describe secondary structure.  Currently, the best secondary structure prediction classifiers can predict the three letter code with around 80% accuracy.  There are a large number of secondary structure prediction programs across the web that work to varying degrees of success.  We will use three different methods to predict the secondary structure, CDM, PSIPRED, and Proteus.  CDM is a secondary structure prediction server which combines the results from the GOR V method and a fragment data mining (FDM) prediction.  FDM involves searching for short fragments that have highly similar structures available in the PDB.  The GOR V, FDM, and CDM methods were developed by Robert Jernigan’s group here at ISU.  Proteus is a meta-server which uses several other servers, including PSIPRED, to predict secondary structure.

 

Use this sequence for the following exercise


>gi|9629359|ref|NP_057854.1| Rev [Human immunodeficiency virus 1]

MAGRSGDSDEELIRTVRLIKLLYQSNPPPNPEGTRQARRNRRRRWRERQRQIHSISERILGTYLGRSAEP

VPLQLPPLERLTLDCNEDCGTSGTQGVGSPQILVESPTVLESGTKE

 

Go to CDM and enter the sequence for prediction.  The comment line must be left out on this server.  This method actually returns three different predictions from different methods called GOR V, CDM, and FDM.

 

Go to PSIPRED and enter the sequence in the “Input Sequence” box.  Select the “Predict Secondary Structure” radio button.  Leave the filtering options alone.  Enter your email address and click Submit.

 

Go to Proteus and submit the sequence for prediction.  You can choose to view the results in your browser or receive your results via email.

 

1.) Are the predictions by the methods the exact same?  Are they even similar?

 

 

2.) Do any of the predictions closely match the actual secondary structure observed in the 1ETF structure in PDB? Describe.

 

Exercise 2: Homology modeling with SWISS-MODEL

 

Use the following sequence for this exercise:

 

>MYB305

MDKKPCNSQDVEVRKGPWTMEEDLILINYIANHGEGVWNSLAKSAGLKRTGKSCRLRWLNYLRPDVRRGNITPEEQLLIMELHAKWGNRWSKIAKHLPGRTDNEIKNYWRTRIQKHIKQAENMNGQAANSEQNDHQEGSSSHMSSAGPTETYSPTSYSANIDTTFQGPFLTETNDNIWSMEDIWSMQLLNGD

 

Go to the SWISS-MODEL website: http://swissmodel.expasy.org/SWISS-MODEL.html and click “First Approach mode” in the left menu. Submit the provided sequence. Note that there is an option near the bottom of the screen to specify a particular PDB entry to use as a template. You will most likely have to complete question 5 later, as results take a while to arrive.

 

Let’s see what happens if we wish instead to find our own model for SWISS-MODEL to use. Go to PDB (http://pdb.org) and use the advanced search to BLAST PDB with the sequence provided above. In this particular instance, the researcher who provided the specified sequence was interested in identifying residues that may be involved in binding DNA.

 

3.) Which structure(s) might be useful in making such a prediction?

 

4.) What if we specifically wanted a model of the unbound protein structure?

 

5.) In light of the answers to these questions, and the results from SWISS-MODEL, discuss a strategy for how you might use SWISS-MODEL in the future to obtain a model for a protein sequence you are interested in.

 

Exercise 3: Using the BioInfoBank metaserver to use multiple methods and obtain multiple models

If a BLAST search of PDB does not turn up any structures with significant sequence homology, you may need to use a threading method such as FUGUE to identify suitable templates you can use for comparative modeling. Even better than using just one method, however, is using a metaserver to submit your sequence simultaneously to several different webservers. As an added bonus

 

Go to http://meta.bioinfo.pl  (If interested, you can read more about the servers utilized here: http://meta.bioinfo.pl/servers.pl)

 

Complete results sometimes require several days to a week, so I have provided links to a set of results. In this case, the sequence provided to us was:

 

>Him-8

MNGNSLNSFVNIRIYGKTLENLKHYGLIEYLNEFAGGSSFVSLTESTSISSINTATVDTPRFSTPIVPNV

GLYQKFTLNLSEKISEIGPNDENEDLKESYDQEPEEELNSSHESNNSVEKVMDMIIEDVVSNHTTNIADG

DDINSPIVSSGQSEFLQDGVDNDGNIDDYEEYQSLPPNDDVIMNETELMDVDRTTVMTPLRSPTFFDYHN

ESGDEDQLNENEMKSPDSKNDEINKDEIHNIQCHFPNCNRAIAWKRKYGKLRLIDHALVHCDKNFLKCKK

CKHTCHTIRQMRYHYRIFHSTSKMEGFGVSGLPTKNKGFQKIMNACFADQLVEMNKRKNPPKSQNGSRRS

RVKSKSKRSGI

 

Results obtained when submitting the entire sequence:

Him-8 http://meta.bioinfo.pl/3djury.pl?meta=v2&id=11775

 

One caveat listed on the submission page is that protein domains should be submitted individually, the results obtained after dividing the sequence into two sequences are listed below:

Him-8 c-term: http://meta.bioinfo.pl/3djury.pl?meta=v2&id=11810

Him-8 n-term: http://meta.bioinfo.pl/3djury.pl?meta=v2&id=11776  

 

6.) Compare the results obtained when submitting the entire sequence vs. breaking it into domains.

 

7.) Which portions of these predictions do you think are likely to be decent models? Why?

 

8.) Which portions will likely require more extensive modeling methods? Why?

 

Other resources:

Critical assessment of protein structure prediction methods (CASP): http://predictioncenter.gc.ucdavis.edu/

Statistics for CASP8 (http://www.predictioncenter.org/casp8/numbers.cgi) ~130 servers!

 

I-TASSER server (top server from CASP7): http://zhang.bioinformatics.ku.edu/I-TASSER/ (results take 2-3 weeks)

ROBETTA server (from David Baker’s lab): http://robetta.bakerlab.org/ (results take 3-4 months!)

MolIDE homology modeling software (from Roland Dunbrack’s lab): http://dunbrack.fccc.edu/molide/ (available for Windows and Linux)

 

Freely available molecular dynamics software (not necessarily user friendly):

         GROMACS: www.gromacs.org/

         NAMD: www.ks.uiuc.edu/Research/namd/

 

 

Please email your completed assignment to Peter.

Email: petez

Domain: iastate.edu