BCB 590                                                                       

Lab 4                                                                                 Name _____________________________

Macromolecular Structure Visualization

 

Objectives

  1. Learn about the protein structure resources available at the PDB and NCBI
  2. Understand the portions of a PDB formatted structure file relevant to structure visualization
  3. Learn to use some useful features of the structural visualization programs, PyMOL and Cn3D

 

Introduction

“The Protein Data Bank (PDB) is the single worldwide depository of information about the three-dimensional structures of large biological molecules, including proteins and nucleic acids. These are the molecules of life that are found in all organisms including bacteria, yeast, plants, flies, and mice, and in healthy as well as diseased humans. Understanding the shape of a molecule helps to understand how it works.”

This introduction from the PDB provides the motivation for today’s exercises. In order to better understand a particular protein, it is important to be able to retrieve structures of interest from the PDB, and to be able to manipulate how these structures are displayed in order to highlight regions of interest.

 

Exercises

 

Required questions are in red.  

 

Note: If you were not able to attend the regularly scheduled lab section, it may help to review the background lecture slides, which can be downloaded from the course webpage. Please feel free to ask a TA if you have any questions regarding these slides.

 

*Certain features of PDB require popups, so you may need to turn off popup blocking for the time being. Please ask a TA if you need help with this.

 

1)   Querying the Protein Data Bank (PDB, http://www.pdb.org) To familiarize yourself with the PDB website, start by viewing the PDB tutorial. It can be found in the left navigation menu by clicking on “Site Tutorials” and then “Tutorial About This Site”. As you can see, there are several more tutorials in this sub-menu that you may find useful later. We will now practice querying the PDB using the dystrophin protein, a key protein involved in muscular dystrophy.

a.    Open your web browser, and enter http://pdb.org in the address bar (or click the provided link)

b.    We will start with a simple keyword query. In the search bar, type dystrophin, then hit enter. How many structure hits were found by this query? To narrow down the list of structures, we will utilize the advanced search function of the PDB.

c.    In the left menu, click on the Search tab, then click Search Database, then click Advanced Search. Click on the “Choose a Query Type:” dropdown menu to bring up the list of available queries. We will not be using most of these, but it is good to be aware of what is available, so take a second to scroll down the list. Go back near the top of the list and select “Molecule Name” under the “Structure Summary--” subheading. Type dystrophin in the box, then click Evaluate Subquery. This feature will tell us how many structures would be returned by this particular query. How many structures were found by this query?

d.    Click Evaluate Query to view these results. We now see that the first result is the N-terminal domain of dystrophin, with id: 1DXX. We will return to this structure in a moment, using the Queries tab in the left menu to return to this query. For now we will perform a new query that provides a better example of how you might use advanced search to obtain a set of proteins with a desired property, rather than searching for one specific protein. In this example, let’s say we want to find all structures of nucleic acid-binding proteins in which the protein is not currently bound to nucleic acid.

e.    Go to the advanced search page as before and click the Clear All box to reset the form. Select “Molecular Function” under the “Biology & Chemistry--” subheading near the bottom of the menu. In the window that pops up, click on the triangle next to the word “binding”. Once the types of binding have loaded, click on the words “nucleic acid binding”. How many structures are found by this query?

f.     Since we wish to narrow this query to only those structures of proteins not bound to any nucleic acid, we need to add a query to further limit our search by clicking on the plus box to the right of the query. Select “Molecule / Chain Type” under “Structure Summary--”. From the “Contains Protein” menu, select yes. Select no from the other two menus. How many structures would be returned using only this subquery?

g.    Now evaluate the entire query. Next to the word “Results” in the left menu you can see the total number of structures that were found. How many structures were found?

h.   For some types of analysis, it is important to only use a set of proteins that does not have high sequence similarity between any two members in the data set. From the left menu, click Narrow Query, then click Remove Similar Structures, then click 50% Sequence Identity. How many structures are left?

 

2)   Working with PDB results. We will now return to our previous query and examine the 1DXX structure in more detail.

a.    We will now return to our previous query by clicking the Queries tab in the left menu. This brings up the Query History, where you can retrieve previously obtained search results. Click on the “View Results” button for the molecule name query for dystrophin we entered earlier. Now click on the text 1DXX, or the image under the text. This brings up the Structure Summary page, which contains some interesting information about the protein. For our purposes, the most interesting information is the derived data on the lower portion of the page. SCOP and CATH are two different methods of categorizing proteins by structure. From here, we can find other proteins with similar structure by clicking on any of the links in these two sections. We can also search for proteins with similar function by clicking on the links in the GO Terms section. How many other proteins are defined as having the molecular function, “actin binding”?

b.    Another interesting summary about a structure can be found by clicking on the Sequence Details tab at the top of the page. Here we see a nice graphical representation of the secondary structure of dystrophin. What secondary structure is most prevalent in dystrophin?

c.    From here we can also download the sequence of our protein chain (or multiple chains if we were working with a complex of proteins with different sequences). Click on FASTA Sequence from the left menu, under “Download Files” to download the sequence of dystrophin. Open the file, then copy and paste the FASTA sequence for chain B only, including the comment line into your lab exercise document.

 

3)   Displaying structures in PyMOL (the fun part)

 

Key lessons:

1)   Using the GUI and menu options

2)   Saving a session (no undo function!)

3)   Basic selections

4)   Commands and command logging

5)   Scripts

6)   Ray-tracing and saving an image

 

Controls using the graphical user interface (GUI)

a.    You don’t need to read it now, but for future reference, the user guide can be found here: http://pymol.sourceforge.net/newman/user/toc.html

b.    To download the PDB file for 1DXX, click Download Files in the left menu, then click PDB text. Save this file where you can find it. This file contains the raw information about the protein structure, including the 3-D coordinates of nearly all of the atoms in the protein. Double click on the file to display our molecule in PyMOL (or right click and select Open With… MacPyMOL if double clicking only opens a text file).

c.    Take a moment to familiarize yourself with the mouse controls. Holding down the left mouse button while moving the mouse rotates the molecule, while holding the right mouse button while moving the mouse up and down zooms in and out on the molecule.

d.    To display the sequence of the protein go to the Display menu at the top of the page and select Sequence On.

e.    In the right menu we can manipulate how the molecule is displayed.

                                              i.     The A stands for Actions we can perform on the protein, most of which we won’t be using today. Click the A next to 1DXX and select “remove waters”.

                                             ii.     We now wish to Show the cartoon representation of this protein, which highlights the secondary structure. Click on the S and select “cartoon”.

                                           iii.      It may be simpler to get a sense of the overall structure of the protein by momentarily Hiding the side chains of the amino acid residues, which are displayed as lines by default. Click on the H and select “lines”.

                                           iv.     We know from looking at the PDB summary of 1DXX that this structure comprises 4 chains, but it is hard to discern which parts of the structure belong to which chain, since they are all the same color. We can Color the chains different colors by clicking on the C, scrolling down to “by chain”, then selecting the 3rd option down.

                                            v.     Once we have the molecule displayed in a manner we may wish to show someone else, we can save the image as a .png graphic file. Under the File menu at the top of the page, select “Save Image”, and include this file as an attachment with your assignment.

 

Basic Selections

f.     One of the strengths of PyMOL is the ability to select only certain portions of a structure to manipulate. In the literature describing this molecule, they indicate that the structure of interest is actually a dimer, two protein chains in a quaternary complex. The file we have downloaded, however, has 4 chains (a tetramer). For now, don’t worry about why there are two extra chains, we’re just going to get rid of them. We want to select all of the atoms in chains C & D, and then remove them. One of the ways atoms can be selected is by selecting amino acids from the sequence at the top of the display window.

                                              i.     Drag the scroll bar located under the sequence until you get to chain C, indicated by /C/ in the top line containing the numbered sequence information. Click once on the leftmost letter, then use the scrollbar to scroll all the way to the right. Hold down the shift key and click on the last letter to select all of the residues of chains C and D (with most applications, clicking and dragging over the text will automatically scroll when you get to the right edge of the screen, but this doesn’t work in PyMOL for some reason). You should now see a new entry titled “sele” underneath 1DXX in the right menu.

                                             ii.     Under Actions, select remove atoms. We are now looking at the molecule of interest.

 

Commands and command logging

g.    Another strength of PyMOL is the ability to use typed commands to select and manipulate the display of atoms. What is especially helpful about this is the ability to create simple scripts to execute a series of commands to display a molecule as we please. We will do the same thing we just did, but using text commands instead. Quit PyMOL, then reopen 1DXX.pdb. Use the display menu to once again show the sequence. Since we may later want to create a script from the commands we are about to enter, select Log… under the File menu to save a log of all commands typed.

h.   The syntax for selecting a group of atoms and assigning a name to them is: select <name>, <selection>. Here, <name> represents what you want to name a group of atoms, and <selection> is an expression defining which atoms should be selected. Atoms can be selected in a variety of ways. For a more complete listing of selection expressions than will be covered in this exercise, see: http://pymol.sourceforge.net/newman/user/S0210start_cmds.html#6_3 One of the ways we can select atoms is by residue name.

                                              i.     One of the ways we can select atoms is by residue name. In .pdb files, water molecules are assigned a name of hoh, so we can create a selection called water by typing: select water, resn hoh. Do this now. Note that in this example we are selecting for atoms whose resname (resn for short) is hoh. To remove the waters, type remove water.

                                             ii.     To do the equivalent of the Show commands we used earlier, we use the command show <display type>, where <display type> is one of cartoon, spheres, sticks, or lines. In this case, we want to show cartoon for the selection 1DXX. We can similarly hide lines by typing hide lines. Selections can be made using Boolean operators as well. For example, you can select chains C & D by typing select <name>, chain C OR chain D. This selects all atoms that are either in chain C OR in chain D. (Stop for a moment to think about why we need to use OR, rather than AND to select both chains. Ask now if you don’t understand this, or you will probably miss a later question). Now type remove <name you gave to that selection> to remove these atoms.

i.      In the article in which the structure for 1DXX was published, the authors mentioned that some of the residues have been experimentally determined to bind actin. A region denoted ABS1 comprises residues 17–26, ABS2 comprises residues 88–116, and ABS3 comprises residues 131–148. Residues can be selected by residue number by using the selection name resi as we used the command resn previously. Make three separate selections using the select command. The color command can be used to color selection using the syntax color <color>, <selection name>. Color each selection a different color.

j.     Selections can also be composed by combining previously defined selections. Make a new selection called ABS, by using Boolean operators and your previously defined selections. Use this selection to display all known actin-binding residues as spheres.

 

Running PyMOL scripts

k.    While the selections we have defined so far have been specific to the dystrophin, you may at some point have a more general scheme for manipulating the display of any arbitrary molecule (e.g. finding and highlighting interface residues). To do this, we can create a PyMOL script file containing a list of commands to be executed. Open the log you saved at the start of this portion of the exercise, and copy the commands entered into a new plain text document, with one command on each line. You should only copy commands you have typed, and only those that resulted in the desired action. Save this file as dystrophin.pml and submit it along with your assignment. To demonstrate how to execute an existing script file, please quit and once again relaunch PyMOL with 1DXX.pdb. Once the molecule is loaded, load the script using the command @<path to file>. For example, ff you saved the .pml file to the desktop, the command would be @~/Desktop/dystrophin.pml. This can also be accomplished by going under the “File” menu and selecting “Run…”, and then navigating to the .pml file using the graphical menu.

l.      As mentioned above, this script is not generally useful, and so perhaps does not fully highlight the power of creating scripts. In the Protein Structure and Function lab we will do a bit more work in PyMOL covering more powerful selection techniques, which we will combine to create a more useful script for quickly visualizing macromolecular interfaces in PyMOL.

 

Ray-tracing and saving an image

m. Prior to ray-tracing, you can use the command bgcolor <color> to change the background color, or set the background to be transparent by using the command bgcolor <color>. You can also set the background to be transparent by typing the command set ray_opaque_background, off.

n.   Once you have the molecule posed in a position for which you wish to save an image, you can use PyMOL’s built-in ray-tracing function to create a high-quality image by typing the command “ray”, or by clicking the button labeled towards the top of the window. To create a higher resolution image, you can type “ray 2000, 2000”, which will result in an image that is 2000 pixels by 2000 pixels. To save the image you can use the Save as… command under the file menu, or type png <filename>. By default, this will create an 2000x2000 pixel image at 72 DPI. To save a higher resolution image, you need to use the png command as follows: png <filename>, dpi=<desired-dpi>. This will result in a smaller image at the specified resolution. NOTE: If you execute any more commands before saving the image, you will need to redo the ray-trace command.

 

 

4) Cn3D

a.    Though you aren’t required to submit anything for this part of the lab, you are strongly encouraged to investigate NCBI’s structure visualization tool Cn3D (http://www.ncbi.nlm.nih.gov/Structure/CN3D/cn3d.shtml), which can be particularly useful when viewing structures for which there are multiple sequence variants available and for viewing portions of the structure for which there are existing sequence annotations.

 

Please email your completed assignment to Peter.

Email: petez

Domain: iastate.edu