BCB
590
Lab
4
Name _____________________________
Macromolecular
Structure Visualization
Objectives
Introduction
“The
Protein Data Bank (PDB) is the single worldwide depository of information about
the three-dimensional structures of large biological molecules, including
proteins and nucleic acids. These are the molecules of life that are found in
all organisms including bacteria, yeast, plants, flies, and mice, and in
healthy as well as diseased humans. Understanding the shape of a molecule helps
to understand how it works.”
This
introduction from the PDB provides the motivation for today’s exercises. In
order to better understand a particular protein, it is important to be able to
retrieve structures of interest from the PDB, and to be able to manipulate how
these structures are displayed in order to highlight regions of interest.
Exercises
Required
questions are in red.
Note: If you were not
able to attend the regularly scheduled lab section, it may help to review the
background lecture slides, which can be downloaded from the course webpage.
Please feel free to ask a TA if you have any questions regarding these slides.
*Certain
features of PDB require popups, so you may need to turn off popup blocking for
the time being. Please ask a TA if you need help with this.
1)
Querying the Protein
Data Bank (PDB, http://www.pdb.org)
To familiarize yourself with the PDB website, start by viewing the PDB
tutorial. It can be found in the left navigation menu by clicking on “Site
Tutorials” and then “Tutorial About This Site”. As you can see, there are
several more tutorials in this sub-menu that you may find useful later. We will
now practice querying the PDB using the dystrophin protein, a key protein involved
in muscular dystrophy.
a.
Open your web
browser, and enter http://pdb.org in the
address bar (or click the provided link)
b.
We will start with a
simple keyword query. In the search bar, type dystrophin, then hit enter. How many
structure hits were found by this query? To
narrow down the list of structures, we will utilize the advanced search
function of the PDB.
c.
In the
left menu, click on the Search tab, then click Search Database,
then click Advanced Search. Click on the “Choose a Query Type:” dropdown
menu to bring up the list of available queries. We will not be using most of
these, but it is good to be aware of what is available, so take a second to
scroll down the list. Go back near the top of the list and select “Molecule
Name” under the “Structure Summary--” subheading. Type dystrophin in the box, then click Evaluate
Subquery. This feature will tell us how many structures would be returned
by this particular query. How
many structures were found by this query?
d.
Click Evaluate
Query to view these results. We now see that the first result is the
N-terminal domain of dystrophin, with id: 1DXX. We will return to this
structure in a moment, using the Queries tab in the left menu to return
to this query. For now we will perform a new query that provides a better
example of how you might use advanced search to obtain a set of proteins with a
desired property, rather than searching for one specific protein. In this
example, let’s say we want to find all structures of nucleic acid-binding
proteins in which the protein is not currently bound to nucleic acid.
e.
Go to the advanced
search page as before and click the Clear All box to reset the form.
Select “Molecular Function” under the “Biology & Chemistry--” subheading
near the bottom of the menu. In the window that pops up, click on the triangle
next to the word “binding”. Once the types of binding have loaded, click on the
words “nucleic acid binding”. How many structures are
found by this query?
f.
Since we wish to
narrow this query to only those structures of proteins not bound to any nucleic acid, we need to add a query
to further limit our search by clicking on the plus box to the right of the
query. Select “Molecule / Chain Type” under “Structure Summary--”. From the “Contains
Protein” menu, select yes.
Select no from the other two
menus. How many structures would be returned using only this subquery?
g.
Now
evaluate the entire query. Next to the word “Results” in the left menu you can
see the total number of structures that were found. How many structures were found?
h.
For some types of
analysis, it is important to only use a set of proteins that does not have high
sequence similarity between any two members in the data set. From the left
menu, click Narrow Query, then click Remove Similar Structures,
then click 50% Sequence Identity. How many
structures are left?
2)
Working with PDB
results. We will now return to our
previous query and examine the 1DXX structure in more detail.
a.
We will now return to
our previous query by clicking the Queries tab in the left menu. This
brings up the Query History, where you can retrieve previously obtained search
results. Click on the “View Results” button for the molecule name query for
dystrophin we entered earlier. Now click on the text 1DXX, or the image under
the text. This brings up the Structure Summary page, which contains some
interesting information about the protein. For our purposes, the most
interesting information is the derived data on the lower portion of the page.
SCOP and CATH are two different methods of categorizing proteins by structure.
From here, we can find other proteins with similar structure by clicking on any
of the links in these two sections. We can also search for proteins with
similar function by clicking on the links in the GO Terms section. How many other proteins are defined as having the molecular
function, “actin binding”?
b.
Another
interesting summary about a structure can be found by clicking on the Sequence
Details tab at the top of the page. Here we see a nice graphical representation
of the secondary structure of dystrophin. What secondary structure is most prevalent in dystrophin?
c.
From here
we can also download the sequence of our protein chain (or multiple chains if
we were working with a complex of proteins with different sequences). Click on FASTA
Sequence from the left menu, under “Download Files” to download the
sequence of dystrophin. Open
the file, then copy and paste the FASTA sequence for chain B only, including
the comment line into
your lab exercise document.
3)
Displaying
structures in PyMOL
(the fun part)
Key lessons:
1)
Using the GUI and
menu options
2)
Saving a session (no
undo function!)
3)
Basic selections
4)
Commands and command
logging
5)
Scripts
6)
Ray-tracing and saving
an image
Controls using the graphical user interface
(GUI)
a.
You don’t need to
read it now, but for future reference, the user guide can be found here: http://pymol.sourceforge.net/newman/user/toc.html
b.
To download the PDB
file for 1DXX, click Download Files in the left menu, then click PDB text. Save
this file where you can find it. This file contains the raw information about
the protein structure, including the 3-D coordinates of nearly all of the atoms
in the protein. Double click on the file to display our molecule in PyMOL (or
right click and select Open With… MacPyMOL if double clicking only opens a text
file).
c.
Take a moment to
familiarize yourself with the mouse controls. Holding down the left mouse button
while moving the mouse rotates the molecule, while holding the right mouse
button while moving the mouse up and down zooms in and out on the molecule.
d.
To display the
sequence of the protein go to the Display menu at the top of the page and
select Sequence On.
e.
In the right menu we
can manipulate how the molecule is displayed.
i. The A stands for Actions we can perform on
the protein, most of which we won’t be using today. Click the A next to 1DXX
and select “remove waters”.
ii. We now wish to Show the cartoon
representation of this protein, which highlights the secondary structure. Click
on the S and select “cartoon”.
iii. It may
be simpler to get a sense of the overall structure of the protein by
momentarily Hiding the side chains of the amino acid residues, which are
displayed as lines by default. Click on the H and select “lines”.
iv. We know from looking at the PDB summary of 1DXX
that this structure comprises 4 chains, but it is hard to discern which parts
of the structure belong to which chain, since they are all the same color. We
can Color the chains different colors by clicking on the C, scrolling
down to “by chain”, then selecting the 3rd option down.
v. Once we have the molecule displayed in a manner we
may wish to show someone else, we can save the image as a .png graphic file. Under the File menu at the top of the page, select “Save
Image”, and include this file as an attachment with your assignment.
Basic Selections
f.
One of the strengths
of PyMOL is the ability to select only certain portions of a structure to
manipulate. In the literature describing this molecule, they indicate that the
structure of interest is actually a dimer, two protein chains in a quaternary
complex. The file we have downloaded, however, has 4 chains (a tetramer). For
now, don’t worry about why there are two extra chains, we’re just going to get
rid of them. We want to select all of the atoms in chains C & D, and then
remove them. One of the ways atoms can be selected is by selecting amino acids
from the sequence at the top of the display window.
i. Drag the scroll bar located under the sequence
until you get to chain C, indicated by /C/ in the top line containing the
numbered sequence information. Click once on the leftmost letter, then use the
scrollbar to scroll all the way to the right. Hold down the shift key and click
on the last letter to select all of the residues of chains C and D (with most
applications, clicking and dragging over the text will automatically scroll
when you get to the right edge of the screen, but this doesn’t work in PyMOL
for some reason). You should now see a new entry titled “sele” underneath 1DXX
in the right menu.
ii. Under Actions, select remove atoms. We are
now looking at the molecule of interest.
Commands and command logging
g.
Another strength of PyMOL
is the ability to use typed commands to select and manipulate the display of
atoms. What is especially helpful about this is the ability to create simple
scripts to execute a series of commands to display a molecule as we please. We
will do the same thing we just did, but using text commands instead. Quit PyMOL,
then reopen 1DXX.pdb. Use the display menu to once again show the sequence.
Since we may later want to create a script from the commands we are about to
enter, select Log… under the File menu to save a log of all commands typed.
h.
The syntax for
selecting a group of atoms and assigning a name to them is: select
<name>, <selection>.
Here, <name> represents what you want to name a group of atoms, and
<selection> is an expression defining which atoms should be selected.
Atoms can be selected in a variety of ways. For a more complete listing of
selection expressions than will be covered in this exercise, see: http://pymol.sourceforge.net/newman/user/S0210start_cmds.html#6_3
One of the ways we can select atoms is by residue name.
i. One of the ways we can select atoms is by residue
name. In .pdb files, water molecules are assigned a name of hoh, so we can
create a selection called water by typing: select water, resn hoh. Do this now. Note that in this example we are
selecting for atoms whose resname (resn for short) is hoh.
To remove the waters, type remove water.
ii. To do the equivalent of the Show commands we used
earlier, we use the command show <display
type>, where <display type> is one of cartoon, spheres, sticks, or
lines. In this case, we want to show cartoon for the selection 1DXX. We can similarly hide
lines by typing hide lines.
Selections can be made using Boolean operators as well. For example, you can
select chains C & D by typing select <name>, chain C OR chain D. This selects all atoms that are either in chain C
OR in chain D. (Stop for a moment to think about why we need to use OR, rather
than AND to select both chains. Ask now if you don’t understand this, or you
will probably miss a later question). Now type remove <name you gave to
that selection> to remove these
atoms.
i.
In the article in
which the structure for 1DXX was published, the authors mentioned that some of
the residues have been experimentally determined to bind actin. A region
denoted ABS1 comprises residues 17–26, ABS2 comprises residues 88–116,
and ABS3 comprises residues 131–148. Residues can be selected by residue
number by using the selection name resi as we used the command resn previously. Make three separate selections using the select command.
The color command can be used
to color selection using the syntax color <color>, <selection name>. Color each selection a
different color.
j.
Selections can also
be composed by combining previously defined selections. Make a new selection
called ABS, by using Boolean operators and your previously defined selections.
Use this selection to display all known actin-binding residues as spheres.
Running PyMOL scripts
k.
While the selections
we have defined so far have been specific to the dystrophin, you may at some
point have a more general scheme for manipulating the display of any arbitrary
molecule (e.g. finding and highlighting interface residues). To do this, we can
create a PyMOL script file containing a list of commands to be executed. Open
the log you saved at the start of this portion of the exercise, and copy the
commands entered into a new plain text document, with one command on each line.
You should only copy commands you have typed, and only those that resulted
in the desired action. Save this file as dystrophin.pml and submit it along with
your assignment. To demonstrate how to execute an existing script file,
please quit and once again relaunch PyMOL with 1DXX.pdb. Once the molecule is
loaded, load the script using the command @<path to file>. For example, ff you saved the .pml file to the
desktop, the command would be @~/Desktop/dystrophin.pml. This can also be
accomplished by going under the “File” menu and selecting “Run…”, and then
navigating to the .pml file using the graphical menu.
l.
As mentioned above,
this script is not generally useful, and so perhaps does not fully highlight
the power of creating scripts. In the Protein Structure and Function lab we
will do a bit more work in PyMOL covering more powerful selection techniques, which
we will combine to create a more useful script for quickly visualizing
macromolecular interfaces in PyMOL.
Ray-tracing
and saving an image
m. Prior to ray-tracing, you can use the command bgcolor
<color> to change the
background color, or set the background to be transparent by using the command bgcolor
<color>. You can also set the
background to be transparent by typing the command set
ray_opaque_background, off.
n.
Once you have the
molecule posed in a position for which you wish to save an image, you can use
PyMOL’s built-in ray-tracing function to create a high-quality image by typing
the command “ray”, or by clicking the button labeled towards the top of the
window. To create a higher resolution image, you can type “ray 2000, 2000”,
which will result in an image that is 2000 pixels by 2000 pixels. To save the
image you can use the Save as… command under the file menu, or type png <filename>. By default, this will create an 2000x2000 pixel
image at 72 DPI. To save a higher resolution image, you need to use the png
command as follows: png <filename>, dpi=<desired-dpi>. This will result in a smaller image at the
specified resolution. NOTE: If you execute any more commands before saving the
image, you will need to redo the ray-trace command.
4) Cn3D
a.
Though you aren’t
required to submit anything for this part of the lab, you are strongly
encouraged to investigate NCBI’s structure visualization tool Cn3D (http://www.ncbi.nlm.nih.gov/Structure/CN3D/cn3d.shtml),
which can be particularly useful when viewing structures for which there are
multiple sequence variants available and for viewing portions of the structure
for which there are existing sequence annotations.
Please email your completed assignment to Peter.
Email: petez
Domain: iastate.edu