### from a lecture given in PR 613: Protein Structure and Function

### on solving the Patterson function,

### October 9, 1998, at Thomas Jefferson University

### © Charles Brenner, Ph.D.

for comments, clarifications, revisions or to help with illustrations, please email charles-brenner@uiowa.edu

## Preliminary comments

0. Today, we will not solve an entire macromolecular structure but we will learn how to determine the coordinates of a single (or few) heavy atom(s) and we will begin to appreciate how knowing those coordinates can get us closer to obtaining an interpretable electron density map of a macromolecule. Our treatment of the difference Patterson function is based upon a course given by Gregory A. Petsko on protein crystallography and on a variety of text books. In order to follow this tutorial, you should have learned from previous lectures that crystals are composed of unit cells with internal symmetry operations--these concepts will here be reviewed in the briefest possible way. You are advised to read through the tutorial as slowly and as carefully as necessary to grasp the material, consulting other sources and working through the problems as you go, and making notes and questions in the margins. For help with other aspects of crystallography, explore a growing collection of crystallographic educational materials on the web. An answer page has been posted. Try to treat it like a safety net and not visit it until you have worked through the problems.

1. It is amazing to consider that in 1933 it was possible to publish a paper in Science "demonstrating" that because crystalline pepsin did not diffract X-rays, it was clear that proteins are amorphous and merely inert carriers of enzyme activity, the physical basis for which could not yet be physically studied. Bernal and Crowfoot's Nature paper the following year, demonstrating diffraction by crystalline pepsin set the stage, just 63 years ago, to account for biology's most fascinating chemistry with physics.

(How were Bernal and Crowfoot able to observe diffraction when others had failed? Recognizing that pepsin crystals lose their optical birefringence when dried, they mounted their crystals in sealed capillary tubes for data collection. How do the large unit cell sizes of protein crystals and their hydration make protein crystal diffraction patterns different than small molecule diffraction patterns? Are there any additional differences between proteins and small molecules that would alter their respective diffraction patterns? answer page)

2. The physics of X-ray diffraction are no longer considered to be interesting by physicists. As you have learned, space group theory was so firmly established by 1935 that all possible space groups were canonized in the International Tables. It would be hard to expect biologists to keep up with the latest physical theories that are en vogue in current physics just as it would be hard for a working physicist today to keep up with the latest function of p53. However, the amount of physics that you have to learn to appreciate X-ray diffraction is roughly the equivalent of the basic biology of the double helix that you would expect a physicist to grasp. It will "not escape your notice" that much of our basic biology is indebted to physical methods. Let's review a few additional facts of physics so that we can proceed to new material.

### You must remember this:

X-rays are high-energy electromagnetic radiation with wavelengths similar to atomic distances. Their short wavelengths make them suitable probes for "materials science" but not focusable the way visible light is for image reconstruction. When crystals diffract X-rays, it is straightforward to measure the locations and the intensities of the reflections and to index the reflections to the reciprocal lattice of the crystal. However, as the diffracted rays arrive on the X-ray detector at the speed of light, it is not possible to measure the unique phases of each reflection.

### And repeat after me:

#### The Fourier transform of a set of phased reflection amplitudes, (h k l F a), is an electron density map

#### and

#### the inverse Fourier transform of (the scatter of) structural coordinates is the corresponding set of phased reflection amplitudes.

### It follows that

Because it is not a problem to do Fourier transformations in either direction, if you had the phases, you would have an electron density map that you could use to get coordinates and that if you had coordinates, you would have the phases. The problem is that, starting out a de novo structure determination, you have neither phases nor coordinates.

Heavy atom methods make it possible to get out of this conundrum. If you can attach a heavy atom to a unique location or locations on your macromolecule of interest without changing the structure or symmetry of your macromolecule and without destroying the ability of the crystal to diffract, you can use the Patterson function to solve the position of the heavy atom.

### Lessons taken at home

At this point in your de novo X-ray structure determination project, your significant other (YSO) may be clear on the concept that if you had the phases, you would have the map and that if you had coordinates that fit the map, you would have the phases. YSO will now ask why you are bothering to determine the coordinates of a heavy atom that is not even in the macromolecule you are trying to solve. **If what you really need are the coordinates of several thousand Nitrogen, Carbon, Oxygen and Sulfur atom, why would you go out and contaminate your protein with Mercury and give yourself another atom to solve?**

If you understand the answers to this outstanding question, you will become a good student of X-ray crystallography.

#### First answer: because you can

First, if you obtain an "isomorphous derivative" crystal, i.e., a crystal whose symmetry and dimensions and contents (with the exception of heavy atom addition) are minimally changed, the Patterson map of derivative (Fph) minus native (Fp) reflection intensities will be dominated by the vectors between the heavy atoms that may allow you to solve the coordinates of the heavy atoms. This will be the subject of today's lecture.

#### Second answer: because heavy atom phases get you closer to protein phases

You have measured native and derivative diffraction amplitudes (Fp and Fph). You can calculate the magnitude of the heavy atom contribution (Fh) as Fph - Fp. If you can solve the position of the heavy atom in the unit cell, you can calculate the heavy atom's phase (a h). Thus, if you ignore for the time being the errors in observed diffraction amplitudes of the native (Fp) and derivative (Fph) data sets, the calculated heavy atom's phase (a h) can be shown to allow only two possible native phases to "close a triangle" between the known native (Fp) and derivative (Fph) amplitudes.

With a single solved heavy atom derivative, you are still "underdetermined" in terms of solving the problem of the protein's phases--in the 360 degrees of possible phases, the two possible phases for each reflection are equally likely. Additional phase information provided, for example, by a second derivative can break phase ambiguity and get you in the business of interpreting an electron density map rather than trying to get phases to generate an initial map. Precisely how the observed native and derivative amplitudes (Fp and Fph) and the calculated heavy atom phases (a h) are used to calculate two possible native phases (a p) is not prohibitively difficult to understand but it is not the subject of today's lecture.

### The Patterson function

Patterson introduced a function in 1934 which is a Fourier transform of the set of squared but not phased reflection amplitudes (h k l F^{2}). This function does not produce an electron density map of the contents of the unit cell but rather a density map of the **vectors** between scattering objects in the cell. Because the densities in the Patterson map go as **squares** of the numbers of electrons of the scattering atoms, Patterson maps of crystals that contain heavy atoms are dominated by the vectors between heavy atoms. However, because the number of peaks in a Patterson map is also related to the square of the number of atoms, protein Patterson maps are rarely interpretable.

In 1954, Perutz and co-workers calculated a **difference Patterson** (Fph - Fp)^{2} with the amplitudes of a mercury-labeled hemoglobin crystal and the amplitudes of an isomorphous native hemoglobin crystal. Here, the scatter of the light atoms is mathematically removed (leaving noise, of course) so that the difference Patterson map ought to show simply the vectors between heavy atoms.

#### Cells, coordinates and vectors

You recall that crystals contain repeating unit cells. Within the unit cells, what has crystallized may be present in multiple copies owing to crystallographic or noncrystallographic symmetry. The coordinates of atoms within unit cells are expressed as (x, y, z) values where x, y and z are either orthogonal Angstroms from the origin or fractions of the unit cell from the origin. For reasons that will become apparent, heavy atom positions are almost always expressed in fractional units, e.g., (.10, .24. .05) or (.35, .02, .44 ). In the latter example, our heavy atom is at the position 35% of the way from the origin along x, 2% from the origin along y, and 44% along z. Patterson-space unit cells are of the same dimensions as real-space cells. To avoid confusion, the x, z and z dimensions of Patterson vector-space are called (u, v, w).

#### Your first Patterson solution

If we had a crystal with no internal symmetry, i.e. a P1 crystal, that contained heavy atoms at the two positions above, the difference Patterson map would contain a peak at the origin (0, 0, 0) and peaks corresponding to the vectors between the two heavy atoms (.25, -.22, .39) and (-.25, .22, -.39). Because Patterson unit cells have the same dimensions as normal unit cells and repeat forever, the fractional coordinate, v = -.22, is the same as the coordinate, v = .78. Thus, by convention, we express fractional coordinates with values of zero to not-quite 1 and would do better to express the two Patterson peaks as (.25, .78, .39) and (.75, .22, .61).

The Patterson map contains equal peaks corresponding to the vector from atom A to atom B and the vector from atom B to atom A. You should be able to appreciate how this fact makes Patterson maps centrosymmetric. You can also see that the vector from atom A in one cell to atom A in neighboring (or any other) cells will always fall on the origin.

If you were presented with a Patterson map that had peaks at (.25, .78, .39) and (.75, .22, .61), you would report that one heavy atom was at (.25, .78, .39) with respect to the other. The conventional way to do this would be to assign one heavy atom to the origin (0, 0, 0) and the other to (.25, .78, .39). If you take our first example at face value, for two heavy atoms to have positions of (.10, .24. .05) and (.35, .02, .44) in P1, it is likely that some work was done already to assign a different atom to position (0, 0, 0).

Crystallographers reading this tutorial will recognize the self-restraint being exercised in not launching into a discussion of difference Fourier methods!

#### Your second Patterson solution

The example above generates two non-origin Patterson peaks by virtue of having two heavy atom sites in a P1 cell. Let's now take a P2(1) crystal with a heavy atom at (.10, .24. .05). The International Tables tell us that there are two symmetry operations that generate P2(1) symmetry. Within the unit cell, all of the atoms located at positions (x, y, z) must also be located at positions (-x, y + 1/2, -z). For instance, if there is a helix that goes from (.20, .15, .09) to (.30, .20, .24) in one asymmetric unit of the crystal, then the crystallographically related helix must go from (.80, .65, .91) to (.70, .70, .76). Our heavy atom at (.10, .24, .05) must be duplicated at (.90, .74, .95). The Patterson map will have the familiar origin peak and peaks at (.80, .50, .90) and (.20, .50, .10).

atom ID | operator | position | vector | Patterson site |

A | 1 | (.10, .24. .05) | A2-A1 | (.80, .50, .90) |

A | 2 | (.90, .74, .95) | A1-A2 | (.20, .50, .10) |

operator 1= | (x, y, z) | |||

operator 2= | (-x, y +1/2, -z) | |||

vector 1-2= | (2x, 1/2, 2z) | |||

vector 2-1= | (-2x, 1/2, -2z) |

Notice that the symmetry operator that takes all values of y and translates them half a cell along y makes it such that all non-origin peaks generated by crystallographic symmetry in P2(1) Patterson maps are at the v = .50 section. These "Harker" sections are always the first sections examined in Patterson solutions. We now take the symmetry operations of P2(1) from the International Tables, (x, y, z) and (-x, y + 1/2, -z), and derive general algebraic expressions of the Patterson vectors. Subtracting operator 2 from operator 1, we can say that in Patterson space, there will be a peak at (u, v, w) = (2x, .50, 2z). Subtraction in the opposite direction yields the centrosymmetric peak, (u, v, w) = (-2x, .50, -2z). Because 2x = +/- .20, x could equal .10, .90, .40 or .60. Initially, the z solution is also 4-fold degenerate at z = .05, .95, .45 or .55.

It should not surprise you that x could equal .10 or .90. In fact, if a heavy atom in P2(1) falls on x = .10, symmetry generates the atom's mate at .90. If the heavy atom falls at x = .10, can z still equal .05, .95, .45 or .55? The answer is no. If one is taking x to be .10, then one has assigned a particular vector equation [either (2x, 1/2, 2z) or (-2x, 1/2, -2z)] to a particular peak. If you assign the first equation to the peak at (.20, .50, .10), you will conclude that x must equal .10 or .60 because 2x = .20. You are have now "taken the position" that 2z (and not -2z) must equal .10 and you will find that only two z solutions, .05 and .55, are compatible with z = .10.

Because peaks on the v = 1/2 Harker section are independent of y, y can have any value. Can you convince yourself that heavy atom A at (.10, .01, .05) or atom A at (.10, .33, .05) would generate the same Patterson peaks as atom A at (.10, .24. .05)? answer page

Use the four tables printed below as sets of alternative two-dimensional unit cells to account for the P2(1) Patterson results. Label the horizontal axes **x** and the vertical axes **z **(both ascending from origins in lower left corners). Mark the position of the heavy atoms A in the top table at positions x = .10, z = .05 and x = .90, z = .95 in each of the twelve unit cells. In the second table, mark A atoms at x = .10, z = .55 and x = .90, z = .45. In the third table, mark A atoms at x = .40, z = .45 and x = .60, z = .55. In the last table, mark the last remaining correct solution. Can you see that each of these representations are equivalent descriptions of the same physical structure and produce the same pair of interatomic vectors? By marking one of the sets of unit cells in another color with an incorrect solution, such as x = .10, z = .45, can you see how this is not the same physical arrangement? answer page Choice of one of the correct unit cells over the other is arbitrary and simply fixes the origin of the unit cell with respect to a heavy atom. Because we began this discussion by stating that we had a heavy atom at (.10, .24, .05) we will stick with those coordinates, even though three other x, z combinations would be acceptable and fixing y at 0 would be more conventional.

**NOTE: These four tables should appear as 3 by 4 arrays of identically sized unit cells. You may need to use the lastest browser for this to appear correctly.**

#### Your third Patterson solution

We will now attempt to generate and deconvolute the difference Patterson map for a P2(1) crystal with the addition of heavy atom B at (.35, .02, .44). The second heavy atom in each asymmetric unit of the crystal makes the Patterson map more interesting. Crystallographic symmetry generates two more peaks on the Harker section at (.30, .50, .12) and (.70, .50, .88) corresponding to the vectors between heavy atom B at (.35, .02, .44 ) and its mate at (.65, .52, .56). However, we also have the centrosymmetric "cross peaks" between each copy of heavy atom A and each copy of heavy atom B.

atom ID | operator | position | vector | Patterson site |

A | 1 | (.10, .24. .05) | B1-A1 | (.25, .78, .39) |

B | 1 | (.35, .02, .44) | A2-A1 | (.80, .50, .90) |

A | 2 | (.90, .74, .95) | B2-A1 | (.55, .28, .51) |

B | 2 | (.65, .52, .56) | A2-B1 | (.55, .72, .51) |

B2-B1 | (.30, .50, .12) | |||

operator 1= | (x, y, z) | B2-A2 | (.75, .78, .61) | |

operator 2= | (-x, y +1/2, -z) | A1-B1 | (.75, .22, .61) | |

A1-A2 | (.20, .50, .10) | |||

vector 1-2= | (2x, 1/2, 2z) | A1-B2 | (.45, .72. .49) | |

vector 2-1= | (-2x, 1/2, -2z) | B1-A2 | (.45, .28, .49) | |

B1-B2 | (.70, .50, .88) | |||

A2-B2 | (.25, .22, .39) |

The unique solutions for x and z from the B-derived Harker peaks at (.30, .50, .12) and (.70, .50, .88) are x = .15, z = .06 or .56 and x = .35, z = .44 or .94. If there were no heavy atom A, then either of these solutions would be equivalent and serve to fix the origin as above. However, the assignment of heavy atom A to (.10, .24. .05) should make some of these solutions for B correct and some incorrect.

Let's examine the consequences of hypothesizing that heavy atom B is at x = .15 to see how far it will take us in explaining all of the Patterson peaks. If B has an x value of .15, its symmetry mate will be located at x = .85. Knowing that A atoms fall at x = .10 and .90, it is satisfying to find Patterson cross-peaks at u = .25, and .75 because these could represent vectors of (.10 - .85) and (.85 - .10). The absence of Patterson sites at u = .05 and .95 is troubling, however. Can you provide the set of Patterson cross-peaks for heavy atom A in its correct positions and heavy atom B at x = .15, z = .06 and x = .85, z = .94? Does that account for the Patterson map? answer page

If heavy atom B is not at x = .15, it must be at .35. Can it be at both x = .35, z = .44 and x = .35, z = .94 or must it be at one or the other? answer page Given that the y value of atom A was assigned to .24 in the solution (.10, .24. .05), can you now derive the y position of your B atoms? Is this a unique solution? answer page