‘DockEM’ is a software package to quantitatively dock, or fit, a crystal structure into an EM map of a macromolecular complex that contains that structure. A density map is calculated from the domain to form a density search object. This search object is then correlated locally (in real space) throughout the asymmetric unit of the map in order to find the position where the density of the search object and the map match best. The positions corresponding to the top 10,000 normalised correlation coefficients are available for inspection.

The algorithm and a validation example are presented in Roseman (2000) Acta Cryst. D56, 1332-1340. An example of an application is published in Roseman et al. (2001).

__Requirements__

- An EM map and one or more atomic structures of domains or fragments of a complex are required.
- The software and a unix Alfa or SGI workstation.
- ‘O’ or another program that can display maps and manipulate coordinates.
- The program MAPMAN (from Upsala Software Factory), or an equivalent.
- Access to the MRC, SPIDER (which we don't have at UCSF) and CCP4 software packages will be useful also.
- You should know the sampling in Å/pixel of the EM map, and the resolution to which it is valid.

__Some notes on the algorithm__

The domain to be docked is treated as a rigid body. If you suspect it has two parts joined by a flexible region then the best approach is to break it in two pieces and dock them separately. If the two solutions are compatible this will provide some degree of validation. Otherwise a larger domain, or one with more distinctive features for docking, could provide constraints for another.

The real space local correlation coefficient is computed between the search object and the EM map at every point in the 6 dimensional search space (3 translational degrees of freedom and 3 in orientation). Sample points from the search object are compared with corresponding points from the map that lie under the ‘footprint’ of the search object at the current position. Therefore no part of the map beyond the boundary of the search object affects the correlation coefficient, though there is no explicit masking of the map density.

Therefore the algorithm can deal with molecules in tight complexes and interfaces. Search object density extending into weaker density of map, or outside molecule, will penalise the score.

__The procedure__

The EM map of the complex should be band pass filtered. The low pass Fourier filter should be set at the maximum resolution the EM map is reliable. I have been using the FSC = 0.5 limit (See Frank, 1996). A high pass filter should be set at a resolution corresponding to be bit larger than the maximum dimension of the domain to be docked.

The program ‘Mapfilter’ (included with DockEM) will Fourier filter the map, and set up the unit cell dimensions. The standard cell is defined as P1, with the origin (0,0,0) at the first pixel. Maps are assumed to be cube shaped, i.e. the X,Y,Z dimensions are equal to one another.

Then an initial placement of the coordinates into the map should be made, using ‘O’ or possibly another program. This needs only to be approximate and defines the origin of the search. The domain should at least be placed somewhere inside the map. Save the coordinates at their new position. A density model can be made from the coordinates using ‘Makedensity’. This must be Fourier filtered at the same resolution as the map, using ‘Mapfilter’.

The density matching search is executed within the program ‘DockEM’. A file describing the angular range to search is required. This is in the same format as used for SPIDER angles files and can be generated with the *SPIDER* VO EA command. The ‘DockEM’ program could run for a long time, depending on the size of the search object and the sampling, step size and angular range to be searched. Progress can be monitored and a run time estimated from the standard output of the program, as the progress though the angles list is reported. If the search region can be restricted, it will go quicker. In some cases the sampling of the EM map and search object can be reduced to be compatible with the resolution, i.e. sampling in Å/pixel at 1/3 * maximum resolution is adequate. Finer sampling will make the computation unnecessarily longer.

Several files are output on completion of the search. A three digit run code ‘00n’ is associated with each DockEM search run, and is part of the output filenames.

- searchdoc00n.dkm, list of search positions and normalised correlation coefficients. More detail on output format later.
- histogram00n.dkm, histogram of all the correlation coefficients calculated in the search.
- searchmap00n.mrc, map of the highest correlation coefficient, over all angles, at each spatial coordinate (X,Y,Z).

Atomic models for any of the scores listed in the file searchdoc00n.dkm can be generated using the program ‘DockXsoln’. The same PDB file and sampling as used for the search should be input. Obviously the first solution to inspect is the top score. PBD files with the name pattern tophit00n.00I.pdb are generated, where n is the run code, and I is the rank in the scores. Many of the solutions listed will be neighbours or shoulders to a local maximum. It is possible to detect clusters of solutions based on the X,Y,Z shift. Clusters in angular space can also be detected, but sometimes it is not obvious that two sets of Euler angles describe a very similar rotation.

__Stepwise Summary:__

5 main steps:

1. Set up the cell dimensions and filter the EM map.

2. Define starting position by manual placement into the map, using ‘O’.

3. Convert the coordinates to a density, and filter.

4. Run DockEM to do the local correlation search.

5. Extract the solutions as coordinates and examine in O.

Plus 2 others:

6. Refinement.

7. Validation.

____________________________________________________________________

0.

Convert the map to mrc format.

1. Set up the cell dimensions and filter the EM map

Run Mapfilter, to filter the EM map and set up the cell parameters.

Angles 90 degrees, C = Cell dimension = (N-1) x D. D = sampling (Å/pixel), N = dimension of the EM map in voxels. The P1 cell implies Cx, Cy, Cz = 90, 90, 90.

Mapfilter

Input EM map filename.

Enter sampling in Å/pixel.

Enter low pass (high resolution cuttoff) and high pass (low resolution cuttoff) filters.

Enter filename for output map.

2. Define starting position by manual placement into the map, using ‘O’

a) Convert mrc format EM map to BRIX format for display in O. Use “mrc2ccp” to convert from mrc to CCP4 format. MAPMAN will read the CCP4 file and convert it to BRIX format.

Starting with ‘somefile.map’, in mrc format.

mrc2ccp somefile

(output: somefile_cnv.map)

MAPMAN

Read

M1

somefile_cnv.map (CCP4 filename)

CCP4

Mappage

M1

filename.brix

quit

b) Run ‘O’

Read in the map, display.

Read in the coordinates, display them.

Move them around.

Write them out at the new position.

O manuals are at their website.

3. Converting to density

Makedensity

Enter coordinates filename.

Enter a map file name to take header information from, this would be the EM map you made the placement into, the MRC format version.

Enter the sampling of the EM map in Å/pixel.

Enter an output filename for the search object density.

Filter this search object map using Mapfilter.

Use the filters as for the EM map.

Choose a threshold of density to include in the search object (after filtering) using some visualisation program (Web (part of Spider), or O or other.). The threshold value should define an isosurface that includes approximately the expected volume of the domain, or slightly less in order to pick the strongest features, with a higher signal to noise ratio. The ‘DockEM’ search algorithm is not very sensitive to the exact threshold and I have found it adequate to find the threshold by ‘eye-ball’ using Web.

4. Run DockEM

DockEM

Enter the search object density (created at step 3).

Enter the sampling of the EM map in Å/pixel.

Enter the density threshold for the search object.

Enter the name of the pdb file. (This is used to define the origin of the transformations.)

Enter a three digit run code for the output files.

Enter the angles file filename.

Enter the angular sampling (in degrees). The same as used to create the angles file.

Enter the target EM map filename to dock into (filtered in step 1).

Enter a search radius in voxels.

Enter a step size for the search, in voxels

5. Examine output files

searchdoc’s

Make coordinates for the solutions.

DockXsoln

Enter the coordinates file (pdb)

Enter the three digit run code

Enter the sampling (Å/pixel)

Enter the solution range required. (e.g. 1,10 for the top ten solutions).

6. Refinement

Local refinement using finer steps and angular sampling can be done, starting from the best solution, or set of solutions, from a previous run.

7. Validation, or additional constraints

Check symmetries, or filter by them.

Check compatibility of different domain solutions

Check possible connectivity.

Check point mutation and biochemical data.

__Other notes/conventions__

Angles files are in Euler angles convention, in spider document format.

Coordinates files are in pdb format.

Maps are in mrc/ccp4 format (but must be converted to brix to display in O).

Use CCP4 programs to apply symmetries or other manipulations, detect clashes automatically.

The P1 unit cell goes 0 -> (N-1)*D.

Resample the EM map, at ~ resolution/3, to get optimum speed.

Run time is linear with the x, y, z range, because it’s a real space search.

The search does every position in the range per angle, therefore the angles are in the outermost loop.

A run code is associated with every output file, and links it to a specific DockEM run.

Columns in order left to right are-

- Score or key.
- X shift, in voxels.
- Y shift, in voxels
- Z shift, in voxels.
- Entry in angles file.
- Omega angle step. (i.e. the actual Psi interval used by the programs, input number 7 to DockEM).
- Psi, Euler angle.
- Theta, Euler angle.
- Phi, Euler angle
- Significance, the number of standard deviations of the correlation coefficient above the mean of all correlation coefficients searched.
- Normalised correlation coefficient.

Some example angles files are included.

These sample all orientations at 8,4,2, or 1 degree intervals respectively:

angles8.spi

angles4.spi

angles2.spi

angles1.spi

These sample up to 30 degrees tilt (range of Euler angle theta = 0,30) at 8, 4, 2 or 1 degree intervals respectively:

angles8L30.spi

angles4L30.spi

angles2L30.spi

angles1L30.spi

To set up your own angles file use the *SPIDER* VO EA command.

Inputs:

- Delta theta – step size of angular sampling in degrees.
- Range of theta – amount of tilt allowed of the object, defines the limit of a conical search region (in angular space) around the origin. 0-180 is all possible orientations.
- Range of phi – this should be 0 –180 to cover all orientations.

Only change delta theta and range of theta to specify restricted searches.

The program searches another degree of freedom, a rotation of -180 to +180 degrees about the ‘old Z axis’ after the rotation defined by theta and phi. The angular sampling, 7^{th} input to the program ‘DockEM’, defines the step size for this.

__Bibliography__

Roseman, A. M. (2000). *Docking structures of domains into maps from cryo-electron microscopy using local correlation*. *Acta Cryst.* D**56**, 1332-1340.

Roseman et al. (2001). Journal of Structural Biology, accepted.

Frank (1996) Three-Dimensional Electron Microscopy of Macromolecular Assemblies. Academic Press, San Diego.

__WWW sites__

O

CCP4: http://www.ccp4.ac.uk/main.html

Upsala Software Factory: http://xray.bmc.uu.se/usf/

SPIDER: http://www.wadsworth.org/spider_doc/spider/docs/spider.html