**13. Structure Solution by Direct
Methods**

**13.1
Routine structure solution**

Usually direct methods will be initiated with the
single SHELXS command TREF; for large structures brute force (e.g. TREF
5000) may prove necessary. In fact there are a large number of parameters
which can be varied, though the program is based on experience of many
thousands of structures and can usually be relied upon to choose sensible
default values. A summary of these parameters appears after the data reduction
output, and should be consulted before attempting any direct methods options
other than 'TREF n'.

**13.2
Facilities for difficult structures**

The phase refinement of multiple random starting
phase sets takes place in three stages, controlled by the INIT, PHAN and
TREF instructions. The 'best' solution is then expanded further by tangent
expansion and *E*-Fourier recycling (see the section on partial structure
expansion).

**INIT nn [#] nf [#] s+
[0.8] s- [0.2] wr [0.2]**

The first stage involves five cycles of weighted
tangent formula refinement (based on triplet phase relations only) starting
from nn reflections with random phases and weights of 1. Single phase seminvariants
which have S_{1}-formula
P_{+} values less that s- or greater than s+ are included with
their predicted phases and unit weights. All these reflections are held
fixed during the INIT stage but refined freely in the subsequent stages.
The remaining reflections also start from random phases with initial weights
wr, but both the phases and the weights are allowed to vary.

If nf is non-zero, the nf 'best' (based on the negative quartet and triplet consistency) phase sets are retained and the process repeated for (npp-nf) parallel phase sets, where npp is the previous number of phase sets processed in parallel (often 128). This is repeated for nf fewer phase sets each time until only a quarter of the original number are processed in parallel. This rather involved algorithm is required to make efficient use of available computer memory. Typically nf should be 8 or 16 for 128 parallel permutations.

The purpose of the INIT stage is to feed the phase annealing stage with relatively self-consistent phase sets, which turns out to be more efficient than starting the phase annealing from purely random phases. If TREF 0 is used to generate partial structure phases for all reflections, the INIT stage is skipped. To save time, only ns reflections and the strongest mtpr triplets for each reflection (or less, if not so many can be found) are used in the INIT stage; these numbers are given on the PHAN instruction.

**PHAN steps [10] cool
[0.9] Boltz [#] ns [#] mtpr [40] mnqr [10]**

The second stage of phase refinement is based on 'phase annealing' (Sheldrick, 1990). This has proved to be an efficient search method for large structures, and possesses a number of beneficial side-effects. It is based on steps cycles of tangent formula refinement (one cycle is a pass through all ns phases), in which a correction is applied to the tangent formula phase. The phase annealing algorithm gives the magnitude of the correction (it is larger when the 'temperature' is higher; this corresponds to a larger value of Boltz), and the sign is chosen to give the best agreement with the negative quartets (if there are no negative quartets involving the reflection in question, a random sign is used instead). After each cycle through all ns phases, a new value for Boltz is obtained by multiplying the old value by cool; this corresponds to a reduction in the 'temperature'. To save time, only ns reflections are refined using the strongest mtpr triplets and mnqr quartets for each reflection (or less, if not so many phase relations can be found). The phase annealing parameters chosen by the program will rarely need to be altered; however if poor convergence is observed, the Boltz value should be reduced; it should usually be in the range 0.2 to 0.5. When the 'TEXP 0 / TREF' method of multisolution partial structure refinement is employed, Boltz should be set at a somewhat higher value (0.4 to 0.7) so that not too many solutions are duplicated.

**TREF np [100] nE [#]
kapscal [#] ntan [#] wn [#]**

np is the number of direct methods attempts; if negative,
only the solution with code number |np| is generated (the code number is
in fact a random number seed). Since the random number generation is very
machine dependent, this can only be relied upon to generate the same results
when run on the same model of computer. This facility is used to generate
*E*-maps for solutions which do not have the 'best' combined figure
of merit. No other parameter may be changed if it is desired to repeat
a solution in this way. For difficult structures, it may well be necessary
to increase np (e.g. TREF 5000) and of course the computer time allocated
for the job.

nE reflections are employed in the full tangent formula phase refinement. Values of nE that give fewer than 20 unique phase relations per reflection for the full phase refinement are not recommended.

kapscal multiplies the products of the three *E*-values
used in triplet phase relations; it may be regarded as a fudge factor to
allow for experimental errors and also to discourage overconsistent (uranium
atom) solutions in symorphic space groups. If it is negative the cross-term
criteria for the negative quartets are relaxed (but all three cross-term
reflections must still be measured), and more negative quartets are used
in the phase refinement, which is also useful for symorphic space groups.

ntan is the number of cycles of full tangent formula
refinement, which follows the phase annealing stage and involves all nE
reflections; it may be increased (at the cost of CPU time) if there is
evidence that the refinement is not converging well. The tangent formula
is modified to avoid overconsistency by applying a correction to the resulting
phase of cos-1(<a
>/a ) when <a
> is less than a ;
the sign of the correction is chosen to give the best agreement with the
negative quartets (a random sign is used if there are no negative quartets
involving the phase in question). This tends to drive the figures of merit
*R** _{a} *and
N

wn is a parameter used in calculating the combined
figure of merit CFOM: CFOM = *R** _{a}*
(NQUAL < wn) or

Only the TREF instruction is essential to specify
direct methods; appropriate INIT, PHAN, FMAP, GRID and PLAN instructions
are then generated automatically if not given.

**13.3
What to do when direct methods fail**

If direct methods fail to give a clearly correct answer, the diagnostic information printed out during the data reduction at the start of the name.lst file should first be carefully reexamined.

After reading the SFAC and UNIT instructions the
program uses the unit-cell contents and volume to calculate the volume
per non-hydrogen atom, which is usually about 18 for typical oganic structures.
Condensed aromatic systems can reduce this value (to about 13 in extreme
cases) and higher values (20-30) are observed for structures containing
heavier elements. The estimated maximum single weight Patterson vector
may be useful (in comparison with the Patterson peak-list) in deciding
whether the expected heavy atoms are in fact present. However in general
the program is rather insensitive to the given unit-cell contents; the
assignment of atom types in the *E*-Fourier recycling (after direct
methods when heavier atoms are present) and in the Patterson interpretation
do however assume that the elements actually present are those named on
the SFAC instructions.

Particularly useful checks are the values of 2
(max) and the maximum values of the (unsigned) reflection indices *h*,
*k* and *l*; for typical small-molecule data the latter should
be a little greater than the corresponding unit-cell dimensions. If not,
or if 2 (max) does not correspond
to the value used in the data collection, there must be an error in the
CELL or HKLF instructions, or possibly in the reflection data.

The *R*_{int} value may be used as a
test of the Laue group provided that appropriate equivalent reflections
have been measured. Generally *R*_{int} should be below 0.1
for the correct assignment. *R*_{sigma} is simply the sum
of s (*F*^{2})
divided by the sum of *F*^{2}; a value above 0.1 indicates
the the data are very weak or that they have been incorrectly processed.

The mean values of |*E*^{2}-1| show
whether the *E*-value distribution for the full data and for the 0*kl*,
*h*0*l* and *hk*0 projections are centric or acentric; this
provides a check on the space group assignment, but such statistics may
be unreliable if heavy atoms are present (especially when they lie on special
positions) or if there are very few reflections in one of these three projections.
Twinned structures may give an acentric distribution even when the true
space group is centrosymmetric, or a mean |*E*^{2}-1| value
less than 0.7 for non-centrosymmetric structures.. These numbers may also
show up typing errors in the LATT and SYMM instructions; although the program
checks the LATT and SYMM instructions for internal consistency, it is not
possible to detect all possible errors in this way.

Direct methods are based on the assumption of 'equal
resolved atoms'. If the data do not suffice to 'resolve' the atoms from
each other, direct methods are doomed to failure. A good empirical test
of resolution is to compare the number of reflections 'observed' in the
1.1 to 1.2 Å range with the number theoretically possible (assuming
that OMIT is at its default setting of 4) as printed out by the program.
If this ratio is less than one half, it is unlikely that the structure
will be ever be solved by direct methods. This criterion may be relaxed
somewhat for centrosymmetric structures and those containing heavy atoms.
It also does not apply to the location of heavy atoms from macromolecular *F*
data because the distances between the 'atoms' are much larger. If the
required resolution has not been reached, there is little point in persuing
direct methods further; the only hope is to recollect the data with a larger
crystal, stronger radiation source, longer measurement times, area detector,
real-time profile fitting and lower temperature, or at least as many of
these as are simultaneously practicable.

If the data reduction diagnostics give no grounds for suspicion and no direct methods solution gives good figures of merit, a brute force approach should be applied. This takes the form of TREF followed by a large number (e.g. TREF 5000); it may also be necessary to set a larger value for TIME. If either of the methods for interrupting a running job are available (see above), an effectively infinite value may be used (TREF 999999). Any change in this number of phase permutaions will also change the random number sequence employed for the starting phases. It may also be worth increasing the second TREF parameter (WE) in steps of say 10% .

If more than one solution has good *R** _{a}
*and N

In cases of pseudosymmetry is may be necessary to modify the

When direct methods only reveal a fragment of the structure, it may well be correctly oriented but incorrectly translated relative to the origin. In such cases a non-centrosymmetric triclinic expansion with 'ESEL -1' may enable the symmetry elements and hence the correct translation (and perhaps the correct space group) to be identified.

Finally, if any heavier (than say sodium) elements are present, automatic Patterson interpretation should be tried.

Chapter 12. SHELXS - Structure Solution

Chapter 14. Patterson Interpretation and Partial Structure Expansion