ESCET Reference Manual (vers. 0.6b)


ESCET is a script driven program to analyze and compare three-dimensional protein structures. This file contains online-help information and is best viewed with an HTML browser.

This document is not the Manual - the manual can be found under the names escet_user.ps or escet_user.pdf in the ESCET distribution directory.

This version is a beta-test version. Please report all bugs, glitches and misunderstandable bits to trs@shelx.uni-ac.gwdg.de.

The program and its documentation is copyright by Thomas R. Schneider (2000/01/02/03)

Table of Contents

Online-Help
How to run ESCET
ESCET - The Name
Command Line Parameters
Commands related to set of atoms
Difference Distance Matrices
Graphics
Selecting Atoms



Online-Help

Currently, the keyword 'help' will simply dump all available information to the screen and to a file.


Available topics: help
help
If given without any arguments, 'help' will print all help information stored to standard output. Simultaneously an html-file 'escet_doc.html' will be dumped to disk. This html-file will reflect the current version of the program

Back to Top


How to run ESCET

Normally, the program is run via a command-script, e.g. myscript.inp. On a Unix-machine, such a script can be executed by typing 'escet < myscript.inp' on the command line.

If wanted, the program can also be run interactively from a UNIX-shell prompt line by simply typing 'escet'. ESCET scripts have the following syntax:

<escet-cmd>;
[<escet-cmd>;]
stop;
Generally, <escet-cmd> is of the following form:

<cmd-name> 
   [([<kwd> = <arg>|<kwd>])] ;

First the name of the command is given. If one wants to use non-default parameters, keywords with arguments have to be given divided by commas. Some parameters can be changed by simply supplying keywords. The list of keywords has to be terminated by a ')'. A ';' after a command will start execution of the command.

Back to Top


ESCET - The Name

ESCET is an acronym for Error-inclusive Structure Comparison and Evaluation Tool.
ESCET was also chosen to commemorate the German letter escet (the one that looks like a beta and stands for a sharp 's') which became almost extinct after a completely unnecessary but swift reformation of the German language by a commission of whose existence nobody took notice until their suggestions were found to have become legally binding ...

Back to Top


Command Line Parameters

-h dumps the reference manual to a file and exits

Back to Top


Commands related to set of atoms

Escet stores atoms and their coordinates etc. in sets. Sets are identified using simple numbers, starting from 0. Atom sets can be read from and written to files in a variety of formats using the commands aset_read and aset_write. In most cases, target and source atom sets are specified using the tset and sset keywords. Atoms can be selected from atom sets and passed on to another atom sets using the aset_select command.

Depending on the format, the following information is extracted from the input file:

PDB: coordinates, information about alpha helices (HELIX records) and beta sheets (SHEET records)
shelxl lst: not documented yet XXX

Available topics: aset_readaset_writeaset_infoaset_stataset_selectaset_egenaset_valiaset_amodaset_iactaset_agenaset_compaset_vect
aset_read
Reads coordinates from files of different formats. The coordinates will be stored in a specified target set tset. The atoms are stored in exactly the same order as in the file.
If an lst-file contains HFIX statements, the program will stop. The problem with such statements is that the model contains more atoms at the end of the corresponding refinement job than in the beginning (and this will cause a mess). For final full-matrix inversions this should not happen, anyway ...
ifile <string> ["input.pdb"]
Name of input file.
itype <lst|pdb|res> [filename extension of ifile]
Type of coordinate file to read. If not given, this defaults to the extension of the filename given for <ifile>.
tset <integer> [0]
Number of atom set in which to put coordinates.
esd_fac <real> [-1.0]
Factor to apply to positional esd's read from a SHELXL lst file. This factor is only applied if esd_fac is positive. It can be used to re-normalize a positional error to a radial error by multiplying it with 1/sqrt(3) = 1/1.73 = 0.577.
sum_disorder <boolean> [FALSE]
If true and type of input file is a .lst-file, print a summary of disordered atoms found in the .lst-file. This is useful for checking and analyzing disorder in proteins refined at atomic resolution.
aset_write
Write coordinates from atom sets into files of different formats.
ofile <string> ["output.pdb"]
Name of output file.
otype <pdb> [pdb]
Type of output file. In the current version, only pdb- and ins/res- files can be written.
sset <integer> [0]
Number of atom set to write.
dmin <real> [0.0]
if otype is res or ins, ESCET will try to write a res-file with a parametrization that depends on the resolution limits. Currently, only dmin >= 2.8 A will give something sensible.
aset_info
Give information on atom sets. This includes some general information about the data stored. In additition, the center of mass (CMS) is calculated
sset <integer> [0]
Number of atom set to give information on. If sset is -1, overview information about all available atom sets is given.
aset_stat
Statistics on atomic properties.
sset <integer> [0]
Number of atom set to use.
prop <bfac|esd> [bfac]
Property of which to display statistics. B factor or e.s.d.'s.
sele <sel-string> [all]
Statistics can be done on atoms selected via <sel-string> only.
See here for details on how to select atoms
label <string> ['']
Label to use for table on output.
list <bool> [off]
Wether or not to print a list of the individual values
aset_select
Select atoms using different criteria.
sset <integer> [0]
Number of source set
tset <integer> [0]
Number of target set
ttitle <string> ['']
Title to be given to target set. If no target title is given, the program tries to make up something sensible
sele <sel-string> [all]
Which atoms to select. See here for details on how to select atoms.
autoselect <bool> [off]
Toggle autoselect-mode, i.e. divide whatever is in sset into different target sets. A new Target set is started for every chain-id found. Filling target sets starts with target set specified by the tset-keyword.
aset_egen
If no experimental esd's (i.e. esd's derived from inversion of the least-squares matrix) are available, approximate esd's can be generated using a variety of models. The simplest method is to give a constant value for all esd's (esd_model = const) as for example a value calculated by SIGMAA.

Another very rough approximation is to simple use the equivalent isotropic B-factor translated into mean-square displacements. (esd_model = ueq)

More sophisticated models are various approximations based on Cruickshank's DPI-formula (Cruickshank, Acta Cryst. (1999). D55, 583-601) (esd_model = <dpi|dpiu|dpiu2>).

When NMR models are compared, the rmsd relative to a mean model maybe a quantity correlated with the coordinate uncertainty. The rmsd or the square of the rmsd can be used.
sset <integer> [0]
Number of source set
esd_model <const|ueq|dpi|dpiu|dpiu2|rmsd|rmsd2> [const]
Model used to generate esd's:

const: use constant number given in <esd_const>.
ueq: use Ueq (B / 8 pi^2) as esd.

The following models relate to Cruickshank's DPI. For calculation of the DPI, a number of parameters have to be provided: completeness of the data cpl, number of atoms refined ni, number of observables used in refinement nobs Rfree or Rall at the end of refinement rfree and the maximum resolution of the data dmin. The corresponding keywords are described below.
If the number of parameters is smaller than the number of observables and the overall R-factor R_all has been provided, eq. 26 is used, otherwise eq. 27 is used (note that for the evaluation of eq. 27, R_free has to be known). Based on the DPI, the different options are:

dpi: use Cruickshank's DPI (this will give a constant esd for every single atom in the structure
dpiu: use Cruickshank's DPI plus linear B factor scaling as described in Schneider2000.
dpiu2: use Cruickshank's DPI plus quadratic B factor scaling (experimental).

For NMR models, rmsd's can be used. Two models are available:
rmsd: simply use the rmsd as the coordinate uncertainty.
rmsd2: use the square of the rmsd as the coordinate uncertainty.
In both cases the uncertainty is multiplied by the number given for esd_fac.
esd_const <real> [0.1]
Number to use as constant esd's for model esd_model = const.
esd_fac <real> [1.0]
Constant factor that will be applied to all uncertainties. Currently only applied if rmsd-based models are used.
cpl <real> [90.0]
Completeness of data in percent.
ni <integer> [number of atoms in <sset>]
Number of atoms, to be more precise the number of fully occupied sites, in the final model.
npar <integer> [0]
Number of parameters used in refinement. If this number is given, equation 26 from Cruickshank1999 is used, otherwise eq. 27.
nobs <integer> [0]
Number of reflections used in refinement. In most cases, if Rfree is available eq. 27 will be used for error estimation. In this equation nobs is the number of reflections in the work set. If eq. 26, i.e. no Rfree available, the R-value for all data should be given for Rfree and nobs becomes the number of unique reflections used in refinement.
rfree <real> [30.0]
Rfree at the end of refinement, if available. If no Rfree is available the 'normal' R-value for all data should be given here AND nobs and npar have to be filled in order to use eq. 26 from Cruickshank's paper.
dmin <real> [3.0]
maximum resolution used in refinement
list <boolean> [OFF]
Whether or not to print a list of all atoms and their esd's at the end of aset_egen
aset_vali
Command to run some checks on your structure to find out missingparts, residues inconsistent with the sequence etc.
aset_amod
Modify properties of atoms in a given atom set.
set <integer> [0]
identifier of atom set
sset <integer> [0]
identifier of an atom set taken as a source of quantities
rigid <integer> [0]
assignment of a rigid body number
sele <sel-string> [all]
atoms of which property should be modified
ssid <char> ['']
assign secondary structure to selected atoms. Allowed are id's corresponding to the ones used in PROCHECK (Kabsch and Sander).
ss_alpha <sel-string> ['']
assign secondary structure 'alpha helix' to selected atoms
ss_beta <sel-string> ['']
assign secondary structure 'beta sheet' to selected atoms
ss_procheck <string> ['']
assign secondary structure from a PROCHECK out file. The argument is the name of this file. WARNING: This option may behave a bit strangely, as it is not tested very well.
chainid <string> ['']
change chainid of selected atoms
rnoffset <integer> [0]
modify residue numbers by given offset <rnoffset>
b <real> [0.0]
set all B-factors to the value given. If the value is -1, take values found in atom set specified using the sset-keyword. WARNING. This option is very unstable if something is messed up with one of the atom lists.
occ <real> [0.0]
set all occupancies to the value given.
aset_iact
Print a list of all atom pairs between to atom sets that are closer than certain distance. Can be used to scan for all atom pairs in a intermolecular surface or to scan for all atoms possibly interacting with a ligand.
seta <integer> [1]
identifier of first atom set
seta <integer> [2]
identifier of second atom set
dist <real> [4.0]
assignment of a rigid body number
aset_agen
Generate a list of atoms. This is mostly for development purposes and simply generates a list of atoms with all coordinates 0.000 etc. into a given atom set tset.
tset <integer> [0]
identifier of atom set to use
natom <integer> [0]
number of atoms to generate
aset_comp
Compares various aspects of two or more atom sets, selected via seta and setb. To do detailed comparison between atom properties such as B-factors on atoms sets with a defined mapping of atom pairs, DO NOT use this command, but use ap2_comp.
seta <integer> [0]
first atom set
setb <integer> [1]
second atom set
distmin <real> [0.1]
Minimum distance between corresponding atoms to trigger the generation of a note in the log-file
dbfacmin <real> [1.0]
Minimum difference in B-factors between corresponding atoms to trigger the generation of a note in the log-file
aset_vect
Geometrical Analysis of groups of atoms. For one set, the center of mass is calculated. For two set, the vector between the centers of mass and its length in Angstroems is calculated. For for sets, the angle between the vectors connecting the centers of mass of set1/set2 and set3/set4 is calculated. Useful for describing oligomers. Macros are allowed in atom-selection.
sset <integer> [0]
Atom set from which selection will be made.
sel1,sel2,sel3,sel4 <sel-string> ['']
selection for first to fourth atom set.

Back to Top


Difference Distance Matrices

Calculation and display of difference distance matrices The command tries its best to find the best defaults by itself. The elements of the distance matrix are displayed with (1,1) in the upper left corner and (N,N) in the lower right corner. This is more intuitive than having (1,1) in lower left and (N,N) in upper right as things are normally discussed proceeding from N to C-terminus.

Available topics: ddm
ddm
Calculate and plot difference distance matrices. Currently everything is designed for CA-CA difference distance matrices.

Consistency checks: Before any calculation is done, the two atom sets are checked for consistency. Normally the consistency check is very strict, but sometimes it may be necessary to be a bit less strict - then it can be made more loose (keyword check) or manipulated to handle sequences with an offset in their residue numbers (keyword setb_rn_offset).

Ticking: For a normal difference distance matrix, the ticks are drawn centered with respect to the column/row of the matrix. Sometimes it is necessary to individual atoms and not CA atoms for a difference distance matrix. Then you can use a different ticking scheme, based on the number of the atoms in the atom set (see here). If atoms are used the program will print a list connecting atoms and their numbers to the log file.

Binning: If a matrix is to big for display, it is binned before being displayed. This complicates putting the tickmarks. For a binned matrix the lowest and the highest ticks are moved to the respective extremes of the plot to indicate that binning was performed. The other ticks show up in the same places that they would show in a normal matrix but scaled by the binning factor. This also means that most ticks will not be centered on the corresponding rows/columns, but will only give a rough guide of where which residue can be found.

Treatment of sequence gaps: For CA-CA difference distance matrices there may be parts of the sequence where coordinates are not available. To avoid excessive problems with binned matrices etc. (e.g. when a gap is starting and ending in a block that is binned), gaps are completely ignored when plotting the matrix. They are marked with small arrows on the x-axis of the plot. If the matrix is binned, the position of the arrows is moved in the same way as the ticks.

Legend: The default position of the legend is chosen according to the type of plot chosen (upper, lower, both, diag, see here). For 'lower' the legend will be alligned with the left edge of the viewport, for 'upper' with its right edge. For 'both', the legend will be centered relative to the viewport. If you want to position the legend manually, you have to used the legend* argument after the type-keyword.
seta <integer> [0]
First coordinate set
setb <integer> [0]
Second coordinate set
setl <intlist> [0]
list of integer number corresponding to atom sets. DD-matrices will be plotted for all possible combinations of atom sets
lolim <real> [0.0]
lower limit of values to display in matrix. If esd_scaled is on, this is in units of sigmas otherwise in Angstroems
hilim <real> [0.0]
upper limit of values to display in matrix. If esd_scaled is on, this is in units of sigmas otherwise in Angstroems
limit
maximum number of elements of matrix in one direction. This determines the binning scheme used. If the dimensions of the matrix is larger than limit, it will be binned accordingly
esd_scaled <boolean> [off]
Switch esd-scaling of matrices on/off
type <lower|upper|both|diag> [both]
As difference distance matrices are symmetric there is redundancy. This allows to use different parts of the matrix for different things. 'lower' and 'upper' represent the non-redundant information in the lower and upper triangle, respectively. 'both' draws the entire matrix. 'diag' only puts a black line along the diagonal (purely cosmetic ...). If type is 'both' a new page is started automatically after one matrix has been plotted. For type 'lower','upper', or 'diag' the plotting is continued on the same page.
check <auto|auto_ignore_rt|strict|loose> [strict]
Type of procedure used for checking the consistency of the atom sets supplied.
For 'auto', the program tries to find the best consistent set of atoms. To be consistent, atoms have to have the same residue number, residue type, atom, and part number.

For 'auto_ignore_rt', the same as 'auto' but residue types are. ignored. Useful for point mutations or models that contain ALA for residues without sidechain density.

For 'strict', all atom names, residue numbers and part id's have to be identical. The only exception is that part ' ' and part 'A' are considered equivalent. This allows to consider a first conformer equivalent to a unique position. The criterion on residue names can be loosened by using an offset for the residue number of atoms in setb (see keyword setb_rn_offset here). This is useful if you want to compare different chains from a SHELXL .lst file as these do not have chain id's (an offset for residuenumbers is used instead). The residue type is not checked as this allows more flexibility if for example CA atoms of proteins with a point mutation are compared.

For 'loose', only the number of atoms is checked. If an setb_rn_offset is not supplied, the offset between residue numbers is automatically determined from the first pair of atoms. If setb_rn_offset is unequal 0, the value provided is used.
minfraglen <real> [auto]
Length of shortest allowed fragment of a rigid body in percent. If a fragment is shorter than this it is converted to 'flexible'. By default this value is set such that fragments shorter than 5 are not allowed. Set this value to 0.0 if you want to switch off this part of the polishing of the solution.
patchlen <0|1> [1]
Wether or not to patch single residues that are not marked as rigid. Set this value to zero if you want to switch off the patching.
setb_rn_offset <integer> [0]
Offset to be used for residuenumbers of setb in difference distance matrix calculations. Can be made positive or negative. If kept at 0, the program tries to be sensible in inventing this offset (see keyword check above). If unequal 0, the given value is used.
rb_plot <boolean> [off]
Wether or not to plot rigid bodies.
rb_find <boolean> [off]
Wether or not to find rigid bodies using a genetic algorithm.
dm_print <boolean> [off]
Wether or not to print intermediate distance-matrices. This is mostly for debugging purposes
dm_dim <integer> [20]
maximum dimension of distance matrices to be printed.
dm_limit <real> [10.0]
Highest absolute value to plot for distance matrices. Values higher or equal this number will be replaced by LLL. The value chosen will also influence the format used for output.
dm_format <string> ['']
format to use for printing elements of distance matrix.
dd_plot <boolean> [on]
whether or now to plot difference distance matrices to a postscript file.

Back to Top


Graphics

All graphics are dumped to postscript files. The relevant parameters can be adjusted in all commands that produce graphical output The graphics state is initialized by assigning default values to all parameters on program start up and can be changed in using a number of keywords described here. Any changes will persist.

The graphics model is based on the concept of a viewport. The viewport is the area of the sheet where plots are made. Tickmarks can be inside or outside the viewport. the coordinates of the viewport are given in absolute postscript coordinates, where (0,0) is the lower left corner of the sheet and 1 unit corresponds to one typographical point (1/72 inch = 0.3375 mm). If the viewport is chosen automatically, its dimension will be related to the 'frame'. The frame is the part of the paper that can be used for printing and is normally set to something a bit smaller than A4 and leaving a left margin for punching holes.

Available topics: update_graphics_state
update_graphics_state
The graphics state can be updated from within all commands that make use of graphics.
plotauto <BOOL> [on]
Whether or not to determine various parameters for plotting (range, ticks etc.) automatically or not. Normally this is switched on for the first plot and then switched off to avoid interference with user defined settings. This is not very well test. If in doubt switch it off and set things by hand.
pstype <ps|eps> [ps]
What kind of postscript file should be produced, i.e. 'ps' (normal postscript) or 'eps' (encapsulated postscript). Normally ps files are made. For publication it is often useful to produce a eps version of the same plot.
psfname <STRING> ['escet_*.[e]ps']
Name of postscript file.
title <STRING> ['NULL']
Title to put on top of the plot.
frame <BOOL> [on]
Whether or not put a frame around the plot.
frametitle <STRING> ['something invented by the program']
Title to put in the header of the frame.
linewidth <REAL> [1.0]
Width of lines in units of 1/72 inches (1pt). Default: 1.0.
dotrad <REAL> [3.0]
Radius of dots used in plots in units of 1/72 inches (1pt). Default: 3.0.
zmin,zmax <REAL> [XXX,XXX]
Minimum and Maximum in third dimension, although they are no three-dimensional plots at the moment. These numbers are for examples used to store the range parameters for difference distance matrices.
color <COLOR> [black]
Current linecolor. current support colors are (name and rgb code): black 0.0,0.0,0.0 ; white 1.0 1.0 1.0 ; red 1.0 0.0 0.0 ; pale_red 1.0 0.5 0.5 ; green 0.0 1.0 0.0 ; blue 0.0 0.0 1.0 ; yellow 1.0 1.0 0.0 ; magenta 1.0 0.0 1.0 ; cyan 0.0 1.0 1.0 .
r <REAL> [0.0] g <REAL> [0.0] b <REAL> [0.0]
Rgb-code for current linecolor
dotr <REAL> [0.0] dotg <REAL> [0.0] dotb <REAL> [0.0]
Rgb-code for current dotcolor
ticks <off|on|auto> [auto]
If 'off' no ticks are drawn, if 'on', tick parameters as given in xtmin, xtmax, xtint and the same for yt* are used. if auto, these parameters are chosen automatically. Any other specifications of these parameters in the command will then be ignored.
tickstype <anum|rnum> [num]
Normally residue numbers are used for tickmarks on plots (tickstype=rnum). Sometimes it is necessary to make plots for individual atoms. Then ticktype=anum can be used and atom numbers will be used.
ticksfontsize <REAL> [11.0]
Font size used for normal ticks, e.g. numbers.
tickssmallfontsize <REAL> [4.0]
Font size used for crowded ticks, i.e. for tickstype = aid.
xmin <REAL> [XXX] xmax <REAL> [XXX]
Minimum and maximum value to display on horizontal axis
xtmin, xtmax, xtint <REAL> [XXX]
Minimum, maximum and intervall for ticks along the horizontal axis
xtdel1, xtdel2 <INT> [0]
two tickmarks that will not be put into the plot
xticks <off|bottom|top|both> [bottom]
whether or not and if where to put ticks in x-direction.
ymin <REAL> [XXX] ymax <REAL> [XXX]
Minimum and maximum value to display on vertical axis
ytmin, ytmax, ytint <REAL>
Minimum, maximum and intervall for ticks along the vertical axis
yticks <off|left|right|both> [left]
Whether or not and if where to put ticks in x-direction.
xlabel,ylabel <STRING> ['']
Labels to be used for x- and y-axes.
xtformat,ytformat <STRING> ['.2f']
C-style format strings for formatting of tick-numbers along x and y.
legend <BOOL> [on]
Whether or not to draw a legend.
legendy, legendy, legendh, legendw <REAL> [0.0]
x and y coordinates, height and width (all in units of typographical points) of the legend
comments <BOOL> [on]
whether or not to print comments.
vx1, vy1, vx2, vy2 <REAL> [something sensible]
Lower left (vx1,vy1) and upper right (vx2,vy2) corners of the current viewport (all in units of typographical points).
bx1, by1, bx2, by2 <REAL> [something sensible]
Lower left (vx1,vy1) and upper right (vx2,vy2) corners of the BoundingBox used for encapsulated postscript (all in units of typographical points).
ssplot <BOOL> [off]
whether or not to put secondary structure plot into residue property or DD-matrix plots
ssheight <BOOL> [off]
height of ss-plot in pixels
ssperc <BOOL> [off]
vertical percentage of plot used for secondary structure plot.

Back to Top


Selecting Atoms

A number of commands operate on a subset of atoms present in the actual atom set. Subsets can be selected by using a so called boolean expression defining the atoms to be used. A <sel-string> consists of a number of conditions connected by AND, OR and NOT. Complicated conditions can be realized by using parentheses.

Available topics:
<sel-string>
A <sel-string> describes a condition or a combination of conditions that an atom has to fullfill to be included in a selected subset. A typical condition would be that the atom should be a Calpha atom - this condition can be translated as: (name == CA). In this case '==' is the comparison operator and 'name' and 'CA' are the left and the right operands.

Allowed comparison operators are: '>' (greater), '<' (less), '>=' (greater or equal), '<=' (less or equal), '==' (equal), and '<>' (not equal).

The left operand in a condition may be one of the following property of atoms: 'bfac' (B factor), 'occ' (occupancy) ', 'resn' (residue name or residue type), 'chainid' (chainid), 'part' (part identifier), 'name' (atom name), 'element' (chemical element), 'anum' (number of the atom, first atom has number 0!).

Conditions have to be surrounded by parentheses and can be combined using the following boolean operators: 'not', 'or', and 'and'.

A slightly different mechanism is used to select atoms based on their
number (i.e. their position in the atom set) or residues based on their
residue number (without the chainid). Here the comparison operator is 'in'
and the numbers have to be given as lists of numbers. A list if numbers is of the form: {n1-n2:n3:n4-n5} to select numbers n1 to n2, number n3 and numbers n5 to n5.

Following are some examples.

select all CA atoms:

sele = (name == CA)

select all atoms with B factor < 80 A^2:

sele = (bfac < 80.0)

select all backbone atoms of chain A:

sele = ((chainid == A) and ((name == N) or (name = CA) or (name == C)))

select atoms number 3 to 21 and atom 27;

sele = (anum in {3-21:27})

select residues 19 to 25, 36 to 42, 45 and 56 to 58:

sele = (resi in {19-25:36-42:45:56-58})

Most problems appearing with selection strings are related to too few or too many brackets.

A number of macros can be used to shorten sel-expressions:
protein = 'standard amino-acids'
backbone = (protein and ((name == N) or (name == CA) or (name == C)))
peptide = ((name == N) or (name == CA) or (name == C) or (name == O))
sidechain = (not ((name == N) or (name == CA) or (name == C) or (name == O))
Back to Top