Q.814 - How can I compare multiple sequences ?

Category: S.814 - sequence

Use SOD. From the manual:

SOD is a program which helps you create O datablocks and macros based on (aligned) sequences in one-letter code.

At present, SOD can be used to perform the following TASKs:

  1. - MULT - analyse multiple aligned sequences
  2. - INIT - create residue-type datablocks
  3. - PAIR - do pair-wise comparisons of aligned sequences

Data can be read in the following FORMats:

  1. - MEGA - multiple aligned sequences in MegAlign format
  2. - EMBL - multiple aligned sequences in the format returned by PredictProtein from EMBL/Heidelberg
  3. - PIR - multiple sequences, PIR format, read one at a time
  4. - EXPL - explicit format, read one at a time

SOD is a non-interactive program; you feed it an input file (and, sometimes, a library file) and you obtain an output file in O datablock format containing one or more O datablocks and sometimes one or more O macros.

			 (1) MULT
			 ========

This will produce datablocks and macros to help you with analysing multiple aligned sequences. If your molecule name is 'M1' then the following datablocks are created:

All you have to do now is to start O, read the SOD output file and execute the macro (or edit it first, if you like).

			 (2) INIT
			 ========

At last, there is a quick way to generate your XXX_RESIDUE_TYPE datablock from scratch. You need this datablock to create space for a new molecule prior to building it (sam_init_db in O).

			 (3) PAIR
			 ========

This option compares your REFErence sequence with each of the others in turn and produces one datablock for each comparison. This datablock contains an integer code for each residue:

 0 = identical residues
 1 = mutation
 2 = insertion in other sequence
 3 = deletion in other sequence
 4 = outside other sequence

You can use this datablock to colour your molecule, e.g. using the paint_case command; if you then make a CA-trace, the colours show where in your protein mutations, insertions and deletions occur.