MAGOS - Help page
MAGOS allows an automated modelling of your query protein coupled to
the creation of a hierarchical and annotated Multiple Alignment of
Complete Sequences (MACS).
You submit a single query protein in FASTA format and optionally a
template code and MAGOS will create the associated MACSIMS and homology
model for this query.
MAGOS results will be available through a user-friendly interface that
allows the interconnection of the generated model and the computed
MACS.
The MAGOS web server thus allows an interactive approach of structural
information within the framework of the evolutionary relevance of the
mined and predicted sequence information.
Flowchart of the MAGOS interconnected processes

Multiple Alignment of Complete Sequences (MACS) computation with
PipeAlign
A tuned version of
PipeAlign has been used
to compute a high quality MACS. The PipeAlign process can be divided in
several steps:
Ballast (homology search):
Starting from the query sequence,
Ballast first runs BlastP to
search for
homologues in the
UniProt and PDB databases or in the non-redundant
UniRef90 database. It then
post-processes the Blast results to identify small conserved segments
(LMSs) that
may characterise the protein family of the query.
Closest PDB homologue :
To define the
closest PDB homologue, a BlastP search is performed using the query
sequence within the PDB database, (maximal E-value parameter set to
10). The PDB with the best score associated with is defined as the
closest PDB. If not present in the selected sequences, the
closest
PDB
homologue is added to the pool of proteins to be aligned.
DbClustal (sequences
alignment):
Ballasts LMS are used by
DbClustal as soft anchors to guide the
building of the multiple alignment of the sequences detected with an
E-value < 10^-3
during the BlastP search in the limit of the maximal number of sequences to be aligned defined by the user (default=100).
Rascal (alignment refinement):
Rascal scans the MACS to identify misaligned residues and
blocks. It then realigns them in order to improve the MACS quality.
NorMD (objective function):
The quality of both
DbClustal and
Rascal alignments is
evaluated with
NorMD, an objective function which
associates a score to an alignment.
Leon (unrelated sequence removal):
The highest scored MACS is processed by
Leon which removes
potential weakly related or highly fragmented sequences to generate the
final MACS, which is also scored by NorMD.
DPC/Secator (clustering step):
Subfamilies are then automatically defined using the clustering
programs
DPC or
Secator.
The alignment of the query and the closest PDB extracted from the
final alignment to compute in parallel a model using
Geno3D.
Back to top
MACSIMS annotation of MACS
MACSIMS (MACS Information Management System) allows the automatic
annotation of a MACS. MACSIMS integrates structural and
functional information mined from external databases like PFAM,
Prosite, Interpro and OMIM as well as various
ab initio
predictions (global and subfamily residue conservation, transmembrane
segments, coiled coil regions, low amino acid composition biased
segments,...). The retrieved information can be propagated through the
whole
alignment, including the query, according to conservation criteria.
Model construction with Geno3D
Geno3D processes the alignment of the query and the
template PDB sequences.
Geno3D extracts geometrical restraints
(dihedral
angles and distances) for corresponding atoms between the query and the
template and performs the 3D construction of the protein by using a
geometrical approach. It uses CNS as a molecular building and analysing
engine.
Good results will be obtained with an entry alignment in which the 2
sequences share at least 30% identity and 40 common residues.
After modelling, generated models are validated by structural alignment
with template (using CE). The model with the minimal energy is
chosen as the retained model.
The resulting structure is a model and must
be
considered carefully, it is not an experimental 3D structure!
Results
The final step of the MAGOS server is the retrieval of all computed
results
and the ability of their interconnection through a user-friendly web
interface (
Jmol page).
Back to top
How to submit a query?

You have to paste your query sequence in fasta format in the
dedicated area.
Query sequence example in fasta format:
>query
MQRAVSVVARLGFRLQAFPPALCRPLSCAQEVLRRTPLYDFHLAHGGKMVAFAGWSLPVQYRDS
HTDSHLHTRQHCSLFDVSHMLQTKILGSDRVKLMESLVVGDIAELRPNQGTLSLFTNEAGGILD
DLIVTNTSEGHLYVVSNAGCWEKDLALMQDKVRELQNQGRDVGLEVLDNALLALQGPTAAQVLQ
AGVADDLRKLPFMTSAVMEVFGVSGCRVTRCGYTGEDGVEISVPVAGAVHLATAILKNPEVKLA
GLAARDSLRLEAGLCLYGNDIDEHTTPVEGSLSWTLGKRRRAAMDFPGAKVIVPQLKGRVQRRR
VGLMCEGAPMRAHSPILNMEGTKIGTVTSGCPSPSLKKNVAMGYVPCEYSRPGTMLLVEVRRKQ
QMAVVSKMPFVPTNYYTLK
Several options are available:
- the maximal number of sequences to be aligned
in the multiple alignment of complete sequences (default = 100)
- the E-value limit for the sequences to be aligned (default = 0.001)
- the database for the BlastP search (default = UniProt + PDB)
- the BlastP low complexity filter option (default = True)
- the BlastP gapped option (Default = True)
- the PDB structure you want to be used as
template for the query modelling by filling the "Template PDB ID code"
area (4 letters PDB identifier, e.g. 1wsv) and precise the chain if
necessary in the "chain" field (e.g. B).
- the maximal number of distant restraints for the model generation (Default = 20000). Should be modified carefully by expert user.
- the superposition of the three generated models (the one with the best RMSD with the template and the minimal energy value is selected as the best)
Click then on the "Submit" button.
An identifier will be associated to your job.

Please note this number for later reference.
The job will run on our server. The time required to process a request
depends on the length of the submitted protein. With long proteins, it
can take up to 3 hours.
If you have filled the e-mail address field, you will receive an
e-mail when the job is completed, with a link to the result page.
However, you can access your results by pasting your job identifier in
the dedicated area on the Check job
page. Note that results will be kept on our server for about 2 weeks.
You will be able to upload your results later by saving your
intermediary results file and reload them (How to
reload previous results).
Back to top
How to check your results?
If you have precised your mail address, you will
get an e-mail at the end of the job, with a link to the result page.
If not, you can access your results by pasting your job identifier in
the dedicated area on the
Check job
page.
Note that results will be kept on our server for about 2 weeks.
The check results page give the state of a running job with all steps
that have been completed.
If an essential step has failed, an error message is displayed in red.
If a non essential step failed, a message is displayed in orange.
If a step runs successfully, Done is displayed in green.
Back to top
What are the results?
Main results page
The main results page allows the downloading of all generated files:
- Alignment file (MACS) in MSF format.
- MACSIMS (Multiple Alignment of Complete Sequence
Information Management System) output in XML format (compulsory for later use)
- DSSP output in text format.
- BlastP output in text format.
- Model generated in PDB file format (compulsory for
later use)
- Disease file in text format. This file can be
empty if no information
about human diseases has been found.
The jmol page allowing the display of the generated model
interconnected with the generated MACSIMS (annotated MACS) can be
accessed via the "Jmol display" link.
An image representing the modelled area compared to the full length
query
sequence is also available.

Back to top
Jmol page
You must have latest java plugin installed in order
to use the
Jmol
Applet.
You also need to enable javascript for your web browser. You can
check your browser
here.
Jmol functions have been tested on:
- IE >6.0, Firefox >0.8, Mozilla >1.5 on Windows
- Mozilla >1.5, Firefox >0.8 on Linux
With Opera >7.5 and Safari distance and log functions doesn't work.
With Konqueror >3.4 Jmol applet doesn't work and stay in loading state.
MacOS X <10.3 doesn't support LiveConnect between Java and JavaScript, Jmol
tools won't work with any browser. MACSIMS vizualisation is still available.
It's strongly recommanded to have the latest version of your browser.
Jmol right frame
The right frame is used to display and access information from the
MACSIMS file (the annotated MACS).
In this frame, you can notice:
- the top sequence corresponds to the query sequence
(QUERY),
- the underlined part of the query sequence is the
modelled part, i.e. CAQE
VLRRTPLYD...
- a popup window containing general information
about the protein is
linked to the protein names,
- a colour code is associated to the protein names:
in
blue : the template protein used to
compute the model.
1wsv_B.
in red
: the proteins involved in Human
diseases.
P48728.
in
pink : the proteins homologue to a
protein involved in Human diseases.
Q9TSZ7.
in
black : the others proteins.
- the amino acid sequences are coloured according to
their cluster
(sub-families).
By default, only the first sequence of each cluster is displayed.
Sub-families can easily be collapsed or uncollapsed for better
visualisation and analysis by using the + or - image or the
Uncollapse
all or
Collapse all button.
A cross image means that there is only one sequence in the cluster.
The quality index (NorMD score) associated with the alignment is shown
on the bottom of the frame.
The alignment annotations are accessible through the combo box just
below.
Detailled features.
The selection of a given feature allows its mapping in the context of
the MACS. For
the secundary
STRUCTure feature, alpha-helices are in pink and
beta-sheets in yellow.
If the modelled sequence owns the feature, it is mapped onto the 3D
structure model in the left frame.
The selection of a residue in the modelled part of the query sequence
(underlined part of the sequences) allows the localization of this
residue within the model (left frame). The selected residue will appear
in green on the model.
Back to top
Jmol left frame
The left frame (applet frame) is used for 3D model rendering. It is
divided in 2 parts : the visualization part above, and the control
panel below.
The control panel provides several display options. Basic options allow
the selection of the applet background colour, zoom
level and reset command (to set labels off).
In the "Select" area, you can change the rendering and colouring types
for the structure.
Click on the "go!" button to validate changes.
The colour options are simultaneously mapped on the 3D model and on the
MACSIMS top query sequence (right panel).
The "Link Parameters" area allows linking the model (left frame) with
the MACSIMS (default) or with the alignment of the modelled sequence
with the template PDB sequence in the right frame.

The "Console" area is by default in "Selection mode". After the
selection of two residues on the 3D model (in blue), the "Distance
mode" allows the computation of the distance between the two residuess.
The 2 selected residues are simultaneously
highlighted in the MACSIMS (right) frame.

Selection, distance and user commands are displayed in the
console.Jmol/Rasmol commands can also be launched in the "Jmol/Rasmol
command"
area (advanced users).
Back to top
MACSIMS features
Features types:
ANCHOR : Predicted
ANCHORs (Ballast)
BLOCK : Predicted conservation
BLOCKs
(Rascal)
COIL : Predicted
COIL (ncoils)
LOWCOMP : Predicted
LOW COMPlexity
region
MOD_RES : Annotated
MODified
RESidues
PFAM A : Annotated
PFAM A region
PFAM B : Annotated
PFAM B region
PROSITE : Annotated
PROSITE domain
REGION : Predicted
REGION
(associated to Rascal's blocks)
REPEAT : Annotated
REPEAT
SEQERR : Predicted block of
SEQuencing
ERRor
SIGNAL : Annotated
SIGNAL
SITE : Annotated functional
SITE
STRUCT : Annotated secondary
STRUCTures.
/!\ For query sequence, secondary structures are deduced with dssp
program from pdb file.
SWDOMAIN : Annotated
SWissprot
DOMAIN
TRANSMEM : Predicted
TRANSMEMbrane
region
VARSPLIC : Annotated
VARiants
produced by alternative
SPLICing
VARIANT : Annotated
VARIANT
(mutation)
(Annotation are mined in public database
(like SwissProt, PDB, PFAM) or predicted and propagated by MACSIMS)
Features description (available by passing mouse over the
feature):
Several suffixes can be found as complementary description:
PRED : for predicted feature.
PROP : for propagated feature.
WARN : for data mined from databases, but
which sounds wrong.
ERROR : for data mined from databases but
which is wrong (cross-validation step).
Back to top
How to reload previous results?
You can reload previous jobs at the
Reload job page if you have
previously saved
intermediary compulsary result files in the main results page.
To access the jmol interface which allows the display of the generated
model interconnected with the generated MACSIMS (annotated MACS), you
have to upload the MACSIMS file in XML format as well as the generated
PDB file of the model. You can also optionally upload the disease file
(if you do not, all the sequence names will be in black, except the
template sequence in blue).

Back to top
FAQ
What is a MACSIMS?
MACSIMS is the acronym of Multiple Alignment of Complete
Sequence Information Management System. MACSIMS allows
the automatic annotation of a MACS by integration of mined and
predicted structural and functional information.
By extension, the output of this program is called a MACSIMS.
Why didn't I receive an e-mail at the end of my job?
Our mail server may have problems or was too busy. However results can
be accessed by the Check job page, with your job identifier.
If you don't have access to this page after some hours (>3h for big
proteins with more than 300 residues), please contact MAGOS
administrator.
You can also check manualy your job at the check page.
What is the NorMD score?
The NorMD program provides a normalized score for the estimation of
protein Multiple Alignment of Complete Sequences (MACS) quality. A rule
of thumb is to consider that a MACS with a NorMD score higher
than 0.5 is a good quality MACS.
NorMD is an objective function that combines the advantages of
column-scoring techniques with the sensitivity of methods incorporating
residue similarity scores and does not depend upon the number, length
or overall similarity of the aligned sequences.
Why can't MAGOS generate a MACSIMS or an homology model?
If the query has no homologous sequence in the Uniprot or the PDB
database, no MACSIMS and no model can be generated.
The homology model is built thanks to the Geno3D software,
which
uses a template sequence from MACSIMS. If no homologous sequence is
found in the PDB database, no model will be generated.
To generate a good model, Geno3D needs to work with a template
sequence sharing at least 30% identity with the query sequence, and
needs an alignment in which at least 40 residues are identical.
Why take the page lot of time to loading MACSIMS and/or Jmol?
MACSIMS is an XML file with a size that can be more than 1Mo, it can be too much for older browser or on older computer.
A solution is to reduce the number of sequence to 50 on the submit
page. Less sequence can decrease information in the MACSIMS.So You
should find the good balance between loading time and number of
sequence in alignment.
Applet Jmol can also take lot of time to load molecule because the browser load applet and MACSIMS in the same time.
Why does the Jmol applet stay in loading state?
Jmol applet can take time to load a molecule. If the applet takes more
than several minutes to load a molecule, reload the page
(F5) and wait again. If the applet can't load the molecule still, check
your browser.
Why does the Jmol applet not work?
Several reasons may answer this problem.
Jmol works with Java version >= 1.4. If your Java version is under
1.4,
please update your Java library.
If you are working on MacOS system, the problem comes from JAVA
liveConnect that is not support natively by this system. See
Java or MacOS documentation about LiveConnect.
On GNU/linux system (tested on Fedora), all functions work fine with
Firefox, Mozilla and Opera
browsers. The interface has also been extensively tested on MS Windows
system: no
problem occurred with Firefox, Mozilla, Internet Explorer and Opera
under XP.
If you have problems, check your Java configuration. You can look at
the help
page for Java installation for Firefox,
Mozilla,
Internet Explorer.
Back to top
References
PipeAlign : a new toolkit for protein family analysis.
Plewniak F., Bianchetti L., Brelivet Y., Carles A., Chalmel F.,
Lecompte O., Mochel T., Moulinier L., Muller A., Muller J., Prigent V.,
Ripp R., Thierry J.C., Thompson J.D., Wicker N. and Poch O.
Nucleic Acids Research, 2003, Vol.31, 13:3829-3832
Geno3D: automatic comparative molecular modelling of protein.
Combet C., Jambon M., Deleage G. and Geourjon C.
Bioinformatics, 2002, Vol. 18, 213-214.
UniProt: the Universal Protein knowledgebase.
Apweiler R., Bairoch A., Wu C.H., Barker W.C., Boeckmann B., Ferro S.,
Gasteiger E., Huang H., Lopez R., Magrane M. et al.
Nucleic Acids Res, 2004, Vol. 32, D115-119.
The Protein Data Bank.
Berman H.M., Westbrook J., Feng Z., Gilliland G., Bhat T.N., Weissig
H., Shindyalov I.N. and Bourne P.E.
Nucleic Acids Res, 2000, Vol. 28, 235-242.
The Pfam protein families database.
Bateman A., Birney E., Cerruti L., Durbin R., Etwiller L., Eddy S.R.,
Griffiths-Jones S., Howe K.L., Marshall M. and Sonnhammer E.L.
Nucleic Acids Res, 2002, Vol. 30, 276-280.
The PROSITE database, its status in 2002
Falquet L., Pagni M., Bucher P., Hulo N., Sigrist C.J., Hofmann K. and
Bairoch A.
Nucleic Acids Res, 2002, Vol. 30, 235-238.
InterPro, progress and status in 2005.
Mulder N.J., Apweiler R., Attwood T.K., Bairoch A., Bateman A., Binns
D., Bradley P., Bork P., Bucher P., Cerutti L. et al.
Nucleic Acids Res, 2005, Vol. 33, D201-205.
Multiple Alignment of Complete Sequences Information Management System.
Thompson J.D., Muller A., Waterhouse A., Procter J., Barton G.J., Plewniak F. and Poch O.
BMC Bioinformatics. Submitted.
Online Mendelian Inheritance in Man (OMIM), a knowledgebase of human
genes and genetic disorders.
Hamosh A., Scott A.F., Amberger J.S., Bocchini C.A. and McKusick V.A.
Nucleic Acids Res, 2005, Vol. 33, D514-517.
Back to top