MAGOS - Help page

MAGOS allows an automated modelling of your query protein coupled to the creation of a hierarchical and annotated Multiple Alignment of Complete Sequences (MACS). You submit a single query protein in FASTA format and optionally a template code and MAGOS will create the associated MACSIMS and homology model for this query. MAGOS results will be available through a user-friendly interface that allows the interconnection of the generated model and the computed MACS. The MAGOS web server thus allows an interactive approach of structural information within the framework of the evolutionary relevance of the mined and predicted sequence information.



Flowchart of the MAGOS interconnected processes

magos process

Multiple Alignment of Complete Sequences (MACS) computation with PipeAlign

A tuned version of PipeAlign has been used to compute a high quality MACS. The PipeAlign process can be divided in several steps:


Ballast (homology search):
Starting from the query sequence, Ballast first runs BlastP to search for homologues in the UniProt and PDB databases or in the non-redundant UniRef90 database. It then post-processes the Blast results to identify small conserved segments (LMSs) that may characterise the protein family of the query.

Closest PDB homologue :
To define the closest PDB homologue, a BlastP search is performed using the query sequence within the PDB database, (maximal E-value parameter set to 10). The PDB with the best score associated with is defined as the closest PDB. If not present in the selected sequences, the closest PDB homologue is added to the pool of proteins to be aligned.

DbClustal (sequences alignment):
Ballasts LMS are used by DbClustal as soft anchors to guide the building of the multiple alignment of the sequences detected with an E-value < 10^-3 during the BlastP search in the limit of the maximal number of sequences to be aligned defined by the user (default=100).

Rascal (alignment refinement):
Rascal scans the MACS to identify misaligned residues and blocks. It then realigns them in order to improve the MACS quality.

NorMD (objective function):
The quality of both DbClustal and Rascal alignments is evaluated with NorMD, an objective function which associates a score to an alignment.

Leon (unrelated sequence removal):
The highest scored MACS is processed by Leon which removes potential weakly related or highly fragmented sequences to generate the final MACS, which is also scored by NorMD.

DPC/Secator (clustering step):
Subfamilies are then automatically defined using the clustering programs DPC or Secator.

The alignment of the query and the closest PDB  extracted from the final alignment to compute in parallel a model using Geno3D.

Back to top

MACSIMS annotation of MACS


MACSIMS (MACS Information Management System) allows the automatic annotation of a MACS. MACSIMS integrates structural and functional information mined from external databases like PFAM, Prosite, Interpro and OMIM as well as various ab initio predictions (global and subfamily residue conservation, transmembrane segments, coiled coil regions, low amino acid composition biased segments,...). The retrieved information can be propagated through the whole alignment, including the query, according to conservation criteria.

Model construction with Geno3D


Geno3D processes the alignment of the query and the template PDB sequences. Geno3D extracts geometrical restraints (dihedral angles and distances) for corresponding atoms between the query and the template and performs the 3D construction of the protein by using a geometrical approach. It uses CNS as a molecular building and analysing engine.
Good results will be obtained with an entry alignment in which the 2 sequences share at least 30% identity and 40 common residues.
After modelling, generated models are validated by structural alignment with template (using CE). The model with the minimal energy is chosen as the retained model.
The resulting structure is a model and must be considered carefully, it is not an experimental 3D structure!

Results


The final step of the MAGOS server is the retrieval of all computed results and the ability of their interconnection through a user-friendly web interface (Jmol page).

Back to top

How to submit a query?

sequence submission

You have to paste your query sequence in fasta format in the dedicated area.

Query sequence example in fasta format:
>query
MQRAVSVVARLGFRLQAFPPALCRPLSCAQEVLRRTPLYDFHLAHGGKMVAFAGWSLPVQYRDS
HTDSHLHTRQHCSLFDVSHMLQTKILGSDRVKLMESLVVGDIAELRPNQGTLSLFTNEAGGILD
DLIVTNTSEGHLYVVSNAGCWEKDLALMQDKVRELQNQGRDVGLEVLDNALLALQGPTAAQVLQ
AGVADDLRKLPFMTSAVMEVFGVSGCRVTRCGYTGEDGVEISVPVAGAVHLATAILKNPEVKLA
GLAARDSLRLEAGLCLYGNDIDEHTTPVEGSLSWTLGKRRRAAMDFPGAKVIVPQLKGRVQRRR
VGLMCEGAPMRAHSPILNMEGTKIGTVTSGCPSPSLKKNVAMGYVPCEYSRPGTMLLVEVRRKQ
QMAVVSKMPFVPTNYYTLK


Several options are available:
    - the maximal number of sequences to be aligned in the multiple alignment of complete sequences (default = 100)
    - the E-value limit for the sequences to be aligned (default = 0.001)
    - the database for the BlastP search (default = UniProt + PDB)
    - the BlastP low complexity filter option (default = True)
    - the BlastP gapped option (Default = True)
    - the PDB structure you want to be used as template for the query modelling by filling the "Template PDB ID code" area (4 letters PDB identifier, e.g. 1wsv) and precise the chain if necessary in the "chain" field (e.g. B).
    - the maximal number of distant restraints for the model generation (Default = 20000). Should be modified carefully by expert user.
    - the superposition of the three generated models (the one with the best RMSD with the template and the minimal energy value is selected as the best)

Click then on the "Submit" button.
An identifier will be associated to your job.

job identifier

Please note this number for later reference.

The job will run on our server. The time required to process a request depends on the length of the submitted protein. With long proteins, it can take up to 3 hours.
If you have filled the e-mail address field, you will receive an e-mail when the job is completed, with a link to the result page.
However, you can access your results by pasting your job identifier in the dedicated area on the Check job page. Note that results will be kept on our server for about 2 weeks. You will be able to upload your results later by saving your intermediary results file and reload them (How to reload previous results).

Back to top

How to check your results?

If you have precised your mail address, you will get an e-mail at the end of the job, with a link to the result page.
If not, you can access your results by pasting your job identifier in the dedicated area on the Check job page.

check results

Note that results will be kept on our server for about 2 weeks.

The check results page give the state of a running job with all steps that have been completed.
If an essential step has failed, an error message is displayed in red.
If a non essential step failed, a message is displayed in orange.
If a step runs successfully, Done is displayed in green.

check results


Back to top

What are the results?

Main results page

main result page


The main results page allows the downloading of all generated files:
    - Alignment file (MACS) in MSF format.
    - MACSIMS (Multiple Alignment of Complete Sequence Information Management System) output in XML format (compulsory for later use)
    - DSSP output in text format.
    - BlastP output in text format.
    - Model generated in PDB file format (compulsory for later use)
    - Disease file in text format. This file can be empty if no information about human diseases has been found.

The jmol page allowing the display of the generated model interconnected with the generated MACSIMS (annotated MACS) can be accessed via the "Jmol display" link.
An image representing the modelled area compared to the full length query sequence is also available.

model

Back to top

Jmol page

You must have latest java plugin installed in order to use the Jmol Applet.
You also need to enable javascript for your web browser. You can check your browser here.
Jmol functions have been tested on:
    - IE >6.0, Firefox >0.8, Mozilla >1.5 on Windows
    - Mozilla >1.5, Firefox >0.8 on Linux
With Opera >7.5 and Safari distance and log functions doesn't work.
With Konqueror >3.4 Jmol applet doesn't work and stay in loading state. MacOS X <10.3 doesn't support LiveConnect between Java and JavaScript, Jmol tools won't work with any browser. MACSIMS vizualisation is still available.
It's strongly recommanded to have the latest version of your browser.

jmol page

Jmol right frame
The right frame is used to display and access information from the MACSIMS file (the annotated MACS).
In this frame, you can notice:
    - the top sequence corresponds to the query sequence (QUERY),
    - the underlined part of the query sequence is the modelled part, i.e. CAQEVLRRTPLYD...
    - a popup window containing general information about the protein is linked to the protein names,
    - a colour code is associated to the protein names:
        in blue : the template protein used to compute the model. 1wsv_B.
        in red : the proteins involved in Human diseases. P48728.
        in pink : the proteins homologue to a protein involved in Human diseases. Q9TSZ7.
        in black : the others proteins.
    - the amino acid sequences are coloured according to their cluster (sub-families).

By default, only the first sequence of each cluster is displayed. Sub-families can easily be collapsed or uncollapsed for better visualisation and analysis by using the + or - image or the Uncollapse all or Collapse all button. A cross image means that there is only one sequence in the cluster.
The quality index (NorMD score) associated with the alignment is shown on the bottom of the frame.
The alignment annotations are accessible through the combo box just below. Detailled features.
The selection of a given feature allows its mapping in the context of the MACS. For the secundary STRUCTure feature, alpha-helices are in pink and beta-sheets in yellow.
If the modelled sequence owns the feature, it is mapped onto the 3D structure model in the left frame.
The selection of a residue in the modelled part of the query sequence (underlined part of the sequences) allows the localization of this residue within the model (left frame). The selected residue will appear in green on the model.

Back to top

Jmol left frame
The left frame (applet frame) is used for 3D model rendering. It is divided in 2 parts : the visualization part above, and the control panel below.
The control panel provides several display options. Basic options allow the selection of the applet background colour, zoom level and reset command (to set labels off).
In the "Select" area, you can change the rendering and colouring types for the structure. Click on the "go!" button to validate changes.
The colour options are simultaneously mapped on the 3D model and on the MACSIMS top query sequence (right panel).
The "Link Parameters" area allows linking the model (left frame) with the MACSIMS (default) or with the alignment of the modelled sequence with the template PDB sequence in the right frame.

link parameter


The "Console" area is by default in "Selection mode". After the selection of two residues on the 3D model (in blue), the "Distance mode" allows the computation of the distance between the two residuess. The 2 selected residues are simultaneously highlighted in the MACSIMS (right) frame.

distance mode


Selection, distance and user commands are displayed in the console.Jmol/Rasmol commands can also be launched in the "Jmol/Rasmol command" area (advanced users).

Back to top

MACSIMS features

Features types:

ANCHOR : Predicted ANCHORs (Ballast)
BLOCK : Predicted conservation BLOCKs (Rascal)
COIL : Predicted COIL (ncoils)
LOWCOMP : Predicted LOW COMPlexity region
MOD_RES : Annotated MODified RESidues
PFAM A : Annotated PFAM A region
PFAM B : Annotated PFAM B region
PROSITE : Annotated PROSITE domain
REGION : Predicted REGION (associated to Rascal's blocks)
REPEAT : Annotated REPEAT
SEQERR : Predicted block of SEQuencing ERRor
SIGNAL : Annotated SIGNAL
SITE : Annotated functional SITE
STRUCT : Annotated secondary STRUCTures. /!\ For query sequence, secondary structures are deduced with dssp program from pdb file.
SWDOMAIN : Annotated SWissprot DOMAIN
TRANSMEM : Predicted TRANSMEMbrane region
VARSPLIC : Annotated VARiants produced by alternative SPLICing
VARIANT : Annotated VARIANT (mutation)
(Annotation are mined in public database (like SwissProt, PDB, PFAM) or predicted and propagated by MACSIMS)


Features description (available by passing mouse over the feature):

Several suffixes can be found as complementary description:
PRED : for predicted feature.
PROP : for propagated feature.
WARN : for data mined from databases, but which sounds wrong.
ERROR : for data mined from databases but which is wrong (cross-validation step).

Back to top

How to reload previous results?

You can reload previous jobs at the Reload job page if you have previously saved intermediary compulsary result files in the main results page.
To access the jmol interface which allows the display of the generated model interconnected with the generated MACSIMS (annotated MACS), you have to upload the MACSIMS file in XML format as well as the generated PDB file of the model. You can also optionally upload the disease file (if you do not, all the sequence names will be in black, except the template sequence in blue).

reload



Back to top

FAQ

What is a MACSIMS?

MACSIMS is the acronym of Multiple Alignment of Complete Sequence Information Management System. MACSIMS allows the automatic annotation of a MACS by integration of mined and predicted structural and functional information.
By extension, the output of this program is called a MACSIMS.

Why didn't I receive an e-mail at the end of my job?

Our mail server may have problems or was too busy. However results can be accessed by the Check job page, with your job identifier.
If you don't have access to this page after some hours (>3h for big proteins with more than 300 residues), please contact MAGOS administrator.
You can also check manualy your job at the check page.

What is the NorMD score?

The NorMD program provides a normalized score for the estimation of protein Multiple Alignment of Complete Sequences (MACS) quality. A rule of thumb is to consider that a MACS with a NorMD score higher than 0.5 is a good quality MACS.
NorMD is an objective function that combines the advantages of column-scoring techniques with the sensitivity of methods incorporating residue similarity scores and does not depend upon the number, length or overall similarity of the aligned sequences.

Why can't MAGOS generate a MACSIMS or an homology model?

If the query has no homologous sequence in the Uniprot or the PDB database, no MACSIMS and no model can be generated.
The homology model is built thanks to the Geno3D software, which uses a template sequence from MACSIMS. If no homologous sequence is found in the PDB database, no model will be generated.
To generate a good model, Geno3D needs to work with a template sequence sharing at least 30% identity with the query sequence, and needs an alignment in which at least 40 residues are identical.

Why take the page lot of time to loading MACSIMS and/or Jmol?

MACSIMS is an XML file with a size that can be more than 1Mo, it can be too much for older browser or on older computer.
A solution is to reduce the number of sequence to 50 on the submit page. Less sequence can decrease information in the MACSIMS.So You should find the good balance between loading time and number of sequence in alignment.
Applet Jmol can also take lot of time to load molecule because the browser load applet and MACSIMS in the same time.

Why does the Jmol applet stay in loading state?

Jmol applet can take time to load a molecule. If the applet takes more than several minutes to load a molecule, reload the page (F5) and wait again. If the applet can't load the molecule still, check your browser.

Why does the Jmol applet not work?

Several reasons may answer this problem.
Jmol works with Java version >= 1.4. If your Java version is under 1.4, please update your Java library.
If you are working on MacOS system, the problem comes from JAVA liveConnect that is not support natively by this system. See Java or MacOS documentation about LiveConnect. On GNU/linux system (tested on Fedora), all functions work fine with Firefox, Mozilla and Opera browsers. The interface has also been extensively tested on MS Windows system: no problem occurred with Firefox, Mozilla, Internet Explorer and Opera under XP.
If you have problems, check your Java configuration. You can look at the help page for Java installation for Firefox, Mozilla, Internet Explorer.

Back to top

References


PipeAlign : a new toolkit for protein family analysis. Plewniak F., Bianchetti L., Brelivet Y., Carles A., Chalmel F., Lecompte O., Mochel T., Moulinier L., Muller A., Muller J., Prigent V., Ripp R., Thierry J.C., Thompson J.D., Wicker N. and Poch O.
Nucleic Acids Research, 2003, Vol.31, 13:3829-3832

Geno3D: automatic comparative molecular modelling of protein.
Combet C., Jambon M., Deleage G. and Geourjon C.
Bioinformatics, 2002, Vol. 18, 213-214.

UniProt: the Universal Protein knowledgebase.
Apweiler R., Bairoch A., Wu C.H., Barker W.C., Boeckmann B., Ferro S., Gasteiger E., Huang H., Lopez R., Magrane M. et al.
Nucleic Acids Res, 2004, Vol. 32, D115-119.

The Protein Data Bank.
Berman H.M., Westbrook J., Feng Z., Gilliland G., Bhat T.N., Weissig H., Shindyalov I.N. and Bourne P.E.
Nucleic Acids Res, 2000, Vol. 28, 235-242.

The Pfam protein families database.
Bateman A., Birney E., Cerruti L., Durbin R., Etwiller L., Eddy S.R., Griffiths-Jones S., Howe K.L., Marshall M. and Sonnhammer E.L.
Nucleic Acids Res, 2002, Vol. 30, 276-280.

The PROSITE database, its status in 2002
Falquet L., Pagni M., Bucher P., Hulo N., Sigrist C.J., Hofmann K. and Bairoch A.
Nucleic Acids Res, 2002, Vol. 30, 235-238.

InterPro, progress and status in 2005.
Mulder N.J., Apweiler R., Attwood T.K., Bairoch A., Bateman A., Binns D., Bradley P., Bork P., Bucher P., Cerutti L. et al.
Nucleic Acids Res, 2005, Vol. 33, D201-205.

Multiple Alignment of Complete Sequences Information Management System.
Thompson J.D., Muller A., Waterhouse A., Procter J., Barton G.J., Plewniak F. and Poch O.
BMC Bioinformatics. Submitted.

Online Mendelian Inheritance in Man (OMIM), a knowledgebase of human genes and genetic disorders.
Hamosh A., Scott A.F., Amberger J.S., Bocchini C.A. and McKusick V.A.
Nucleic Acids Res, 2005, Vol. 33, D514-517.

Back to top


Contacts:
For MACSIMS: Anne Friedrich or Olivier Poch
For Web Server,Jmol interface and Molecular Modeling: Nicolas Garnier or Emmanuel Bettler

This site works with resolution >=800*600
and is optimized for 1024*768
ibcp websiteIGBMC website   AFM website   Decrypthon website   CNRS website    UCBL website Valid HTML 4.01! Valid CSS!