HUPO Proteomics Standards Initiative Protein Interaction Specification Documentation

Proteomics Standards Initiative

Molecular Interaction XML Format 2.5

Documentation of schema changes from version 1.0 to 2.5

December 2005

Significant changes have been made from MIF 1.0 to 2.5. The overall aims were to

increase the expressive power of the format, going from “intersection” to “union” of molecular interaction annotation, allowing inter-database exchange of fully annotated records;
perform a cleanup and fix minor bugs.

Format changes

Although MIF 2.5 has many changes and is significantly more complex than 1.0, nearly all changes are optional additions, and the minimal representation of a given interaction has not been significantly expanded. In detail, the changes are:

<experimentDescription>

bibref is now mandatory. Submissions are considered bibrefs.
confidence is now a confidenceList, allowing multiple confidence values.

<interaction>

added id attribute
added interaction/parameterList element to describe especially kinetic parameters.
added an optional flag "negative" with default value false. If set to "true", it indicates that the interaction has explicitely been described as NOT being observed in the experiment.
added inferredInteractionList to allow correct description of complex topology, with supporting experimental evidence.
imexId has been added for the purpose of the IMEx molecular interaction exchange consortium.

<proteinInteractor> and
<proteinParticipant>

To enable representation of more general interactions, not only protein interactions, proteinParticipant and proteinInteractor have been renamed participant and interactor, respectively, and an element interactorType, controlled by a new controlled vocabulary, has been added. This allows a high flexibility for representation of general molecular interactions.

<interactor>

added modelled flag: If true, it describes an interaction in a species of interest, e.g. human, but has actually been investigated in another organism, e.g. mouse. The transfer will usually be based on a homology statement made by the data producer. If this optional element is missing, it is assumed to be set to false.
added intraMolecular flag: If true, it is an intramolecular interaction, e.g. an autophosphorylation. If missing, this element is assumed to be false.

<participant>

added id attribute
The addition of participant/interactionRef allows the representation of hierarchical structure in complexes, e.g. composition of a receptor complex from subunits, and interaction of such a receptor complex with a ligand.
participantIdentificationList has been added to allow the description of the participant identification method on a per-participant basis, not only a global method on the experiment level.
The previous “role” attribute has been split into the biological role, e.g. enzyme/target, and the experimental role, e.g. bait/prey.
New element experimentalFormList has been added to allow description of experimental forms, e.g. protein tags
ExperimentalInteractorList allows the representation of homology-based deductions made by the data provider. For example, an experimentalist might work with mouse proteins to make a statement on a human system. In this case, the experimentally used protein would be stored in an experimentalInteractor element, the human protein would be stored in the normal participant. On <interaction> level, the flag <modelled> should be set.
The <confidence> element has been extended into a <confidenceList>.

<feature>

added id attribute
renamed featureDescription -> featureType
renamed featureLocation -> featureRange
added ExperimentRefList. This allows to refer to one or more experiments in which the feature has been determined.
added a <names> element
added a <attributeList> element, to allow handling of free text feature description.

<featureRange>

each feature now has a list of range elements, to allow representation of discontinuous features, e.g. structural domains.
the range element has been restructured to allow fuzzy locations, and start/end ranges.
site and position have been removed

Administrative changes

id attributes

The type of id attributes has been changed from xs:ID to xs:int. xs:ID requires that any id is unique in the file. This was incompatible with the denormalised form of MIF 1.0, where e.g. the same protein may be listed more than once.
Ids are now defined to be arbibrary integers, unique to each object within an <entry>.
The type xs:int has been chosen to provide an easy mapping to standard data types, as it provides a limited range of integers, while xs:integer represents the mathmatical concept of integers with an unlimited value range.
All major objects now have an id attribute, namely <experiment>, <interaction>, <interactor>, <participant>, <feature>.

Method-related elements have been renamed for clarity:

ParticipantDetection -> ParticipantIdentificationMethod
InteractionDetection -> InteractionDetectionMethod
FeatureDetection -> FeatureDetectionMethod

namesType extended by addition of an optional list of aliases.
Ordered sequence of standard elements. New order is:

names
bibref
xref
other
attributeList

created a new complex type confidenceType and inserted it in all previous occurrences of confidence elements.
Added attribute/nameAc to allow controlled vocabulary for attributes.
extended xrefType, now allows a controlled vocabulary representation of database xrefs.
For all string attributes and elements, the length has been set to at least 1. This avoids empty attributes and elements, which could cause problems in data exchange.

Controlled vocabulary changes

The major change from PSI 1.0 to 2.5 requires a remapping of controlled vocabularies.

Proposed mappings from PSI 1.0 to 2.5 CVs are described in cv-1to25mapping.doc .

The reverse mapping is described in cv-25to1mapping.txt . This file is presented in plain text format to facilitate parsing.

Henning Hermjakob, 21/11/2005