Molecular
Interaction XML Format 2.5
Documentation of schema changes from
version 1.0 to 2.5
December 2005
Significant changes have been made from MIF 1.0 to 2.5. The
overall aims were to
- increase the expressive power of
the format, going from “intersection” to “union”
of molecular interaction annotation, allowing inter-database exchange of
fully annotated records;
- perform a cleanup and fix minor bugs.
Format changes
Although
MIF 2.5 has many changes and is significantly more complex than 1.0, nearly all
changes are optional additions, and the minimal representation of a given
interaction has not been significantly expanded. In detail, the changes are:
- <experimentDescription>
- bibref
is now mandatory. Submissions are considered bibrefs.
- confidence
is now a confidenceList, allowing multiple confidence values.
- <interaction>
- added id attribute
- added
interaction/parameterList element to describe especially kinetic
parameters.
- added
an optional flag "negative" with default value false. If set to
"true", it indicates that the interaction has explicitely been
described as NOT being observed in the experiment.
- added
inferredInteractionList to allow correct description of complex topology,
with supporting experimental evidence.
- imexId
has been added for the purpose of the IMEx molecular interaction exchange
consortium.
- <proteinInteractor>
and
<proteinParticipant>
- To enable
representation of more general interactions, not only protein
interactions, proteinParticipant and proteinInteractor have been renamed
participant and interactor, respectively, and an element interactorType,
controlled by a new controlled vocabulary, has been added. This allows a
high flexibility for representation of general molecular interactions.
- <interactor>
- added
modelled flag: If true, it describes an interaction in a species of
interest, e.g. human, but has actually been investigated in another organism,
e.g. mouse. The transfer will usually be based on a homology statement
made by the data producer. If this optional element is missing, it is assumed to
be set to false.
- added
intraMolecular flag: If true, it is an intramolecular interaction, e.g.
an autophosphorylation. If missing, this element is assumed to be false.
- <participant>
- added id attribute
- The addition of
participant/interactionRef allows the representation of hierarchical
structure in complexes, e.g. composition of a receptor complex from
subunits, and interaction of such a receptor complex with a ligand.
- participantIdentificationList
has been added to allow the description of the participant identification
method on a per-participant basis, not only a global method on the
experiment level.
- The previous
“role” attribute has been split into the biological role,
e.g. enzyme/target, and the experimental role, e.g. bait/prey.
- New element
experimentalFormList has been added to allow description of experimental
forms, e.g. protein tags
- ExperimentalInteractorList
allows the representation of homology-based deductions made by the data
provider. For example, an experimentalist might work with mouse proteins
to make a statement on a human system. In this case, the experimentally
used protein would be stored in an experimentalInteractor element, the human protein would be stored in the
normal participant. On <interaction> level, the flag
<modelled> should be set.
- The
<confidence> element has been extended into a
<confidenceList>.
- <feature>
- added id attribute
- renamed
featureDescription -> featureType
- renamed
featureLocation -> featureRange
- added
ExperimentRefList. This allows to refer to one
or more experiments in which the feature has been determined.
- added a <names>
element
- added
a <attributeList> element, to allow handling of free text feature
description.
- <featureRange>
- each
feature now has a list of range elements, to allow representation of
discontinuous features, e.g. structural domains.
- the
range element has been restructured to allow fuzzy locations, and
start/end ranges.
- site and position
have been removed
- Administrative changes
- id attributes
- The type of id
attributes has been changed from xs:ID to
xs:int. xs:ID requires that any id is unique in the file. This was
incompatible with the denormalised form of MIF 1.0, where e.g. the same
protein may be listed more than once.
Ids are now defined to be arbibrary integers, unique to each object
within an <entry>.
The type xs:int has been chosen to provide an
easy mapping to standard data types, as it provides a limited range of
integers, while xs:integer represents the mathmatical concept of
integers with an unlimited value range.
- All major objects
now have an id attribute, namely <experiment>,
<interaction>, <interactor>, <participant>,
<feature>.
- Method-related
elements have been renamed for clarity:
- ParticipantDetection
-> ParticipantIdentificationMethod
- InteractionDetection
-> InteractionDetectionMethod
- FeatureDetection
-> FeatureDetectionMethod
- namesType
extended by addition of an optional list of aliases.
- Ordered sequence of
standard elements. New order is:
- names
- bibref
- xref
- other
- attributeList
- created
a new complex type confidenceType and inserted it in all previous occurrences
of confidence elements.
- Added
attribute/nameAc to allow controlled vocabulary for attributes.
- extended
xrefType, now allows a controlled vocabulary representation of database
xrefs.
- For all string
attributes and elements, the length has been set to at least 1. This
avoids empty attributes and elements, which could cause problems in data
exchange.
Controlled vocabulary changes
The major change from PSI 1.0 to 2.5 requires a remapping of
controlled vocabularies.
Proposed mappings from PSI 1.0 to 2.5 CVs are described in cv-1to25mapping.doc .
The reverse mapping is described in cv-25to1mapping.txt . This file is presented in
plain text format to facilitate parsing.
Henning Hermjakob, 21/11/2005