Molecular Interaction
XML Format Documentation
Version 2.0
October 20, 2004
Introduction
The Proteomics Standards Initiative (PSI) aims to define community
standards for data representation in proteomics to facilitate data
comparison, exchange and verification. For detailed information on all
PSI activities, please see http://psidev.sf.net.
This document decribes the molecular interaction data exchange
format. PSI is following a leveled approach to building this
specification. This document describes level 2.0 beta. For documentation of
the previous level 1.0 please see
http://psidev.sourceforge.net/mi/xml/doc/user/.
Level 2.0 beta was never officially released.
Significant changes are currently being implemented and will be released in autumn 2005 as version
2.5.
PSI MI was designed by consortium of molecular interaction data
providers from both academia and industry, including BIND, DIP,
IntAct, MINT, MIPS, GlaxoSmithKline, CellZome, Hybrigenics,
Universities of Bielefeld, Bordeaux, Cambridge, and others.
Purpose of the PSI MI XML
format
The PSI MI format is a data exchange format for molecular
interactions. It is not a proposed database structure. Intended usages
are described by the use
cases documentation.
These use case descriptions also provide hints for future tools to be
developed.
Purpose of this
document
The purpose of this document is to describe the general structure of
the PSI MI XML specification in a more user-friendly manner than the
specification does itself. For the detailed and most up-to-date
description please see the auto-generated
documentation. This documentation also
provides additional
information, e.g. sample data files and use case descriptions.
The XML schema is located at http://psidev.sourceforge.net/mi/rel2/src/MIF2.xsd
Release schedule
PSI MI 2.0 beta will not be released. Significant changes are now being
implemented and will lead to version
2.5, to be released in autumn 2005.
Structure of a PSI MI
record
The root element of a PSI MI XML file is the entrySet. An entrySet
contains one or more entries. Each entry is a self-contained unit. This
allows to easily concatenate the contents of multiple files into a
single file by simply adding all the entries into the entrySet.

Figure 1: The entry top level element
Each entry describes one or more protein interactions. The PSI MI
format can be used in two forms, a compact and an expanded form. In the
compact form, all interactors (proteins), experiments, and availability
statements are described once in the respective list elements, and then
only referred to by references from the individual interactions in the
interactionList. The compact form allows a dense, non-repetitive
representation of the data, in particular for large data sets.
In the expanded form, all proteins, experiments, and availability
statements are described directly in the interaction element. As a
result, each interaction is a self-contained element providing all
necessary information. The expanded form results in larger files, but
is
more suitable for conversion to displayed data, e.g. HTML pages. The
PSI
MI consortium provides tools to convert the compact into the expanded
form and back.
In the next sections, the top level elements shown in Figure 1 and
their function will be described.
The source element describes the source of the entry, usually the
organisation which provides it. It also contains a release (number) and
a releaseDate.
The availabilityList provides statements on the availability of the
data, usually copyright statements. In the current version, the
availability statements are free text. The PSI MI format might later be
extended to provide predefined availability statements.
The experimentList contains experimentDescriptions. Each
experimentDescription describes one set of experimental parameters,
usually associated with a single publication. In large-scale
experiments, normally only one parameter is varied across a series of
experiments, usually the bait. The PSI MI format describes the constant
parameters, e.g. experimental techniques, in an experimentDescription,
while the variable parameters, e.g. the bait, are described in the
interaction element.
The interactorList describes a set of interactors participating in an
interaction. The interactor element describes
the "normal" form of a protein, consisting of the "administrative" data
like name and crossreferences, and organism and amino acid sequence.
Attributes which are relevant for a specific interaction, in particular
sequence features, are described in the participant element within an
interaction.

Figure 2: Interaction element
*** To be further detailed ***
see http://psidev.sourceforge.net/mi/rel2/doc/auto/MIF2.html
Use of external controlled
vocabularies
Where possible, external controlled vocabularies are referenced from
PSI MI. External controlled vocabularies are used in two forms:
- Open controlled
vocabularies: We think that no existing controlled vocabulary provides
all necessary terms for the given attribute in the PSI MI format. In
this case, it is up to the data provider to choose a controlled
vocabulary, or to provide a free text string if no appropriate
controlled vocabulary exists.
- Closed controlled
vocabularies: We think that there is a controlled vocabulary which
appropriately covers all necessary terms for the given attribute. In
this case, only terms from the defined vocabulary should be used.
The closed controlled vocabularies referenced by PSI MI are listed in
the table below. All vocabularies are contained in a pair of files in Gene
Ontology flat file format: psi-mi2.dag
and psi-mi2.def. The correctness of
references to external controlled vocabularies is
currently not enforced by the PSI MI schema. It is the responsibility
of
the data provider to ensure that only existing terms at an up-to-date
data source are referenced.
PSI MI XML
level 2 data element
|
term name
|
PSI-MI
identifier
|
|
participant identification method |
MI:0002 |
|
interaction detection method |
MI:0001 |
|
interaction type |
MI:0190 |
|
participant role
Example: enzyme
|
MI:0500
|
|
experimental role
Example: bait
|
MI:0495
|
|
experimental form
Example: his-tagged
|
MI:0505
|
featureType/featureDetectionMethod
|
feature detection method
|
MI:0003
|
featureType/featureType
|
feature type
|
MI:0116 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
deleted terms
|
obsolete
|
-
|
List of planned features
Because we are following a leveled approach, we are interested in
knowing what the community wishes to be included in the next level.
The latest list of features to discuss/include in the future can be
found here:
http://sourceforge.net/tracker/?atid=511101&group_id=65472&func=browse
How to comment
If you would like to comment on this document, the PSI MI XML
specification, please send a mail to:
psidev-mi-dev@lists.sourceforge.net
Available data
No data in PSI-MI 2.0 is available yet. Sample data will be added soon.
Tools
No PSI-MI 2.0 compatible tools are available yet. Will be added soon.
Data submission
The following databases currently accept submissions of PSI MI
formatted interaction data:
Further information and
relevant links
Databases involved:
Companies involved:
Related Efforts: