Proteomics Standards Initiative

Molecular Interaction XML Format Documentation

Version 2.0

October 20, 2004


Table of Contents

  1. Introduction
  2. Purpose of the PSI MI XML format
  3. Purpose of this document
  4. Release schedule
  5. Changes from PSI MI 1.0 to 2.0
  6. Structure of a PSI MI record
  7. Detailed Documentation
  8. Use of external controlled vocabularies
  9. List of planned features
  10. How to comment
  11. Available data
  12. Tools
  13. Data submission
  14. Further information and relevant links

Introduction

The Proteomics Standards Initiative (PSI) aims to define community standards for data representation in proteomics to facilitate data comparison, exchange and verification. For detailed information on all PSI activities, please see http://psidev.sf.net.

This document decribes the molecular interaction data exchange format. PSI is following a leveled approach to building this specification. This document describes level 2.0 beta. For documentation of the previous level 1.0 please see http://psidev.sourceforge.net/mi/xml/doc/user/.

Level 2.0 beta was never officially released. Significant changes are currently being implemented and will be released in autumn 2005 as version 2.5.

PSI MI was designed by consortium of molecular interaction data providers from both academia and industry, including BIND, DIP,  IntAct, MINT, MIPS, GlaxoSmithKline, CellZome, Hybrigenics, Universities of Bielefeld, Bordeaux, Cambridge, and others.

Purpose of the PSI MI XML format

The PSI MI format is a data exchange format for molecular interactions. It is not a proposed database structure. Intended usages are described by the use cases documentation. These use case descriptions also provide hints for future tools to be developed.

Purpose of this document

The purpose of this document is to describe the general structure of the PSI MI XML specification in a more user-friendly manner than the specification does itself. For the detailed and most up-to-date description please see the auto-generated documentation. This documentation also provides additional information, e.g. sample data files and use case descriptions.
The XML schema is located at http://psidev.sourceforge.net/mi/rel2/src/MIF2.xsd

Release schedule

PSI MI 2.0 beta will not be released. Significant changes are now being implemented and will lead to version 2.5, to be released in autumn 2005.

Changes from PSI MI 1.0 to 2.0

Structure of a PSI MI record

The root element of a PSI MI XML file is the entrySet. An entrySet contains one or more entries. Each entry is a self-contained unit. This allows to easily concatenate the contents of multiple files into a single file by simply adding all the entries into the entrySet.


Figure 1: The entry top level element

Each entry describes one or more protein interactions. The PSI MI format can be used in two forms, a compact and an expanded form. In the compact form, all interactors (proteins), experiments, and availability statements are described once in the respective list elements, and then only referred to by references from the individual interactions in the interactionList. The compact form allows a dense, non-repetitive representation of the data, in particular for large data sets.
In the expanded form, all proteins, experiments, and availability statements are described directly in the interaction element. As a result, each interaction is a self-contained element providing all necessary information. The expanded form results in larger files, but is more suitable for conversion to displayed data, e.g. HTML pages. The PSI MI consortium provides tools to convert the compact into the expanded form and back.

In the next sections, the top level elements shown in Figure 1 and their function will be described.

The source element describes the source of the entry, usually the organisation which provides it. It also contains a release (number) and a releaseDate.

The availabilityList provides statements on the availability of the data, usually copyright statements. In the current version, the availability statements are free text. The PSI MI format might later be extended to provide predefined availability statements.

The experimentList contains experimentDescriptions. Each experimentDescription describes one set of experimental parameters, usually associated with a single publication. In large-scale experiments, normally only one parameter is varied across a series of experiments, usually the bait. The PSI MI format describes the constant parameters, e.g. experimental techniques, in an experimentDescription, while the variable parameters, e.g. the bait, are described in the interaction element.

The interactorList describes a set of interactors participating in an interaction. The interactor element describes the "normal" form of a protein, consisting of the "administrative" data like name and crossreferences, and organism and amino acid sequence. Attributes which are relevant for a specific interaction, in particular sequence features, are described in the participant element within an interaction.


Figure 2: Interaction element

*** To be further detailed ***

Detailed Documentation

see http://psidev.sourceforge.net/mi/rel2/doc/auto/MIF2.html

Use of external controlled vocabularies

Where possible, external controlled vocabularies are referenced from PSI MI. External controlled vocabularies are used in two forms:
The closed controlled vocabularies referenced by PSI MI are listed in the table below. All vocabularies are contained in a pair of files in Gene Ontology flat file format: psi-mi2.dag and psi-mi2.def. The correctness of references to external controlled vocabularies is currently not enforced by the PSI MI schema. It is the responsibility of the data provider to ensure that only existing terms at an up-to-date data source are referenced.

PSI MI XML level 2 data element
term name
PSI-MI identifier
experimentType/participantIdentificationMethod participant identification method MI:0002
experimentType/interactionDetectionMethod interaction detection method MI:0001
interactionElementType/interactionType interaction type MI:0190
interactionElementType/participantList/*/participantRole participant role
Example: enzyme
MI:0500
interactionElementType/experimentalFormList/*/experimentalRole experimental role
Example: bait
MI:0495
interactionElementType/experimentalFormList/*/experimentalForm experimental form
Example: his-tagged
MI:0505
featureType/featureDetectionMethod
feature detection method
MI:0003
featureType/featureType
feature type
MI:0116





















deleted terms
obsolete
-

List of planned features

Because we are following a leveled approach, we are interested in knowing what the community wishes to be included in the next level.

The latest list of features to discuss/include in the future can be found here:
http://sourceforge.net/tracker/?atid=511101&group_id=65472&func=browse

How to comment

If you would like to comment on this document, the PSI MI XML specification, please send a mail to:
psidev-mi-dev@lists.sourceforge.net 

Available data

No data in PSI-MI 2.0 is available yet. Sample data will be added soon.

Tools

No PSI-MI 2.0 compatible tools are available yet. Will be added soon.

Data submission

The following databases currently accept submissions of PSI MI formatted interaction data:

Further information and relevant links

Databases involved:

Companies involved: Related Efforts: