Tara McIntosh

Molecular Interaction Map Corpus

The Molecular Interaction Map (MIM) corpus consists of manually annotated passages (instances) from full-text articles that describe interactions summarised by Kohn (1999) in a Molecular Interaction Map.

Each instance is marked with its location within the article it was retrieved from, and any coreference or negated expressions which need to be resolved for the MIM fact to be inferred. The corpus captures any factual dependencies (synonyms and extra facts) that must be resolved to extract a fact completely. For example, a fact in the results section may require a synonym to be defined in the Introduction.

Version 0.9

A sample version of the data is available. It contains 484 annotated instances from the MIM descriptions: A1, A4, C2, C30, C36, E13, N4, N6, P21, and P36. The MIM corpus is provided in XML format.

Publications based on preliminary versions of the corpus:

Tara McIntosh and James R. Curran. 2009 "Challenges for automatically extracting molecular interactions from full-text articles" In BMC Bioinformatics
Tara McIntosh and James R. Curran. 2007 "Sentence retrieval for extracting biomedical knowledge." In Proceedings of the Conference of the Pacific Association for Computational Linguistics (PACLING). Melbourne Australia [pdf]
Tara McIntosh and James R. Curran. 2007 "Challenges for extracting biomedical knowledge from full text." In Proceedings of the Workshop on BioNLP (BioNLP). Prague Czech Republic [pdf]