Each instance is marked with its location within the article it was retrieved from, and any coreference or negated expressions which need to be resolved for the MIM fact to be inferred. The corpus captures any factual dependencies (synonyms and extra facts) that must be resolved to extract a fact completely. For example, a fact in the results section may require a synonym to be defined in the Introduction.
Version 0.9
A sample version of the data is available. It contains 484 annotated instances from the MIM descriptions: A1, A4, C2, C30, C36, E13, N4, N6, P21, and P36. The MIM corpus is provided in XML format.Publications based on preliminary versions of the corpus:
- Tara McIntosh and James R. Curran. 2009 "Challenges for automatically extracting molecular interactions from full-text articles" In BMC Bioinformatics
- Tara McIntosh and James R. Curran. 2007 "Sentence retrieval for extracting biomedical knowledge." In Proceedings of the Conference of the Pacific Association for Computational Linguistics (PACLING). Melbourne Australia [pdf]
- Tara McIntosh and James R. Curran. 2007 "Challenges for extracting biomedical knowledge from full text." In Proceedings of the Workshop on BioNLP (BioNLP). Prague Czech Republic [pdf]