Tara McIntosh

Projects

This page describes a number of my research projects.

Weighted Mutual Exclusion Bootstrapping (WMEB)

WMEB is an unsupervised bootstrapping algorithm for automatically extracting semantic lexicons from raw biomedical literature. Previous approaches suffer from semantic drift, where a lexicon's meaning shifts during bootstrapping. WMEB prevents semantic drift by extracting multiple competing classes simultaneously and exploiting statistical measures of association strength.

Extensions of WMEB utilise bagging and distributional similarity techniques to detect and prevent semantic drift further.

The systems are domain-independent and significantly outperforms previous approaches. See [ALTA08], [ACL09] and [PhD Thesis] for details.

NegFinder

NegFinder is an unsupervised algorithm for automatically detecting competing categories during bootstrapping. The discovered negative categories are then exploited to reduce semantic drift. Prior to this work, WMEB required a domain expert to manually craft negative competing categories.

NegFinder exploits the agglomerative process of hierarchical clustering to efficiently detect drifting categories. State-of-the-art results were published in [EMNLP10].

Relation Guided Bootstrapping

RGB is the first bootstrapping algorithm that automatically discovers open relationships between the target semantic categories. By simultaneously extracting lexicons and their open relations, the necessity of manually crafted category and relationship constraints is removed. State-of-the art results will be presented at ACL 2011.

MaxConf

As part of my final undergraduate year, I developed new Association Rule (AR) mining algorithms to extract gene relationships from microarray data.

At the time, existing microarray data mining techniques suffered from two major weaknesses - restricting the number of genes which can be included in the analysis, and the assumption that only common gene relationships are of interest.

My research rectified these. More specifically, I developed the first comprehensive AR algorithm, MaxConf, which can mine dense data with no support threshold (traditionally used to prune uncommon relationships), by incorporating new confidence threshold properties.

Some work and data is open-sourced, and is available on request.