Exploiting structure for event discovery using the MDI algorithm

Authors:
Martina Naughton
Affiliations:
University College Dublin, Ireland
Venue:
ACL '07 Proceedings of the 45th Annual Meeting of the ACL: Student Research Workshop
Year:
2007

Citing 7
Cited 2

Assessing agreement on classification tasks: the kappa statistic

Computational Linguistics
An algorithm for suffix stripping

Readings in information retrieval
An Algorithm that Learns What‘s in a Name

Machine Learning - Special issue on natural language learning
Foundations of statistical natural language processing

Foundations of statistical natural language processing
Generic summarization and keyphrase extraction using mutual reinforcement principle and sentence clustering

SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Probabilistic DFA Inference using Kullback-Leibler Divergence and Minimality

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Information Extraction: Techniques and Challenges

SCIE '97 International Summer School on Information Extraction: A Multidisciplinary Approach to an Emerging Information Technology

Creating a gold standard for sentence clustering in multi-document summarization

ACLstudent '09 Proceedings of the ACL-IJCNLP 2009 Student Research Workshop
Using temporal cues for segmenting texts into events

IceTAL'10 Proceedings of the 7th international conference on Advances in natural language processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Effectively identifying events in unstructured text is a very difficult task. This is largely due to the fact that an individual event can be expressed by several sentences. In this paper, we investigate the use of clustering methods for the task of grouping the text spans in a news article that refer to the same event. The key idea is to cluster the sentences, using a novel distance metric that exploits regularities in the sequential structure of events within a document. When this approach is compared to a simple bag of words baseline, a statistically significant increase in performance is observed.