Assessing agreement on classification tasks: the kappa statistic
Computational Linguistics
An algorithm for suffix stripping
Readings in information retrieval
An Algorithm that Learns What‘s in a Name
Machine Learning - Special issue on natural language learning
Foundations of statistical natural language processing
Foundations of statistical natural language processing
SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Probabilistic DFA Inference using Kullback-Leibler Divergence and Minimality
ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Information Extraction: Techniques and Challenges
SCIE '97 International Summer School on Information Extraction: A Multidisciplinary Approach to an Emerging Information Technology
Creating a gold standard for sentence clustering in multi-document summarization
ACLstudent '09 Proceedings of the ACL-IJCNLP 2009 Student Research Workshop
Using temporal cues for segmenting texts into events
IceTAL'10 Proceedings of the 7th international conference on Advances in natural language processing
Hi-index | 0.00 |
Effectively identifying events in unstructured text is a very difficult task. This is largely due to the fact that an individual event can be expressed by several sentences. In this paper, we investigate the use of clustering methods for the task of grouping the text spans in a news article that refer to the same event. The key idea is to cluster the sentences, using a novel distance metric that exploits regularities in the sequential structure of events within a document. When this approach is compared to a simple bag of words baseline, a statistically significant increase in performance is observed.