A language modeling approach to information retrieval
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Segmentation and detection at IBM: hybrid statistical models and two-tiered clustering
Topic detection and tracking
Bursty and hierarchical structure in streams
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Augmenting Naive Bayes Classifiers with Statistical Language Models
Information Retrieval
TopCat: Data Mining for Topic Identification in a Text Corpus
IEEE Transactions on Knowledge and Data Engineering
Estimating real-valued characteristics of criminals from their recorded crimes
Proceedings of the 17th ACM conference on Information and knowledge management
Combining naive bayes and n-gram language models for text classification
ECIR'03 Proceedings of the 25th European conference on IR research
i-JEN: visual interactive Malaysia crime news retrieval system
IVIC'11 Proceedings of the Second international conference on Visual informatics: sustaining research and innovations - Volume Part II
Hi-index | 0.00 |
Within the vocabulary used in a set of news stories a minority of terms will be topic-specific in that they occur largely or solely within those stories belonging to a common event. When applying unsupervised learning techniques such as clustering it is useful to determine which words are event-specific and which topic they relate to. Continuous language models are used to model the generation of news stories over time and from these models two measures are derived: bendiness which indicates whether a word is event specific and shape distance which indicates whether two terms are likely to relate to the same topic. These are used to construct a new clustering technique which identifies and characterises the underlying events within the news stream.