SIGIR '93 Proceedings of the 16th annual international ACM SIGIR conference on Research and development in information retrieval
On-line new event detection and tracking
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Foundations of statistical natural language processing
Foundations of statistical natural language processing
Probabilistic latent semantic indexing
Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Explorations in Automatic Thesaurus Discovery
Explorations in Automatic Thesaurus Discovery
An Adapted Lesk Algorithm for Word Sense Disambiguation Using WordNet
CICLing '02 Proceedings of the Third International Conference on Computational Linguistics and Intelligent Text Processing
The Journal of Machine Learning Research
Efficient randomized pattern-matching algorithms
IBM Journal of Research and Development - Mathematics and computing
Topic Detection in the news domain
ISICT '04 Proceedings of the 2004 international symposium on Information and communication technologies
Topic Detection from Blog Documents Using Users' Interests
MDM '06 Proceedings of the 7th International Conference on Mobile Data Management
Enhancing clustering blog documents by utilizing author/reader comments
ACM-SE 45 Proceedings of the 45th annual southeast regional conference
Topic Detection by Clustering Keywords
DEXA '08 Proceedings of the 2008 19th International Conference on Database and Expert Systems Application
Clustering Blogs with Collective Wisdom
ICWE '08 Proceedings of the 2008 Eighth International Conference on Web Engineering
NLDB'09 Proceedings of the 14th international conference on Applications of Natural Language to Information Systems
Detecting topic labels for tweets by matching features from pseudo-relevance feedback
AusDM '12 Proceedings of the Tenth Australasian Data Mining Conference - Volume 134
Hi-index | 0.00 |
In recent years we have seen a vast increase in the volume of information published on weblog sites and also the creation of new web technologies where people discuss actual events. The need for automatic tools to organize this massive amount of information is clear, but the particular characteristics of weblogs such as shortness and overlapping vocabulary make this task difficult. In this work, we present a novel methodology to cluster weblog posts according to the topics discussed therein. This methodology is based on a generative probabilistic model in conjunction with a Self-Term Expansion methodology. We present our results which demonstrate a considerable improvement over the baseline.