SIGIR '93 Proceedings of the 16th annual international ACM SIGIR conference on Research and development in information retrieval
Foundations of statistical natural language processing
Foundations of statistical natural language processing
Explorations in Automatic Thesaurus Discovery
Explorations in Automatic Thesaurus Discovery
Information Retrieval
An Adapted Lesk Algorithm for Word Sense Disambiguation Using WordNet
CICLing '02 Proceedings of the Third International Conference on Computational Linguistics and Intelligent Text Processing
A simple but powerful automatic term extraction method
COMPUTERM '02 COLING-02 on COMPUTERM 2002: second international workshop on computational terminology - Volume 14
UPV-SI: word sense induction using self term expansion
SemEval '07 Proceedings of the 4th International Workshop on Semantic Evaluations
Fast clustering algorithm for information organization
CICLing'03 Proceedings of the 4th international conference on Computational linguistics and intelligent text processing
NLDB'09 Proceedings of the 14th international conference on Applications of Natural Language to Information Systems
Clustering the tagged resources using STAC
WISM'10 Proceedings of the 2010 international conference on Web information systems and mining
On the difficulty of clustering microblog texts for online reputation management
WASSA '11 Proceedings of the 2nd Workshop on Computational Approaches to Subjectivity and Sentiment Analysis
Followee recommendation based on text analysis of micro-blogging activity
Information Systems
Hi-index | 0.00 |
The analysis of blogs is emerging as an exciting new area in the text processing field which attempts to harness and exploit the vast quantity of information being published by individuals. However, their particular characteristics (shortness, vocabulary size and nature, etc.) make it difficult to achieve good results using automated clustering techniques. Moreover, the fact that many blogs may be considered to be narrow domain means that exploiting external linguistic resources can have limited value. In this paper, we present a methodology to improve the performance of clustering techniques on blogs, which does not rely on external resources. Our results show that this technique can produce significant improvements in the quality of clusters produced.