Improving the Clustering of Blogosphere with a Self-term Enriching Technique

Authors:
Fernando Perez-Tellez;David Pinto;John Cardiff;Paolo Rosso
Affiliations:
Institute of Technology Tallaght, Social Media Research Group, Dublin, Ireland;Benemerita Universidad Autónoma de Puebla, Mexico;Institute of Technology Tallaght, Social Media Research Group, Dublin, Ireland;Natural Language Engineering Lab. - EliRF, Dept. Sistemas Informáticos y Computación, Universidad Politécnica, Valencia, Spain
Venue:
TSD '09 Proceedings of the 12th International Conference on Text, Speech and Dialogue
Year:
2009

Citing 9
Cited 3

Concept based query expansion

SIGIR '93 Proceedings of the 16th annual international ACM SIGIR conference on Research and development in information retrieval
Foundations of statistical natural language processing

Foundations of statistical natural language processing
Explorations in Automatic Thesaurus Discovery

Explorations in Automatic Thesaurus Discovery
Information Retrieval

Information Retrieval
An Adapted Lesk Algorithm for Word Sense Disambiguation Using WordNet

CICLing '02 Proceedings of the Third International Conference on Computational Linguistics and Intelligent Text Processing
A simple but powerful automatic term extraction method

COMPUTERM '02 COLING-02 on COMPUTERM 2002: second international workshop on computational terminology - Volume 14
UPV-SI: word sense induction using self term expansion

SemEval '07 Proceedings of the 4th International Workshop on Semantic Evaluations
Fast clustering algorithm for information organization

CICLing'03 Proceedings of the 4th international conference on Computational linguistics and intelligent text processing
Characterizing weblog corpora

NLDB'09 Proceedings of the 14th international conference on Applications of Natural Language to Information Systems

Clustering the tagged resources using STAC

WISM'10 Proceedings of the 2010 international conference on Web information systems and mining
On the difficulty of clustering microblog texts for online reputation management

WASSA '11 Proceedings of the 2nd Workshop on Computational Approaches to Subjectivity and Sentiment Analysis
Followee recommendation based on text analysis of micro-blogging activity

Information Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

The analysis of blogs is emerging as an exciting new area in the text processing field which attempts to harness and exploit the vast quantity of information being published by individuals. However, their particular characteristics (shortness, vocabulary size and nature, etc.) make it difficult to achieve good results using automated clustering techniques. Moreover, the fact that many blogs may be considered to be narrow domain means that exploiting external linguistic resources can have limited value. In this paper, we present a methodology to improve the performance of clustering techniques on blogs, which does not rely on external resources. Our results show that this technique can produce significant improvements in the quality of clusters produced.