Clustering short texts using wikipedia

Authors:
Somnath Banerjee;Krishnan Ramanathan;Ajay Gupta
Affiliations:
Hewlett-Packard Labs, Bangalore, India;Hewlett-Packard Labs, Bangalore, India;Hewlett-Packard Labs, Bangalore, India
Venue:
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Year:
2007

Citing 2
Cited 47

Automatic text processing

Automatic text processing
Ontologies Improve Text Document Clustering

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining

Learning to classify short and sparse text & web with hidden topics from large-scale data collections

Proceedings of the 17th international conference on World Wide Web
Learning to link with wikipedia

Proceedings of the 17th ACM conference on Information and knowledge management
Growing Fields of Interest - Using an Expand and Reduce Strategy for Domain Model Extraction

WI-IAT '08 Proceedings of the 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology - Volume 01
Clustering Documents Using a Wikipedia-Based Concept Representation

PAKDD '09 Proceedings of the 13th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining
Exploiting Wikipedia as external knowledge for document clustering

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Web Search Clustering and Labeling with Hidden Topics

ACM Transactions on Asian Language Information Processing (TALIP)
Mining meaning from Wikipedia

International Journal of Human-Computer Studies
Exploiting internal and external semantics for the clustering of short texts using world knowledge

Proceedings of the 18th ACM conference on Information and knowledge management
New Labeling Strategy for Semi-supervised Document Categorization

KSEM '09 Proceedings of the 3rd International Conference on Knowledge Science, Engineering and Management
Web opinions analysis with scalable distance-basedclustering

ISI'09 Proceedings of the 2009 IEEE international conference on Intelligence and security informatics
Data clustering: 50 years beyond K-means

Pattern Recognition Letters
Text clustering with important words using normalization

Proceedings of the 10th annual joint conference on Digital libraries
Short text classification in twitter to improve information filtering

Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
Collaboration analytics: mining work patterns from collaboration activities

CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
On the difficulty of clustering company tweets

SMUC '10 Proceedings of the 2nd international workshop on Search and mining user-generated contents
A self-supervised approach for extraction of attribute-value pairs from wikipedia articles

SPIRE'10 Proceedings of the 17th international conference on String processing and information retrieval
Comparative study of clustering techniques for short text documents

Proceedings of the 20th international conference companion on World wide web
High-order co-clustering text data on semantics-based representation model

PAKDD'11 Proceedings of the 15th Pacific-Asia conference on Advances in knowledge discovery and data mining - Volume Part I
A multi-layer text classification framework based on two-level representation model

Expert Systems with Applications: An International Journal
Text clustering based on granular computing and wikipedia

RSKT'11 Proceedings of the 6th international conference on Rough sets and knowledge technology
Discovering User Interest on Twitter with a Modified Author-Topic Model

WI-IAT '11 Proceedings of the 2011 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology - Volume 01
Enhancing accessibility of microblogging messages using semantic knowledge

Proceedings of the 20th ACM international conference on Information and knowledge management
A framework for joint community detection across multiple related networks

Neurocomputing
Topical clustering of search results

Proceedings of the fifth ACM international conference on Web search and data mining
Enriching short text representation in microblog for clustering

Frontiers of Computer Science in China
Mining wikipedia and yahoo! answers for question expansion in opinion QA

PAKDD'10 Proceedings of the 14th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining - Volume Part I
Discovering collective viewpoints on micro-blogging events based on community and temporal aspects

ADMA'11 Proceedings of the 7th international conference on Advanced Data Mining and Applications - Volume Part I
Query phrase expansion using wikipedia in patent class search

AIRS'11 Proceedings of the 7th Asia conference on Information Retrieval Technology
A web 2.0 approach for organizing search results using wikipedia

AIRS'11 Proceedings of the 7th Asia conference on Information Retrieval Technology
Wikipedia-based smoothing for enhancing text clustering

AIRS'11 Proceedings of the 7th Asia conference on Information Retrieval Technology
Collective viewpoint identification of low-level participation

APWeb'12 Proceedings of the 14th Asia-Pacific international conference on Web Technologies and Applications
Classification of short texts by deploying topical annotations

ECIR'12 Proceedings of the 34th European conference on Advances in Information Retrieval
Enhancement of co-authorship networks with content-similarity information

Proceedings of the International Conference on Advances in Computing, Communications and Informatics
CluChunk: clustering large scale user-generated content incorporating chunklet information

Proceedings of the 1st International Workshop on Big Data, Streams and Heterogeneous Source Mining: Algorithms, Systems, Programming Models and Applications
Making your interests follow you on twitter

Proceedings of the 21st ACM international conference on Information and knowledge management
TCSST: transfer classification of short & sparse text using external data

Proceedings of the 21st ACM international conference on Information and knowledge management
Collaboratively built semi-structured content and Artificial Intelligence: The story so far

Artificial Intelligence
Wiki3C: exploiting wikipedia for context-aware concept categorization

Proceedings of the sixth ACM international conference on Web search and data mining
visualRSS: a platform to mine and visualise social data from RSS feeds

ICWE'12 Proceedings of the 12th international conference on Current Trends in Web Engineering
A document is known by the company it keeps: neighborhood consensus for short text categorization

Language Resources and Evaluation
Hyper Media News: a fully automated platform for large scale analysis, production and distribution of multimodal news content

Multimedia Tools and Applications
Enhancing short text clustering with small external repositories

AusDM '11 Proceedings of the Ninth Australasian Data Mining Conference - Volume 121
Probabilistic semantic similarity measurements for noisy short texts using Wikipedia entities

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Improving short text classification using public search engines

IUKM'13 Proceedings of the 2013 international conference on Integrated Uncertainty in Knowledge Modelling and Decision Making
Identification of collective viewpoints on microblogs

Data & Knowledge Engineering
Enhancing sentence-level clustering with ranking-based clustering framework for theme-based summarization

Information Sciences: an International Journal
An efficient Particle Swarm Optimization approach to cluster short texts

Information Sciences: an International Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

Subscribers to the popular news or blog feeds (RSS/Atom) often face the problem of information overload as these feed sources usually deliver large number of items periodically. One solution to this problem could be clustering similar items in the feed reader to make the information more manageable for a user. Clustering items at the feed reader end is a challenging task as usually only a small part of the actual article is received through the feed. In this paper, we propose a method of improving the accuracy of clustering short texts by enriching their representation with additional features from Wikipedia. Empirical results indicate that this enriched representation of text items can substantially improve the clustering accuracy when compared to the conventional bag of words representation.