Enhancing cluster labeling using wikipedia

Authors:
David Carmel;Haggai Roitman;Naama Zwerdling
Affiliations:
IBM Haifa Research Lab, Haifa, Israel;IBM Haifa Research Lab, Haifa, Israel;IBM Haifa Research Lab, Haifa, Israel
Venue:
Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Year:
2009

Citing 14
Cited 39

Scatter/Gather: a cluster-based approach to browsing large document collections

SIGIR '92 Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval
Inferring hierarchical descriptions

Proceedings of the eleventh international conference on Information and knowledge management
Centroid-based summarization of multiple documents

Information Processing and Management: an International Journal
A clustering method for news articles retrieval system

WWW '05 Special interest tracks and posters of the 14th international conference on World Wide Web
A Concept-Driven Algorithm for Clustering Search Results

IEEE Intelligent Systems
Automatically labeling hierarchical clusters

dg.o '06 Proceedings of the 2006 international conference on Digital government research
What makes a query difficult?

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Identifying Document Topics Using the Wikipedia Category Network

WI '06 Proceedings of the 2006 IEEE/WIC/ACM International Conference on Web Intelligence
Automatic Discovery of Concepts from Text

WI '06 Proceedings of the 2006 IEEE/WIC/ACM International Conference on Web Intelligence
The Google Similarity Distance

IEEE Transactions on Knowledge and Data Engineering
Enhancing text clustering by leveraging Wikipedia semantics

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Introduction to Information Retrieval

Introduction to Information Retrieval
Overcoming the brittleness bottleneck using wikipedia: enhancing text categorization with encyclopedic knowledge

AAAI'06 proceedings of the 21st national conference on Artificial intelligence - Volume 2
Computing semantic relatedness using Wikipedia-based explicit semantic analysis

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence

Extracting user profiles from large scale data

Proceedings of the 2010 Workshop on Massive Data Analytics on the Cloud
Prototype hierarchy based clustering for the categorization and navigation of web collections

Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
Analysis of structural relationships for hierarchical cluster labeling

Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
Wikipedia as sense inventory to improve diversity in Web search results

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Entity search: building bridges between two worlds

Proceedings of the 3rd International Semantic Search Workshop
Inducing word senses to improve web search result clustering

EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Organizing query completions for web search

CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
W-kmeans: clustering news articles using wordNet

KES'10 Proceedings of the 14th international conference on Knowledge-based and intelligent information and engineering systems: Part III
Annotate Wikipedia with Flickr images: concepts and case study

ICIMCS '10 Proceedings of the Second International Conference on Internet Multimedia Computing and Service
The role of queries in ranking labeled instances extracted from text

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
Query expansion based on clustered results

Proceedings of the VLDB Endowment
Recovering semantics of tables on the web

Proceedings of the VLDB Endowment
Word clouds of multiple search results

IRFC'11 Proceedings of the Second international conference on Multidisciplinary information retrieval facility
Informative sentence retrieval for domain specific terminologies

IEA/AIE'11 Proceedings of the 24th international conference on Industrial engineering and other applications of applied intelligent systems conference on Modern approaches in applied intelligence - Volume Part I
Clustering web search results with maximum spanning trees

AI*IA'11 Proceedings of the 12th international conference on Artificial intelligence around man and beyond
WikiLabel: an encyclopedic approach to labeling documents en masse

Proceedings of the 20th ACM international conference on Information and knowledge management
Folksonomy-based term extraction for word cloud generation

Proceedings of the 20th ACM international conference on Information and knowledge management
Advertising Keywords Recommendation for Short-Text Web Pages Using Wikipedia

ACM Transactions on Intelligent Systems and Technology (TIST)
A breakdown of quality flaws in Wikipedia

Proceedings of the 2nd Joint WICOW/AIRWeb Workshop on Web Quality
A web 2.0 approach for organizing search results using wikipedia

AIRS'11 Proceedings of the 7th Asia conference on Information Retrieval Technology
Wikipedia-based smoothing for enhancing text clustering

AIRS'11 Proceedings of the 7th Asia conference on Information Retrieval Technology
Towards an automatic construction of Contextual Attribute-Value Taxonomies

Proceedings of the 27th Annual ACM Symposium on Applied Computing
LDA-Based topic modeling in labeling blog posts with wikipedia entries

APWeb'12 Proceedings of the 14th international conference on Web Technologies and Applications
Cluster labeling for multilingual scatter/gather using comparable corpora

ECIR'12 Proceedings of the 34th European conference on Advances in Information Retrieval
Folksonomy-Based Term Extraction for Word Cloud Generation

ACM Transactions on Intelligent Systems and Technology (TIST)
Extracting information networks from the blogosphere

ACM Transactions on the Web (TWEB)
Guided discovery of interesting relationships between time series clusters and metadata properties

Proceedings of the 12th International Conference on Knowledge Management and Knowledge Technologies
Selecting keywords to represent web pages using Wikipedia information

Proceedings of the 18th Brazilian symposium on Multimedia and the web
Harnessing the crowds for smart city sensing

Proceedings of the 1st international workshop on Multimodal crowd sensing
Conceptualizing documents with Wikipedia

Proceedings of the fifth workshop on Exploiting semantic annotations in information retrieval
A clustering technique for news articles using WordNet

Knowledge-Based Systems
Exploring the existing category hierarchy to automatically label the newly-arising topics in cQA

Proceedings of the 21st ACM international conference on Information and knowledge management
Wiki3C: exploiting wikipedia for context-aware concept categorization

Proceedings of the sixth ACM international conference on Web search and data mining
Unsupervised graph-based topic labelling using dbpedia

Proceedings of the sixth ACM international conference on Web search and data mining
Semantic Query Expansion using Cluster Based Domain Ontologies

International Journal of Information Retrieval Research
How Do Users Search the Mobile Web with a Clustering Interface?: A Longitudinal Study

International Journal of Mobile Human Computer Interaction
Increasing stability of result organization for session search

ECIR'13 Proceedings of the 35th European conference on Advances in Information Retrieval
Beyond cluster labeling: Semantic interpretation of clusters' contents using a graph representation

Knowledge-Based Systems
A statistical approach to mining customers' conversational data from social media

IBM Journal of Research and Development

Quantified Score

Hi-index	0.00

Visualization

Abstract

This work investigates cluster labeling enhancement by utilizing Wikipedia, the free on-line encyclopedia. We describe a general framework for cluster labeling that extracts candidate labels from Wikipedia in addition to important terms that are extracted directly from the text. The "labeling quality" of each candidate is then evaluated by several independent judges and the top evaluated candidates are recommended for labeling. Our experimental results reveal that the Wikipedia labels agree with manual labels associated by humans to a cluster, much more than with significant terms that are extracted directly from the text. We show that in most cases even when human's associated label appears in the text, pure statistical methods have difficulty in identifying them as good descriptors. Furthermore, our experiments show that for more than 85% of the clusters in our test collection, the manual label (or an inflection, or a synonym of it) appears in the top five labels recommended by our system.