Analysis of structural relationships for hierarchical cluster labeling

Authors:
Markus Muhr;Roman Kern;Michael Granitzer
Affiliations:
Know-Center Graz, Graz, Austria;Know-Center Graz, Graz, Austria;Graz University of Technology, Graz, Austria
Venue:
Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
Year:
2010

Citing 11
Cited 5

Scatter/Gather: a cluster-based approach to browsing large document collections

SIGIR '92 Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval
Inferring hierarchical descriptions

Proceedings of the eleventh international conference on Information and knowledge management
A clustering method for news articles retrieval system

WWW '05 Special interest tracks and posters of the 14th international conference on World Wide Web
A Concept-Driven Algorithm for Clustering Search Results

IEEE Intelligent Systems
Automatically labeling hierarchical clusters

dg.o '06 Proceedings of the 2006 international conference on Digital government research
What makes a query difficult?

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Automatic Discovery of Concepts from Text

WI '06 Proceedings of the 2006 IEEE/WIC/ACM International Conference on Web Intelligence
Introduction to Information Retrieval

Introduction to Information Retrieval
A class-feature-centroid classifier for text categorization

Proceedings of the 18th international conference on World wide web
Enhancing cluster labeling using wikipedia

Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Visual Knowledge Discovery in Dynamic Enterprise Text Repositories

IV '09 Proceedings of the 2009 13th International Conference Information Visualisation

Query expansion based on clustered results

Proceedings of the VLDB Endowment
On the use of consensus clustering for incremental learning of topic hierarchies

SBIA'12 Proceedings of the 21st Brazilian conference on Advances in Artificial Intelligence
Unsupervised graph-based topic labelling using dbpedia

Proceedings of the sixth ACM international conference on Web search and data mining
Semantic Query Expansion using Cluster Based Domain Ontologies

International Journal of Information Retrieval Research
Support for Video Hosting Service Users Using Folksonomy and Social Annotation

WI-IAT '12 Proceedings of the The 2012 IEEE/WIC/ACM International Joint Conferences on Web Intelligence and Intelligent Agent Technology - Volume 01

Quantified Score

Hi-index	0.00

Visualization

Abstract

Cluster label quality is crucial for browsing topic hierarchies obtained via document clustering. Intuitively, the hierarchical structure should influence the labeling accuracy. However, most labeling algorithms ignore such structural properties and therefore, the impact of hierarchical structures on the labeling accuracy is yet unclear. In our work we integrate hierarchical information, i.e. sibling and parent-child relations, in the cluster labeling process. We adapt standard labeling approaches, namely Maximum Term Frequency, Jensen-Shannon Divergence, Chi Square Test, and Information Gain, to take use of those relationships and evaluate their impact on 4 different datasets, namely the Open Directory Project, Wikipedia, TREC Ohsumed and the CLEF IP European Patent dataset. We show, that hierarchical relationships can be exploited to increase labeling accuracy especially on high-level nodes.