Automatically labeling hierarchical clusters

Authors:
Pucktada Treeratpituk;Jamie Callan
Affiliations:
Carnegie Mellon University, Pittsburgh, PA;Carnegie Mellon University, Pittsburgh, PA
Venue:
dg.o '06 Proceedings of the 2006 international conference on Digital government research
Year:
2006

Citing 10
Cited 30

Constant interaction-time scatter/gather browsing of very large document collections

SIGIR '93 Proceedings of the 16th annual international ACM SIGIR conference on Research and development in information retrieval
Viewing morphology as an inference process

SIGIR '93 Proceedings of the 16th annual international ACM SIGIR conference on Research and development in information retrieval
Finding topic words for hierarchical summarization

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Using web structure for classifying and describing web pages

Proceedings of the 11th international conference on World Wide Web
Inferring hierarchical descriptions

Proceedings of the eleventh international conference on Information and knowledge management
The TaxGen Framework: Automating the Generation of a Taxonomy for a Large Document Collection

HICSS '99 Proceedings of the Thirty-Second Annual Hawaii International Conference on System Sciences-Volume 2 - Volume 2
Learning to cluster web search results

Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
A practical web-based approach to generating topic hierarchy for text segments

Proceedings of the thirteenth ACM international conference on Information and knowledge management
Automatic construction of a hypernym-labeled noun hierarchy from text

ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
Near-duplicate detection for eRulemaking

dg.o '05 Proceedings of the 2005 national conference on Digital government research

The opposite of smoothing: a language model approach to ranking query-specific document clusters

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
User Oriented Hierarchical Information Organization and Retrieval

ECML '07 Proceedings of the 18th European conference on Machine Learning
Document Clustering Description Extraction and Its Application

ICCPOL '09 Proceedings of the 22nd International Conference on Computer Processing of Oriental Languages. Language Technology for the Knowledge-based Economy
Web Search Clustering and Labeling with Hidden Topics

ACM Transactions on Asian Language Information Processing (TALIP)
Enhancing cluster labeling using wikipedia

Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Hybrid clustering for validation and improvement of subject-classification schemes

Information Processing and Management: an International Journal
Generic title labeling for clustered documents

Expert Systems with Applications: An International Journal
Novel labeling strategies for hierarchical representation of multidimensional data analysis results

AIA '08 Proceedings of the 26th IASTED International Conference on Artificial Intelligence and Applications
Emerging topic detection on Twitter based on temporal and social terms evaluation

Proceedings of the Tenth International Workshop on Multimedia Data Mining
Analysis of structural relationships for hierarchical cluster labeling

Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
Constructing tree-based knowledge structures from text corpus

Applied Intelligence
Organizing query completions for web search

CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
W-kmeans: clustering news articles using wordNet

KES'10 Proceedings of the 14th international conference on Knowledge-based and intelligent information and engineering systems: Part III
Selecting candidate labels for hierarchical document clusters using association rules

MICAI'10 Proceedings of the 9th Mexican international conference on Artificial intelligence conference on Advances in soft computing: Part II
The role of queries in ranking labeled instances extracted from text

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
An analysis of perspectives in interactive settings

Proceedings of the First Workshop on Social Media Analytics
Recovering semantics of tables on the web

Proceedings of the VLDB Endowment
The opposite of smoothing: a language model approach to ranking query-specific document clusters

Journal of Artificial Intelligence Research
Enhancing accessibility of microblogging messages using semantic knowledge

Proceedings of the 20th ACM international conference on Information and knowledge management
Cluster labeling for multilingual scatter/gather using comparable corpora

ECIR'12 Proceedings of the 34th European conference on Advances in Information Retrieval
Improving hierarchical document cluster labels through candidate term selection

Intelligent Decision Technologies
Extracting information networks from the blogosphere

ACM Transactions on the Web (TWEB)
A clustering technique for news articles using WordNet

Knowledge-Based Systems
Exploring the existing category hierarchy to automatically label the newly-arising topics in cQA

Proceedings of the 21st ACM international conference on Information and knowledge management
Automatic labeling hierarchical topics

Proceedings of the 21st ACM international conference on Information and knowledge management
Improved query suggestion by query search

KI'12 Proceedings of the 35th Annual German conference on Advances in Artificial Intelligence
Unsupervised graph-based topic labelling using dbpedia

Proceedings of the sixth ACM international conference on Web search and data mining
Personalized emerging topic detection based on a term aging model

ACM Transactions on Intelligent Systems and Technology (TIST) - Special Section on Intelligent Mobile Knowledge Discovery and Management Systems and Special Issue on Social Web Mining
An annotation assistance system using an unsupervised codebook composed of handwritten graphical multi-stroke symbols

Pattern Recognition Letters
Semi-supervised learning for character recognition in historical archive documents

Pattern Recognition

Quantified Score

Hi-index	0.00

Visualization

Abstract

Government agencies must often quickly organize and analyze large amounts of textual information, for example comments received as part of notice and comment rulemaking. Hierarchical organization is popular because it represents information at different levels of detail and is convenient for interactive browsing. Good hierarchical clustering algorithms are available, but there are few good solutions for automatically labeling the nodes in a cluster hierarchy.This paper presents a simple algorithm that automatically assigns labels to hierarchical clusters. The algorithm evaluates candidate labels using information from the cluster, the parent cluster, and corpus statistics. A trainable threshold enables the algorithm to assign just a few high-quality labels to each cluster. Experiments with Open Directory Project (ODP) hierarchies indicate that the algorithm creates cluster labels that are similar to labels created by ODP editors.