The cluster-abstraction model: unsupervised learning of topic hierarchies from text data

Authors:
Thomas Hofmann
Affiliations:
Computer Science Division, UC Berkeley & International CS Institute, Berkeley, CA
Venue:
IJCAI'99 Proceedings of the 16th international joint conference on Artificial intelligence - Volume 2
Year:
1999

Citing 4
Cited 17

Recent trends in hierarchic document clustering: a critical review

Information Processing and Management: an International Journal
Distributional clustering of words for text classification

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Statistical Models for Co-occurrence Data

Statistical Models for Co-occurrence Data
Distributional clustering of English words

ACL '93 Proceedings of the 31st annual meeting on Association for Computational Linguistics

Towards the Automatic Construction of Conceptual Taxonomies

DaWaK '08 Proceedings of the 10th international conference on Data Warehousing and Knowledge Discovery
Nearly-automated metadata hierarchy creation

HLT-NAACL-Short '04 Proceedings of HLT-NAACL 2004: Short Papers
Hierarchical-Hyperspherical Divisive Fuzzy C-Means (H2D-FCM) Clustering for Information Retrieval

WI-IAT '09 Proceedings of the 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology - Volume 01
The nested chinese restaurant process and bayesian nonparametric inference of topic hierarchies

Journal of the ACM (JACM)
A new method for clustering heterogeneous data: clustering by compression

WSEAS Transactions on Computers
Statistical modeling of large distribution sets

Proceedings of the Fourth SIGMOD PhD Workshop on Innovative Database Research
Cross-lingual latent topic extraction

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Global learning of focused entailment graphs

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Multi-label classification and extracting predicted class hierarchies

Pattern Recognition
A hierarchical model of web summaries

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers - Volume 2
Multilingual document mining and navigation using self-organizing maps

Information Processing and Management: an International Journal
Model-based hierarchical clustering

UAI'00 Proceedings of the Sixteenth conference on Uncertainty in artificial intelligence
Finding uninformative features in binary data

IDEAL'05 Proceedings of the 6th international conference on Intelligent Data Engineering and Automated Learning
A quality driven Hierarchical Data Divisive Soft Clustering for information retrieval

Knowledge-Based Systems
Editorial: Narrative-based taxonomy distillation for effective indexing of text collections

Data & Knowledge Engineering
Entailment-based text exploration with application to the health-care domain

ACL '12 Proceedings of the ACL 2012 System Demonstrations
On the use of consensus clustering for incremental learning of topic hierarchies

SBIA'12 Proceedings of the 21st Brazilian conference on Advances in Artificial Intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper presents a novel statistical latent class model for text mining and interactive information access. The described learning architecture, called Cluster-Abstraction Model (CAM), is purely data driven and utilizes contact-specific word occurrence statistics. In an intertwined fashion, the CAM extracts hierarchical relations between groups of documents as well as an abstractive organization of keywords. An annealed version of the Expectation-Maximization (EM) algorithm for maximum likelihood estimation of the model parameters is derived. The benefits of the CAM for interactive retrieval and automated cluster summarization are investigated experimentally.