Hierarchical Bayesian clustering for automatic text classification

Authors:
Makoto Iwayama;Takenobu Tokunaga
Affiliations:
Advanced Research Laboratory, Hitachi Ltd., Saitama, Japan;Department of Computer Science, Tokyo Institute of Technology, Tokyo, Japan
Venue:
IJCAI'95 Proceedings of the 14th international joint conference on Artificial intelligence - Volume 2
Year:
1995

Citing 9
Cited 14

Toward memory-based reasoning

Communications of the ACM - Special issue on parallelism
Recent trends in hierarchic document clustering: a critical review

Information Processing and Management: an International Journal
Models for retrieval with probabilistic indexing

Information Processing and Management: an International Journal - Modeling data, information and knowledge
Experiments with a component theory of probabilistic information retrieval based on single terms as document components

ACM Transactions on Information Systems (TOIS)
An evaluation of phrasal and clustered representations on a text categorization task

SIGIR '92 Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval
Classifying news stories using memory based reasoning

SIGIR '92 Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval
Stochastic Complexity in Statistical Inquiry Theory

Stochastic Complexity in Statistical Inquiry Theory
A probabilistic model for text categorization: based on a single random variable with multiple values

ANLC '94 Proceedings of the fourth conference on Applied natural language processing
Automatic thesaurus construction based on grammatical relations

IJCAI'95 Proceedings of the 14th international joint conference on Artificial intelligence - Volume 2

Cluster-based text categorization: a comparison of category search strategies

SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
A Study of Bayesian Clustering of a Document Set Based on GA

SEAL'98 Selected papers from the Second Asia-Pacific Conference on Simulated Evolution and Learning on Simulated Evolution and Learning
Automatic Web-Page Classification by Using Machine Learning Methods

WI '01 Proceedings of the First Asia-Pacific Conference on Web Intelligence: Research and Development
Information Access Based on Associative Calculation

SOFSEM '00 Proceedings of the 27th Conference on Current Trends in Theory and Practice of Informatics
Extraction and representation of contextual information for knowledge discovery in texts

Information Sciences—Informatics and Computer Science: An International Journal
Utilizing the world wide web as an encyclopedia: extracting term descriptions from semi-structured texts

ACL '00 Proceedings of the 38th Annual Meeting on Association for Computational Linguistics
A differential LSI method for document classification

AsianIR '03 Proceedings of the sixth international workshop on Information retrieval with Asian languages - Volume 11
Automated extraction of behavioural profiles from document usage

BT Technology Journal
Automatic thesaurus construction based on grammatical relations

IJCAI'95 Proceedings of the 14th international joint conference on Artificial intelligence - Volume 2
An information-theoretic based model for large-scale contextual text processing

Information Sciences: an International Journal
Hierarchical comments-based clustering

Proceedings of the 2011 ACM Symposium on Applied Computing
Just-in-time interactive document search

WM'05 Proceedings of the Third Biennial conference on Professional Knowledge Management
Evolutionary ANNs for improving accuracy and efficiency in document classification methods

KES'06 Proceedings of the 10th international conference on Knowledge-Based Intelligent Information and Engineering Systems - Volume Part III
Related terms clustering for enhancing the comprehensibility of web search results

DEXA'07 Proceedings of the 18th international conference on Database and Expert Systems Applications

Quantified Score

Hi-index	0.00

Visualization

Abstract

Text classification, the grouping of texts into several clusters, has been used as a means of improving both the efficiency and the effective-Dess of text retrieval/categorization In this paper we propose a hierarchical clustering algorithm that constructs a Bet of clusters having the maximum Bayesian posterior probability, the probability that the given texts are classified into clusters We call the algorithm Hierarchical Bayesian Clustering (HBC) The advantages of HBC are experimentally verified from several viewpoints (1) HBC can reconstruct the original clusters more accurately than do other non probabilistic algorithms (2) When a probabilistic text categorization is extended to a cluster-based one, the use of HBC offers better performance than does the use of non probabilistic algorithms.