Enhanced word clustering for hierarchical text classification

Authors:
Inderjit S. Dhillon;Subramanyam Mallela;Rahul Kumar
Affiliations:
Univ. of Texas, Austin;Univ. of Texas, Austin;Univ. of Texas, Austin
Venue:
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Year:
2002

Citing 18
Cited 34

What every computer scientist should know about floating-point arithmetic

ACM Computing Surveys (CSUR)
Elements of information theory

Elements of information theory
The nature of statistical learning theory

The nature of statistical learning theory
On the Optimality of the Simple Bayesian Classifier under Zero-One Loss

Machine Learning - Special issue on learning with probabilistic representations
Distributional clustering of words for text classification

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Probabilistic latent semantic indexing

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
On feature distributional clustering for text categorization

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Machine Learning

Machine Learning
Introduction to Modern Information Retrieval

Introduction to Modern Information Retrieval
On Bias, Variance, 0/1—Loss, and the Curse-of-Dimensionality

Data Mining and Knowledge Discovery
Concept Decompositions for Large Sparse Text Data Using Clustering

Machine Learning
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features

ECML '98 Proceedings of the 10th European Conference on Machine Learning
Hierarchically Classifying Documents Using Very Few Words

ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
A Comparative Study on Feature Selection in Text Categorization

ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Using Taxonomy, Discriminants, and Signatures for Navigating in Text Databases

VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
Pattern Classification (2nd Edition)

Pattern Classification (2nd Edition)
Distributional clustering of English words

ACL '93 Proceedings of the 31st annual meeting on Association for Computational Linguistics
Quantization

IEEE Transactions on Information Theory

Text Mining with Information-Theoretic Clustering

Computing in Science and Engineering
A practical web-based approach to generating topic hierarchy for text segments

Proceedings of the thirteenth ACM international conference on Information and knowledge management
Data Driven Similarity Measures for k-Means Like Clustering Algorithms

Information Retrieval
Adaptive sampling for thresholding in document filtering and classification

Information Processing and Management: an International Journal
On the use of linear programming for unsupervised text classification

Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Taxonomy generation for text segments: A practical web-based approach

ACM Transactions on Information Systems (TOIS)
Building implicit links from content for forum search

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Semi-supervised model-based document clustering: A comparative study

Machine Learning
Exploiting asymmetry in hierarchical topic extraction

CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
A new feature selection score for multinomial naive Bayes text classification based on KL-divergence

ACLdemo '04 Proceedings of the ACL 2004 on Interactive poster and demonstration sessions
A semi-supervised feature clustering algorithm with application to word sense disambiguation

HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Addressing diverse user preferences in SQL-query-result navigation

Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Dynamic category profiling for text filtering and classification

Information Processing and Management: an International Journal
Co-clustering based classification for out-of-domain documents

Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Interactive high-quality text classification

Information Processing and Management: an International Journal
Can chinese web pages be classified with english data source?

Proceedings of the 17th international conference on World Wide Web
A heuristic algorithm for clustering rooted ordered trees

Intelligent Data Analysis
Visual explanation of evidence in additive classifiers

IAAI'06 Proceedings of the 18th conference on Innovative applications of artificial intelligence - Volume 2
Graph-based word clustering using a web search engine

EMNLP '06 Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing
Adaptive email spam filtering based on information theory

WISE'07 Proceedings of the 8th international conference on Web information systems engineering
Automatically computed document dependent weighting factor facility for Naïve Bayes classification

Expert Systems with Applications: An International Journal
Long distance bigram models applied to word clustering

Pattern Recognition
Automatic query generation and query relevance measurement for unsupervised language model adaptation of speech recognition

EURASIP Journal on Audio, Speech, and Music Processing
Cluster based symbolic representation and feature selection for text classification

ADMA'10 Proceedings of the 6th international conference on Advanced data mining and applications - Volume Part II
Dissimilarity based feature selection for text classification: a cluster based approach

Proceedings of the International Conference & Workshop on Emerging Trends in Technology
High Relevance Keyword Extraction facility for Bayesian text classification on different domains of varying characteristic

Expert Systems with Applications: An International Journal
A divergence-oriented approach for web users clustering

ICCSA'06 Proceedings of the 2006 international conference on Computational Science and Its Applications - Volume Part II
Active learning for probability estimation using jensen-shannon divergence

ECML'05 Proceedings of the 16th European conference on Machine Learning
Using maximal spanning trees and word similarity to generate hierarchical clusters of non-redundant RSS news articles

Journal of Intelligent Information Systems
Sensor selection to support practical use of health-monitoring smart environments

Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery
Prior knowledge employment based on the k-l and tanimoto distances matching for intelligent autonomous robots

ICIRA'12 Proceedings of the 5th international conference on Intelligent Robotics and Applications - Volume Part III
p-PIC: Parallel power iteration clustering for big data

Journal of Parallel and Distributed Computing
The curse of 140 characters: evaluating the efficacy of SMS spam detection on android

Proceedings of the Third ACM workshop on Security and privacy in smartphones & mobile devices
Multi-document text summarization using topic model and fuzzy logic

MLDM'13 Proceedings of the 9th international conference on Machine Learning and Data Mining in Pattern Recognition

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper we propose a new information-theoretic divisive algorithm for word clustering applied to text classification. In previous work, such "distributional clustering" of features has been found to achieve improvements over feature selection in terms of classification accuracy, especially at lower number of features [2, 28]. However the existing clustering techniques are agglomerative in nature and result in (i) sub-optimal word clusters and (ii) high computational cost. In order to explicitly capture the optimality of word clusters in an information theoretic framework, we first derive a global criterion for feature clustering. We then present a fast, divisive algorithm that monotonically decreases this objective function value, thus converging to a local minimum. We show that our algorithm minimizes the "within-cluster Jensen-Shannon divergence" while simultaneously maximizing the "between-cluster Jensen-Shannon divergence". In comparison to the previously proposed agglomerative strategies our divisive algorithm achieves higher classification accuracy especially at lower number of features. We further show that feature clustering is an effective technique for building smaller class models in hierarchical classification. We present detailed experimental results using Naive Bayes and Support Vector Machines on the 20 Newsgroups data set and a 3-level hierarchy of HTML documents collected from Dmoz Open Directory.