A subspace decision cluster classifier for text classification

Authors:
Yan Li;Edward Hung;Korris Chung
Affiliations:
Liuxian Road, West Campus, Shenzhen Polytechnic, Shenzhen 518055, China;PQ Building, MailBox 56, The Hong Kong Polytechnic University, Hung Hom, KLN, Hong Kong;PQ Building, MailBox 56, The Hong Kong Polytechnic University, Hung Hom, KLN, Hong Kong
Venue:
Expert Systems with Applications: An International Journal
Year:
2011

Citing 34
Cited 1

Multivariate Decision Trees

Machine Learning
Support-Vector Networks

Machine Learning
Automatic subspace clustering of high dimensional data for data mining applications

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
A re-examination of text categorization methods

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Hierarchical neural networks for text categorization (poster abstract)

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Hierarchical classification of Web content

SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Text categorization for multi-page documents: a hybrid naive Bayes HMM approach

Proceedings of the 1st ACM/IEEE-CS joint conference on Digital libraries
Machine learning in automated text categorization

ACM Computing Surveys (CSUR)
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features

ECML '98 Proceedings of the 10th European Conference on Machine Learning
When Is ''Nearest Neighbor'' Meaningful?

ICDT '99 Proceedings of the 7th International Conference on Database Theory
Transductive Inference for Text Classification using Support Vector Machines

ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Detecting Concept Drift with Support Vector Machines

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
An Interactive Approach to Building Classification Models by Clustering and Cluster Validation

IDEAL '00 Proceedings of the Second International Conference on Intelligent Data Engineering and Automated Learning, Data Mining, Financial Engineering, and Intelligent Agents
A Visual Method of Cluster Validation with Fastmap

PADKK '00 Proceedings of the 4th Pacific-Asia Conference on Knowledge Discovery and Data Mining, Current Issues and New Applications
A Machine Learning Algorithm Based on Supervised Clustering and Classification

AMT '01 Proceedings of the 6th International Computer Science Conference on Active Media Technology
A Comparative Study of Centroid-Based, Neighborhood-Based and Statistical Approaches for Effective Document Categorization

ICPR '02 Proceedings of the 16 th International Conference on Pattern Recognition (ICPR'02) Volume 4 - Volume 4
CBC: Clustering Based Text Classification Requiring Minimal Labeled Data

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Fast k-Nearest Neighbor Classification Using Cluster-Based Trees

IEEE Transactions on Pattern Analysis and Machine Intelligence
Automated Variable Weighting in k-Means Type Clustering

IEEE Transactions on Pattern Analysis and Machine Intelligence
Data Mining: Concepts and Techniques

Data Mining: Concepts and Techniques
A maximal figure-of-merit (MFoM)-learning approach to robust classifier design for text categorization

ACM Transactions on Information Systems (TOIS)
What are the grand challenges for data mining?: KDD-2006 panel report

ACM SIGKDD Explorations Newsletter
Exploring in the weblog space by detecting informative and affective articles

Proceedings of the 16th international conference on World Wide Web
An Entropy Weighting k-Means Algorithm for Subspace Clustering of High-Dimensional Sparse Data

IEEE Transactions on Knowledge and Data Engineering
Robust classification of rare queries using web knowledge

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
An improved centroid classifier for text categorization

Expert Systems with Applications: An International Journal
Deep classification in large-scale text hierarchies

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Building a Decision Cluster Classification Model for High Dimensional Data by a Variable Weighting k-Means Method

AI '08 Proceedings of the 21st Australasian Joint Conference on Artificial Intelligence: Advances in Artificial Intelligence
A class-feature-centroid classifier for text categorization

Proceedings of the 18th international conference on World wide web
A neural network model for hierarchical multilingual text categorization

ISNN'05 Proceedings of the Second international conference on Advances in neural networks - Volume Part II
Neighborhood density method for selecting initial cluster centers in k-means clustering

PAKDD'06 Proceedings of the 10th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining
The PASCAL recognising textual entailment challenge

MLCW'05 Proceedings of the First international conference on Machine Learning Challenges: evaluating Predictive Uncertainty Visual Object Classification, and Recognizing Textual Entailment
Multinomial naive bayes for text categorization revisited

AI'04 Proceedings of the 17th Australian joint conference on Advances in Artificial Intelligence
Question answering pilot task at CLEF 2004

CLEF'04 Proceedings of the 5th conference on Cross-Language Evaluation Forum: multilingual Information Access for Text, Speech and Images

High performance genetic algorithm based text clustering using parts of speech and outlier elimination

Applied Intelligence

Quantified Score

Hi-index	12.05

Visualization

Abstract

In this paper, a new classification method (SDCC) for high dimensional text data with multiple classes is proposed. In this method, a subspace decision cluster classification (SDCC) model consists of a set of disjoint subspace decision clusters, each labeled with a dominant class to determine the class of new objects falling in the cluster. A cluster tree is first generated from a training data set by recursively calling a subspace clustering algorithm Entropy Weighting k-Means algorithm. Then, the SDCC model is extracted from the subspace decision cluster tree. Various tests including Anderson-Darling test are used to determine the stopping condition of the tree growing. A series of experiments on real text data sets have been conducted. Their results show that the new classification method (SDCC) outperforms the existing methods like decision tree and SVM. SDCC is particularly suitable for large, high dimensional sparse text data with many classes.