Multinomial event model based abstraction for sequence and text classification

Authors:
Dae-Ki Kang;Jun Zhang;Adrian Silvescu;Vasant Honavar
Affiliations:
Artificial Intelligence Research Laboratory, Department of Computer Science, Iowa State University, Ames, IA;Artificial Intelligence Research Laboratory, Department of Computer Science, Iowa State University, Ames, IA;Artificial Intelligence Research Laboratory, Department of Computer Science, Iowa State University, Ames, IA;Artificial Intelligence Research Laboratory, Department of Computer Science, Iowa State University, Ames, IA
Venue:
SARA'05 Proceedings of the 6th international conference on Abstraction, Reformulation and Approximation
Year:
2005

Citing 17
Cited 4

Quantifying inductive bias: AI learning algorithms and Valiant's learning framework

Artificial Intelligence
Towards language independent automated learning of text categorization models

SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
Attribute-oriented induction in data mining

Advances in knowledge discovery and data mining
Bayesian Network Classifiers

Machine Learning - Special issue on learning with probabilistic representations
Inductive learning algorithms and representations for text categorization

Proceedings of the seventh international conference on Information and knowledge management
Distributional clustering of words for text classification

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
CACTUS—clustering categorical data using summaries

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Applications of Data Mining to Electronic Commerce

Data Mining and Knowledge Discovery
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features

ECML '98 Proceedings of the 10th European Conference on Machine Learning
Using Feature Hierarchies in Bayesian Network Learning

SARA '02 Proceedings of the 4th International Symposium on Abstraction, Reformulation, and Approximation
Clustering categorical data: an approach based on dynamical systems

The VLDB Journal — The International Journal on Very Large Data Bases
Distributional clustering of English words

ACL '93 Proceedings of the 31st annual meeting on Association for Computational Linguistics
Using DAML+OIL to classify intrusive behaviours

The Knowledge Engineering Review
RCV1: A New Benchmark Collection for Text Categorization Research

The Journal of Machine Learning Research
Generation of Attribute Value Taxonomies from Data for Data-Driven Construction of Accurate and Compact Classifiers

ICDM '04 Proceedings of the Fourth IEEE International Conference on Data Mining
AVT-NBL: An Algorithm for Learning Compact and Accurate Naïve Bayes Classifiers from Attribute Value Taxonomies and Data

ICDM '04 Proceedings of the Fourth IEEE International Conference on Data Mining
A two-stage classifier for identification of protein--protein interface residues

Bioinformatics

Learning decision trees with taxonomy of propositionalized attributes

Pattern Recognition
Propositionalized attribute taxonomies from data for data-driven construction of concise classifiers

Expert Systems with Applications: An International Journal
RNBL-MN: a recursive naive bayes learner for sequence classification

PAKDD'06 Proceedings of the 10th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining
TwitAg: a multi-agent feature selection and recommendation framework for twitter

PRIMA'10 Proceedings of the 13th international conference on Principles and Practice of Multi-Agent Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

In many machine learning applications that deal with sequences, there is a need for learning algorithms that can effectively utilize the hierarchical grouping of words. We introduce Word Taxonomy guided Naive Bayes Learner for the Multinomial Event Model (WTNBL-MN) that exploits word taxonomy to generate compact classifiers, and Word Taxonomy Learner (WTL) for automated construction of word taxonomy from sequence data. WTNBL-MN is a generalization of the Naive Bayes learner for the Multinomial Event Model for learning classifiers from data using word taxonomy. WTL uses hierarchical agglomerative clustering to cluster words based on the distribution of class labels that co-occur with the words. Our experimental results on protein localization sequences and Reuters text show that the proposed algorithms can generate Naive Bayes classifiers that are more compact and often more accurate than those produced by standard Naive Bayes learner for the Multinomial Model.