Leveraging Higher Order Dependencies between Features for Text Classification

Authors:
Murat C. Ganiz;Nikita I. Lytkin;William M. Pottenger
Affiliations:
Department of Computer Science, Lehigh University, USA and DIMACS Rutgers, The State University of New Jersey, USA;Department of Computer Science Rutgers, The State University of New Jersey, USA;Department of Computer Science Rutgers, The State University of New Jersey, USA and DIMACS Rutgers, The State University of New Jersey, USA
Venue:
ECML PKDD '09 Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases: Part I
Year:
2009

Citing 14
Cited 0

Corpus-based stemming using cooccurrence of word variants

ACM Transactions on Information Systems (TOIS)
Enhanced hypertext categorization using hyperlinks

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Pairwise classification and support vector machines

Advances in kernel methods
Level search schemes for information filtering and retrieval

Information Processing and Management: an International Journal
Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond

Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond
Learning to Classify Text Using Support Vector Machines: Methods, Theory and Algorithms

Learning to Classify Text Using Support Vector Machines: Methods, Theory and Algorithms
Automatic word sense discrimination

Computational Linguistics - Special issue on word sense disambiguation
Choosing the word most typical in context using a lexical co-occurrence network

ACL '98 Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics
Dependency Networks for Relational Data

ICDM '04 Proceedings of the Fourth IEEE International Conference on Data Mining
Distributed higher order association rule mining using information extracted from textual data

ACM SIGKDD Explorations Newsletter - Natural language processing and text mining
Link mining: a survey

ACM SIGKDD Explorations Newsletter
Detection of Interdomain Routing Anomalies Based on Higher-Order Path Analysis

ICDM '06 Proceedings of the Sixth International Conference on Data Mining
Probabilistic classification and clustering in relational data

IJCAI'01 Proceedings of the 17th international joint conference on Artificial intelligence - Volume 2
A framework for understanding Latent Semantic Indexing (LSI) performance

Information Processing and Management: an International Journal - Special issue: Formal methods for information retrieval

Quantified Score

Hi-index	0.00

Visualization

Abstract

Traditional machine learning methods only consider relationships between feature values within individual data instances while disregarding the dependencies that link features across instances. In this work, we develop a general approach to supervised learning by leveraging higher-order dependencies between features. We introduce a novel Bayesian framework for classification named Higher Order Naive Bayes (HONB). Unlike approaches that assume data instances are independent, HONB leverages co-occurrence relations between feature values across different instances. Additionally, we generalize our framework by developing a novel data-driven space transformation that allows any classifier operating in vector spaces to take advantage of these higher-order co-occurrence relations. Results obtained on several benchmark text corpora demonstrate that higher-order approaches achieve significant improvements in classification accuracy over the baseline (first-order) methods.