Abstracting for Dimensionality Reduction in Text Classification

Authors:
Richard A. McAllister;Rafal A. Angryk
Affiliations:
Department of Computer Science, Montana State University, BozemanMT 59717;Department of Computer Science, Montana State University, BozemanMT 59717
Venue:
International Journal of Intelligent Systems
Year:
2013

Citing 11
Cited 0

Word sense disambiguation for free-text indexing using a massive semantic network

CIKM '93 Proceedings of the second international conference on Information and knowledge management
Text Classification by Boosting Weak Learners based on Terms and Concepts

ICDM '04 Proceedings of the Fourth IEEE International Conference on Data Mining
A graph model for unsupervised lexical acquisition

COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
On the strength of hyperclique patterns for text categorization

Information Sciences: an International Journal
GDClust: A Graph-Based Document Clustering Technique

ICDMW '07 Proceedings of the Seventh IEEE International Conference on Data Mining Workshops
Discriminative parameter learning for Bayesian networks

Proceedings of the 25th international conference on Machine learning
Building semantic kernels for text classification using wikipedia

Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Frequent pattern-growth approach for document organization

Proceedings of the 2nd international workshop on Ontologies and information systems for the semantic web
Using Wikipedia for Co-clustering Based Cross-Domain Text Classification

ICDM '08 Proceedings of the 2008 Eighth IEEE International Conference on Data Mining
An Abstraction-Based Data Model for Information Retrieval

AI '09 Proceedings of the 22nd Australasian Joint Conference on Advances in Artificial Intelligence
Semantically-guided clustering of text documents via frequent subgraphs discovery

ISMIS'11 Proceedings of the 19th international conference on Foundations of intelligent systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

There is a growing interest in efficient models of text mining and an emergent need for new data structures that address word relationships. Detailed knowledge about the taxonomic environment of keywords that are used in text documents can provide valuable insight into the nature of the subject matter contained therein. Such insight may be used to enhance the data structures used in the text data mining task as relationships become usefully apparent. A popular scalable technique used to infer these relationships, while reducing dimensionality, has been Latent Semantic Analysis. We present a new approach, which uses an ontology of lexical abstractions to create abstraction profiles of documents and uses these profiles to perform text organization based on a process that we call frequent abstraction analysis. We introduce TATOO, the Text Abstraction TOOlkit, which is a full implementation of this new approach. We present our data model via an example of how taxonomically derived abstractions can be used to supplement semantic data structures for the text classification task. © 2013 Wiley Periodicals, Inc.