Fast supervised feature extraction by term discrimination information pooling

Authors:
Amara Tariq;Asim Karim
Affiliations:
LUMS School of Science and Engineering, Lahore, Pakistan;LUMS School of Science and Engineering, Lahore, Pakistan
Venue:
Proceedings of the 20th ACM international conference on Information and knowledge management
Year:
2011

Citing 6
Cited 0

A Two-Stage Linear Discriminant Analysis via QR-Decomposition

IEEE Transactions on Pattern Analysis and Machine Intelligence
Learning semantic relatedness from term discrimination information

Expert Systems with Applications: An International Journal
A Robust Discriminative Term Weighting Based Linear Discriminant Method for Text Classification

ICDM '08 Proceedings of the 2008 Eighth IEEE International Conference on Data Mining
Non-classical lexical semantic relations

CLS '04 Proceedings of the HLT-NAACL Workshop on Computational Lexical Semantics
Using information content to evaluate semantic similarity in a taxonomy

IJCAI'95 Proceedings of the 14th international joint conference on Artificial intelligence - Volume 1
Text relatedness based on a word thesaurus

Journal of Artificial Intelligence Research

Quantified Score

Hi-index	0.00

Visualization

Abstract

Dimensionality reduction (DR) through feature extraction (FE) is desirable for efficient and effective processing of text documents. Many of the techniques for text FE produce features that are not readily interpretable and require super-linear computation time. In this paper, we present a fast supervised DR/FE technique, named FEDIP, that is motivated by the notion of relatedness of terms to topics or contexts. This relatedness is quantified by using the discrimination information provided by a term for a topic in a labeled document collection. Features are constructed by pooling the discrimination information of highly related terms for each topic. FEDIP's time complexity is linear in the size of the vocabulary and document collection. FEDIP is evaluated for document classification with SVM and naive Bayes classifiers on six text data sets. The results show that FEDIP produces low-dimension feature spaces that yield higher classification accuracy when compared with LDA and LSI. FEDIP is also found to be significantly faster than the other techniques on our evaluation data sets.