Term-weighting approaches in automatic text retrieval
Information Processing and Management: an International Journal
An evaluation of phrasal and clustered representations on a text categorization task
SIGIR '92 Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval
A re-examination of text categorization methods
Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Experimentation as a way of life: Okapi at TREC
Information Processing and Management: an International Journal - The sixth text REtrieval conference (TREC-6)
A vector space model for automatic indexing
Communications of the ACM
Automatic scientific text classification using local patterns: KDD CUP 2002 (task 1)
ACM SIGKDD Explorations Newsletter
Text classification using string kernels
The Journal of Machine Learning Research
An extensive empirical study of feature selection metrics for text classification
The Journal of Machine Learning Research
Improving Text Classification using Local Latent Semantic Indexing
ICDM '04 Proceedings of the Fourth IEEE International Conference on Data Mining
Reducing the human overhead in text categorization
Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Design and Analysis of Experiments
Design and Analysis of Experiments
A semantic approach to IE pattern induction
ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Kernel methods, syntax and semantics for relational text categorization
Proceedings of the 17th ACM conference on Information and knowledge management
Comparing information extraction pattern models
IEBeyondDoc '06 Proceedings of the Workshop on Information Extraction Beyond The Document
SigDIAL '06 Proceedings of the 7th SIGdial Workshop on Discourse and Dialogue
Text categorization with class-based and corpus-based keyword selection
ISCIS'05 Proceedings of the 20th international conference on Computer and Information Sciences
Feature selection in text classification via SVM and LSI
ISNN'06 Proceedings of the Third international conference on Advances in Neural Networks - Volume Part I
Text categorization based on fuzzy soft set theory
ICCSA'12 Proceedings of the 12th international conference on Computational Science and Its Applications - Volume Part IV
A high performance centroid-based classification approach for language identification
Pattern Recognition Letters
Hi-index | 0.10 |
We propose a novel text classification approach based on two main concepts, lexical dependency and pruning. We extend the standard bag-of-words method by including dependency patterns in the feature vector. We perform experiments with 37 lexical dependencies and the effect of each dependency type is analyzed separately in order to identify the most discriminative dependencies. We analyze the effect of pruning (filtering features with low frequencies) for both word features and dependency features. Parameter tuning is performed with eight different pruning levels to determine the optimal levels. The experiments were repeated on three datasets with different characteristics. We observed a significant improvement on the success rates as well as a reduction on the dimensionality of the feature vector. We argue that, in contrast to the works in the literature, a much higher pruning level should be used in text classification. By analyzing the results from the dataset perspective, we also show that datasets in similar formality levels have similar leading dependencies and show close behavior with varying pruning levels.