Text classification with the support of pruned dependency patterns

  • Authors:
  • Levent Özgür;Tunga Güngör

  • Affiliations:
  • Department of Computer Engineering, Boğaziçi University, Bebek, 34342 Istanbul, Turkey;Department of Computer Engineering, Boğaziçi University, Bebek, 34342 Istanbul, Turkey

  • Venue:
  • Pattern Recognition Letters
  • Year:
  • 2010

Quantified Score

Hi-index 0.10

Visualization

Abstract

We propose a novel text classification approach based on two main concepts, lexical dependency and pruning. We extend the standard bag-of-words method by including dependency patterns in the feature vector. We perform experiments with 37 lexical dependencies and the effect of each dependency type is analyzed separately in order to identify the most discriminative dependencies. We analyze the effect of pruning (filtering features with low frequencies) for both word features and dependency features. Parameter tuning is performed with eight different pruning levels to determine the optimal levels. The experiments were repeated on three datasets with different characteristics. We observed a significant improvement on the success rates as well as a reduction on the dimensionality of the feature vector. We argue that, in contrast to the works in the literature, a much higher pruning level should be used in text classification. By analyzing the results from the dataset perspective, we also show that datasets in similar formality levels have similar leading dependencies and show close behavior with varying pruning levels.