Feature Reduction and Database Maintenance in NETNEWS Classification

  • Authors:
  • Wen-Lin Hsu;Sheau-Dong Lang

  • Affiliations:
  • -;-

  • Venue:
  • IDEAS '99 Proceedings of the 1999 International Symposium on Database Engineering & Applications
  • Year:
  • 1999

Quantified Score

Hi-index 0.00

Visualization

Abstract

We propose a statistical feature-reduction technique to filter out the most ambiguous articles in the training data for categorizing the NETNEWS articles. We also incorporate a batch updating scheme to periodically do maintenance on the term structures of the news database after training. The baseline method combines the terms of all the articles of each newsgroup in the training set to represent the newsgroups as single vectors. After training, the incoming news articles are classified based on their similarity to the existing newsgroup categories. Our implementation uses an inverted file to store the trained term structures of each newsgroup, and uses a list similar to the inverted file to buffer the newly arrival articles, for efficient routing and updating purposes. Our experimental results using real NETNEWS articles and newsgroups demonstrate (1) applying feature reduction to the training set improves the routing accuracy, efficiency, and database storage; (2) updating improves the routing accuracy; and (3) the batch technique improves the efficiency of the updating operation.