A simple probabilistic approach to classification and routing

  • Authors:
  • Louise Guthrie;James Leistensnider

  • Affiliations:
  • Lockheed Martin Corporation, Philadelphia, PA;Lockheed Martin Corporation, Philadelphia, PA

  • Venue:
  • TIPSTER '96 Proceedings of a workshop on held at Vienna, Virginia: May 6-8, 1996
  • Year:
  • 1996

Quantified Score

Hi-index 0.00

Visualization

Abstract

Several classification and routing methods were implemented and compared. The experiments used FBIS documents from four categories, and the measures used were the tf.idf and Cosine similarity measures, and a maximum likelihood estimate based on assuming a Multinomial Distribution for the various topics (populations). In addition, the SMART program was run with 'lnc.ltc' weighting and compared to the others.Decisions for both our classification scheme (documents are put into any number of disjoint categories) and our routing scheme (documents are assigned a 'score' and ranked relative to each category) are based on the highest probability for correct classification or routing. All of the techniques described here are fully automatic, and use a training set of relevant documents to produce lists of distinguishing terms and weights. All methods (ours and the ones we compared to) gave excellent results for the classification task, while the one based on the Multinomial Distribution produced the best results on the routing task.