Classification of RSS-Formatted documents using full text similarity measures

  • Authors:
  • Katarzyna Wegrzyn-Wolska;Piotr S. Szczepaniak

  • Affiliations:
  • Ecole Supérieure d'Ingenieurs en Informatique et Génie de Télécommunication, Avon-Fontainebleau, France;Institute of Computer Science, Technical University of Lodz, Lodz, Poland

  • Venue:
  • ICWE'05 Proceedings of the 5th international conference on Web Engineering
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

The Web is enormous, unlimited and dynamically changed source of useful and varied kinds of information. The news is one of the most rapidly changing kinds of information. The departure for this paper is presentation of RSS, an useful data format used frequently by publishers of news; some statistics related to news syndication illustrate the actual situation. Then, two recently developed methods for examination of similarity of textual documents are briefly presented. Since RSS-supplied records always contain the same type of information (headlines, links, article summaries, etc.), application of methods of presented type makes their diverse applications like automatic news classification and filtering easier.