Selecting interesting articles using their similarity based only on positive examples

  • Authors:
  • Jiří Hroza;Jan Žižka

  • Affiliations:
  • Faculty of Informatics, Department of Information Technologies, Masaryk University, Brno, Czech Republic;Faculty of Informatics, Department of Information Technologies, Masaryk University, Brno, Czech Republic

  • Venue:
  • CICLing'05 Proceedings of the 6th international conference on Computational Linguistics and Intelligent Text Processing
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

The task of automated searching for interesting text documents frequently suffers from a very poor balance among documents representing both positive and negative examples or from one completely missing class. This paper suggests the ranking approach based on the k-NN algorithm adapted for determining the similarity degree of new documents just to the representative positive collection. From the viewpoint of the precision-recall relation, a user can decide in advance how many and how similar articles should be released through a filter.