IFME: information filtering by multiple examples with under-sampling in a digital library environment

  • Authors:
  • Mingzhu Zhu;Chao Xu;Yi-Fang Brook Wu

  • Affiliations:
  • New Jersey Institute of Technology, Newark, NJ, USA;New Jersey Institute of Technology, Newark, NJ, USA;New Jersey Institute of Technology, Newark, NJ, USA

  • Venue:
  • Proceedings of the 13th ACM/IEEE-CS joint conference on Digital libraries
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

With the amount of digitalized documents increasing exponentially, it is more difficult for users to keep up to date with the knowledge in their domain. In this paper, we present a framework named IFME (Information Filtering by Multiple Examples) in a digital library environment to help users identify the literature related to their interests by leveraging the Positive Unlabeled learning (PU learning). Using a few relevant documents provided by a user and considering the documents in an online database as unlabeled data (called U), it ranks the documents in U using a PU learning algorithm. From the experimental results, we found that while the approach performed well when a large set of relevant feedback documents were available, it performed relatively poor when the relevant feedback documents were few. We improved IFME by combining PU learning with under-sampling to tune the performance. Using Mean Average Precision (MAP), our experimental results indicated that with under-sampling, the performance improved significantly even when the size of P was small. We believe the PU learning based IFME framework brings insights to develop more effective digital library systems.