IFME: information filtering by multiple examples with under-sampling in a digital library environment

Authors:
Mingzhu Zhu;Chao Xu;Yi-Fang Brook Wu
Affiliations:
New Jersey Institute of Technology, Newark, NJ, USA;New Jersey Institute of Technology, Newark, NJ, USA;New Jersey Institute of Technology, Newark, NJ, USA
Venue:
Proceedings of the 13th ACM/IEEE-CS joint conference on Digital libraries
Year:
2013

Citing 8
Cited 1

Learning to classify text from labeled and unlabeled documents

AAAI '98/IAAI '98 Proceedings of the fifteenth national/tenth conference on Artificial intelligence/Innovative applications of artificial intelligence
Partially Supervised Classification of Text Documents

ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
PEBL: Web Page Classification without Negative Examples

IEEE Transactions on Knowledge and Data Engineering
Improving web search ranking by incorporating user behavior information

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data (Data-Centric Systems and Applications)

Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data (Data-Centric Systems and Applications)
Learning to Find Relevant Biological Articles without Negative Training Examples

AI '08 Proceedings of the 21st Australasian Joint Conference on Artificial Intelligence: Advances in Artificial Intelligence
Negative training data can be harmful to text classification

EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
LIBSVM: A library for support vector machines

ACM Transactions on Intelligent Systems and Technology (TIST)

Search by multiple examples

Proceedings of the 7th ACM international conference on Web search and data mining

Quantified Score

Hi-index	0.00

Visualization

Abstract

With the amount of digitalized documents increasing exponentially, it is more difficult for users to keep up to date with the knowledge in their domain. In this paper, we present a framework named IFME (Information Filtering by Multiple Examples) in a digital library environment to help users identify the literature related to their interests by leveraging the Positive Unlabeled learning (PU learning). Using a few relevant documents provided by a user and considering the documents in an online database as unlabeled data (called U), it ranks the documents in U using a PU learning algorithm. From the experimental results, we found that while the approach performed well when a large set of relevant feedback documents were available, it performed relatively poor when the relevant feedback documents were few. We improved IFME by combining PU learning with under-sampling to tune the performance. Using Mean Average Precision (MAP), our experimental results indicated that with under-sampling, the performance improved significantly even when the size of P was small. We believe the PU learning based IFME framework brings insights to develop more effective digital library systems.