Learning to classify text from labeled and unlabeled documents
AAAI '98/IAAI '98 Proceedings of the fifteenth national/tenth conference on Artificial intelligence/Innovative applications of artificial intelligence
Partially Supervised Classification of Text Documents
ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
PEBL: Web Page Classification without Negative Examples
IEEE Transactions on Knowledge and Data Engineering
Improving web search ranking by incorporating user behavior information
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data (Data-Centric Systems and Applications)
Learning to Find Relevant Biological Articles without Negative Training Examples
AI '08 Proceedings of the 21st Australasian Joint Conference on Artificial Intelligence: Advances in Artificial Intelligence
Negative training data can be harmful to text classification
EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
LIBSVM: A library for support vector machines
ACM Transactions on Intelligent Systems and Technology (TIST)
Proceedings of the 7th ACM international conference on Web search and data mining
Hi-index | 0.00 |
With the amount of digitalized documents increasing exponentially, it is more difficult for users to keep up to date with the knowledge in their domain. In this paper, we present a framework named IFME (Information Filtering by Multiple Examples) in a digital library environment to help users identify the literature related to their interests by leveraging the Positive Unlabeled learning (PU learning). Using a few relevant documents provided by a user and considering the documents in an online database as unlabeled data (called U), it ranks the documents in U using a PU learning algorithm. From the experimental results, we found that while the approach performed well when a large set of relevant feedback documents were available, it performed relatively poor when the relevant feedback documents were few. We improved IFME by combining PU learning with under-sampling to tune the performance. Using Mean Average Precision (MAP), our experimental results indicated that with under-sampling, the performance improved significantly even when the size of P was small. We believe the PU learning based IFME framework brings insights to develop more effective digital library systems.