Improving retrieval performance by relevance feedback
Readings in information retrieval
Learning to extract symbolic knowledge from the World Wide Web
AAAI '98/IAAI '98 Proceedings of the fifteenth national/tenth conference on Artificial intelligence/Innovative applications of artificial intelligence
Learning to classify text from labeled and unlabeled documents
AAAI '98/IAAI '98 Proceedings of the fifteenth national/tenth conference on Artificial intelligence/Innovative applications of artificial intelligence
IEEE Intelligent Systems
A Probabilistic Analysis of the Rocchio Algorithm with TFIDF for Text Categorization
ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Transductive Inference for Text Classification using Support Vector Machines
ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Building Text Classifiers Using Positive and Unlabeled Examples
ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
PEBL: Web Page Classification without Negative Examples
IEEE Transactions on Knowledge and Data Engineering
Improving web search ranking by incorporating user behavior information
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Learning to rank with partially-labeled data
Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Introduction to Information Retrieval
Introduction to Information Retrieval
Learning classifiers from only positive and unlabeled data
Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Learning to Find Relevant Biological Articles without Negative Training Examples
AI '08 Proceedings of the 21st Australasian Joint Conference on Artificial Intelligence: Advances in Artificial Intelligence
Learning to classify texts using positive and unlabeled data
IJCAI'03 Proceedings of the 18th international joint conference on Artificial intelligence
Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data
Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data
Beyond keyword search: discovering relevant scientific literature
Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Proceedings of the 13th ACM/IEEE-CS joint conference on Digital libraries
Hi-index | 0.00 |
It is often difficult for users to adopt keywords to express their information needs. Search-By-Multiple-Examples (SBME), a promising method for overcoming this problem, allows users to specify their information needs as a set of relevant documents rather than as a set of keywords. Most of the studies on SBME adopt the Positive Unlabeled learning (PU learning) techniques by treating the users' provided examples (denote as query examples) as positive set and the entire data collection as unlabeled set. However, it is inefficient to treat the entire data collection as unlabeled set, as its size can be huge. In addition, the query examples are treated as being relevant to a single topic, but it is often the case that they can be relevant to multiple topics. As the query examples are much fewer than the unlabeled data, the system performance may downgrade dramatically because of the class imbalance problem. What's more, the experiments conducted in these studies have not taken into account the settings in online search, which are very different from the controlled experiments scenario. This proposed research seeks to explore how to improve SBME by exploring: (1) how to predict user' information needs by modeling the content of the documents using probabilistic topic models; (2) how to deal with the class imbalance problem by reducing the size of the unlabeled data and adopting machine learning techniques. We will also conduct extensive experiments to better evaluate SBME using different sizes of query examples to simulate users' information needs.