Search by multiple examples

Authors:
Mingzhu Zhu;Yi-Fang Brook Wu
Affiliations:
New Jersey Institute of Technology, Newark, NJ, USA;New Jersey Institute of Technology, Newark, NJ, USA
Venue:
Proceedings of the 7th ACM international conference on Web search and data mining
Year:
2014

Citing 17
Cited 0

Improving retrieval performance by relevance feedback

Readings in information retrieval
Learning to extract symbolic knowledge from the World Wide Web

AAAI '98/IAAI '98 Proceedings of the fifteenth national/tenth conference on Artificial intelligence/Innovative applications of artificial intelligence
Learning to classify text from labeled and unlabeled documents

AAAI '98/IAAI '98 Proceedings of the fifteenth national/tenth conference on Artificial intelligence/Innovative applications of artificial intelligence
Support Vector Machines

IEEE Intelligent Systems
A Probabilistic Analysis of the Rocchio Algorithm with TFIDF for Text Categorization

ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Transductive Inference for Text Classification using Support Vector Machines

ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Building Text Classifiers Using Positive and Unlabeled Examples

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
PEBL: Web Page Classification without Negative Examples

IEEE Transactions on Knowledge and Data Engineering
Improving web search ranking by incorporating user behavior information

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Learning to rank with partially-labeled data

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Introduction to Information Retrieval

Introduction to Information Retrieval
Learning classifiers from only positive and unlabeled data

Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Learning to Find Relevant Biological Articles without Negative Training Examples

AI '08 Proceedings of the 21st Australasian Joint Conference on Artificial Intelligence: Advances in Artificial Intelligence
Learning to classify texts using positive and unlabeled data

IJCAI'03 Proceedings of the 18th international joint conference on Artificial intelligence
Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data

Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data
Beyond keyword search: discovering relevant scientific literature

Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
IFME: information filtering by multiple examples with under-sampling in a digital library environment

Proceedings of the 13th ACM/IEEE-CS joint conference on Digital libraries

Quantified Score

Hi-index	0.00

Visualization

Abstract

It is often difficult for users to adopt keywords to express their information needs. Search-By-Multiple-Examples (SBME), a promising method for overcoming this problem, allows users to specify their information needs as a set of relevant documents rather than as a set of keywords. Most of the studies on SBME adopt the Positive Unlabeled learning (PU learning) techniques by treating the users' provided examples (denote as query examples) as positive set and the entire data collection as unlabeled set. However, it is inefficient to treat the entire data collection as unlabeled set, as its size can be huge. In addition, the query examples are treated as being relevant to a single topic, but it is often the case that they can be relevant to multiple topics. As the query examples are much fewer than the unlabeled data, the system performance may downgrade dramatically because of the class imbalance problem. What's more, the experiments conducted in these studies have not taken into account the settings in online search, which are very different from the controlled experiments scenario. This proposed research seeks to explore how to improve SBME by exploring: (1) how to predict user' information needs by modeling the content of the documents using probabilistic topic models; (2) how to deal with the class imbalance problem by reducing the size of the unlabeled data and adopting machine learning techniques. We will also conduct extensive experiments to better evaluate SBME using different sizes of query examples to simulate users' information needs.