Query sampling for learning data fusion

Authors:
Ting-Chu Lin;Pu-Jen Cheng
Affiliations:
Department of Computer Science and Information Engineering National Taiwan University, Taiwan, Taipei, Taiwan Roc;Department of Computer Science and Information Engineering National Taiwan University, Taiwan, Taipei, Taiwan Roc
Venue:
Proceedings of the 20th ACM international conference on Information and knowledge management
Year:
2011

Citing 20
Cited 0

Information-based objective functions for active data selection

Neural Computation
Automatic combination of multiple ranked retrieval systems

SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
Combining the evidence of multiple query representations for information retrieval

TREC-2 Proceedings of the second conference on Text retrieval conference
Selection of relevant features and examples in machine learning

Artificial Intelligence - Special issue on relevance
Improved Boosting Algorithms Using Confidence-rated Predictions

Machine Learning - The Eleventh Annual Conference on computational Learning Theory
Models for metasearch

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Condorcet fusion for improved retrieval

Proceedings of the eleventh international conference on Information and knowledge management
A decision-theoretic generalization of on-line learning and an application to boosting

EuroCOLT '95 Proceedings of the Second European Conference on Computational Learning Theory
Automatic discovery of query-class-dependent models for multimodal search

Proceedings of the 13th annual ACM international conference on Multimedia
ProbFuse: a probabilistic approach to data fusion

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Probabilistic latent query analysis for combining multiple retrieval sources

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Document selection methodologies for efficient and effective learning-to-rank

Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
A few good topics: Experiments in topic set reduction for retrieval evaluation

ACM Transactions on Information Systems (TOIS)
Relying on topic subsets for system ranking estimation

Proceedings of the 18th ACM conference on Information and knowledge management
Segmentation of search engine results for effective data-fusion

ECIR'07 Proceedings of the 29th European conference on IR research
Extending probabilistic data fusion using sliding windows

ECIR'08 Proceedings of the IR research, 30th European conference on Advances in information retrieval
Estimating probabilities for effective data fusion

Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
On the contributions of topics to system evaluation

ECIR'11 Proceedings of the 33rd European conference on Advances in information retrieval
Improving generalization by data categorization

PKDD'05 Proceedings of the 9th European conference on Principles and Practice of Knowledge Discovery in Databases
Training data selection for support vector machines

ICNC'05 Proceedings of the First international conference on Advances in Natural Computation - Volume Part I

Quantified Score

Hi-index	0.00

Visualization

Abstract

Data fusion is to merge the results of multiple independent retrieval models into a single ranked list. Several earlier studies have shown that the combination of different models can improve the retrieval performance better than using any of the individual models. Although many promising results have been given by supervised fusion methods, training data sampling has attracted little attention in previous work of data fusion. By observing some evaluations on TREC and NTCIR datasets, we found that the performance of one model varied largely from one training example to another, so that not all training examples were equivalently effective. In this paper, we propose two novel approaches: greedy and boosting approaches, which select effective training data by query sampling to improve the performance of supervised data fusion algorithms such as BayesFuse, probFuse and MAPFuse. Extensive experiments were conducted on five data sets including TREC-3,4,5 and NTCIR-3,4. The results show that our sampling approaches can significantly improve the retrieval performance of those data fusion methods.