Information-based objective functions for active data selection
Neural Computation
Automatic combination of multiple ranked retrieval systems
SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
Combining the evidence of multiple query representations for information retrieval
TREC-2 Proceedings of the second conference on Text retrieval conference
Selection of relevant features and examples in machine learning
Artificial Intelligence - Special issue on relevance
Improved Boosting Algorithms Using Confidence-rated Predictions
Machine Learning - The Eleventh Annual Conference on computational Learning Theory
Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Condorcet fusion for improved retrieval
Proceedings of the eleventh international conference on Information and knowledge management
A decision-theoretic generalization of on-line learning and an application to boosting
EuroCOLT '95 Proceedings of the Second European Conference on Computational Learning Theory
Automatic discovery of query-class-dependent models for multimodal search
Proceedings of the 13th annual ACM international conference on Multimedia
ProbFuse: a probabilistic approach to data fusion
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Probabilistic latent query analysis for combining multiple retrieval sources
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Document selection methodologies for efficient and effective learning-to-rank
Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
A few good topics: Experiments in topic set reduction for retrieval evaluation
ACM Transactions on Information Systems (TOIS)
Relying on topic subsets for system ranking estimation
Proceedings of the 18th ACM conference on Information and knowledge management
Segmentation of search engine results for effective data-fusion
ECIR'07 Proceedings of the 29th European conference on IR research
Extending probabilistic data fusion using sliding windows
ECIR'08 Proceedings of the IR research, 30th European conference on Advances in information retrieval
Estimating probabilities for effective data fusion
Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
On the contributions of topics to system evaluation
ECIR'11 Proceedings of the 33rd European conference on Advances in information retrieval
Improving generalization by data categorization
PKDD'05 Proceedings of the 9th European conference on Principles and Practice of Knowledge Discovery in Databases
Training data selection for support vector machines
ICNC'05 Proceedings of the First international conference on Advances in Natural Computation - Volume Part I
Hi-index | 0.00 |
Data fusion is to merge the results of multiple independent retrieval models into a single ranked list. Several earlier studies have shown that the combination of different models can improve the retrieval performance better than using any of the individual models. Although many promising results have been given by supervised fusion methods, training data sampling has attracted little attention in previous work of data fusion. By observing some evaluations on TREC and NTCIR datasets, we found that the performance of one model varied largely from one training example to another, so that not all training examples were equivalently effective. In this paper, we propose two novel approaches: greedy and boosting approaches, which select effective training data by query sampling to improve the performance of supervised data fusion algorithms such as BayesFuse, probFuse and MAPFuse. Extensive experiments were conducted on five data sets including TREC-3,4,5 and NTCIR-3,4. The results show that our sampling approaches can significantly improve the retrieval performance of those data fusion methods.