Improving relevance feedback in the vector space model
CIKM '97 Proceedings of the sixth international conference on Information and knowledge management
Foundations of statistical natural language processing
Foundations of statistical natural language processing
A study of smoothing methods for language models applied to Ad Hoc information retrieval
Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Simulation of user judgments in bibliographic retrieval systems
SIGIR '81 Proceedings of the 4th annual international ACM SIGIR conference on Information storage and retrieval: theoretical issues in information retrieval
Probabilistic models of information retrieval based on measuring the divergence from randomness
ACM Transactions on Information Systems (TOIS)
Problems in the simulation of bibliographic retrieval systems
SIGIR '80 Proceedings of the 3rd annual ACM conference on Research and development in information retrieval
Minimal test collections for retrieval evaluation
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Training linear SVMs in linear time
Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
On rank-based effectiveness measures and optimization
Information Retrieval
Pegasos: Primal Estimated sub-GrAdient SOlver for SVM
Proceedings of the 24th international conference on Machine learning
Building simulated queries for known-item topics: an analysis using six european languages
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Incorporating term dependency in the dfr framework
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
HLT '02 Proceedings of the second international conference on Human Language Technology Research
Evaluation over thousands of queries
Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Search Engines: Information Retrieval in Practice
Search Engines: Information Retrieval in Practice
Bridging Language Modeling and Divergence from Randomness Models: A Log-Logistic Model for IR
ICTIR '09 Proceedings of the 2nd International Conference on Theory of Information Retrieval: Advances in Information Retrieval Theory
Learning to Rank for Information Retrieval
Foundations and Trends in Information Retrieval
Retrieval experiments using pseudo-desktop collections
Proceedings of the 18th ACM conference on Information and knowledge management
What is Twitter, a social network or a news media?
Proceedings of the 19th international conference on World wide web
Comparing click-through data to purchase decisions for retrieval evaluation
Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
Validating query simulators: an experiment using commercial searches and purchases
CLEF'10 Proceedings of the 2010 international conference on Multilingual and multimodal information access evaluation: cross-language evaluation forum
#TwitterSearch: a comparison of microblog search and web search
Proceedings of the fourth ACM international conference on Web search and data mining
Information search and retrieval in microblogs
Journal of the American Society for Information Science and Technology
Incorporating query expansion and quality indicators in searching microblog posts
ECIR'11 Proceedings of the 33rd European conference on Advances in information retrieval
Estimation methods for ranking recent information
Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Pseudo test collections for learning web search ranking functions
Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Searching microblogs: coping with sparsity and document quality
Proceedings of the 20th ACM international conference on Information and knowledge management
A nugget-based test collection construction paradigm
Proceedings of the 20th ACM international conference on Information and knowledge management
ECIR'06 Proceedings of the 28th European conference on Advances in Information Retrieval
Credibility-inspired ranking for blog post retrieval
Information Retrieval
Generating pseudo test collections for learning to rank scientific articles
CLEF'12 Proceedings of the Third international conference on Information Access Evaluation: multilinguality, multimodality, and visual analytics
Microblog language identification: overcoming the limitations of short, unedited and idiomatic text
Language Resources and Evaluation
Hi-index | 0.00 |
Recent years have witnessed a persistent interest in generating pseudo test collections, both for training and evaluation purposes. We describe a method for generating queries and relevance judgments for microblog search in an unsupervised way. Our starting point is this intuition: tweets with a hashtag are relevant to the topic covered by the hashtag and hence to a suitable query derived from the hashtag. Our baseline method selects all commonly used hashtags, and all associated tweets as relevance judgments; we then generate a query from these tweets. Next, we generate a timestamp for each query, allowing us to use temporal information in the training process. We then enrich the generation process with knowledge derived from an editorial test collection for microblog search. We use our pseudo test collections in two ways. First, we tune parameters of a variety of well known retrieval methods on them. Correlations with parameter sweeps on an editorial test collection are high on average, with a large variance over retrieval algorithms. Second, we use the pseudo test collections as training sets in a learning to rank scenario. Performance close to training on an editorial test collection is achieved in many cases. Our results demonstrate the utility of tuning and training microblog search algorithms on automatically generated training material.