A probabilistic learning approach for document indexing
ACM Transactions on Information Systems (TOIS) - Special issue on research and development in information retrieval
Document language models, query models, and risk minimization for information retrieval
Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Cumulated gain-based evaluation of IR techniques
ACM Transactions on Information Systems (TOIS)
Introduction to topic detection and tracking
Topic detection and tracking
On the bursty evolution of blogspace
WWW '03 Proceedings of the 12th international conference on World Wide Web
Combining document representations for known-item search
Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Newsjunkie: providing personalized newsfeeds via analysis of information novelty
Proceedings of the 13th international conference on World Wide Web
Information diffusion through blogspace
Proceedings of the 13th international conference on World Wide Web
Retrieval evaluation with incomplete information
Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Cluster-based retrieval using language models
Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Corpus structure, language models, and ad hoc information retrieval
Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Simple BM25 extension to multiple weighted fields
Proceedings of the thirteenth ACM international conference on Information and knowledge management
Choosing document structure weights
Information Processing and Management: an International Journal
Integrating word relationships into language models
Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Improving web search ranking by incorporating user behavior information
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
LDA-based document models for ad-hoc retrieval
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Efficient search in large textual collections with redundancy
Proceedings of the 16th international conference on World Wide Web
TopX: efficient and versatile top-k query processing for semistructured data
The VLDB Journal — The International Journal on Very Large Data Bases
Mining the search trails of surfing crowds: identifying relevant websites from user activity
Proceedings of the 17th international conference on World Wide Web
Recrawl scheduling based on information longevity
Proceedings of the 17th international conference on World Wide Web
Finding the right facts in the crowd: factoid question answering over social media
Proceedings of the 17th international conference on World Wide Web
Retrieval and feedback models for blog feed search
Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Discriminative probabilistic models for passage based retrieval
Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Discovering key concepts in verbose queries
Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Overview of the INEX 2007 Ad Hoc Track
Focused Access to XML Documents
An improved markov random field model for supporting verbose queries
Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Compact full-text indexing of versioned document collections
Proceedings of the 18th ACM conference on Information and knowledge management
Leveraging temporal dynamics of document content in relevance ranking
Proceedings of the third ACM international conference on Web search and data mining
Linear time series models for term weighting in information retrieval
Journal of the American Society for Information Science and Technology
Analysis of the INEX 2009 ad hoc track results
INEX'09 Proceedings of the Focused retrieval and evaluation, and 8th international conference on Initiative for the evaluation of XML retrieval
Detecting and exploiting stability in evolving heterogeneous information spaces
Proceedings of the 11th annual international ACM/IEEE joint conference on Digital libraries
Temporal latent semantic analysis for collaboratively generated content: preliminary results
Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Keeping keywords fresh: a BM25 variation for personalized keyword extraction
Proceedings of the 2nd Temporal Web Analytics Workshop
User edits classification using document revision histories
EACL '12 Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics
Collaboratively built semi-structured content and Artificial Intelligence: The story so far
Artificial Intelligence
Temporal web dynamics and its application to information retrieval
Proceedings of the sixth ACM international conference on Web search and data mining
Hi-index | 0.00 |
The generative process underlies many information retrieval models, notably statistical language models. Yet these models only examine one (current) version of the document, effectively ignoring the actual document generation process. We posit that a considerable amount of information is encoded in the document authoring process, and this information is complementary to the word occurrence statistics upon which most modern retrieval models are based. We propose a new term weighting model, Revision History Analysis (RHA), which uses the revision history of a document (e.g., the edit history of a page in Wikipedia) to redefine term frequency - a key indicator of document topic/relevance for many retrieval models and text processing tasks. We then apply RHA to document ranking by extending two state-of-the-art text retrieval models, namely, BM25 and the generative statistical language model (LM). To the best of our knowledge, our paper is the first attempt to directly incorporate document authoring history into retrieval models. Empirical results show that RHA provides consistent improvements for state-of-the-art retrieval models, using standard retrieval tasks and benchmarks.