Document retrieval: shallow data, deep theories; historical reflections, potential directions

Authors:
Karen Spärck Jones
Affiliations:
Computer Laboratory, University of Cambridge, Cambridge, UK
Venue:
ECIR'03 Proceedings of the 25th European conference on IR research
Year:
2003

Citing 26
Cited 2

Another look at automatic text-retrieval systems

Communications of the ACM
Term-weighting approaches in automatic text retrieval

Information Processing and Management: an International Journal
The automatic indexing system AIR/PHYS - from research to applications

SIGIR '88 Proceedings of the 11th annual international ACM SIGIR conference on Research and development in information retrieval
On the application of syntactic methodologies in automatic text analysis

Information Processing and Management: an International Journal - Special issue on natural language processing and information retrieval
Models for retrieval with probabilistic indexing

Information Processing and Management: an International Journal - Modeling data, information and knowledge
Inference networks for document retrieval

SIGIR '90 Proceedings of the 13th annual international ACM SIGIR conference on Research and development in information retrieval
The use of phrases and structured queries in information retrieval

SIGIR '91 Proceedings of the 14th annual international ACM SIGIR conference on Research and development in information retrieval
Class-based n-gram models of natural language

Computational Linguistics
A network approach to probabilistic information retrieval

ACM Transactions on Information Systems (TOIS)
A language modeling approach to information retrieval

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
A hidden Markov model information retrieval system

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Information retrieval as statistical translation

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
On Relevance, Probabilistic Indexing and Information Retrieval

Journal of the ACM (JACM)
A vector space model for automatic indexing

Communications of the ACM
A probabilistic model of information retrieval: development and comparative experiments

Information Processing and Management: an International Journal
Information Retrieval

Information Retrieval
Theory of Indexing

Theory of Indexing
Term-specific smoothing for the language modeling approach to information retrieval: the importance of a query term

SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
On the MSE robustness of batching estimators

Proceedings of the 33nd conference on Winter simulation
Summarization beyond sentence extraction: a probabilistic approach to sentence compression

Artificial Intelligence
Language Modeling for Information Retrieval

Language Modeling for Information Retrieval
Headline generation based on statistical translation

ACL '00 Proceedings of the 38th Annual Meeting on Association for Computational Linguistics
Automatic Information Organization and Retrieval.

Automatic Information Organization and Retrieval.
The SMART Retrieval System—Experiments in Automatic Document Processing

The SMART Retrieval System—Experiments in Automatic Document Processing
TREC: Experiment and Evaluation in Information Retrieval (Digital Libraries and Electronic Publishing)

TREC: Experiment and Evaluation in Information Retrieval (Digital Libraries and Electronic Publishing)
A statistical approach to mechanized encoding and searching of literary information

IBM Journal of Research and Development

Using the shape recovery method to evaluate indexing techniques

Journal of the American Society for Information Science and Technology
On whose shoulders?

Computational Linguistics

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper reviews the development of statistically-based retrieval. Since the 1950s statistical techniques have clearly demonstrated their practical worth and statistical theories their staying power, for document or text retrieval. In the last decade the TREC programme, and the Web, have offered new retrieval challenges to which these methods have successfully risen. They are now one element in the much wider and very productive spread of statistical methods to all areas of information and language processing, in which innovative approaches to modelling their data and tasks are being applied.