Searching for historical word-forms in a database of 17th-century English text using spelling-correction methods

Authors:
Alexander M. Robertson;Peter Willett
Affiliations:
Department of Information Studies, University of Sheffield, Western Bank, Sheffield, UK, S10 2TN;Department of Information Studies, University of Sheffield, Western Bank, Sheffield, UK, S10 2TN
Venue:
SIGIR '92 Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval
Year:
1992

Citing 6
Cited 6

Algorithms

Algorithms
`Fisching fore weds': phonetic retrieval of written text in information systems

Program
The String-to-String Correction Problem

Journal of the ACM (JACM)
Automatic spelling correction in scientific and scholarly text

Communications of the ACM
A technique for computer detection and correction of spelling errors

Communications of the ACM
Retrieval of misspelled names in an airlines passenger record system

Communications of the ACM

Techniques for automatically correcting words in text

ACM Computing Surveys (CSUR)
Document and passage retrieval based on hidden Markov models

SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
Design of an interactive spell checker: optimizing the list of offered words

Decision Support Systems
A morphologically sensitive clustering algorithm for identifying Arabic roots

ACL '00 Proceedings of the 38th Annual Meeting on Association for Computational Linguistics
The method of N-grams in large-scale clustering of DNA texts

Pattern Recognition
A cross-language approach to historic document retrieval

ECIR'06 Proceedings of the 28th European conference on Advances in Information Retrieval

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper discusses the application of algorithmic spelling-correction techniques to the identification of those words in a database of 17th century English text that are most similar to a query word in modern English. The experiments have used n-gram matching, non-phonetic coding and dynamic programming methods for spelling correction, and have demonstrated that high-recall searches can be carried out, although some of the searches are very demanding of computational resources. The methods are, in principle, applicable to historical texts in many languages and from many diffeent periods.