Information retrieval: data structures and algorithms
Information retrieval: data structures and algorithms
Phonetic string matching: lessons from information retrieval
SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
Retrieval effectiveness of proper name search methods
Information Processing and Management: an International Journal
Context-sensitive learning methods for text categorization
ACM Transactions on Information Systems (TOIS)
Data mining: practical machine learning tools and techniques with Java implementations
Data mining: practical machine learning tools and techniques with Java implementations
Projekt Der Deutsche Wortschatz
Linguistik und neue Medien [10. Jahrestagung der GLDV
PIRE: an extensible IR engine based on probabilistic datalog
ECIR'05 Proceedings of the 27th European conference on Advances in Information Retrieval Research
Retrieval in text collections with historic spelling using linguistic and spelling variants
Proceedings of the 7th ACM/IEEE-CS joint conference on Digital libraries
Identifying Quotations in Reference Works and Primary Materials
ECDL '08 Proceedings of the 12th European conference on Research and Advanced Technology for Digital Libraries
Proceedings of The Third Workshop on Analytics for Noisy Unstructured Text Data
On lexical resources for digitization of historical documents
Proceedings of the 9th ACM symposium on Document engineering
Efficiently generating correction suggestions for garbled tokens of historical language
Natural Language Engineering
A gold standard corpus of early modern German
LAW V '11 Proceedings of the 5th Linguistic Annotation Workshop
Computation of similarity: similarity search as computation
CiE'11 Proceedings of the 7th conference on Models of computation in context: computability in Europe
Progress in information retrieval
ECIR'06 Proceedings of the 28th European conference on Advances in Information Retrieval
Unsupervised profiling of OCRed historical documents
Pattern Recognition
Normalizing historical orthography for OCR historical documents using LSTM
Proceedings of the 2nd International Workshop on Historical Document Imaging and Processing
Hi-index | 0.00 |
In this paper, we describe a new approach for retrieval in texts with non-standard spelling, which is important for historic texts in English or German. For this purpose, we present a new algorithm for generating search term variants in ancient orthography. By applying a spell checker on a corpus of historic texts, we generate a list of candidate terms for which the contemporary spellings have to be assigned manually. Then our algorithm produces a set of probabilistic rules. These probabilities can be considered for ranking in the retrieval stage. An experimental comparison shows that our approach outperforms competing methods.