New technology and new roles: the need for “corpus editors”
DL '00 Proceedings of the fifth ACM conference on Digital libraries
Procorph: um sistema de apoio à criação de dicionários históricos
Companion Proceedings of the XIV Brazilian Symposium on Multimedia and the Web
Hi-index | 0.00 |
Early modern books written in Latin contain many abbreviations of common words that are derived from earlier manuscript practice. While these abbreviations are usually easily deciphered by a reader well-versed in Latin, they pose technical problems for full text digitization: they are difficult to OCR or have typed and --- if they are not expanded correctly --- they limit the effectiveness of information retrieval and reading support tools in the digital library. In this paper, I will describe a method for the automatic expansion and disambiguation of these abbreviations.