A new generation of textual corpora: mining corpora from very large collections
Proceedings of the 7th ACM/IEEE-CS joint conference on Digital libraries
When printed hypertexts go digital: information extraction from the parsing of indices
Proceedings of the 20th ACM conference on Hypertext and hypermedia
Text retrieval from early printed books
Proceedings of The Third Workshop on Analytics for Noisy Unstructured Text Data
Hi-index | 0.00 |
This paper is related to automatic indexing and reformulation of ancient dictionaries. The objective is to make easy access to ancient printed documents from XVI to XIX century for a diversified public (historians, scientists, librarians, etc.). Since the facsimile mode is insufficient, the aim is to look further for the use of the indexing based on the formal structure representative of some contents in order to optimize their exploration. Starting from a first indexing experiment operated on more recent documents, the TLF ("Trésor de la Langue Française": Treasure of the French Language) in the ATILF laboratory (Nancy, France), we extend the indexing technique to automatic reformulation and reedition of ancient dictionaries. However, face to the problem extent, we limited our investigations to a very specific collections of the ATILF laboratory, the "Trévoux" dictionary (defined later).