WordNet: a lexical database for English
Communications of the ACM
Discovering word senses from text
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Automatic word sense discrimination
Computational Linguistics - Special issue on word sense disambiguation
Automatic retrieval and clustering of similar words
COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 2
Automatic acquisition of hyponyms from large text corpora
COLING '92 Proceedings of the 14th conference on Computational linguistics - Volume 2
Discovering corpus-specific word senses
EACL '03 Proceedings of the tenth conference on European chapter of the Association for Computational Linguistics - Volume 2
ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Discovering word senses from a network of lexical cooccurrences
COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Retrieval in text collections with historic spelling using linguistic and spelling variants
Proceedings of the 7th ACM/IEEE-CS joint conference on Digital libraries
Using information content to evaluate semantic similarity in a taxonomy
IJCAI'95 Proceedings of the 14th international joint conference on Artificial intelligence - Volume 1
Studying how the past is remembered: towards computational history through large scale text mining
Proceedings of the 20th ACM international conference on Information and knowledge management
Which words do you remember? temporal properties of language use in digital archives
TPDL'12 Proceedings of the Second international conference on Theory and Practice of Digital Libraries
Towards mobile language evolution exploitation
Multimedia Tools and Applications
Hi-index | 0.00 |
Word sense discrimination is the first, important step towards automatic detection of language evolution within large, historic document collections. By comparing the found word senses over time, we can reveal and use important information that will improve understanding and accessibility of a digital archive. Algorithms for word sense discrimination have been developed while keeping today's language in mind and have thus been evaluated on well selected, modern datasets. The quality of the word senses found in the discrimination step has a large impact on the detection of language evolution. Therefore, as a first step, we verify that word sense discrimination can successfully be applied to digitized historic documents and that the results correctly correspond to word senses. Because accessibility of digitized historic collections is influenced also by the quality of the optical character recognition (OCR), as a second step we investigate the effects of OCR errors on word sense discrimination results. All evaluations in this paper are performed on The Times Archive, a collection of newspaper articles from 1785 - 1985.