From the old to the new: intergrating hypertext into traditional scholarship
HYPERTEXT '87 Proceedings of the ACM conference on Hypertext
Communications of the ACM
Class-based probability estimation using a semantic hierarchy
Computational Linguistics
Accurate methods for the statistics of surprise and coincidence
Computational Linguistics - Special issue on using large corpora: I
Building a large annotated corpus of English: the penn treebank
Computational Linguistics - Special issue on using large corpora: II
Tagging inflective languages: prediction of morphological categories for a rich, structured tagset
COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
Word association norms, mutual information, and lexicography
ACL '89 Proceedings of the 27th annual meeting on Association for Computational Linguistics
CLAWS4: the tagging of the British National Corpus
COLING '94 Proceedings of the 15th conference on Computational linguistics - Volume 1
Inducing a semantically annotated lexicon via EM-based clustering
ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
A statistical parser for Czech
ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
Pseudo-projective dependency parsing
ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Non-projective dependency parsing using spanning tree algorithms
HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Beyond digital incunabula: modeling the next generation of digital libraries
ECDL'06 Proceedings of the 10th European conference on Research and Advanced Technology for Digital Libraries
The development of the Index Thomisticus Treebank valency lexicon
LaTeCH-SHELT&R '09 Proceedings of the EACL 2009 Workshop on Language Technology and Resources for Cultural Heritage, Social Sciences, Humanities, and Education
Automatic selectional preference acquisition for Latin verbs
ACLstudent '10 Proceedings of the ACL 2010 Student Research Workshop
A discriminative model for joint morphological disambiguation and dependency parsing
HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Extracting two thousand years of latin from a million book library
Journal on Computing and Cultural Heritage (JOCCH)
Hi-index | 0.00 |
We describe here in detail our work toward creating a dynamic lexicon from the texts in a large digital library. By leveraging a small structured knowledge source (a 30,457 word treebank), we are able to extract selectional preferences for words from a 3.5 million word Latin corpus. This is promising news for low-resource languages and digital collections seeking to leverage a small human investment into much larger gain. The library architecture in which this work is developed allows us to query customized subcorpora to report on lexical usage by author, genre or era and allows us to continually update the lexicon as new texts are added to the collection.