Building a dynamic lexicon from a digital library

Authors:
David Bamman;Gregory Crane
Affiliations:
Tufts University, Medford, MA, USA;Tufts University, Medford, MA, USA
Venue:
Proceedings of the 8th ACM/IEEE-CS joint conference on Digital libraries
Year:
2008

Citing 13
Cited 4

From the old to the new: intergrating hypertext into traditional scholarship

HYPERTEXT '87 Proceedings of the ACM conference on Hypertext
Drudgery and deep thought

Communications of the ACM
Class-based probability estimation using a semantic hierarchy

Computational Linguistics
Accurate methods for the statistics of surprise and coincidence

Computational Linguistics - Special issue on using large corpora: I
Building a large annotated corpus of English: the penn treebank

Computational Linguistics - Special issue on using large corpora: II
Tagging inflective languages: prediction of morphological categories for a rich, structured tagset

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
Word association norms, mutual information, and lexicography

ACL '89 Proceedings of the 27th annual meeting on Association for Computational Linguistics
CLAWS4: the tagging of the British National Corpus

COLING '94 Proceedings of the 15th conference on Computational linguistics - Volume 1
Inducing a semantically annotated lexicon via EM-based clustering

ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
A statistical parser for Czech

ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
Pseudo-projective dependency parsing

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Non-projective dependency parsing using spanning tree algorithms

HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Beyond digital incunabula: modeling the next generation of digital libraries

ECDL'06 Proceedings of the 10th European conference on Research and Advanced Technology for Digital Libraries

The development of the Index Thomisticus Treebank valency lexicon

LaTeCH-SHELT&R '09 Proceedings of the EACL 2009 Workshop on Language Technology and Resources for Cultural Heritage, Social Sciences, Humanities, and Education
Automatic selectional preference acquisition for Latin verbs

ACLstudent '10 Proceedings of the ACL 2010 Student Research Workshop
A discriminative model for joint morphological disambiguation and dependency parsing

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Extracting two thousand years of latin from a million book library

Journal on Computing and Cultural Heritage (JOCCH)

Quantified Score

Hi-index	0.00

Visualization

Abstract

We describe here in detail our work toward creating a dynamic lexicon from the texts in a large digital library. By leveraging a small structured knowledge source (a 30,457 word treebank), we are able to extract selectional preferences for words from a 3.5 million word Latin corpus. This is promising news for low-resource languages and digital collections seeking to leverage a small human investment into much larger gain. The library architecture in which this work is developed allows us to query customized subcorpora to report on lexical usage by author, genre or era and allows us to continually update the lexicon as new texts are added to the collection.