NULEX: an open-license broad coverage lexicon

Authors:
Clifton J. McFate;Kenneth D. Forbus
Affiliations:
Northwestern University, Evanston, IL;Northwestern University, Evanston, IL
Venue:
HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers - Volume 2
Year:
2011

Citing 4
Cited 1

Natural language understanding (2nd ed.)

Natural language understanding (2nd ed.)
Class-Based Construction of a Verb Lexicon

Proceedings of the Seventeenth National Conference on Artificial Intelligence and Twelfth Conference on Innovative Applications of Artificial Intelligence
Building a large annotated corpus of English: the penn treebank

Computational Linguistics - Special issue on using large corpora: II
Expanding verb coverage in Cyc with VerbNet

ACLstudent '10 Proceedings of the ACL 2010 Student Research Workshop

Uby: a large-scale unified lexical-semantic resource based on LMF

EACL '12 Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics

Quantified Score

Hi-index	0.00

Visualization

Abstract

Broad coverage lexicons for the English language have traditionally been handmade. This approach, while accurate, requires too much human labor. Furthermore, resources contain gaps in coverage, contain specific types of information, or are incompatible with other resources. We believe that the state of open-license technology is such that a comprehensive syntactic lexicon can be automatically compiled. This paper describes the creation of such a lexicon, NU-LEX, an open-license feature-based lexicon for general purpose parsing that combines WordNet, VerbNet, and Wiktionary and contains over 100,000 words. NU-LEX was integrated into a bottom up chart parser. We ran the parser through three sets of sentences, 50 sentences total, from the Simple English Wikipedia and compared its performance to the same parser using Comlex. Both parsers performed almost equally with NU-LEX finding all lex-items for 50% of the sentences and Comlex succeeding for 52%. Furthermore, NULEX's shortcomings primarily fell into two categories, suggesting future research directions.