StringNet as a computational resource for discovering and investigating linguistic constructions

  • Authors:
  • David Wible;Nai-Lung Tsao

  • Affiliations:
  • National Central University, Jhongli City, Taoyuan County, Taiwan;National Central University, Jhongli City, Taoyuan County, Taiwan

  • Venue:
  • EUCCL '10 Proceedings of the NAACL HLT Workshop on Extracting and Using Constructions in Computational Linguistics
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

We describe and motivate the design of a lexico-grammatical knowledgebase called StringNet and illustrate its significance for research into constructional phenomena in English. StringNet consists of a massive archive of what we call hybrid n-grams. Unlike traditional n-grams, hybrid n-grams can consist of any co-occurring combination of POS tags, lexemes, and specific word forms. Further, we detect and represent superordinate and subordinate relations among hybrid n-grams by cross-indexing, allowing the navigation of StringNet through these hierarchies, from specific fixed expressions ("It's the thought that counts") up to their hosting proto-constructions (e.g. the It Cleft construction: "it's the [noun] that [verb]"). StringNet supports discovery of grammatical dependencies (e.g., subject-verb agreement) in non-canonical configurations as well as lexical dependencies (e.g., adjective/noun collocations specific to families of constructions).