Identifying semantic relations in text

  • Authors:
  • Daniel Gildea;Daniel Jurafsky

  • Affiliations:
  • University of California, Berkeley, and International Computer Science Institute;University of Colorado, Boulder

  • Venue:
  • Exploring artificial intelligence in the new millennium
  • Year:
  • 2003

Quantified Score

Hi-index 0.00

Visualization

Abstract

Over the past decade, natural language processing has been transformed by the adoption of statistical methods. The statistical approach began with shallow problems such as part-of-speech tagging, progressed to syntactic parsing, and is now being applied to higher-level semantic tasks. We present a statistical system for identifying the semantic relationships, or semantic roles, filled by constituents of a sentence. The system operates at the level of frame semantics, which provide us with an intermediate representation between the detail of complete theories of semantics and simpler domain-specific slot-filler representations. Given an input sentence, the system labels constituents with roles such as SPEAKER, MESSAGE, and TOPIC, identifying participants in various types of actions or states.The system is based on statistical classifiers that were trained on roughly 50,000 sentences hand labeled with semantic roles in the FrameNet semantic labeling project. We then parsed each training sentence and extracted various lexical and syntactic features, including the syntactic category of the constituent, its grammatical function, and position in the sentence. These features were combined with knowledge of the target verb, noun, or adjective: as well as information such as the prior probabilities of various combinations of semantic roles. We also used various methods of lexical clustering to generalize across possible fillers of roles. Test sentences were parsed, annotated with these features, and then passed through the classifiers.Our system achieves 80% accuracy in identifying the semantic role of presegmented constituents. At the harder task of simultaneously segmenting constituents and identifying their semantic role, the system achieved 65% precision and 61% recall.