Extracting sense trees from the Romanian thesaurus by sense segmentation & dependency parsing

  • Authors:
  • Neculai Curteanu;Alex Moruz;Diana Trandabăţ

  • Affiliations:
  • Romanian Academy;"Al. I. Cuza" University, Iaşi;"Al. I. Cuza" University, Iaşi

  • Venue:
  • COGALEX '08 Proceedings of the workshop on Cognitive Aspects of the Lexicon
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper aims to introduce a new parsing strategy for large dictionary (thesauri) parsing, called Dictionary Sense Segmentation & Dependency (DSSD), devoted to obtain the sense tree, i.e. the hierarchy of the defined meanings, for a dictionary entry. The real novelty of the proposed approach is that, contrary to dictionary 'standard' parsing, DSSD looks for and succeeds to separate the two essential processes within a dictionary entry parsing: sense tree construction and sense definition parsing. The key tools to accomplish the task of (autonomous) sense tree building consist in defining the dictionary sense marker classes, establishing a tree-like hierarchy of these classes, and using a proper searching procedure of sense markers within the DSSD parsing algorithm. A similar but more general approach, using the same techniques and data structures for (Romanian) free text parsing is SCD (Segmentation-Cohesion-Dependency) (Curteanu; 1988, 2006), which DSSD is inspired from. A DSSD-based parser is implemented in Java, building currently 91% correct sense trees from DTLR (Dicţionarul Tezaur al Limbii Române -- Romanian Language Thesaurus) entries, with significant resources to improve and enlarge the DTLR lexical semantics analysis.