Query refinement through lexical clustering of scientific textual databases

  • Authors:
  • Eric SanJuan

  • Affiliations:
  • LITA Université Paul Verlaine & URI-INIST/CNRS, Metz, France

  • Venue:
  • NLDB'05 Proceedings of the 10th international conference on Natural Language Processing and Information Systems
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

TermWatch system automatically extracts multi word terms from scientific texts based on morphological analysis and relates them through linguistic variations. The resulting terminological network is clustered based on a 3-level hierarchical graph algorithm and mapped onto a 2D space. Clusters are automatically labeled based on variation activity. After a precise review of the methodology, this paper evaluates in the context of querying a scientific textual database, the overlap of terms and cluster labels with the keywords selected by human indexers as well as the set of possible queries based on the clustering output. The results show that linguistic variation paradigm is a robust way of automatically extracting and structuring a user comprehensive terminological resource for query refinement.