Extracting lay paraphrases of specialized expressions from monolingual comparable medical corpora

  • Authors:
  • Louise Deléger;Pierre Zweigenbaum

  • Affiliations:
  • INSERM, Paris, France;CNRS, LIMSI, Orsay, France

  • Venue:
  • BUCC '09 Proceedings of the 2nd Workshop on Building and Using Comparable Corpora: from Parallel to Non-parallel Corpora
  • Year:
  • 2009

Quantified Score

Hi-index 0.01

Visualization

Abstract

Whereas multilingual comparable corpora have been used to identify translations of words or terms, monolingual corpora can help identify paraphrases. The present work addresses paraphrases found between two different discourse types: specialized and lay texts. We therefore built comparable corpora of specialized and lay texts in order to detect equivalent lay and specialized expressions. We identified two devices used in such paraphrases: nominalizations and neo-classical compounds. The results showed that the paraphrases had a good precision and that nominalizations were indeed relevant in the context of studying the differences between specialized and lay language. Neo-classical compounds were less conclusive. This study also demonstrates that simple paraphrase acquisition methods can also work on texts with a rather small degree of similarity, once similar text segments are detected.