Extracting lay paraphrases of specialized expressions from monolingual comparable medical corpora

Authors:
Louise Deléger;Pierre Zweigenbaum
Affiliations:
INSERM, Paris, France;CNRS, LIMSI, Orsay, France
Venue:
BUCC '09 Proceedings of the 2nd Workshop on Building and Using Comparable Corpora: from Parallel to Non-parallel Corpora
Year:
2009

Citing 13
Cited 10

Information fusion for multidocument summarization: paraphrasing and generation

Information fusion for multidocument summarization: paraphrasing and generation
Introduction to the special issue on the web as corpus

Computational Linguistics - Special issue on web as corpus
TextTiling: segmenting text into multi-paragraph subtopic passages

Computational Linguistics
Identifying word translations in non-parallel texts

ACL '95 Proceedings of the 33rd annual meeting on Association for Computational Linguistics
Syntagmatic and paradigmatic representations of term variation

ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
Extracting paraphrases from a parallel corpus

ACL '01 Proceedings of the 39th Annual Meeting on Association for Computational Linguistics
Learning to paraphrase: an unsupervised approach using multiple-sequence alignment

NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
Extracting structural paraphrases from aligned monolingual corpora

PARAPHRASE '03 Proceedings of the second international workshop on Paraphrasing - Volume 16
Paraphrase acquisition for information extraction

PARAPHRASE '03 Proceedings of the second international workshop on Paraphrasing - Volume 16
Paraphrasing with bilingual parallel corpora

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Local Rephrasing Suggestions for Supporing the Work of Writers

GoTAL '08 Proceedings of the 6th international conference on Advances in Natural Language Processing
Mining a lexicon of technical terms and lay equivalents

BioNLP '07 Proceedings of the Workshop on BioNLP 2007: Biological, Translational, and Clinical Language Processing
Aligning needles in a haystack: paraphrase acquisition across the web

IJCNLP'05 Proceedings of the Second international joint conference on Natural Language Processing

For the sake of simplicity: unsupervised extraction of lexical simplifications from Wikipedia

HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
A survey of paraphrasing and textual entailment methods

Journal of Artificial Intelligence Research
Generating phrasal and sentential paraphrases: A survey of data-driven methods

Computational Linguistics
Characterizing patient-friendly "micro-explanations"of medical events

Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Monolingual alignment by edit rate computation on sentential paraphrase pairs

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers - Volume 2
Putting it simply: a context-aware approach to lexical simplification

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers - Volume 2
Parallel sentence generation from comparable corpora for improved SMT

Machine Translation
Web-based validation for contextual targeted paraphrasing

MTTG '11 Proceedings of the Workshop on Monolingual Text-To-Text Generation
Using discourse information for paraphrase extraction

EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
Exploiting discourse information to identify paraphrases

Expert Systems with Applications: An International Journal

Quantified Score

Hi-index	0.01

Visualization

Abstract

Whereas multilingual comparable corpora have been used to identify translations of words or terms, monolingual corpora can help identify paraphrases. The present work addresses paraphrases found between two different discourse types: specialized and lay texts. We therefore built comparable corpora of specialized and lay texts in order to detect equivalent lay and specialized expressions. We identified two devices used in such paraphrases: nominalizations and neo-classical compounds. The results showed that the paraphrases had a good precision and that nominalizations were indeed relevant in the context of studying the differences between specialized and lay language. Neo-classical compounds were less conclusive. This study also demonstrates that simple paraphrase acquisition methods can also work on texts with a rather small degree of similarity, once similar text segments are detected.