Extracting paraphrases of technical terms from noisy parallel software corpora

Authors:
Xiaoyin Wang;David Lo;Jing Jiang;Lu Zhang;Hong Mei
Affiliations:
Singapore Management University, Singapore and Peking University, Beijing, China;Singapore Management University, Singapore;Singapore Management University, Singapore;Peking University, Beijing, China;Peking University, Beijing, China
Venue:
ACLShort '09 Proceedings of the ACL-IJCNLP 2009 Conference Short Papers
Year:
2009

Citing 7
Cited 6

Automatic retrieval and clustering of similar words

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 2
Extracting paraphrases from a parallel corpus

ACL '01 Proceedings of the 39th Annual Meeting on Association for Computational Linguistics
Extracting structural paraphrases from aligned monolingual corpora

PARAPHRASE '03 Proceedings of the second international workshop on Paraphrasing - Volume 16
Paraphrasing with bilingual parallel corpora

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
/*icomment: bugs or bad comments?*/

Proceedings of twenty-first ACM SIGOPS symposium on Operating systems principles
An approach to detecting duplicate bug reports using natural language and execution information

Proceedings of the 30th international conference on Software engineering
Identifying Word Relations in Software: A Comparative Study of Semantic Similarity Tools

ICPC '08 Proceedings of the 2008 The 16th IEEE International Conference on Program Comprehension

A discriminative model approach for accurate duplicate bug report retrieval

Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering - Volume 1
A corpus-based method for extracting paraphrases of emotion terms

CAAGET '10 Proceedings of the NAACL HLT 2010 Workshop on Computational Approaches to Analysis and Generation of Emotion in Text
A survey of paraphrasing and textual entailment methods

Journal of Artificial Intelligence Research
Finding relevant answers in software forums

ASE '11 Proceedings of the 2011 26th IEEE/ACM International Conference on Automated Software Engineering
Identifying Linux bug fixing patches

Proceedings of the 34th International Conference on Software Engineering
Terminological paraphrase extraction from scientific literature based on predicate argument tuples

Journal of Information Science

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we study the problem of extracting technical paraphrases from a parallel software corpus, namely, a collection of duplicate bug reports. Paraphrase acquisition is a fundamental task in the emerging area of text mining for software engineering. Existing paraphrase extraction methods are not entirely suitable here due to the noisy nature of bug reports. We propose a number of techniques to address the noisy data problem. The empirical evaluation shows that our method significantly improves an existing method by up to 58%.