Lexical statistical machine translation for language migration

Authors:
Anh Tuan Nguyen;Tung Thanh Nguyen;Tien N. Nguyen
Affiliations:
Iowa State University, USA;Iowa State University, USA;Iowa State University, USA
Venue:
Proceedings of the 2013 9th Joint Meeting on Foundations of Software Engineering
Year:
2013

Citing 12
Cited 0

Program Translation Via Abstraction and Reimplementation

IEEE Transactions on Software Engineering
SPiCE: A System for Translating Smalltalk Programs Into a C Environment

IEEE Transactions on Software Engineering
Automated Cobol to Java Recycling

CSMR '03 Proceedings of the Seventh European Conference on Software Maintenance and Reengineering
BLEU: a method for automatic evaluation of machine translation

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Recommending adaptive changes for framework evolution

Proceedings of the 30th international conference on Software engineering
Statistical Machine Translation

Statistical Machine Translation
Mining API mapping for language migration

Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering - Volume 1
Using twinning to adapt programs to alternative APIs

Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering - Volume 1
AURA: a hybrid approach to identify framework evolution

Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering - Volume 1
Phrasal: a toolkit for statistical machine translation with facilities for extraction and incorporation of arbitrary model features

HLT-DEMO '10 Proceedings of the NAACL HLT 2010 Demonstration Session
A history-based matching approach to identification of framework evolution

Proceedings of the 34th International Conference on Software Engineering
On the naturalness of software

Proceedings of the 34th International Conference on Software Engineering

Quantified Score

Hi-index	0.00

Visualization

Abstract

Prior research has shown that source code also exhibits naturalness, i.e. it is written by humans and is likely to be repetitive. The researchers also showed that the n-gram language model is useful in predicting the next token in a source file given a large corpus of existing source code. In this paper, we investigate how well statistical machine translation (SMT) models for natural languages could help in migrating source code from one programming language to another. We treat source code as a sequence of lexical tokens and apply a phrase-based SMT model on the lexemes of those tokens. Our empirical evaluation on migrating two Java projects into C# showed that lexical, phrase-based SMT could achieve high lexical translation accuracy (BLEU from 81.3-82.6%). Users would have to manually edit only 11.9-15.8% of the total number of tokens in the resulting code to correct it. However, a high percentage of total translation methods (49.5-58.6%) is syntactically incorrect. Therefore, our result calls for a more program-oriented SMT model that is capable of better integrating the syntactic and semantic information of a program to support language migration.