Foundations of statistical natural language processing
Foundations of statistical natural language processing
Programming by voice, VocalProgramming
Assets '00 Proceedings of the fourth international ACM conference on Assistive technologies
Recovering Traceability Links between Code and Documentation
IEEE Transactions on Software Engineering
Building a large annotated corpus of English: the penn treebank
Computational Linguistics - Special issue on using large corpora: II
Mining Version Histories to Guide Software Changes
Proceedings of the 26th International Conference on Software Engineering
Spoken Language Support for Software Development
VLHCC '04 Proceedings of the 2004 IEEE Symposium on Visual Languages - Human Centric Computing
Jungloid mining: helping to navigate the API jungle
Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation
Automatic generation of suggestions for program investigation
Proceedings of the 10th European software engineering conference held jointly with 13th ACM SIGSOFT international symposium on Foundations of software engineering
DynaMine: finding common error patterns by mining software revision histories
Proceedings of the 10th European software engineering conference held jointly with 13th ACM SIGSOFT international symposium on Foundations of software engineering
Using language clues to discover crosscutting concerns
MACS '05 Proceedings of the 2005 workshop on Modeling and analysis of concerns in software
What's in a Name? A Study of Identifiers
ICPC '06 Proceedings of the 14th IEEE International Conference on Program Comprehension
A voice-activated syntax-directed editor for manually disabled programmers
Proceedings of the 8th international ACM SIGACCESS conference on Computers and accessibility
Proceedings of the 14th ACM SIGSOFT international symposium on Foundations of software engineering
Using natural language program analysis to locate and understand action-oriented concerns
Proceedings of the 6th international conference on Aspect-oriented software development
Proceedings of the the 6th joint meeting of the European software engineering conference and the ACM SIGSOFT symposium on The foundations of software engineering
Crowdsourcing user studies with Mechanical Turk
Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Javert: fully automatic mining of general temporal properties from dynamic traces
Proceedings of the 16th ACM SIGSOFT International Symposium on Foundations of software engineering
Sourcerer: mining and searching internet-scale software repositories
Data Mining and Knowledge Discovery
Merlin: specification inference for explicit information flow problems
Proceedings of the 2009 ACM SIGPLAN conference on Programming language design and implementation
Learning from examples to improve code completion systems
Proceedings of the the 7th joint meeting of the European software engineering conference and the ACM SIGSOFT symposium on The foundations of software engineering
Data Mining for Software Engineering
Computer
Genoa Proceedings of the 23rd European Conference on ECOOP 2009 --- Object-Oriented Programming
Statistical Machine Translation
Statistical Machine Translation
Improving code completion with program history
Automated Software Engineering
Code Completion from Abbreviated Input
ASE '09 Proceedings of the 2009 IEEE/ACM International Conference on Automated Software Engineering
Automatically documenting program changes
Proceedings of the IEEE/ACM international conference on Automated software engineering
Towards automatically generating summary comments for Java methods
Proceedings of the IEEE/ACM international conference on Automated software engineering
A study of the uniqueness of source code
Proceedings of the eighteenth ACM SIGSOFT international symposium on Foundations of software engineering
IDE 2.0: collective intelligence in software development
Proceedings of the FSE/SDP workshop on Future of software engineering research
Code template inference using language models
Proceedings of the 48th Annual Southeast Regional Conference
Improving identifier informativeness using part of speech information
Proceedings of the 8th Working Conference on Mining Software Repositories
Automatically detecting and describing high level actions within methods
Proceedings of the 33rd International Conference on Software Engineering
Generating natural language summaries for crosscutting source code concerns
ICSM '11 Proceedings of the 2011 27th IEEE International Conference on Software Maintenance
An evaluation of the strategies of sorting, filtering, and grouping API methods for Code Completion
ICSM '11 Proceedings of the 2011 27th IEEE International Conference on Software Maintenance
Proceedings of the 27th IEEE/ACM International Conference on Automated Software Engineering
Proceedings of the 2013 International Conference on Software Engineering
Mining source code repositories at massive scale using language modeling
Proceedings of the 10th Working Conference on Mining Software Repositories
A statistical semantic language model for source code
Proceedings of the 2013 9th Joint Meeting on Foundations of Software Engineering
Lexical statistical machine translation for language migration
Proceedings of the 2013 9th Joint Meeting on Foundations of Software Engineering
Detecting API documentation errors
Proceedings of the 2013 ACM SIGPLAN international conference on Object oriented programming systems languages & applications
Structured statistical syntax tree prediction
Proceedings of the 2013 companion publication for conference on Systems, programming, & applications: software for humanity
Hi-index | 0.00 |
Natural languages like English are rich, complex, and powerful. The highly creative and graceful use of languages like English and Tamil, by masters like Shakespeare and Avvaiyar, can certainly delight and inspire. But in practice, given cognitive constraints and the exigencies of daily life, most human utterances are far simpler and much more repetitive and predictable. In fact, these utterances can be very usefully modeled using modern statistical methods. This fact has led to the phenomenal success of statistical approaches to speech recognition, natural language translation, question-answering, and text mining and comprehension. We begin with the conjecture that most software is also natural, in the sense that it is created by humans at work, with all the attendant constraints and limitations---and thus, like natural language, it is also likely to be repetitive and predictable. We then proceed to ask whether a) code can be usefully modeled by statistical language models and b) such models can be leveraged to support software engineers. Using the widely adopted n-gram model, we provide empirical evidence supportive of a positive answer to both these questions. We show that code is also very repetitive, and in fact even more so than natural languages. As an example use of the model, we have developed a simple code completion engine for Java that, despite its simplicity, already improves Eclipse's completion capability. We conclude the paper by laying out a vision for future research in this area.