Towards automatically generating summary comments for Java methods
Proceedings of the IEEE/ACM international conference on Automated software engineering
An exploratory study of identifier renamings
Proceedings of the 8th Working Conference on Mining Software Repositories
Proceedings of the 8th Working Conference on Mining Software Repositories
Improving identifier informativeness using part of speech information
Proceedings of the 8th Working Conference on Mining Software Repositories
Automatically detecting and describing high level actions within methods
Proceedings of the 33rd International Conference on Software Engineering
Applying a dynamic threshold to improve cluster detection of LSI
Science of Computer Programming
Improving the tokenisation of identifier names
Proceedings of the 25th European conference on Object-oriented programming
Mining Java class identifier naming conventions
Proceedings of the 34th International Conference on Software Engineering
Concept location using formal concept analysis and information retrieval
ACM Transactions on Software Engineering and Methodology (TOSEM)
Source code identifier splitting using Yahoo image and web search engine
Proceedings of the First International Workshop on Software Mining
Is text search an effective approach for fault localization: a practitioners perspective
Proceedings of the 3rd annual conference on Systems, programming, and applications: software for humanity
Risk chain prediction metrics for predicting fault proneness in object oriented systems
Proceedings of the Second International Conference on Computational Science, Engineering and Information Technology
Concept-based failure clustering
Proceedings of the ACM SIGSOFT 20th International Symposium on the Foundations of Software Engineering
Identification of generalization refactoring opportunities
Automated Software Engineering
Normalizing source code vocabulary to support program comprehension and software quality
Proceedings of the 2013 International Conference on Software Engineering
An ontology toolkit for problem domain concept location in program comprehension
Proceedings of the 2013 International Conference on Software Engineering
Automatically mining software-based, semantically-similar words from comment-code mappings
Proceedings of the 10th Working Conference on Mining Software Repositories
A dataset for evaluating identifier splitters
Proceedings of the 10th Working Conference on Mining Software Repositories
Portfolio: Searching for relevant functions and their usages in millions of lines of code
ACM Transactions on Software Engineering and Methodology (TOSEM) - Testing, debugging, and error handling, formal methods, lifecycle concerns, evolution and maintenance
Enhancing software artefact traceability recovery processes with link count information
Information and Software Technology
Hi-index | 0.00 |
Automated software engineering tools (e.g., program search, concern location, code reuse, quality assessment, etc.) increasingly rely on natural language information from comments and identifiers in code. The first step in analyzing words from identifiers requires splitting identifiers into their constituent words. Unlike natural languages, where space and punctuation are used to delineate words, identifiers cannot contain spaces. One common way to split identifiers is to follow programming language naming conventions. For example, Java programmers often use camel case, where words are delineated by uppercase letters or non-alphabetic characters. However, programmers also create identifiers by concatenating sequences of words together with no discernible delineation, which poses challenges to automatic identifier splitting. In this paper, we present an algorithm to automatically split identifiers into sequences of words by mining word frequencies in source code. With these word frequencies, our identifier splitter uses a scoring technique to automatically select the most appropriate partitioning for an identifier. In an evaluation of over 8000 identifiers from open source Java programs, our Samurai approach outperforms the existing state of the art techniques.