An Information Retrieval Approach for Automatically Constructing Software Libraries
IEEE Transactions on Software Engineering
An Empirical Study of Representation Methods for Reusable Software Components
IEEE Transactions on Software Engineering
Using structural context to recommend source code examples
Proceedings of the 27th international conference on Software engineering
Sourcerer: a search engine for open source code supporting structure-based search
Companion to the 21st ACM SIGPLAN symposium on Object-oriented programming systems, languages, and applications
Assieme: finding and leveraging implicit references in a web search interface for programmers
Proceedings of the 20th annual ACM symposium on User interface software and technology
CodeGenie: using test-cases to search and reuse source code
Proceedings of the twenty-second IEEE/ACM international conference on Automated software engineering
Sourcerer: mining and searching internet-scale software repositories
Data Mining and Knowledge Discovery
Hi-index | 0.00 |
The large amounts of software source code projects available on the Internet or within companies are creating new information retrieval challenges. Present-day source code search engines such as Google Code Search tend to treat source code as pure text, as they do with Web pages. However, source code files differ from Web pages or pure text files in that each file must follow a set of rules called syntax, and a source file can be seen as a structured document. Each file contains elements to complete a task. In this paper, we parse each source code file into elements called blocks. They include a non-leaf block and a leaf block for further indexing and ranking. These leaf blocks can be categorized into code-data and meta-data blocks that possess different stemming and stop-word filtering processes used in building the source code index. Finally, to provide a flexible code search scheme, we also propose a blockspecified query scheme. Experimental results indicate that our approach provides a more flexible code search mechanism that results in a higher number of relevant items.