A block-structured model for source code retrieval

Authors:
Sheng-Kuei Hsu;Shi-Jen Lin
Affiliations:
National Central University, Jhongli, Taiwan, ROC and Nanya Institute of Technology, Taiwan, ROC;Nanya Institute of Technology, Jhongli, Taiwan, ROC
Venue:
ACIIDS'11 Proceedings of the Third international conference on Intelligent information and database systems - Volume Part II
Year:
2011

Citing 7
Cited 0

An Information Retrieval Approach for Automatically Constructing Software Libraries

IEEE Transactions on Software Engineering
An Empirical Study of Representation Methods for Reusable Software Components

IEEE Transactions on Software Engineering
Using structural context to recommend source code examples

Proceedings of the 27th international conference on Software engineering
Sourcerer: a search engine for open source code supporting structure-based search

Companion to the 21st ACM SIGPLAN symposium on Object-oriented programming systems, languages, and applications
Assieme: finding and leveraging implicit references in a web search interface for programmers

Proceedings of the 20th annual ACM symposium on User interface software and technology
CodeGenie: using test-cases to search and reuse source code

Proceedings of the twenty-second IEEE/ACM international conference on Automated software engineering
Sourcerer: mining and searching internet-scale software repositories

Data Mining and Knowledge Discovery

Quantified Score

Hi-index	0.00

Visualization

Abstract

The large amounts of software source code projects available on the Internet or within companies are creating new information retrieval challenges. Present-day source code search engines such as Google Code Search tend to treat source code as pure text, as they do with Web pages. However, source code files differ from Web pages or pure text files in that each file must follow a set of rules called syntax, and a source file can be seen as a structured document. Each file contains elements to complete a task. In this paper, we parse each source code file into elements called blocks. They include a non-leaf block and a leaf block for further indexing and ranking. These leaf blocks can be categorized into code-data and meta-data blocks that possess different stemming and stop-word filtering processes used in building the source code index. Finally, to provide a flexible code search scheme, we also propose a blockspecified query scheme. Experimental results indicate that our approach provides a more flexible code search mechanism that results in a higher number of relevant items.