A block-structured model for source code retrieval

  • Authors:
  • Sheng-Kuei Hsu;Shi-Jen Lin

  • Affiliations:
  • National Central University, Jhongli, Taiwan, ROC and Nanya Institute of Technology, Taiwan, ROC;Nanya Institute of Technology, Jhongli, Taiwan, ROC

  • Venue:
  • ACIIDS'11 Proceedings of the Third international conference on Intelligent information and database systems - Volume Part II
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

The large amounts of software source code projects available on the Internet or within companies are creating new information retrieval challenges. Present-day source code search engines such as Google Code Search tend to treat source code as pure text, as they do with Web pages. However, source code files differ from Web pages or pure text files in that each file must follow a set of rules called syntax, and a source file can be seen as a structured document. Each file contains elements to complete a task. In this paper, we parse each source code file into elements called blocks. They include a non-leaf block and a leaf block for further indexing and ranking. These leaf blocks can be categorized into code-data and meta-data blocks that possess different stemming and stop-word filtering processes used in building the source code index. Finally, to provide a flexible code search scheme, we also propose a blockspecified query scheme. Experimental results indicate that our approach provides a more flexible code search mechanism that results in a higher number of relevant items.