Parsing C/C++ Code without Pre-processing

  • Authors:
  • Yoann Padioleau

  • Affiliations:
  • University of Illinois, Urbana Champaign,

  • Venue:
  • CC '09 Proceedings of the 18th International Conference on Compiler Construction: Held as Part of the Joint European Conferences on Theory and Practice of Software, ETAPS 2009
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

It is difficult to develop style-preserving source-to-source transformation engines for C and C++. The main reason is not the complexity of those languages, but the use of the C pre-processor (cpp ), especially ifdefs and macros. This has for example hindered the development of refactoring tools for C and C++. In this paper we propose to combine multiple techniques and heuristics to parse C/C++ source files as-is, while still having only a few modifications to the original grammars of C and C++. We rely on the fact that in most C and C++ software, programmers follow a limited number of conventions on the use of cpp which makes it possible to disambiguate different situations by just looking at the context, names, or indentation of cpp constructs. We have implemented a parser, Yacfe, based on these techniques and evaluated it on 16 large open source projects. Yacfe can on average parse 96% of those projects correctly. As a side effect, we also found mistakes in code that was not compiled because it was protected by particular ifdefs, but that was still analyzed by Yacfe. Using Yacfe on new projects may require adapting some of our techniques. We found that as conventions and idioms are shared by many projects, the adaptation time is on average less than 2 hours for a new project.