Parsing C/C++ Code without Pre-processing

Authors:
Yoann Padioleau
Affiliations:
University of Illinois, Urbana Champaign,
Venue:
CC '09 Proceedings of the 18th International Conference on Compiler Construction: Held as Part of the Joint European Conferences on Theory and Practice of Software, ETAPS 2009
Year:
2009

Citing 17
Cited 10

Views: a way for pattern matching to cohabit with data abstraction

POPL '87 Proceedings of the 14th ACM SIGACT-SIGPLAN symposium on Principles of programming languages
The annotated C++ reference manual

The annotated C++ reference manual
The C programming language

The C programming language
The design and evolution of C++

The design and evolution of C++
Refactoring: improving the design of existing code

Refactoring: improving the design of existing code
A framework for preprocessor-aware C source code analyses

Software—Practice & Experience
An Empirical Analysis of C Preprocessor Use

IEEE Transactions on Software Engineering
CIL: Intermediate Language and Tools for Analysis and Transformation of C Programs

CC '02 Proceedings of the 11th International Conference on Compiler Construction
Handling Preprocessor-Conditioned Declarations

SCAM '02 Proceedings of the Second IEEE International Workshop on Source Code Analysis and Manipulation
Refactoring Browser with Preprocessor

CSMR '03 Proceedings of the Seventh European Conference on Software Maintenance and Reengineering
Global Analysis and Transformations in Preprocessed Languages

IEEE Transactions on Software Engineering
DMS®: Program Transformations for Practical Scalable Software Evolution

Proceedings of the 26th International Conference on Software Engineering
ASTEC: a new approach to refactoring C

Proceedings of the 10th European software engineering conference held jointly with 13th ACM SIGSOFT international symposium on Foundations of software engineering
Analyzing Multiple Configurations of a C Program

ICSM '05 Proceedings of the 21st IEEE International Conference on Software Maintenance
Documenting and automating collateral evolutions in linux device drivers

Proceedings of the 3rd ACM SIGOPS/EuroSys European Conference on Computer Systems 2008
Listening to programmers Taxonomies and characteristics of comments in operating system code

ICSE '09 Proceedings of the 31st International Conference on Software Engineering
An efficient context-free parsing algorithm for natural languages

IJCAI'85 Proceedings of the 9th international joint conference on Artificial intelligence - Volume 2

An automated approach for finding variable-constant pairing bugs

Proceedings of the IEEE/ACM international conference on Automated software engineering
TypeChef: toward type checking #ifdef variability in C

FOSD '10 Proceedings of the 2nd International Workshop on Feature-Oriented Software Development
Partial preprocessing C code for variability analysis

Proceedings of the 5th Workshop on Variability Modeling of Software-Intensive Systems
Analyzing the discipline of preprocessor annotations in 30 million lines of C code

Proceedings of the tenth international conference on Aspect-oriented software development
Featherweight TEX and parser correctness

SLE'10 Proceedings of the Third international conference on Software language engineering
An approach to improving the structure of error-handling code in the linux kernel

Proceedings of the 2011 SIGPLAN/SIGBED conference on Languages, compilers and tools for embedded systems
Variability-aware parsing in the presence of lexical macros and conditional compilation

Proceedings of the 2011 ACM international conference on Object oriented programming systems languages and applications
SuperC: parsing all of C by taming the preprocessor

Proceedings of the 33rd ACM SIGPLAN conference on Programming Language Design and Implementation
Validating LR(1) parsers

ESOP'12 Proceedings of the 21st European conference on Programming Languages and Systems
Investigating preprocessor-based syntax errors

Proceedings of the 12th international conference on Generative programming: concepts & experiences

Quantified Score

Hi-index	0.00

Visualization

Abstract

It is difficult to develop style-preserving source-to-source transformation engines for C and C++. The main reason is not the complexity of those languages, but the use of the C pre-processor (cpp ), especially ifdefs and macros. This has for example hindered the development of refactoring tools for C and C++. In this paper we propose to combine multiple techniques and heuristics to parse C/C++ source files as-is, while still having only a few modifications to the original grammars of C and C++. We rely on the fact that in most C and C++ software, programmers follow a limited number of conventions on the use of cpp which makes it possible to disambiguate different situations by just looking at the context, names, or indentation of cpp constructs. We have implemented a parser, Yacfe, based on these techniques and evaluated it on 16 large open source projects. Yacfe can on average parse 96% of those projects correctly. As a side effect, we also found mistakes in code that was not compiled because it was protected by particular ifdefs, but that was still analyzed by Yacfe. Using Yacfe on new projects may require adapting some of our techniques. We found that as conventions and idioms are shared by many projects, the adaptation time is on average less than 2 hours for a new project.