Combining C and C++

Posted on Posted in Embedded C/C++

Typically, embedded devices firmware is implemented mainly in C, with some low-level parts written directly in Assembly. As the requirements from the devices grow more complex, the firmware’s code base grows larger and more complex in turn. A way to deal with this complexity is to utilize concepts from the object-oriented world. Although this can be done in pure C (a topic which deserves, and will indeed get, a separate entry), it can be easier to implement in C++.

A selling point of C++ that is especially relevant to the embedded environment is the fact that features you don’t use don’t cost you anything. This means that the engineer can choose the features needed to design the system and “suffer” just their costs, without incurring penalties for features that were excluded from the system. This is in contrast with almost all other object-oriented languages, making them much less suitable for constrained embedded work.

Naturally it is rather difficult to implement all the firmware in C++, since some parts of the embedded code base simply cannot work with the complexities of an object-oriented language or is entirely unsuitable to being represented in object form. These include interrupt handlers, low lever drivers and most of the OS code (either proprietary or from a vendor). Thus, it is necessary to combine both languages – C and C++ – into the firmware’s code base.

Luckily, most toolchains provide both C and C++ compilers. This is especially true of GCC based toolchains, but also some of the commercial offerings also provide a C++ compiler. After compilation, the output of both C and C++ code is the same – a binary object file. These object files will then be handled by a linker which doesn’t care about C or C++ (this is not strictly true, of course).

So, the job of adding C++ code to the embedded project boils down to adding a .cpp file, making sure the toolchain knows to invoke the C++ compiler for it (by type association in the Makefile, the project’s properties or whatever method is provided by the IDE) and building the project.

However, all is not peachy. There are some subtle (and not so subtle) differences between C and C++ that are important to know about and handle correctly. The main difference is in the way C and C++ handle functions – calling conventions and name mangling.

Taking the x86 world as an example, regular Visual Studio C/C++ code typically uses cdecl calling convention, while the Win32 API typically uses stdcall calling convention. Should the calling conventions be mixed up, the compiler should warn about this (hopefully). If it doesn’t, expect to face a corrupted stack since it will either be cleaned up twice or not at all. Parameters will probably arrive in the wrong order as well.

Luckily, embedded platforms typically have more registers than x86, making the calling convention simply “put all arguments into registers”. However, with lots of function arguments or with architectures with limited registers count this might still be a concern and depends on the implementation and defaults of the compiler toolchain. Note that in some calling convention mismatch the compiler will not catch them, revealing the problem only at runtime. Debugging such mismatches is a lot of fun in 2 AM, looking at the watch of the caller and seeing everything in perfect order, only to see the callee’s arguments get all mixed up. Diving into the Assembly is a sure way to see the difference.

Name mangling (or decoration) is a much more common problem when combining C and C++, and luckily almost always caught by the linker. The typical C name mangling is just adding an underscore at the beginning of the function name, or adding a few characters to denote the calling convention. Consider the following functions:

Their mangled names using the Visual Studio compiler are (obtained by asking the assembly listing):
PUBLIC _hydrogen_c
PUBLIC _deuterium_c
PUBLIC _helium_c
PUBLIC _beryllium@8
PUBLIC @boron@8

gcc C name mangling is even simpler – do nothing:
.type hydrogen_c, @function
.type deuterium_c, @function
.type helium_c, @function

C++ name mangling is much more complex, encoding the class name (to differentiate between functions with the same name in different classes), the arguments list (to allow parameter overloading) and other information. Typically, it is encoded in a relatively simple manner, allowing for human readability for those familiar with the format. Consider the same functions in C++ (plus an example of overloading and a class function):

The Visual Studio mangled names are much more complex:
PUBLIC ?hydrogen_cpp@@YAXXZ
PUBLIC ?deuterium_cpp@@YAHXZ
PUBLIC ?helium_cpp@@YAHH@Z
PUBLIC ?helium_cpp@@YAHHH@Z
PUBLIC ?lithium@elements@@QAEHHH@Z

And so are the gcc ones:
.type _Z12hydrogen_cppv, @function
.type _Z13deuterium_cppv, @function
.type _Z10helium_cppi, @function
.type _Z10helium_cppii, @function
.type _ZN8elements7lithiumEii, @function

Since the compiler works only with the mangled names, it is important for the C and C++ compilers (as well as the linker) to understand each other. If the two portions of the code rely on a shared header file (as it should be) to declare the common functions, each compiler (the C compiler and the C++ compiler) will interpret the function declaration differently and will create code that expects two different function names.

Consider a header file that declares carbon:

C code that includes the above header and calls “carbon”:

And C++ code that does the same:

The resulting expected names are, for the C code:
EXTRN _carbon:PROC
And for the C++ code:
EXTRN ?carbon@@YAHHH@Z:PROC
Thus, a linker error is given – there is only one mangled name that is actually defined, while there are two different declaration and usages. The specific error depends on where the function is defined – if it is defined in C code, then the C++ version will be unresolved; if it is defined in C++ code, then the C version will be unresolved. In our case the function is defined in the C++ code, so the following linker error:

For Visual Studio:
LNK2019: unresolved external symbol _carbon referenced in function _hydrogen_c
And for gcc (which unhelpfully de-mangles the name, making it harder to identify the offending function):
For C code: undefined reference to 'carbon'
For C++ code: undefined reference to 'carbon(int, int)'
To resolve this incompatibility, we need to tell the compiler to go to the least common denominator – C style name mangling. This is done using the extern "C" block. This will insure that the compiler will produce names that are mangled in the C style, making them compatible between C and C++ code. Naturally this cannot apply to classes (or any other C++ only feature) and is used for global functions only. We can use extern "C" both for declarations as well as for definitions. We can wrap the declaration in the header file:

If the function’s definition is in a C file, then no additional handling is needed. If we want to define the function in a C++ file, we need to wrap definition as well:

However, there is a problem with using extern "C" – it’s a C++ only directive, and the C compiler will complain about it. The solution is typically either to use an exclusive header file for the shared functions and then use extern "C" around the #include directive in C++ files, or to use an #ifdef for the compiler’s C++ flag to make sure that the extern "C" directive is only used for C++ code.

The first solution would look like this in the C++ code:

The second solution would look like this in the header file:

To sum up this post, I’ll give an example that caused abundant head-scratching in a company I worked for once. Our embedded code base was starting to transition to a mix of C and C++, utilizing the old proprietary OS written in C with new application level code written in C++. The code base started out many decades ago, and had some features that seem to have predated the C89 standard. One that was of interest was the Boolean variable definition – following convention set somewhere in the late eighties to early nineties in that company, global numerical definitions when done using enum’s, not #define’s. Thus, the Boolean type was defined as:

This appeared in a generic header file, which was then included in particularly every .c and .cpp file in the project. One day a strange thing happened when passing a pointer to a structure that contained a BOOL_T as one of the fields between a C function and a C++ function. The structure’s definition was in a shared header file, so there were no duplicate types or anything of that sort. Deeper investigation showed that the memory of the structure was interpreted differently between the C and the C++ code. Adding packing directives did little to help with that.
Consider the following structure definition:

Passing a MYSTERY_STRUCT_T* between functions would result in the values of digit_1, digit_2 and digit_3 to be different between the C compiled code and the C++ compiled code, regardless of the extern “C” used.

The reason behind this is the different way the ancient gcc-based compiler was handling the enum that defined BOOL_T. For C code the compiler used --enum_is_best by default (setting the enum’s type to be the smallest size that will hold all possible values, similar to defining the enum as packed in gcc), while for C++ code the compiler used --enum_is_int by default (similar to the flag by the same name in gcc, forcing enum’s to occupy 4 bytes).

Thus, the MYSTERY_STRUCT_T looks at first glance to be occupying 8 bytes (and so it does in C realm), but in the C++ code it actually occupied 12 bytes and was interpreted as such when passed by-reference to C++ functions. This caused all sorts of interesting values to crop out in the digit’s fields.

Admittedly this happened with a very old, very patchy toolchain. Modern compilers and embedded toolchains hopefully don’t suffer from such inconsistencies in default compiler configuration between C and C++, making interoperability easier.

Leave a Reply

Your email address will not be published. Required fields are marked *