19 Feb, 2013

Episode Two: Those Whose Names shall not be Spoken

A naming convention is a set of rules for choosing the character sequence to be used for identifiers which denote variables, types and functions etc. in source code and documentation. They can be a source of enormous controversy, the kind of controversy that start wars —flame wars that is—. C++ specifies a naming convention for reserved identifiers, names that can be safely used by the implementation and standard library, and which result in undefined behavior if used in a program. Such naming convention is standard, so there shouldn't be any controversy, and any standard conformant code has to make sure it does not declare any one of those reserved names.

Reserved names

Reserved global names

[2.11/3] [...] some identifiers are reserved for use by C++ implementations and standard libraries (17.6.4.3.2) and shall not be used otherwise; no diagnostic is required.

[17.6.4.3.2/1] Certain sets of names and function signatures are always reserved to the implementation:

  • Each name that contains a double underscore __ or begins with an underscore followed by an uppercase letter (2.12) is reserved to the implementation for any use.

  • Each name that begins with an underscore is reserved to the implementation for use as a name in the global namespace.

Names that contain a double underscore in its name are reserved. They are typically used by the implementation to introduce new keywords or intrinsics. The standard even makes use of these reserved names to show the equivalence between range-based for statements and regular for statements, so as to enforce the fact that the names introduced do not conflict with the names of anything defined by the user. In [6.5.4/1], it shows that a statement of the form:

for ( for-range-declaration : range-init ) statement

is equivalent to:

{
    auto && __range = range-init;
    for ( auto __begin = begin-expr,
               __end = end-expr;
        __begin != __end;
        ++__begin ) {
        for-range-declaration = *__begin;
        statement
    }
}

This restriction applies not only to names beginning with a double underscore, but to names with a double underscore anywhere within its name. A double underscore is two consecutive underscores _, and not two underscores anywhere in a name as it is sometimes believed.

Names that begin with an underscore followed by an uppercase letter are reserved. These names are often seen in the implementation of standard libraries. Since no legal program can use them, a standard library implementor can safely use those knowing that if they ever break user code the party at fault is the user and not them. They can also be sporadically seen in other places, like the awkward and long forgotten _Bool type.

If you ever worked with the Windows API or MFC, you may have seen and even been encouraged to use the _T or _TEXT macro. By the previous rule, _T is a name reserved to the implementation for any use and the Windows API is at fault —even though they technically are the implementation for which those names are reserved—. wxWidgets has also unfortunately inherited said illegally named macro as a convenience, which just forwards to the preferred wxT macro. And just for completeness, _T happens to be a reserved identifier for any use in C as well, making it an invalid macro name in both languages.

The restriction on names beginning with an underscore in the global namespace follows the restrictions on reserved identifiers specified by C. This means that you can define types, objects and functions with names beginning with an underscore as long as they are in a namespace other than the global one. Macros, on the other hand, don't play by namespace rules so they can effectively be considered to be declared in the global namespace, and the std namespace, and any and all other namespaces as well. By this rule, _T is once again an invalid macro name.

Reserved user-defined literal names

[17.6.4.3.5/1]. Literal suffix identifiers that do not start with an underscore are reserved for future standardization.

Literal suffix identifiers must start with an underscore. Some names not starting with an underscore are already in use today, those are the integral suffixes u, U, l, L, ll, LL and some combinations of them, and the floating point suffixes f, F, l, L. Others have potential future implementations, like b and/or B for binary literals implemented either as integral suffixes or standard user-defined literals, i for std::complex literals with zero as its real component, or even s for literal std::chrono::seconds.

Remember that names starting with an underscore followed by a capital letter are reserved for any use and that user-defined literal names must begin with an underscore. Given those restrictions, is _Km a valid name? As it turns out, the standard currently requires there be whitespace before the name in the declaration of such user-defined literal suffix:

[13.5.8] literal-operator-id:
operator "" identifier

[13.5.8/1] The identifier in a literal-operator-id is called a literal suffix identifier .

[13.5.8/8] [ Example:
[...]

float operator ""E(const char*); // error: ""E (with no intervening space) is a single token

[...] —end example ]

The intervening space makes _Km a separate preprocessor-token and as such it will be expanded if it happens to name a macro, which is one of the allowed uses of that reserved name for an implementation. That renders _Km and any other user-defined literal suffix that starts with an underscore followed by a capital letter a potentially invalid one, as it will be impossible to even declare them when they happens to collide with an implementation macro.

Hopefully, this is likely to change soon. The proposed resolution is to drop the requirement for intervening space, allowing the declaration to consist of a single preprocessor-token thus avoiding macro expansion.

As of the C++14 draft standard this has been resolved:

[13.5.8] literal-operator-id:
operator string-literal identifier
operator user-defined-string-literal

[13.5.8/1] The string-literal or user-defined-string-literal in a literal-operator-id shall have no encoding-prefix and shall contain no characters other than the implicit terminating ’\0’. The ud-suffix of the user-defined-string-literal or the identifier in a literal-operator-id is called a literal suffix identifier. [ Note: some literal suffix identifiers are reserved for future standardization; see 17.6.4.3.5. —end note ]

[13.5.8/8] [ Example:
[...]

float operator ""E(const char*); // OK

[...] —end example ]

Reserved macro names

[17.6.4.3.1/2] A translation unit shall not #define or #undef names lexically identical to keywords, to the identifiers listed in Table 3, or to the attribute-tokens described in 7.6.

Macros cannot be named after keywords, which should be no surprise. Although technically not keywords, this also applies to the alternative operator representations and, and_eq, bitand, bitor, compl, not, not_eq, or, or_eq, xor, xor_eq.

Table 3 refers to the identifiers with special meaning —override and final— which can be think of as positional keywords, they only act as keywords in specific contexts. It is important to note that the use of these names is reserved in macros but its use is allowed elsewhere, that is in fact the reason they are positional keywords and not full fledged keywords after all.

#define final 0 // this is invalid
int const final = 0; // this is ok

The attribute-tokens are noreturn and carries_dependency.

[17.6.4.3.1/1] A translation unit that includes a standard library header shall not #define or #undef names declared in any standard library header.

Names declared in standard library headers include offsetof and NULL, but also string, swap and cout. More interestingly, the set of those names will likely increase whenever a new standard is issued. For instance, function and thread names where valid macro names until C++11; although the rules where also made stricter to apply to any standard library header, not just those included within the translation unit. Any valid identifier can potentially be declared in a standard library header in the future —they may even be in use today and you not be aware of it, seemingly odd things like propagate_on_container_move_assignment—, so avoid macros when possible and otherwise choose complex names to reduce the possibility of a collision.

A note on macro names

Macros have their own name space —labels do as well, but who uses goto anymore—, however they still interfere with identifiers in other spaces since they know nothing about the language but just bluntly operate on tokens. Not only their name space is flat, so its really easy to collide with other definitions if not careful, but they will replace any use of their names pretty much everywhere. That is probably the reason why there seems to be a general agreement that macros should have fully capitalized names. However, that is not enough.

By that rule, T and even T0 would be fully capitalized valid macro names, but those happen to be fairly regular choices for template parameters standing for generic types. There used to be —and maybe still is— a C physics library that defined a macro T just for stylistic reasons, as it resolved to plain whitespace. Suffice to say such library was unusable in conjunction with any slightly templated C++ code, with the exception of the standard library that shields itself by using only reserved names. Of course, nothing prevents a rogue library from declaring macros with a reserved name like _Ty, which would break more than one well known standard library.

Another recipe for disaster comes from the Windows API —you again, but others have done it as well— as it includes min and max functions, except that being a C API it can only sensibly provide those as macro functions. They happen to conflict with the use of the standard implementation of those, std::min and std::max. Luckily, there is a simple way of protecting against macro functions, as they will only be expanded when they are followed by an opening parenthesis ( with whitespace tokens allowed in between. The workaround is to inject something between the macro function name and a consecutive parenthesis. Adding parenthesis around the function name will do, although that has other effects as well. Another possibility is to place yet another macro after the function name, one that resolves to whitespace —but it better not be named T—.

std::max(0, 1); // error if max is defined as a macro function

(std::max)(0, 1); // ok, but no ADL (fine in this particular case since ADL does not kick in on qualified names)
std::max<int>(0, 1); // ok, only for function templates

#define INTERCEPT_MACRO_FUNCTION_SUBSTITUTION  
std::max INTERCEPT_MACRO_FUNCTION_SUBSTITUTION (0, 1); // ok, but gee...

std::max /*whitespace?*/ (0, 1); // error, comments are turned into whitespace early in the translation process

Besides being fully capitalized, macro names should be as random and unique as possible —UUIDs and hashes would make for perfect macro names— so to reduce to a minimum any chance of conflict. Local macros, those that are #defined and #undefined within a file scope, are somewhat less dangerous yet their names should still hint that they are macros.

Other reserved names

Reserved external names

[17.6.4.3.3/1] Each name declared as an object with external linkage in a header is reserved to the implementation to designate that library object with external linkage, both in namespace std and in the global namespace.

The list of such reserved names includes errno, declared or defined in <cerrno>.

[17.6.4.3.3/2] Each global function signature declared with external linkage in a header is reserved to the implementation to designate that function signature with external linkage.

The list of such reserved function signatures with external linkage includes setjmp(jmp_buf), declared or defined in <csetjmp>, and va_end(va_list), declared or defined in <cstdarg>.

[17.6.4.3.3/3] Each name from the Standard C library declared with external linkage is reserved to the implementation for use as a name with extern "C" linkage, both in namespace std and in the global namespace.

[17.6.4.3.3/4]. Each function signature from the Standard C library declared with external linkage is reserved to the implementation for use as a function signature with both extern "C" and extern "C++" linkage, or as a name of namespace scope in the global namespace.

Reserved type names

[17.6.4.3.4/1] For each type T from the Standard C library, the types ::T and std::T are reserved to the implementation and, when defined, ::T shall be identical to std::T.

These types are clock_t, div_t, FILE, fpos_t, lconv, ldiv_t, mbstate_t, ptrdiff_t, sig_atomic_t, size_t, time_t, tm, va_list, wctrans_t, wctype_t and wint_t.

Summary

C++ reserves a set of names to be used by implementations and standard libraries. Standard conformant programs shall avoid declaring those names. Additionally, polite code chooses macro names so that they have a low chance of colliding with other potential identifiers.

  • Avoid declaring names that contain a double underscore.
  • Avoid declaring names that begin with an underscore followed by a capital letter.
  • Avoid declaring names that begin with an underscore in the global namespace.
  • Avoid declaring user-defined literals that do not begin with an underscore.
  • Avoid declaring names that conflict with names in the standard library.