19 Jan, 2013

Episode One: To Be or Not to Be Const

Const-correctness is the form of program correctness that deals with the proper declaration of objects as mutable or immutable. The idea of constness is a compile-time enforced construct that indicates what a programmer should do, it communicates information about a value's intended use: whether something may or may not be modified. Embedding such intent by means of a keyword will result in clearer, easier to follow code; and if you happen to stray from the declared intent the compiler will step up and make you think twice about what you are trying to do.

Being const

When the const-qualifier is applied to a declaration it results in something that is immutable.

  • const in objects:

    An object that is declared const is immutable. Its value cannot be changed, thus its value will always be the one with which the object was initialized.

  • const in pointers/references:

    A pointer/reference to const does not need to point/refer to a const object, but it is treated as if it does. The pointer/reference cannot be used to modify the pointee/referenced, even if the object pointed to/referenced is a non-const object that can be modified through some other access path.

  • const in member functions:

    A member function that is declared const results in a member function that can't change the observable state of the object that contains it. The const-qualifier applies to the implicit this argument.

  • const in function arguments:

    As function argument types decay, top level const-qualifiers are removed. It is the copy of the parameters that are const and not the parameters passed to it, it only affects the way in which such parameter copies can be used within the declaration of that function.

    The following two definitions are in fact defining the same function, which is an error:

    void foo( int p ){ }
    void foo( int const p ){ }

bitwise vs logical

The idea of constness does not imply that the variable as it is stored in the computer's memory is unwriteable (bitwise constness), but rather than its logical state does not change (logical constness). Sometimes an object logical state does not change, but its bits do (think of an object that provides reference count, cache, etc). Such thing is accomplished by the use of the mutable keyword.

[7.1.1/10] The mutable specifier can be applied only to names of class data members (9.2) and cannot be applied to names declared const or static, and cannot be applied to reference members. [ Example:

class X {
    mutable const int* p; // OK
    mutable int* const q; // ill-formed
  };

—end example ]

[7.1.1/11] The mutable specifier on a class data member nullifies a const specifier applied to the containing class object and permits modification of the mutable class member even though the rest of the object is const (7.1.6.1).

A member object that is declared as mutable can always be changed, even inside a const member function or by a pointer/reference to const.

So you want to shoot yourself in the foot

Another alternative to bypass intention is to cast the constness away by means of const_cast. By using it, its possible to turn a pointer/reference to a const object into one to a non-const object. It's also possible to go the other way and turn a pointer/reference to a non-const object into a const one, but that is less common since such conversion is a standard conversion that can be done implicitly.

But there is a caveat:

[7.1.6.1/4] Except that any class member declared mutable (7.1.1) can be modified, any attempt to modify a const object during its lifetime (3.8) results in undefined behavior.

A static const object with no mutable subobjects that is initialized with a constant expression may be placed in a read-only memory location (either ROM or otherwise enforced). If you cast the constness away of one of those objects and then try to modify it, you have undefined behavior. This situation is more common than it may initially seem, specially when dealing with string literals

char const* some_string = "Hello World?";
const_cast< char* >( some_string )[11] = '!'; // we need const_cast here or the compiler will complain

Take a look at what is going on there, we are declaring a string literal "Hello World?"

static char const __hello_world_literal[] = "Hello World?";

we take a pointer to it

char const* some_string = __hello_world_literal;

and then we cast the constness away and try to modify it

const_cast< char* >( some_string )[11] = '!';

And, trying to modify something in read-only memory is —of course— undefined behavior. This will usually, but not necessary, result in a crash when executed.

String literals happens to be one of those places where C and C++ differ, being const in C++ and not in C:

char* p = "abc"; // valid in C, invalid in C++

Some C++ compilers will, perhaps for backwards compatibility reasons, allow such invalid code by essentially adding an implicit const_cast. Now we have the same kind of undefined behavior, but without any of the compiler errors nor the ugly const_cast hinting that it should be handled with care.

const implies thread-safe

This is what the Standard Language has to say on thread-safety:

[1.10/4] Two expression evaluations conflict if one of them modifies a memory location (1.7) and the other one accesses or modifies the same memory location.

[1.10/21] The execution of a program contains a data race if it contains two conflicting actions in different threads, at least one of which is not atomic, and neither happens before the other. Any such data race results in undefined behavior.

which is nothing else than the sufficient condition for a data race to occur:

  1. There are two or more actions being performed at the same time on a given thing; and
  2. At least one of them is a write.

The Standard Library builds on that, going a bit further:

[17.6.5.9/1] This section specifies requirements that implementations shall meet to prevent data races (1.10). Every standard library function shall meet each requirement unless otherwise specified. Implementations may prevent data races in cases other than those specified below.

[17.6.5.9/3] A C++ standard library function shall not directly or indirectly modify objects (1.10) accessible by threads other than the current thread unless the objects are accessed directly or indirectly via the function’s non-const arguments, including this.

which in simple words says that it expects operations on const objects to be thread-safe. This means that the Standard Library won't introduce a data race as long as operations on const objects of your own types either

  1. Consist entirely of reads —that is, there are no writes—; or
  2. Internally synchronizes writes.

If this expectation does not hold for one of your types, then using it directly or indirectly together with any component of the Standard Library may result in a data race. In conclusion, const does mean thread-safe from the Standard Library point of view. It is important to note that this is merely a contract and it won't be enforced by the compiler, if you break it you get undefined behavior and you are on your own. Whether const is present or not will not affect code generation —at least not in respect to data races—.

const is no synchronized

Consider the following overly simplified class representing a rectangle:

class rect {
    int width = 0, height = 0;

public:
    /*...*/
    void set_size( int new_width, int new_height ) {
        width = new_width;
        height = new_height;
    }
    int area() const {
        return width * height;
    }
};

The member-function area is thread-safe; not because its const, but because it consist entirely of read operations. There are no writes involved, and at least one write involved is necessary for a data race to occur. That means that you can call area from as many threads as you want and you will get correct results all the time.

Note that this doesn't mean that rect is thread-safe. In fact, its easy to see how if a call to area were to happen at the same time that a call to set_size on a given rect, then area could end up computing its result based on an old width and a new height (or even on garbled values).

But that is alright, rect isn't const so its not even expected to be thread-safe after all. An object declared const rect, on the other hand, would be thread-safe since no writes are possible (and if you are considering const_cast-ing something originally declared const then you get undefined-behavior and that's it).

So what does it mean then?

Let's assume —for the sake of argument— that multiplication operations are extremely costly and we better avoid them when possible. We could compute the area only if it is requested, and then cache it in case it is requested again in the future:

class rect {
    int width = 0, height = 0;

    mutable int cached_area = 0;
    mutable bool cached_area_valid = true;

public:
    /*...*/
    void set_size( int new_width, int new_height ) {
        cached_area_valid = ( width == new_width && height == new_height );
        width = new_width;
        height = new_height;
    }
    int area() const {
        if( !cached_area_valid ) {
            cached_area = width;
            cached_area *= height;
            cached_area_valid = true;
        }
        return cached_area;
    }
};

[If this example seems too artificial, you could mentally replace int by a very large dynamically allocated integer which is inherently non thread-safe and for which multiplications are extremely costly.]

The member-function area is no longer thread-safe, it is doing writes now and is not internally synchronized. Is it a problem? The call to area may happen as part of a copy-constructor of another object, such constructor could have been called by some operation on a standard container from a different thread, and at that point the standard library expects this operation to behave as a read in regard to data races. But we are doing writes!

As soon as we put a rect in a standard container —directly or indirectly— we are entering a contract with the Standard Library. To keep doing writes in a const function while still honoring that contract, we need to internally synchronize those writes:

class rect {
    int width = 0, height = 0;

    mutable std::mutex cache_mutex;
    mutable int cached_area = 0;
    mutable bool cached_area_valid = true;

public:
    /*...*/
    void set_size( int new_width, int new_height ) {
        if( new_width != width || new_height != height )
        {
            std::lock_guard< std::mutex > guard( cache_mutex );

            cached_area_valid = false;
        }
        width = new_width;
        height = new_height;
    }
    int area() const {
        std::lock_guard< std::mutex > guard( cache_mutex );

        if( !cached_area_valid ) {
            cached_area = width;
            cached_area *= height;
            cached_area_valid = true;
        }
        return cached_area;
    }
};

Note that we made the area function thread-safe, but the set_size function is —deliberately— non thread-safe so the rect still isn't thread-safe. A call to area happening at the same time that a call to set_size may still end up computing the wrong value, since the assignments to width and height are not protected by the mutex.

If we really wanted a thread-safe rect, we would use a synchronization primitive to protect the non thread-safe rect.

Summary

Const-correctness is a good thing. It lets us express an intent directly with language constructs, and we get some help from the compiler to enforce it.

  • const means logically immutable. Implement your const methods so that this expectation holds.
  • const means thread-safe (not necessarily synchronized). Implement your const methods so that this expectation holds.
  • mutable members can always be changed, even inside const functions.
  • const_casts that remove constness are only ok if the object was originally declared non-const.

References: