30 Aug, 2013

Episode Six: Boollocks

In the beginning there was no bool. And C++ said "let there be bool", and there was bool...

The Boolean data type is an integral type with only two values: true and false, intended to represent the truth values of logic and Boolean algebra. With only two values, what can possibly go wrong?

Genesis

The initial implementations of the C language (1972) provided no Boolean type, and to this day Boolean values are commonly represented by ints in C programs. The comparison operators are defined to return an int, either 0 for false or non-0 for true. The same convention is assumed by the logical operators and condition-testing statements.

After enums were added to the ANSI version of C (1989), many programmers got used to defining their own Boolean types as such, for readability reasons. However, enumerated types are equivalent to integers according to the language standards; so the effective identity between Booleans and integers is still valid for C programs.

C++ (standardized in 1998) has a separate Boolean data type bool, but with automatic conversions from scalar and pointer values that are very similar to those of C —and if you think a builtin bool type is not needed, think again—.

C99 added a Boolean data type _Bool. Additionally, the standard requires that macros are defined in the <stdbool.h> header to alias the type as bool, as well as providing macros for true and false.

[7.18/1] The header <stdbool.h> defines four macros.

[7.18/2] The macro bool expands to _Bool.

[7.18/3] The remaining three macros are suitable for use in #if preprocessing directives. They are true which expands to the integer constant 1, false which expands to the integer constant 0, and __bool_true_false_are_defined which expands to the integer constant 1.

[7.18/4] Notwithstanding the provisions of 7.1.3, a program may undefine and perhaps then redefine the macros bool, true, and false.

[7.1.3] states that once <stdbool.h> is included, the macro names defined in it are reserved for use only as specified, and that declaring or defining entities with those names results in undefined behavior.

Even more, the future library directions section reads:

[7.31.9/1] The ability to undefine and perhaps then redefine the macros bool, true, and false is an obsolescent feature.

A Partial Specialization Gone Wrong

A bit is enough to store the possible values of bool. However, there is no type with a size less than that of char, which has to be at least 8 bits long. This means that every bool will waste at least 7 bits. If we had several bools to store, we could pack them together into one of the other integral types and save a considerable amount of space. That is the idea behind the partial specialization of std::vector of bools.

[23.3.8] To optimize space allocation, a specialization of vector for bool elements is provided:

namespace std {
    template <class Allocator> class vector<bool, Allocator> {
    public:
      // types:
      typedef bool const_reference;
      /*...*/
      // bit reference:
      class reference {
        friend class vector;
        reference() noexcept;
      public:
        ~reference();
        operator bool() const noexcept;
        reference& operator=(const bool x) noexcept;
        reference& operator=(const reference& x) noexcept;
        void flip() noexcept; // flips the bit
      };
      /*...*/
    };
  }

The problem with this particular specialization is that std::vector<bool> is not really a std::vector, and in fact it is not even a container!

Contiguous Storage

A std::vector<T> will store its elements contiguously in memory, as long as its elements are not bools:

[23.3.7.1/1] A vector is a sequence container that supports random access iterators. In addition, it supports (amortized) constant time insert and erase operations at the end; insert and erase in the middle take linear time. Storage management is handled automatically, though hints can be given to improve efficiency. The elements of a vector are stored contiguously, meaning that if v is a vector<T, Allocator> where T is some type other than bool, then it obeys the identity &v[n] == &v[0] + n for all 0 <= n < v.size().

While for a std::vector<bool>:

[23.3.8/3] There is no requirement that the data be stored as a contiguous allocation of bool values. A space-optimized representation of bits is recommended instead.

General Container Requirements

A std::vector<T> is a container, as long as its elements are not bools:

[23.3.7.1/2] A vector satisfies all of the requirements of a container and of a reversible container (given in two tables in 23.2) [...]

[23.2.1/4] In Tables 96 and 97, X denotes a container class containing objects of type T [...]

Expression Return type
X::reference lvalue of T
X::const_reference const lvalue of T

While for a std::vector<bool>:

[23.3.8/4] reference is a class that simulates the behavior of references of a single bit in vector<bool>. The conversion operator returns true when the bit is set, and false otherwise. The assignment operator sets the bit when the argument is (convertible to) true and clears it otherwise.

...and const_reference is just plain bool.

Allocator-Aware Container Requirements

A std::vector<T> is an allocator-aware container, as long as its elements are not bools:

[23.3.7.1/2] A vector satisfies all of the requirements [...] of an allocator-aware container (Table 99).

[23.2.1/3] For the components affected by this subclause that declare an allocator_type, objects stored in these components shall be constructed using the allocator_traits<allocator_type>::construct function and destroyed using the allocator_traits<allocator_type>::destroy function (20.8.8.2). These functions are called only for the container’s element type, not for internal types used by the container. [ Note: This means, for example, that a node-based container might need to construct nodes containing aligned buffers and call construct to place the element into the buffer. —end note ]

While for a std::vector<bool>:

[23.3.8/2] [...] operations dealing with the bool value type map to bit values in the container storage and allocator_traits::construct (20.8.8.2) is not used to construct these values.

Wrapping up

The committee intended std::vector<bool> to moonlight as a dynamic counterpart to std::bitset, and the result is a partial specialization that can't fulfill either of its jobs. Since bits are not addressable, a data structure of packed bools —dynamic or not— will never be a container. For example, the following code snippet will work for every mutable container, but not for a std::vector<bool>:

template <class Container>
void test(Container& c)
{
  for (auto& e : c) {
    // do something to e
  }
}

A dynamic counterpart to std::bitset would better be named std::dynamic_bitset and not std::vector<bool>. But even if that happens, due to backwards compatibility we might never get std::vector<bool> back. Meanwhile, the space optimization of std::vector<bool> is optional —yet not the odd interface—, so Boost.Dynamic_Bitset might be a better alternative.

Tribool

or: How I Learned to Stop Worrying and Love the Overloaded Logical Operators

There is more in logic than just Boolean algebra. A many-valued logic system has more than just two truth values. In particular, a three-valued logic system has three truth values: true, false and some indeterminate third value.

The fundamental data type of a three-valued logic system is the tribool —a different beast than std::optional<bool>—, and Boost.Tribool is an implementation of it. Quoting from the documentation:

tribool supports the 3-state logic operators ! (negation), && (logical-and), and || (logical-or), with bool and tribool values. For instance:

tribool x = some_op();
  tribool y = some_other_op();
  if (x && y) {
    // both x and y are true
  }
  else if (!(x && y)) {
    // either x or y is false
  }
  else {
    // neither x nor y is false, but we don't know that both are true
  
    if (x || y) {
      // either x or y is true
    }
  }

The catch is that those operators are overloaded, and overloaded operators are just regular functions. As such, overloaded operators && and || loose their special short-circuit evaluation semantics. Consider the following scenario:

struct X {
  boost::tribool hi_there() const { /*...*/ }
};

X *x = nullptr;
if (x != nullptr && x->hi_there()) { // boom!
  /*...*/
}

The problem here is that boost::tribool is an argument to &&, so the overloaded operator && is called instead of the builtin operator. As with any ordinary function, each parameter shall be initialized with its corresponding argument —the evaluation order is indeterminated, but there is a more pressing issue—. The right hand side argument, x->some_op(), will be evaluated regardless of the value of the left hand side argument —which might not have been evaluated yet— and it will be dereferencing a null pointer each time x != nullptr does not hold. It should be noted that the offending expression makes no mention of boost::tribool, which would be in itself a red flag.

Syntax sugar is nice to have, but not when the cost to pay is that of deviant semantics.

Boolean Conversions

An implicit conversion to bool is another construct involving bool that is not without issues —issues that were covered by a previous episode—. What follows is an excerpt from Episode Five: Explicit is Better than Implicit:

Consider a conversion from a user-defined class X to bool. Given that bool is an integral type, if said conversion function is not explicit then suddenly a value of type X could participate in all kinds of integral expressions. While it may make sense for an object representing a file to convert to false when said file is not open, it is utterly senseless to add files or to compare files using relational operators.

[...]

Before explicit conversion functions where introduced in C++11, conversions to bool presented a real problem —as exemplified earlier—. Components of the IOStreams subset of the standard library initially tackled this problem by defining a conversion to void* instead, for which there is a standard conversion to bool. This prevents its participation in arithmetic expressions, but it opens the door to nonsense code such as delete std::cout. A solution using member pointers was then born, which conveniently cannot be the operand in a delete expression, and it is known as the —now obsolete— safe bool idiom.

Summary

The Boolean data type in C++ is bool and its possible truth values are true and false.

  • bool is an integral type.
  • A std::vector<bool> does not provide the guarantees of a std::vector nor of a container. Generic algorithms may fail for that particular partial specialization.
  • Overloaded logical operators && and || do not have short-circuit evaluation semantics.
  • Types with an implicit-conversion to bool can participate in all kinds of integral expressions.

References: