10 Sep, 2014

Episode Ten: When Size Does Matter

In the C++ lands every object has mass; for any complete type T, sizeof(T) is greater than zero. This keeps array indexing and pointer arithmetics from collapsing, but it also means that empty objects occupy space. Furthermore, when an empty object is placed in a class next to a bigger member, padding may —and in all likeliness will— be added due to alignment requirements, resulting in an empty member taking more than just one byte of storage. Certainly something has to be done about this...

The Empty Base Optimization

A motivating example is that of std::unique_ptr, a smart pointer which automatically manages the lifetime of an object of whom is the sole owner. As such, it is an ideal candidate to replace error prone uses of manually managed naked pointers. However, std::unique_ptr<T> does not only hold a pointer-to-T, but also a deleter object —an object that disposes of the pointer as appropriate— which defaults to std::default_delete<T>. A std::unique_ptr<T> seeks to impose no overhead over T*, but by having to store an additional object even when it's empty—which std::default_delete is— it seems to be starting quite at a disadvantage.

Consider the following suboptimal implementation of unique_ptr:

namespace suboptimal {
  template <typename T, typename D = std::default_delete<T>>
  class unique_ptr {
    T* ptr;
    D deleter;

  public:
    /*...*/
  };
}

For a particular implementation —one with 64-bit pointers and aligned memory access—, the size of suboptimal::unique_ptr<int> is twice that of int*:

std::cout << "sizeof \n"
          << " - int* : " << sizeof(int*) << '\n' // 8
          << " - std::default_delete<int> : "
            << sizeof(std::default_delete<int>) << '\n' // 1
          << " - suboptimal::unique_ptr<int> : "
            << sizeof(suboptimal::unique_ptr<int>) << '\n'; // 16

Yet, even though it is not guaranteed by the standard, for any decent implementation the size of std::unique_ptr<T> will be that of T*:

std::cout << " - std::unique_ptr<int> : "
            << sizeof(std::unique_ptr<int>) << '\n'; // 8

The deleter is still stored within the unique_ptr, it has to be since the interface exposes a reference to it; it is stored in a place where it is allowed —not guaranteed— to occupy no space when empty: a base class subobject. Both the compiler optimization and the technique that exploits it are commonly referred to as the empty base optimization (EBO), often indistinctively.

The Object Model

An object is a region of storage, defined by the C++ object model.

1.8 [intro.object]/2 Objects can contain other objects, called subobjects. A subobject can be a member subobject (9.2), a base class subobject (Clause 10), or an array element. An object that is not a subobject of any other object is called a complete object.

1.8 [intro.object]/4 If a complete object, a data member (9.2), or an array element is of class type, its type is considered the most derived class, to distinguish it from the class type of any base class subobject; an object of a most derived class type or of a non-class type is called a most derived object.

A special guarantee is given for base class subobjects, which may have zero size:

1.8 [intro.object]/5 Unless it is a bit-field (9.6), a most derived object shall have a non-zero size and shall occupy one or more bytes of storage. Base class subobjects may have zero size. An object of trivially copyable or standard-layout type (3.9) shall occupy contiguous bytes of storage.

Different complete objects have distinct addresses, subobjects may share the same address as long as they are of different types:

1.8 [intro.object]/6 Unless an object is a bit-field or a base class subobject of zero size, the address of that object is the address of the first byte it occupies. Two objects that are not bit-fields may have the same address if one is a subobject of the other, or if at least one is a base class subobject of zero size and they are of different types; otherwise, they shall have distinct addresses. [...]

Those base class subobjects that may have zero size are informally known as empty class types. While not a language notion, the library formalizes this property of a type through its std::is_empty type trait. The following examples depict situations where at least one subobject is an empty base class:

struct empty {};
struct empty_1 : empty {};
struct empty_2 : empty {};

// Address shared between A and empty (zero size)
struct A : empty {};

// Address shared between B and empty (non-zero size)
struct B : empty { empty x; };

// Address shared between C, empty_1 and empty_1::empty (non-zero size)
// Address shared between empty_2 and empty_2::empty
struct C : empty_1, empty_2 {};

// Address shared between C, empty_1 and empty_1::empty (non-zero size)
// Address shared between empty_2, empty_2::empty (non-zero size)
struct D : empty_1, empty_2 { empty x; };

A Note on Standard Layout

The memory model is written in such a way that implementors may choose to implement the empty base optimization—most do—, but it is not mandatory. However, C++11 introduced the notion of a standard-layout class, for which the empty base optimization is required:

9 [class]/7 A standard-layout class is a class that:

  • has no non-static data members of type non-standard-layout class (or array of such types) or reference,
  • has no virtual functions (10.3) and no virtual base classes (10.1),
  • has the same access control (Clause 11) for all non-static data members,
  • has no non-standard-layout base classes,
  • either has no non-static data members in the most derived class and at most one base class with non-static data members, or has no base classes with non-static data members, and
  • has no base classes of the same type as the first non-static data member.

9 [class]/16 Two standard-layout struct (Clause 9) types are layout-compatible if they have the same number of non-static data members and corresponding non-static data members (in declaration order) have layout-compatible types (3.9).

9.2 [class.mem]/19 If a standard-layout class object has any non-static data members, its address is the same as the address of its first non-static data member. Otherwise, its address is the same as the address of its first base class subobject (if any). [Note: There might therefore be unnamed padding within a standard-layout struct object, but not at its beginning, as necessary to achieve appropriate alignment. —end note]

A standard-layout class can have multiple empty base classes, yet the address of its first non-static data member shall be the same as that of the object. This can only be achieved if the empty-base optimization is applied to all base clases.

It should be noted that the intent is to restrict standard-layout classes to those in which the empty-base optimization can effectively be applied. However, under the given definition, classes with more than one base class subobject of the same type —C and D above— are incorrectly considered standard-layout classes. Were C truly a standard-layout class it would have to be layout-compatible with A , yet they have different sizes. Furthermore, for D to be a truly standard-layout class the addresses of the subobject x and empty_1 shall be the same as that of the object D, while at the same time the addresses of the subobjects x, empty_1 and empty_2 shall be distinct.

Implementation

Taking advantage of the empty-base optimization is not hard, but it is not trivial either. Simply inheriting from a possibly empty type is not an option, as it is only possible to inherit from non-union class types that are not marked as final. Within those, there are some inconvenient cases related to virtual. Inheriting from a class that declares or inherits a virtual function would inadvertently turn the holder into a polymorphic class. Furthermore, inheriting from a class with virtual base classes would require that the holder initializes it —as they must be initialized by the most derived type—, which it can't possibly know how to.

The std::is_empty type trait meets almost all of these requirements, as if it was carefuly tailored to help leveraging the empty-base optimization:

20.10.4.3 [meta.unary.prop]

template <class T>
  struct is_empty;
  • Condition: T is a class type, but not a union type, with no non-static data members other than bit-fields of length 0, no virtual member functions, no virtual base classes, and no base class B for which is_empty<B>::value is false.

  • Preconditions: If T is a non-union class type, T shall be a complete type.

The case not addressed by it is that of class types marked as final, for which the std::is_final type trait was introduced in C++14:

20.10.4.3 [meta.unary.prop]

template <class T>
  struct is_final;
  • Condition: T is a class type marked with the class-virt-specifier final (Clause 9). [Note: A union is a class type that can be marked with final. —end note]

  • Preconditions: If T is a class type, T shall be a complete type.

A candidate for the empty-base optimization is thus a type T such that std::is_empty<T>::value is true and std::is_final<T>::value is false. With these tools, it is possible to implement a space efficient unique_ptr:

namespace better_but_not_quite_there_yet {
  template <typename T, typename D = std::default_delete<T>,
    bool = std::is_empty<D>::value && !std::is_final<D>::value>
  class unique_ptr : private D { // empty-base optimization
    T* ptr;

  public:
    /*...*/

    D& get_deleter() noexcept { return *this; }
    D const& get_deleter() const noexcept { return *this; }
  };

  template <typename T, typename D>
  class unique_ptr<T, D, false> {
    T* ptr;
    D deleter;

  public:
    /*...*/

    D& get_deleter() noexcept { return deleter; }
    D const& get_deleter() const noexcept { return deleter; }
  };
}

This is better, in the sense that there is no space overhead for empty deleters, but it is not quite there yet:

std::cout << " - better_but_not_quite_there_yet::unique_ptr<int> : "
            << sizeof(better_but_not_quite_there_yet::unique_ptr<int>) << '\n'; // 8

Taking advantage of the empty-base optimization in this particular way has some serious drawbacks. The obvious one is that the interface of unique_ptr has changed by exposing an additional template parameter. A more fundamental one is that implementing the optimization in this way is not generic, the effort has to be replicated for each class that wants to leverage it. For that reason, a compressed_pair —like the one from Boost, or better yet one that is C++11/14 aware— is a most welcomed addition to the toolbox:

template <std::size_t I, typename T,
  bool = std::is_empty<T>::value && !std::is_final<T>::value>
class compressed_member : private T // empty-base optimization
{
public:
  /*..*/

  constexpr T& get() noexcept { return *this; }
  constexpr T const& get() const noexcept { return *this; }
};

template <std::size_t I, typename T>
class compressed_member<I, T, false>
{
  T value_;
public:
  /*..*/

  constexpr T& get() noexcept { return value_; }
  constexpr T const& get() const noexcept { return value_; }
};

template <typename T1, typename T2>
class compressed_pair
 : private compressed_member<0, T1>
 , private compressed_member<1, T2> {
public:
  /*...*/

  constexpr T1& first() noexcept { return compressed_member<0, T1>::get(); }
  constexpr T1 const& first() const noexcept { return compressed_member<0, T1>::get(); }
  constexpr T2& second() noexcept { return compressed_member<1, T2>::get(); }
  constexpr T2 const& second() const noexcept { return compressed_member<1, T2>::get(); }
};

The reason compressed_pair must be a distinct type is that std::pair specifies its member subobjects, and as such they cannot be replaced with base class subobjects when empty. However, a std::pair is nothing but a two element std::tuple, which fortunately does not specify any member subobjects. A decent implementation of std::tuple will be space efficient, employing the empty-base optimization —at least for the basic case of no subobjects with conflicting addresses—.

It is now possible to use compressed_pair to implement unique_ptr, as well as any other classes that could benefit from the empty-base optimization:

namespace ideal {
  template <typename T, typename D = std::default_delete<T>> // clean interface
  class unique_ptr { // single definition
    compressed_pair<T*, D> m; // empty-base optimization if applicable

  public:
    /*...*/

    T* get() const noexcept { return m.first(); }

    D& get_deleter() noexcept { return m.second(); }
    D const& get_deleter() const noexcept { return m.second(); }
  };
}

std::cout << " - ideal::unique_ptr<int> : "
            << sizeof(ideal::unique_ptr<int>) << '\n'; // 8

A Note on Facility Inheritance

Facility inheritance is used as a convenience to provide a set of functionality to its heirs. Examples of it are std::iterator and boost::noncopyable. These facility classes tend to be empty, however if they are not carefully designed they can end up preventing the empty-base optimization:

struct foo : private boost::noncopyable {};
struct bar : private boost::noncopyable { foo x; };

std::cout << " - bar : " << sizeof(bar) << '\n'; // 2

Even though foo is an empty class, the empty-base optimization does not apply to bar. If it did, then two distinct subobjects of type boost::noncopyable would end up sharing the same address. The problem is that even though there is no semantic relation between the two classes, this is not reflected in the type system. When designing a class for facility inheritance, it is best to keep in mind the empty-base optimization and make different things have distinct types:

namespace better {
  template <typename Derived> struct noncopyable {
    /*same as boost::noncopyable*/
  };
}

struct foo : private better::noncopyable<foo> {};
struct bar : private better::noncopyable<bar> { foo x; };

std::cout << " - bar : " << sizeof(bar) << '\n'; // 1

Summary

Every complete object occupies storage; sizeof(T) is always greater than zero, even for empty objects. Because sometimes size does matter, an empty base class subobject is allowed to have zero size. The empty-base optimization can be leveraged to keep an object's size compact when the type of an object to be hold could potentially be empty —like models of standard library concepts such as Allocator, Deleter, et.al.—.

  • Different complete objects have distinct addresses. Subobjects can share a single address with the complete object that holds them; this is the case for the first base subobject and/or the first member subobject. Different objects of the same type never share a single address.
  • The empty-base optimization allows for an empty base subobject to have zero size, as long as it doesn't cause two different subobjects of the same type share a single address.
  • The empty-base optimization is mandatory for a standard-layout class, optional otherwise.
  • Candidates that could benefit from the empty-base optimization can be determined using std::is_empty<T>::value together with !std::is_final<T>::value.
  • Facility inheritance can easily interfere with the empty-base optimization, if not properly design so that there are no common types between unrelated classes.

References:

Empty base optimization - cppreference.com

Is the Empty Base Class Optimization now a mandatory optimization (at least for standard-layout classes)?, StackOverflow.com