Hi! Hope you're enjoying this blog. I have a new home at www.goldsborough.me. Be sure to also check by there for new posts <3

Sunday, July 19, 2015

Literals in C++

C++11 introduced the concept of user-defined literals. What does this mean? It means that this snippet of code is valid:

using namespace std::chrono_literals;
    
std::cout << "One day has "
          << (23h + 59min + 60s).count()
          << " seconds." << std::endl;
    
// Output: One day has 86400 seconds.

But first, let me outline what your grandmother's rusty C++ compiler can already do in terms of manipulating primitive types by prefixing them in a certain way. Already long before C++11, it was possible to write integers in their octal representation, simply by prefixing them with the digit 0. The compiler would then convert that value from octal to decimal. In the same way, a '0x' prefix stood and stands for a hexadecimal value:

std::cout << 10 << " "
          << 010 << " "
          << 0x10
          << std::endl;

// Output: 10 8 16

Here comes the first change C++11 brought along regarding literals: binary literals. Nowadays, you can not only write integers in their hexadecimal, decimal or octal representation, but also in binary, by prefixing the number with '0b' and writing out the individual bits:

std::cout << 10 << " "
          << 010 << " "
          << 0x10 << " "
          << 0b10
          << std::endl;

// Output: 10 8 16 2

That's cute, but not really impressive. There are, however, three more areas where the standard introduced literals to cast certain expressions to classes depending on what characters they are suffixed by, namely: strings, time durations and complex numbers.

String Literals


For strings, you can cast a character literal (i.e. a char array) to an object of type std::string simply by appending an 's', given that you use a 'using' statement for the 'std::string_literals' (or std::literals::string_literals) namespace:

using namespace std::literals::string_literals;

auto string = "Hello World!"s;

std::cout << string.size() << std::endl;
    
// Output: 12

While this kind of syntax is certainly interesting, it lacks clarity and I would most likely always prefer to explicitly declare the string type. Imagine someone has no previous knowledge about the new standard literals, does not see the using declaration and comes across this expression. How much confusion would the "magic s" cause? Between 4 and 5 kilograms of confusion I would estimate.

Complex Literals


Next, it is now possible to use literals for complex numbers. Where you previously would have declared a complex number with a real and imaginary part like so:

std::complex<double> z(1, 2);

You can now use the following syntax, given that you make the std::literals, std::complex_literals or the std::literals::complex_literals namespace visible with an appropriate using declaration:

using namespace std::complex_literals;
    
auto z = 1.0 + 2i;


Duration Literals


The next domain of the standard library where you can find literals is the std::chrono header and its handling of time-spans and durations. I already gave an example of the literals you can specify for durations at the top of this article. At this point, it may be of worth to discuss how the literal syntax is implemented. The answer is surprisingly simple, as to define a certain literal means nothing else as to write a function -- a literal operator -- taking an integer or string literal as its argument (there is a pre-defined set of possible parameter lists you can have) and returning whatever you like. Here, for example, are the definitions of the literal operators that make the "24h" syntax possible:

constexpr std::chrono::hours
operator""_h(unsigned long long hrs)
{
    return std::chrono::hours(hrs);
}

constexpr std::chrono::duration<long double, std::ratio<3600>>
operator""_h(long double hrs)
{
    return std::chrono::duration<long double, std::ratio<3600>>(hrs);
}


Where you would write operator* to overload the multiplication operator or operator== for comparison, you here write operator"" followed by the suffix you want to define, e.g. h for hours. In this case, there is one overload that takes an unsigned long long and returns a duration represented by an integer (std::chrono::seconds), while the other's parameter and return type both use floating-point types and are called appropriately during overload resolution. This same syntax and pattern is used also for minutes (min), seconds (s), milliseconds (ms), microseconds (us) and lastly nanoseconds(ns).

User-defined Literals


While the standards committee has indeed granted us the right to create and use our own literal operators, there are a few catches to be aware of which I will quickly touch upon before showing you a few examples of user-defined literal operators. The first catch is that user-defined literal suffixes must begin with an underline, as those not starting with an underline are reserved for the standard. The second catch, which I already mentioned briefly further above, is that there is a pre-defined set of parameter lists that you can have for your literal operators. This list includes:

  • ( const char * ) 
  • ( unsigned long long int )
  • ( long double )   
  • ( char )   
  • ( wchar_t )  
  • ( char16_t ) 
  • ( char32_t ) 
  • ( const char * , std::size_t ) 
  • ( const wchar_t * , std::size_t )
  • ( const char16_t * , std::size_t )  
  • ( const char32_t * , std::size_t )

This is why, as you may have noticed, the parameter list for the hour-duration literal operator took a parameter of long double for the one overload and a parameter of type unsigned long long for the other overload. What exactly is the underlying motivation of the standards committee regarding this restriction of our liberty? Well, that's just none of your business. Damn you. But on more serious note, my personal guess is that each of these primitive types (of the non-pointer types here) can hold the largest possible value in their category, meaning all values in the range of their category can be implicitly cast to that type without having to worry about overflow (resulting from the cast -- the integer could still be larger than the range of the largest data type). If there were multiple overloads possible for a value, e.g. one taking an object of type int and another taking an unsigned long long, then given just the literal value 5 the compiler would have a pretty difficult time figuring out which to pick and would likely start to cry. Thus, all values of a certain category will be cast to the type with the largest width and thereby the necessity for overloads is nullified.

Here, now, a quick example of a user-defined type and an appropriate literal operator:

class Foo
{
public:
    
    constexpr explicit Foo(std::size_t x)
    : _x(x)
    { }
    
    void set(std::size_t x)
    {
        _x = x;
    }
    
    std::size_t get() const
    {
        return _x;
    }
    
private:
    
    std::size_t _x;
};

constexpr Foo operator""_f(unsigned long long x)
{
    return Foo(x);
}

As you can see, because the Foo class deals with integers, the parameter for the literal operator is an unsigned long long. Moreover, the literal begins with an underscore to not conflict with standard literals. Now, we can use the following sassy syntax to declare and initialize an object of type Foo:

int main(int argc, char * argv [])
{
    auto foo = 5_f;
    
    std::cout << foo.get() << std::endl;
    
    // Output: 5
}

That is all neat and nice, but what about a real-world, applicable example of user-defined literals? This would clearly be any form of unit-suffix. For a C++ CSS parser, you could have _em, _px, _pt or _pct suffixes to denote length units. For physics simulations, you could very easily declare things to be of a certain unit such as Newtons (_N) for force or Joules (_J) for energy and then allow implicit conversions between certain units when physically allowed. For the purpose of this article I wrote a neatly-commented little example using literals for units of weight, which you can find as a gist here.

Separator literals


As a very last topic for this article about literals in C++11 and beyond, I just very briefly want to touch upon a little feature that may make your code a lot more readable. Previously, you had no way of separating digits for very long integer literals:

unsigned long x = 204582349058239;

Now, with the arrival of the new literal syntax, it's possible to separate digits by single quotes:

unsigned long x = 204'582'349'058'239;

Much better!

And that'll be it for this post. Hope I could help!

No comments :

Post a Comment