Hi! Hope you're enjoying this blog. I have a new home at www.goldsborough.me. Be sure to also check by there for new posts <3

Wednesday, April 1, 2015

Python's built-in zip() function in C++

We all know, use and love Python's built-in zip() function.

In Python, the implementation of the zip() function probably looks like, or at least could be replicated by this cute little lambda:

zip = lambda *args: [[i[n] for i in args] for n in range(len(min(args,key=len)))]

Python, being a weakly typed programming language with support for list comprehensions as well as elements of functional programming such as the lambda function feature shown here, makes it easy to create these super concise, variable length, who-cares-about-types-anyway functions.

Now, if we leave type utopia and enter the world of strongly typed programming languages such as C++, things look a little different. Types suddenly matter and variable length parameter lists take a little more thought to implement than simply adding an asterisk (note to self: great success, no longer spelt it asterix). As a matter of fact, parameter packs and variadic template functions, the combination of features we'll use in this tutorial, were only added in C++11, but provide some much-needed and really, really cool functionality.

At the end of this tutorial, this C++ code will work:

int main(int argc, char * argv [])
{
    std::vector<int> a = {1,2,3};
    std::vector<int> b = {4,5,12,44};
    std::vector<int> c = {6,7,8,9};
    
    for (auto i : zip(a, b, c))
    {
        for (auto j : i)
        {
            std::cout << j << "\t";
        }
        
        std::cout << std::endl;
    }
}

Pretty concise, no?

Before I show you and describe the code that makes the above behavior possible, let me write a few words about variadic template functions in C++. Basically, they enable us to have template functions that take a variable number of parameters. Example:

void foo()
{ }

template<typename T, typename... Args>
void foo(const T& head, const Args&... tail)
{
    std::cout << head << std::endl;
    
    foo(tail...);
}

Variadic template functions are created by having a parameter pack argument for a template function or class. This parameter pack type is denoted by the "typename" keyword followed by an ellipsis ("...") and a name, in the function above "Args" (seems to be the conventional name for C++ parameter packs). When a function then takes such a parameter pack as an argument, as does foo() above, you can call the function with a variable number of arguments, e.g. foo(1, 2, 3, 4) or foo("lorem", "ipsum").

The problem is that you cannot actually retrieve any values from this parameter pack, you can only expand it, i.e. you can't do args[2] to get the third element of a parameter pack "args", you can only do args... (note the ellipsis) which will yield the full list of parameters passed to it.

To tackle this problem, we have two template types: T and Args. Args is the template parameter pack I already introduced you to. By having a second template type T as the first argument, we can capture the first item passed the function, which will enable us to do things with it! We can then recursively call this variadic template function each time making one item from the parameter pack the "head" (type T) which we can use, while packing all other arguments into the parameter pack, the "tail" (of type Args...), which we pass on to the next stack. When the tail is empty at the very end, the overload of the function taking no parameters is called.

So, what happens when we call foo(1,2,3)? The "head" of type T will be 1 at first, while 2 and 3 are stored in the parameter pack Args. Then we call foo(tail...), thereby expanding the parameter pack. This is equivalent to foo(2,3). In the next function stack, head is equal to 2 and 3 is stored in the tail. The parameter pack expansion is then equivalent to foo(3), thus setting head equal to 3 in this stack while the parameter pack is empty. When we call foo(tail...) one last time, it will be equivalent to foo(). This is why we need a parameter-less overload of foo(), even though we don't have to do anything in it. It will just start the movement back down the stacks.

The output would therefore be:

>> foo(1,2,3);
>> 1
>> 2
>> 3

Now that the behavior of C++ variadic template functions should be clear(er), I can show you the code for the zip() function in C++ (also found here as a gist):

template<typename T>
class Zip
{
    
public:
    
    typedef std::vector<T> container_t;
    
    template<typename... Args>
    Zip(const T& head, const Args&... args)
    : items_(head.size()),
      min_(head.size())
    {
        zip_(head, args...);
    }
    
    inline operator container_t() const
    {
        return items_;
    }
    
    inline container_t operator()() const
    {
        return items_;
    }
    
private:
    
    template<typename... Args>
    void zip_(const T& head, const Args&... tail)
    {
        // If the current item's size is less than
        // the one stored in min_, reset the min_
        // variable to the item's size
        if (head.size() < min_) min_ = head.size();
        
        for (std::size_t i = 0; i < min_; ++i)
        {
            // Use begin iterator and advance it instead
            // of using the subscript operator adds support
            // for lists. std::advance has constant complexity
            // for STL containers whose iterators are
            // RandomAccessIterators (e.g. vector or deque)
            typename T::const_iterator itr = head.begin();
            
            std::advance(itr, i);
            
            items_[i].push_back(*itr);
        }
        
        // Recursive call to zip_(T, Args...)
        // while the expansion of tail... is not empty
        // else calls the overload with no parameters
        return zip_(tail...);
    }
 
    inline void zip_()
    {
        // If min_ has shrunk since the
        // constructor call
        items_.resize(min_);
    }
 
    /*! Holds the items for iterating. */
    container_t items_;
    
    /*! The minimum number of values held by all items */
    std::size_t min_;
    
};
 
template<typename T, typename... Args>
typename Zip<T>::container_t zip(const T& head, const Args&... tail)
{
    return Zip<T>(head, tail...);
}

As you can see, the zip() function is a variadic template function, but it is only an extra level of indirection to a template class, Zip<T>, whose constructor is again a variadic template function. The code is commented, but I will explain the general procedure of a call to zip().

Just to re-cap: the aim of the zip function and the Zip class is to restructure a variable number of containers into a new set of containers, where each container in the new set holds all items for a single column of the old set. Because we want each container of the new set to have the same length, we must use the minimum number of columns in the old set for our restructuring process. This sounds kind of complicated, but we're essentially just transposing the old set of containers (switching rows for columns).

We can't completely escape strong typing, even with the use of templates, because we have to store the values from the old set of containers into some other set while restructuring. Therefore, Zip<T> is a template class where T is the type of container we're passing, e.g. std::vector<int> in the main loop at the top of this post. When we call zip with a set of vectors of ints, the private variable "items_" will be of type std::vector<std::vector<int>> (typedef container_t). items_ is the object we return from zip(), by the way, so the return type of zip() is Zip<T>::container_t.

So a function call to zip(T, Args) will be forwarded to Zip<T>(T, Args), which again forwards to its own variadic template function that does the actual work of unpacking and restructuring the containers: zip_(T, Args), which has the overload for when Args is empty: zip_().

So how do we make the restructured containers / items retrievable? I decided that the best and most intuitive way would be to overload the implicit conversion operator to container_t, as well as overload the function call operator of the Zip<T> class. The overload of the conversion operator is useful any time we want to directly convert the Zip object to the container_t, e.g. as we do in the zip() function (the implicit conversion happens when we return the object). However, I figured that one case in which this would not work is if passing the Zip object as a template parameter to a function that expects the zipped container and not the Zip class, as there would be no implicit conversion in that case. To solve this problem, we have the possibility of explicitly retrieving items_ through the function call operator:

template<typename T>
void foo(const T& t)
{
    std::cout << t.size() << std::endl;
}

Zip<std::list<int>> zipObject(a,b,c);

foo(zipObject); // Error, Zip has no member size()
    
foo(zipObject()); // All okay, Zip::container_t has member size()

Given that the Zip object is supposed to simulate function behavior as well as possible, I chose the function call operator overload over having an explicit get() member or anything, thus making Zip into a true functor object. Of course, ideally, we shouldn't be handling the Zip object ourselves, but just be calling the zip() function which returns Zip::container_t in a much more intuitive way.

So after all this explaining business, what actually is the output of the zip() function. Does it even work?

int main(int argc, char * argv [])
{
    std::vector<int> a = {1,2,3};
    std::vector<int> b = {4,5,12,44};
    std::vector<int> c = {6,7,8,9};
    
    for (auto i : zip(a, b, c))
    {
        for (auto j : i)
        {
            std::cout << j << "\t";
        }
        
        std::cout << std::endl;
    }
}

Output:

>>> 1    4    6 
>>> 2    5    7 
>>> 3    12   8

Yey! Thanks for reading! :)

2 comments :