Intervals: the upcoming evolution of C ++

The C ++ 20 standard will appear soon, in which, most likely, they will add the concept of intervals ( ranges ), but few know what they are and what they eat. I couldn’t find the Russian-speaking sources available to a wide audience about this beast, so in this article I would like to tell you more about it based on the lecture by Arno Schödl “From Iterators to Ranges: The Upcoming Evolution Of the STL” from Meeting C ++ 2015- th year. I will try to make this article as clear as possible for those who first come across this concept, and at the same time I will talk about all sorts of chips like interval adapters for those who are already familiar with this concept and want to learn more.

Libraries with ranges

At the time of this writing, there are three main libraries that implement the intervals:

The first library, in fact, is the progenitor of this concept (which is not surprising, after all, what is missing in the Boost library collection :)). The second is the Eric Niebler Library, which will be discussed later. And finally, the latest library, as you might guess, was written by the company think-cell, which can be said to have developed and improved Boost.Range.

Why are intervals our future?

For those who are not familiar with the concept of interval, we define this non-trivial concept as something that has a beginning and an end (a pair of iterators ).

Let's now consider the following problem: there is a vector, it is necessary to remove from it all repeating elements. Under the current standard, we would solve it like this:

std::vector<T> vec=...; std::sort( vec.begin(), vec.end() ); vec.erase( std::unique( vec.begin(), vec.end() ), vec.end() );

In this case, we specify the name of the vector as much as 6 times! However, using the concept of intervals (by combining iterators at the beginning and end of a vector into one object), you can write many times simpler by specifying the desired vector only once :

 tc::unique_inplace( tc::sort(vec) );

What is the interval at the moment within the current standard?

In the C ++ 11 standard, a range-based for loop and universal access to the beginning / end of containers was added, and in the last C ++ 17 standard, nothing new was added at intervals.

 for ( int& i : <range_expression> ) { ... }

 std::begin/end(<range_expression>)

Future intervals

Let us now dwell on the previously mentioned library Range V3. Erik Nibler, her creator, created the Range's Technical Specification as his home project, modifying the algorithm library to support the intervals. It looks like this:

 namespace ranges { template< typename Rng, typename What > decltype(auto) find( Rng && rng, What const& what ) { return std::find( ranges::begin(rng), ranges::end(rng), what ); } }

On his website there is some kind of preview of what he wants to standardize, this is the Range V3 .

What, after all, can range be considered?

First of all, containers (vector, string, list, etc.), because they have a beginning and an end. It is clear that containers own their elements, that is, when we turn to containers, we turn to all their elements. Similarly, when copying and declaring permanent (deep copying and constancy). Secondarily , views can also be considered as intervals. Views is just a pair of iterators pointing to the beginning and the end, respectively. Here is their simplest implementation:

 template<typename It> struct iterator_range { It m_itBegin; It m_itEnd; It begin() const { return m_itBegin; } It end() const { return m_itEnd; } };

Views, in turn, only refer to the elements, so copying and constancy are lazy (this does not affect the elements).

Interval adapters

At this, the inventors of the intervals did not dwell, because otherwise this concept would be quite useless. Therefore, they introduced such a thing as range adapters.

Transformer adapter

Consider the following problem: Let the vector int 's be given, in which you need to find the first element equal to 4:

 std::vector<int> v; auto it = ranges::find(v, 4);

Now let's imagine that the type of the vector is not int, but some kind of complex self-written structure, but in which there is an int, and the task is still the same:

 struct A { int id; double data; }; std::vector<A> v={...}; auto it = ranges::find_if( v, [](A const& a) { return a.id == 4; } );

It is clear that these two codes are similar in semantics, however, they differ significantly in syntax, because in the latter case we had to manually write a function that runs through the int field. But if you use a transforming adapter ( transform adapter ), then everything looks much more succinctly:

 struct A { int id; double data; }; std::vector<A> v={...}; auto it = ranges::find( tc::transform(v, std::mem_fn(&A::id)), 4);

In fact, the transforming adapter “transforms” our structure by creating a wrapper class around the int field. It is clear that the pointer points to the id field, but if we wanted it to point to the whole structure, then we need to add .base () at the end. This command encapsulates a field, which is why the pointer can run through the entire structure:

 auto it = ranges::find( tc::transform(v, std::mem_fn(&A::id)), 4).base();

Here is an example implementation of a transforming adapter (it consists of iterators, each of which has its own functor):

 template<typename Base, typename Func> struct transform_range { struct iterator { private: Func m_func; // в каждом итераторе decltype( tc::begin(std::declval<Base&>()) ) m_it; public: decltype(auto) operator*() const { return m_func(*m_it); } decltype(auto) base() const { return (m_it); } ... }; };

Filter adapter

And if in the last task we needed to find not the first such element, but to “filter” the entire field of int 's for the presence of such elements? In this case, we would use a filter adapter:

 tc::filter( v, [](A const& a) { return 4 == a.id; } );

Note that the filter is performed lazily during iterations.

But its naive implementation (something like this is implemented in Boost.Range):

 template<typename Base, typename Func> struct filter_range { struct iterator { private: Func m_func; // функтор и ДВА итератора decltype( ranges::begin(std::declval<Base&>()) ) m_it; decltype( ranges::begin(std::declval<Base&>()) ) m_itEnd; public: iterator& operator++() { ++m_it; while( m_it != m_itEnd && !static_cast<bool>(m_func(*m_it)) ) ++m_it; return *this; } ... }; };

As we can see, it already requires two iterators instead of one, as it was in the transforming adapter. The second iterator is necessary in order not to accidentally overstep the bounds of the container during iterations.

Some optimizations

Well, what does the iterator from tc :: filter (tc :: filter (tc :: filter (...))) look like?

Boost.Range

As part of the implementation above, it looks like this:

Not to look nervous!

m_func3
m_it3
m_func2
m_it2
m_func1
m_it1;
m_itEnd1;
m_itEnd2
m_func1
m_it1;
m_itEnd1;
m_itEnd3
m_func2
m_it2
m_func1
m_it1;
m_itEnd1;
m_itEnd2
m_func1
m_it1;
m_itEnd1;

Obviously, it is terribly inefficient.

Range v3

Let's think about how to optimize this adapter. The idea of Eric Nibler was that we should put general information (a functor and a pointer to the end) into an adapter object, and then we can store a link to this adapter object and the desired iterator

*m_rng
m_it

Then, as part of this implementation, the triple filter will look something like this:

Tyk

m_rng3
m_it3
m_rng2
m_it2
m_rng1
m_it1

This is still not perfect, although at times faster than the previous implementation.

think-cell index concept

Now consider the solution of the company think-cell. They introduced the so-called concept of indexes ( index concept ) to solve this problem. An index is an iterator that performs all the same operations as a regular iterator, but does so by referring to intervals.

 template<typename Base, typename Func> struct index_range { ... using Index = ...; Index begin_index() const; Index end_index() const; void increment_index( Index& idx ) const; void decrement_index( Index& idx ) const; reference dereference( Index& idx ) const; ... };

We show how you can combine an index with a regular iterator.

It is clear that a regular iterator can also be considered an index. In the opposite direction, compatibility can be implemented for example:

 template<typename IndexRng> struct iterator_for_index { IndexRng* m_rng; typename IndexRng::Index m_idx; iterator& operator++() { m_rng.increment_index(m_idx); return *this; } ... };

Then the triple filter will be implemented super-efficiently:

 template<typename Base, typename Func> struct filter_range { Func m_func; Base& m_base; using Index = typename Base::Index; void increment_index( Index& idx ) const { do { m_base.increment_index(idx); } while ( idx != m_base.end_index() && !m_func(m_base.dereference_index(idx)) ); } };

 template<typename IndexRng> struct iterator_for_index { IndexRng* m_rng; typename IndexRng::Index m_idx; ... };

As part of this implementation, the algorithm will work quickly regardless of the depth of the filter.

Intervals with lvalue and rvalue containers

Now let's see how the intervals work with lvalue and rvalue containers:

lvalue

Range V3 and think-cell behave the same with lvalue. Suppose we have this code:

 auto rng = view::filter(vec, pred1); bool b = ranges::any_of(rng, pred2);

Here we have a pre-declared vector that lies in memory (lvalue), and we need to create an interval and then somehow work with it. We create a view using view :: filter or tc :: filter and become happy, there are no errors, and we can use this view, for example, in any_of.

Range V3 and rvalue

However, if our vector had not yet been remembered (for example, if we only created it), and we would have the same task, then we would try to write and faced with an error:

 auto rng = view::filter(create_vector(), pred1); // не скомпилируется bool b = ranges::any_of(rng, pred2);

Why did it arise? View will be a hanging link to rvalue due to the fact that we create a vector and directly put in the filter, that is, the filter will have an rvalue link, which will then point to something unknown when the compiler goes to the next line and an error occurs. In order to solve this problem, in Range V3 came up with an action :

 auto rng = action::filter(create_vector(), pred1); // теперь скомпилируется bool b = ranges::any_of(rng, pred2);

Action does everything at once, that is, it simply takes a vector, filters it by predicate and puts it in the interval. However, the disadvantage is that it is no longer lazy, and think-cell tried to correct this minus.

think-cell and rvalue

Think-cell made it so that instead of view a container is created there:

 auto rng = tc::filter(creates_vector(), pred1); bool b = ranges::any_of(rng, pred2);

As a result, we do not encounter a similar error, because in their implementation the filter collects the rvalue container instead of the link, so this happens lazily. In Range V3, they didn’t want to do that, because they were afraid that there would be errors due to the fact that the filter behaves either as a view or as a container, but the think-cell is convinced that programmers understand how the filter behaves, and most of the errors arise precisely because of this "laziness."

Generator intervals

Let us generalize the concept of intervals. In fact, there are intervals without iterators. They are called generator ranges . Suppose we have a GUI widget (interface element), and we call the move widget. We have a window that asks to move its widget, we also have a button in the list box , and another window should also flip through its widgets, that is, we call traverse_widgets , which connects the elements to the functor ( you can say, there is an enumeration function where you connect the functor, and the function lists in this functor all the elements that it has ).

 template<typename Func> void traverse_widgets( Func func ) { if (window1) { window1->traverse_widgets(std::ref(func)); } func(button1); func(listbox1); if (window2) { window2->traverse_widgets(std::ref(func)); } }

This is somewhat similar to the interval of widgets, but there are no iterators here. Writing them directly would be inefficient and, above all, very difficult. In this case, we can say that such constructions are also considered as intervals. Then for such cases the use of useful interval methods takes place, such as any_of :

 mouse_hit_any_widget=tc::any_of( [] (auto func) { traverse_widgets(func); }, [] (auto const& widget) { return widget.mouse_hit(); } );

think-cell tries to implement the methods so that they have the same interface for all types of intervals:

 namespace tc { template< typename Rng > bool any_of( Rng const& rng ) { bool bResult = false; tc::enumerate( rng, [&](bool_context b) { bResult = bResult || b; } ); return bResult; } }

Using tc :: enumerate , the difference between the intervals is hidden, since such an implementation adheres to the concept of internal iteration (what the concepts of external and internal iteration are described in more detail in the lecture), however, such an implementation has its drawbacks, namely std :: any_of stops as soon as true is encountered. They try to solve this problem, for example, by adding exceptions (the so-called interrupted generator intervals ).

Conclusion

I hate the range-based for loop, because it motivates people to write it wherever it is needed and where it is not needed, because of which the brevity of the code often worsens, for example, people write this:

 bool b = false; for (int n : rng) { if ( is_prime(n) ) { b = true; break; } }

instead of this:

 bool b = tc::any_of( rng, is_prime );

Source: https://habr.com/ru/post/440388/