Standard Library Algorithms: Changes and Additions
Today we have a guest post from Marc Gregoire , Software Architect at Nikon Metrology and Microsoft MVP since 2007.
The C++14 standard already contains a wealth of different kinds of algorithms. C++17 adds a couple more algorithms and updates some existing ones. This article explains what’s new and what has changed in the C++17 Standard Library.
New Algorithms
Sampling
C++17 includes the following new sampling algorithm:

sample(first, last, out, n, gen)
It uses the given random number generator (
gen
) to pick
n
random elements from a given range [
first
,
last
) and writes them to the given output iterator (
out
).
Here is a simple piece of code that constructs a vector containing the integers 1 to 20. It then sets up a random number generator, and finally generates 10 sequences of 5 values, in which each value is randomly sampled from the
data
vector:
using namespace std;
vector<int> data(20);
iota(begin(data), end(data), 1);
copy(cbegin(data), cend(data), ostream_iterator<int>(cout, " "));
cout << '\n';
random_device seeder;
const auto seed = seeder.entropy() ? seeder() : time(nullptr);
default_random_engine generator(
static_cast<default_random_engine::result_type>(seed));
const size_t numberOfSamples = 5;
vector<int> sampledData(numberOfSamples);
for (size_t i = 0; i < 10; ++i)
{
sample(cbegin(data), cend(data), begin(sampledData),
numberOfSamples, generator);
copy(cbegin(sampledData), cend(sampledData),
ostream_iterator<int>(cout, " "));
cout << '\n';
}
Here is an example of a possible output:
3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
8 9 17 19
7 12 13 18
7 8 14 18
4 5 10 20
4 8 13 17
3 4 5 20
7 8 9 13
7 8 10 15
5 8 12 13
3 8 10 19
Iterating
The C++ Standard Library already included
for_each()
to process each element in a given range. C++17 adds a
for_each_n(first, n, func)
algorithm. It calls the given function object (
func
) for each element in the range given by a first iterator (
first
) and a number of elements (
n
). As such, it is very similar to
for_each()
, but
for_each_n()
only processes the first
n
elements of the range.
Here is a simple example that generates a vector of 20 values, then uses
for_each_n()
to print the first 5 values to the console:
using namespace std;
vector<int> data(20);
iota(begin(data), end(data), 1);
for_each_n(begin(data), 5,
[](const auto& value) { cout << value << '\n'; });
Searching
C++17 includes a couple of specialized searchers, all defined in
<functional>
:

default_searcher

boyer_moore_searcher

boyer_moore_horspool_searcher
The BoyerMoore searchers are often used to find a piece of text in a large block of text, and are usually more efficient than the default searcher. In practice, the two BoyerMoore searchers are able to skip certain characters instead of having to compare each individual character. This gives these algorithms a sublinear complexity, making them much faster than the default searcher. See the Wikipedia article for more details of the algorithm.
To use these specialized searchers, you create an instance of one of them and pass that instance as the last parameter to
std::search()
, for example:
using namespace std;
const string haystack = "Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.";
const string needle = "consectetur";
const auto result = search(cbegin(haystack), cend(haystack),
boyer_moore_searcher(cbegin(needle), cend(needle)));
if (result != cend(haystack))
cout << "Found it.\n";
else
cout << "Not found.\n";
If you want to carry out multiple searches on the same range, you can construct a single instance of
std::boyer_moore_searcher
and reuse it, rather than creating a new one for each
std::search()
call.
Generalized Sum Algorithms
Scan
The following generalized sum algorithms have been added to the C++17 Standard Library:

exclusive_scan(first, last, out, init[, bin_op])

inclusive_scan(first, last, out[, bin_op[, init]])

transform_exclusive_scan(first, last, out, init, bin_op, un_op)

transform_inclusive_scan(first, last, out, bin_op, un_op[, init])
Where
bin_op
is a binary operator (
std::plus<>()
by default), and un_op is a unary operator.
All these algorithms calculate a sequence of sums of the elements in a given range [
first
,
last
), denoted as [e
_{
0
}
, e
_{
n
}
). The calculated sums are written to [
out
,
out + (lastfirst)
), denoted as [s
_{
0
}
, s
_{
n
}
). Suppose further that we denote the binary operator (
bin_op
) as ⊕. The
exclusive_scan()
algorithm then calculates the following sequence of sums:
s0 = init
s1 = init ⊕ e0
s2 = init ⊕ e0 ⊕ e1
…
sn1 = init ⊕ e0 ⊕ e1 ⊕ … ⊕ en−2
While
inclusive_scan()
calculates the following sums:
s0 = init ⊕ e0
s1 = init ⊕ e0 ⊕ e1
…
sn1 = init ⊕ e0 ⊕ e1 ⊕ … ⊕ en−1
The only difference is that
inclusive_scan()
includes the i
^{
th
}
element in the i
^{
th
}
sum, while
exclusive_scan()
does not include the i
^{
th
}
element in the i
^{
th
}
sum.
exclusive_scan()
and
inclusive_scan()
are similar to
partial_sum()
. However,
partial_sum()
evaluates everything from left to right, while
exclusive_scan()
and
inclusive_scan()
evaluate everything in a nondeterministic order. That means that the result of these will be nondeterministic if the binary operator that is used is not associative. Because of the nondeterministic order, these algorithms can be executed in parallel by specifying a parallel execution policy, see later.
The sums calculated by
inclusive_scan()
with
init
equal to 0 and an associative binary operator are exactly the same as the sums calculated by
partial_sum()
.
transform_exclusive_scan()
and
transform_inclusive_scan()
are very similar. The only difference is that they apply the given unary operator before calculating the sum. Suppose the unary operator is denoted as a function call
f
(). The
transform_exclusive_scan()
algorithm then calculates the following sums sequence:
s0 = init
s1 = init ⊕ f(e0)
s2 = init ⊕ f(e0) ⊕ f(e1)
…
sn1 = init ⊕ f(e0) ⊕ f(e1) ⊕ … ⊕ f(en−2)
And the transform_inclusive_scan() algorithm calculates the following sums:
s0 = init ⊕ f(e0)
s1 = init ⊕ f(e0) ⊕ f(e1)
…
sn1 = init ⊕ f(e0) ⊕ f(e1) ⊕ … ⊕ f(en−1)
Reduce
Additionally, the following two reduce algorithms have been added:

reduce(first, last[, init[, bin_op]])

transform_reduce(first, last, init, bin_op, un_op)
Where
bin_op
is a binary operator,
std::plus<>()
by default. These algorithms result in a single value, similar to
accumulate()
.
Suppose again that the range [
first
,
last
) is denoted as [e
_{
0
}
, e
_{
n
}
).
reduce()
then calculates the following sum:
init ⊕ e0 ⊕ e1 ⊕ … ⊕ en−1
While transform_reduce() results in the following sum, assuming the unary operator is denoted as a function call f ():
init ⊕ f(e0) ⊕ f(e1) ⊕ … ⊕ f(en−1)
Unlike
accumulate()
,
reduce()
supports parallel execution. The
accumulate()
algorithm always evaluates everything deterministically from left to right, while the evaluation order is nondeterministic for
reduce()
. A consequence is that the result of
reduce()
will be nondeterministic in case the binary operator is not associative or not commutative.
The sum calculated by
reduce()
with
init
equal to 0 is exactly the same as the result of calling
accumulate()
as long as the binary operator that is used is associative and commutative.
Finally, there is another set of overloads for
transform_reduce()
:

transform_reduce(first1, last1, first2, init[, bin_op1, bin_op2])
It requires two ranges: a range [
first1
,
last1
), denoted as [a
_{
0
}
, a
_{
n
}
), and a range starting at
first2
, denoted as [b
_{
0
}
, b
_{
n
}
). Suppose
bin_op1
(
std::plus<>()
by default) is denoted as ⊕, and
bin_op2
(
std::multiplies<>()
by default) is denoted as ⊖, then it calculates the following sum:
init ⊕ (a0 ⊖ b0) ⊕ (a1 ⊖ b1) ⊕ … ⊕ (an1 ⊖ bn1)
Parallel Algorithms
A major addition to the C++17 Standard Library is support for parallel execution of more than 60 of its algorithms, such as
sort()
,
all_of()
,
find()
,
transform()
, …
If a Standard Library algorithm supports parallel execution, then it accepts an execution policy as its first parameter. This policy determines to what degree the algorithm may parallelize or vectorize its execution. Currently, the following policy types and instances are defined in the
std::execution
namespace in the
<execution>
header:
Execution Policy Type  Global Instance  Description 
sequenced_policy 
seq

No parallel execution is allowed. 
parallel_policy 
par

Parallel execution is allowed. 
parallel_unsequenced_policy 
par_unseq

Parallel and vectorized execution is allowed. Execution is also allowed to switch between different threads. 
The
parallel_unsequenced_policy
imposes a lot of restrictions on what the algorithm’s function callbacks are allowed to do. With that policy, the calls to the callbacks are unsequenced. As such, its callbacks are not allowed to perform memory allocation/deallocation, acquire mutexes, and more. The other policies do not have such restrictions, and in fact guarantee that their callback calls are sequenced, although of course they can be in a nondeterministic order. In any case, you are responsible to prevent data races and deadlocks.
Using these parallel policies is straightforward. Here is a quick example that generates a
vector
of 1 billion double values, then uses the
std::transform()
algorithm to calculate the square root of each value in parallel:
using namespace std;
vector<double> data(1'000'000'000);
iota(begin(data), end(data), 1);
transform(execution::par_unseq, begin(data), end(data), begin(data),
[](const auto& value) { return sqrt(value); });
If you run this piece of code on an 8core machine, the CPU load can look as follows. The peak you see on all eight cores is the parallel execution of the call to
std::transform()
.
Utility Functions
C++17 also includes a couple of handy utility functions that are not really algorithms, but still useful to know.
clamp()
std::clamp(value, low, high)
is defined in the
<algorithm>
header. It ensures that a given value is within a given range [
low
,
high
]. The result of calling
clamp()
is:

a reference to
low
ifvalue
<low

a reference to
high
ifvalue
>high

otherwise, a reference to the given
value
One usecase is to clamp audio samples to a 16bit range:
using namespace std;
const int low = 32'768;
const int high = 32'767;
cout << clamp(12'000, low, high) << '\n';
cout << clamp(36'000, low, high) << '\n';
cout << clamp(40'000, low, high) << '\n';
The output of this code snippet is as follows:
32768
7
gcd() and lcm()
std::gcd()
returns the greatest common divisor of two integer types, while
lcm()
returns the least common multiple of two integer types. Both algorithms are defined in the
<numeric>
header.
Using these algorithms is straightforward, for example:
cout << gcd(24, 44) << '\n';
cout << lcm(24, 44) << '\n';
The output is as follows:
Removed Algorithms
C++17 has removed one algorithm: std::random_shuffle(). This algorithm was previously already marked as deprecated by C++14. You should use
std::shuffle()
instead.
Further Reading Material
Have a look at my book, “ Professional C++, 4 ^{ th } Edition ”, published by Wiley/Wrox, for a more indepth overview of all the functionality provided by the C++17 Standard Library. It also includes a description of all language features that have been added by C++17.
Additionally, you can also read more about certain C++17 features on my C++17 blog post series .