diff --git a/README.md b/README.md
index c50cfdf..a3c83a3 100644
--- a/README.md
+++ b/README.md
@@ -1,5 +1,5 @@
- Document Number: N4707
- Date: 2017-11-22
+ Document Number: N4726
+ Date: 2018-02-12
Revises:
Project: Programming Language C++
Project Number: TS 19570
@@ -7,18 +7,13 @@
NVIDIA Corporation
jhoberock@nvidia.com
-# Parallelism TS Editor's Report, post-Albuquerque mailing
+# Parallelism TS Editor's Report, pre-Jacksonville mailing
-N4706 is the proposed working draft of Parallelism TS Version 2. It contains changes to the Parallelism TS as directed by the committee at the Albuquerque meeting.
+N4725 is the proposed working draft of Parallelism TS Version 2. It contains editorial changes to the Parallelism TS.
-N4706 updates the previous draft, N4696, published in the pre-Toronto mailing.
-
-# Technical Changes
-
-* Apply P0776R1 - Rebase the Parallelism TS onto the C++17 Standard
-* Apply P0075R2 - Template Library for Parallel For Loops
+N4725 updates the previous draft, N4706, published in the post-Toronto mailing.
# Acknowledgements
-Thanks to Alisdair Meredith and Pablo Halpern for reviewing these changes.
+Thanks to Pablo Halpern and Matthias Kretz for suggesting editorial changes.
diff --git a/algorithms.html b/algorithms.html
index 554032f..ab1cd37 100644
--- a/algorithms.html
+++ b/algorithms.html
@@ -1,511 +1,103 @@
- Function objects passed into parallel algorithms as objects of type
- Parallel algorithms have template parameters named
- The invocations of element access functions in parallel algorithms invoked with an execution
- policy object of type
- The invocations of element access functions in parallel algorithms invoked with an execution
- policy object of type
- The invocations of element access functions in parallel algorithms invoked with an
- execution policy of type Parallel algorithms
-
-
- In general
- This clause describes components that C++ programs may use to perform operations on containers
- and other sequences in parallel.
-
-
-
-
- Requirements on user-provided function objects
-
- BinaryPredicate
,
- Compare
, and BinaryOperation
shall not directly or indirectly modify
- objects via their arguments.
-
-
- Effect of execution policies on algorithm execution
-
-
- ExecutionPolicy
which describe
- the manner in which the execution of these algorithms may be parallelized and the manner in
- which they apply the element access functions.
-
-
-
- sequential_execution_policy
execute in sequential order in
- the calling thread.
-
-
-
- parallel_execution_policy
are permitted to execute in an
- unordered fashion in either the invoking thread or in a thread implicitly created by the library
- to support parallel algorithm execution. Any such invocations executing in the same thread are
- indeterminately sequenced with respect to each other.
-
-
- using namespace std::experimental::parallel;
-int a[] = {0,1};
-std::vector<int> v;
-for_each(par, std::begin(a), std::end(a), [&](int i) {
- v.push_back(i*2+1);
-});
-
-
- The program above has a data race because of the unsynchronized access to the container
- v
.
-
-
-
-
-
-using namespace std::experimental::parallel;
-std::atomic<int> x = 0;
-int a[] = {1,2};
-for_each(par, std::begin(a), std::end(a), [&](int n) {
- x.fetch_add(1, std::memory_order_relaxed);
- // spin wait for another iteration to change the value of x
- while (x.load(std::memory_order_relaxed) == 1) { }
-});
-
- The above example depends on the order of execution of the iterations, and is therefore
- undefined (may deadlock).
-
-
-
-
-
-
- The above example synchronizes access to object
-using namespace std::experimental::parallel;
-int x=0;
-std::mutex m;
-int a[] = {1,2};
-for_each(par, std::begin(a), std::end(a), [&](int) {
- m.lock();
- ++x;
- m.unlock();
-});
x
ensuring that it is
- incremented correctly.
-
-
- unsequenced_policy
are permitted to execute
- in an unordered fashion in the calling thread, unsequenced with respect to one another
- within the calling thread.
-
-
-
-
-
- The invocations of element access functions in parallel algorithms invoked with an
- executino policy of type vector_policy
are permitted to execute
- in an unordered fashion in the calling thread, unsequenced with respect to one another
- within the calling thread, subject to the sequencing constraints of wavefront application
- (for_loop
or for_loop_strided
.
-
- The invocations of element access functions in parallel algorithms invoked with an execution
- policy of type parallel_vector_execution_policy
- are permitted to execute in an unordered fashion in unspecified threads, and unsequenced
- with respect to one another within each thread.
-
-- -
-- - Since
parallel_vector_execution_policy
allows the execution of element access functions to be
- interleaved on a single thread, synchronization, including the use of mutexes, risks deadlock. Thus the
- synchronization with parallel_vector_execution_policy
is restricted as follows:-- - A standard library function is vectorization-unsafe if it is specified to synchronize with - another function invocation, or another function invocation is specified to synchronize with it, and if - it is not a memory allocation or deallocation function. Vectorization-unsafe standard library functions - may not be invoked by user code called from
parallel_vector_execution_policy
algorithms.-- -
-using namespace std::experimental::parallel; -int x=0; -std::mutex m; -int a[] = {1,2}; -for_each(par_vec, std::begin(a), std::end(a), [&](int) { - m.lock(); - ++x; - m.unlock(); -});- - The above program is invalid because the applications of the function object are not - guaranteed to run on different threads. -
-- -
m.lock
on the same thread, which may deadlock.
- -- -
parallel_execution_policy
or the
- parallel_vector_execution_policy
invocation allow the implementation to fall back to
- sequential execution if the system cannot parallelize an algorithm invocation due to lack of
- resources.
-
- Algorithms invoked with an execution policy object of type execution_policy
- execute internally as if invoked with the contained execution policy object.
-
- The semantics of parallel algorithms invoked with an execution policy object of - implementation-defined type are implementation-defined. -
- - -- For the purposes of this section, an evaluation is a value computation or side effect of - an expression, or an execution of a statement. Initialization of a temporary object is considered a - subexpression of the expression that necessitates the temporary object. -
- -- An evaluation A contains an evaluation B if: - -
- An evaluation A is ordered before an evaluation B if A is deterministically
- sequenced before B.
+ For the purposes of this section, an evaluation is a value computation or side effect of + an expression, or an execution of a statement. Initialization of a temporary object is considered a + subexpression of the expression that necessitates the temporary object. +
-- For an evaluation A ordered before an evaluation B, both contained in the same - invocation of an element access function, A is a vertical antecedent of B if: +
+ An evaluation A contains an evaluation B if: -
goto
statement or asm
declaration that jumps to a statement outside of S, orswitch
statement executed within S that transfers control into a substatement of a nested selection or iteration statement, orthrow
longjmp
.
-
- In the following, Xi and Xj refer to evaluations of the same expression
- or statement contained in the application of an element access function corresponding to the ith and
- jth elements of the input sequence.
+ An evaluation A is ordered before an evaluation B if A is deterministically
+ sequenced before B.
- Horizontally matched is an equivalence relationship between two evaluations of the same expression. An - evaluation Bi is horizontally matched with an evaluation Bj if: +
+ For an evaluation A ordered before an evaluation B, both contained in the same + invocation of an element access function, A is a vertical antecedent of B if: +
- Let f be a function called for each argument list in a sequence of argument lists. - Wavefront application of f requires that evaluation Ai be sequenced - before evaluation Bi if i < j and and: - +
goto
statement or asm
declaration that jumps to a statement outside of S, orswitch
statement executed within S that transfers control into a substatement of a nested selection or iteration statement, orthrow
longjmp
.
ExecutionPolicy
algorithm overloads
- The Parallel Algorithms Library provides overloads for each of the algorithms named in
- Table 1, corresponding to the algorithms with the same name in the C++ Standard Algorithms Library.
-
- For each algorithm in ExecutionPolicy
, which shall be the first template parameter.
-
- In addition, each such overload shall have the new function parameter as the
- first function parameter of type ExecutionPolicy&&
.
-
- Unless otherwise specified, the semantics of ExecutionPolicy
algorithm overloads
- are identical to their overloads without.
-
- Parallel algorithms shall not participate in overload resolution unless
- is_execution_policy<decay_t<ExecutionPolicy>>::value
is true
.
-
adjacent_difference |
- adjacent_find |
- all_of |
- any_of |
-
copy |
- copy_if |
- copy_n |
- count |
-
count_if |
- equal |
- exclusive_scan |
- fill |
-
fill_n |
- find |
- find_end |
- find_first_of |
-
find_if |
- find_if_not |
- for_each |
- for_each_n |
-
generate |
- generate_n |
- includes |
- inclusive_scan |
-
inner_product |
- inplace_merge |
- is_heap |
- is_heap_until |
-
is_partitioned |
- is_sorted |
- is_sorted_until |
- lexicographical_compare |
-
max_element |
- merge |
- min_element |
- minmax_element |
-
mismatch |
- move |
- none_of |
- nth_element |
-
partial_sort |
- partial_sort_copy |
- partition |
- partition_copy |
-
reduce |
- remove |
- remove_copy |
- remove_copy_if |
-
remove_if |
- replace |
- replace_copy |
- replace_copy_if |
-
replace_if |
- reverse |
- reverse_copy |
- rotate |
-
rotate_copy |
- search |
- search_n |
- set_difference |
-
set_intersection |
- set_symmetric_difference |
- set_union |
- sort |
-
stable_partition |
- stable_sort |
- swap_ranges |
- transform |
-
transform_exclusive_scan |
- transform_inclusive_scan |
- transform_reduce |
- uninitialized_copy |
-
uninitialized_copy_n |
- uninitialized_fill |
- uninitialized_fill_n |
- unique |
-
unique_copy |
- - | - | - |
+ In the following, Xi and Xj refer to evaluations of the same expression
+ or statement contained in the application of an element access function corresponding to the ith and
+ jth elements of the input sequence.
- Define GENERALIZED_SUM(op, a1, ..., aN)
as follows:
+ Horizontally matched is an equivalence relationship between two evaluations of the same expression. An
+ evaluation Bi is horizontally matched with an evaluation Bj if:
a1
when N
is 1
op(GENERALIZED_SUM(op, b1, ..., bK)
, GENERALIZED_SUM(op, bM, ..., bN))
where
-
- b1, ..., bN
may be any permutation of a1, ..., aN
and1 < K+1 = M ≤ N
.
- Define GENERALIZED_NONCOMMUTATIVE_SUM(op, a1, ..., aN)
as follows:
+ Let f be a function called for each argument list in a sequence of argument lists.
+ Wavefront application of f requires that evaluation Ai be sequenced
+ before evaluation Bji if i < j and and:
a1
when N
is 1
op(GENERALIZED_NONCOMMUTATIVE_SUM(op, a1, ..., aK), GENERALIZED_NONCOMMUTATIVE_SUM(op, aM,
..., aN)
where 1 < K+1 = M ≤ N
.
- <experimental/algorithm>
synopsis-#include <algorithm> - -namespace std {-namespace std::experimental { -inline namespace parallelism_v2 { -inline namespace v2 { - template<class ExecutionPolicy, - class InputIterator, class Function> - void for_each(ExecutionPolicy&& exec, - InputIterator first, InputIterator last, - Function f); - template<class InputIterator, class Size, class Function> - InputIterator for_each_n(InputIterator first, Size n, - Function f); - template<class ExecutionPolicy, - class InputIterator, class Size, class Function> - InputIterator for_each_n(ExecutionPolicy&& exec, - InputIterator first, Size n, - Function f);+#include <algorithm> + +namespace std::experimental { +inline namespace parallelism_v2 { namespace execution {@@ -548,11 +125,10 @@ Header
template<class T> ordered_update_t<T> ordered_update(T& ref) noexcept; } - // Exposition only: Suppress template argument deduction. template<class T> struct no_deduce { using type = T; }; -template<class T> struct no_dedude_t = typename no_deduce<T>::type; +template<class T> struct no_deduc<experimental/algorithm>
synopsisde_t = typename no_deduce<T>::type;Support for reductions template<class T, class BinaryOperation> @@ -605,18 +181,14 @@ Header
class I, class Size, class S, class... Rest> void for_loop_n_strided(ExecutionPolicy&& exec, I start, Size n, S stride, Rest&&... rest); - -<experimental/algorithm>
synopsis}} } -}
Each of the function templates in this subclause ([parallel.alg.reductions]) returns a reduction object of unspecified type having a reduction value type and encapsulating a reduction identity value for the reduction, a @@ -640,93 +212,83 @@
plus<T>
, incrementing
the accumulator would be consistent with the combiner but doubling it or assigning to it would not.
-
CopyConstructible
and MoveAssignable
. The expression var = combiner(var, var)
shall be well-formed.CopyConstructible
and MoveAssignable
. The expression var = combiner(var, var)
shall be well-formed.T
, reduction identity identity
, combiner function object combiner
, and using the object referenced by var
as its live-out object.T
, reduction identity identity
, combiner function object combiner
, and using the object referenced by var
as its live-out object.CopyConstructible
and MoveAssignable
.T
, reduction identity and combiner operation as specified in table var
as its live-out object.CopyConstructible
and MoveAssignable
.>T
, reduction identity and combiner operation as specified in table var
as its live-out object.Function | -Reduction Identity | -Combiner Operation | +Function | +Reduction Identity | +Combiner Operation |
---|---|---|---|---|---|
reduction_plus |
- T() |
- x + y |
+ reduction_plus |
+ T() |
+ x + y |
reduction_multiplies |
- T(1) |
- x * y |
+ reduction_multiplies |
+ T(1) |
+ x * y |
reduction_bit_and |
- (~T()) |
- X & y |
+ reduction_bit_and |
+ (~T()) |
+ X & y |
reduction_bit_or |
- T() |
- x | y |
+ reduction_bit_or |
+ T() |
+ x | y |
reduction_bit_xor |
- T() |
- x ^ y |
+ reduction_bit_xor |
+ T() |
+ x ^ y |
reduction_min |
- var |
- min(x, y) |
+ reduction_min |
+ var |
+ min(x, y) |
reduction_max |
- var |
- max(x, y) |
+ reduction_max |
+ var |
+ max(x, y) |
y
and sets s
ot the sum of the squares.
+ y
and sets s
toextern int n; extern float x[], y[], a; @@ -739,15 +301,14 @@-Reductions
} );
Each of the function templates in this section return an induction object of unspecified type having an induction value type and encapsulating an initial value i of that type and, optionally, a stride. @@ -763,75 +324,68 @@
remove_cv_t>remove_reference_t>T<<
,
- initial value var
, and (if specified) stride stride
. If T
is an lvalue reference
- to non-const
type, then the object referenced by var
becomes the live-out object for the
- induction object; otherwise there is no live-out object.
-
- remove_cv_t<>remove_reference_t<>T>><<
,
+ initial value var
, and (if specified) stride stride
. If T
is an lvalue reference
+ to non-const
type, then the object referenced by var
becomes the live-out object for the
+ induction object; otherwise there is no live-out object.
+ ExecutionPolicy
, I
shall be an integral type
or meet the requirements of a forward iterator type; otherwise, I
shall be an integral
type or meet the requirements of an input iterator type. Size
shall be an integral type
@@ -843,222 +397,65 @@ ExecutionPolicy
, f shall meet the requirements of CopyConstructible
;
otherwise, f shall meet the requirements of MoveConstructible
.
-
rest
parameter pack. The
- length of the input sequence is:
-
- n
, if specified,
- finish - start
if neither n
nor stride
is specified,
- 1 + (finish-start-1)/stride
if stride
is positive,
- 1 + (start-finish-1)/-stride
.
- start
. Each subsequent element is generated by adding
- stride
to the previous element, if stride
is specified, otherwise by incrementing
- the previous element. I
is an
- iterator type, the iterators in the input sequence are not dereferenced before
- being passed to f.induction
, then the additional argument is the
- induction value for that induction object corresponding to the position of the application of f in the input
- sequence.
- f
to the result of dereferencing every iterator in the range [first,last)
.
-
- first
satisfies the requirements of a mutable iterator, f
may
- apply nonconstant functions through the dereferenced iterator.
- f
exactly last - first
times.f
returns a result, the result is ignored.for_each
does not return a copy of
- its Function
parameter, since parallelization may not permit efficient state
- accumulation.
- for_each
requires
- Function
to meet the requirements of CopyConstructible
.
- rest
parameter pack. The
+ length of the input sequence is:
- n
, if specified,
+ Function
shall meet the requirements of MoveConstructible
-
- Function
need not meet the requirements of CopyConstructible
.
- finish - start
if neither n
nor stride
is specified,
+ f
to the result of dereferencing every iterator in the range
- [first,first + n)
, starting from first
and proceeding to first + n - 1
.
-
- first
satisfies the requirements of a mutable iterator,
- f
may apply nonconstant functions through the dereferenced iterator.
- 1 + (finish-start-1)/stride
if stride
is positive,
+ first + n
for non-negative values of n
and first
for negative values.
- 1 + (start-finish-1)/-stride
.
+ f
returns a result, the result is ignored.
- start
. Each subsequent element is generated by adding
+ stride
to the previous element, if stride
is specified, otherwise by incrementing
+ the previous element. advance
distance
I
is an
+ iterator type, the iterators in the input sequence are not dereferenced before
+ being passed to f.rest
f
to the result of dereferencing every iterator in the range
- [first,first + n)
, starting from first
and proceeding to first + n - 1
.
-
- first
satisfies the requirements of a mutable iterator,
- f
may apply nonconstant functions through the dereferenced iterator.
- induction
, then the additional argument is the
+ induction value for that induction object corresponding to the position of the application of f in the input
+ sequence.
+ first + n
for non-negative values of n
and first
for negative values.f
returns a result, the result is ignored.for_each_n
requires
- Function
to meet the requirements of CopyConstructible
.
- f
.
- f
returns a result, the result is ignored.f
exits via an exception, then terminate
will be called, consistent
with all other potentially-throwing operations invoked with vector_policy
execution.
@@ -1167,7 +558,7 @@
- An object of type ordered_update_t
is a proxy for an object of type T
+ An object of type ><T<>ordered_update_t<T>
is a proxy for an object of type T
intended to be used within a parallel application of an element access function using a
policy object of type vector_policy
. Simple increments, assignments, and compound
assignments to the object are forwarded to the proxied object, but are sequenced as though
@@ -1195,536 +586,4 @@
<experimental/numeric>
synopsis-namespace std { -namespace experimental { -namespace parallel { -inline namespace v2 { - template<class InputIterator> - typename iterator_traits<InputIterator>::value_type - reduce(InputIterator first, InputIterator last); - template<class ExecutionPolicy, - class InputIterator> - typename iterator_traits<InputIterator>::value_type - reduce(ExecutionPolicy&& exec, - InputIterator first, InputIterator last); - template<class InputIterator, class T> - T reduce(InputIterator first, InputIterator last, T init); - template<class ExecutionPolicy, - class InputIterator, class T> - T reduce(ExecutionPolicy&& exec, - InputIterator first, InputIterator last, T init); - template<class InputIterator, class T, class BinaryOperation> - T reduce(InputIterator first, InputIterator last, T init, - BinaryOperation binary_op); - template<class ExecutionPolicy, class InputIterator, class T, class BinaryOperation> - T reduce(ExecutionPolicy&& exec, - InputIterator first, InputIterator last, T init, - BinaryOperation binary_op); - - template<class InputIterator, class OutputIterator, - class T> - OutputIterator - exclusive_scan(InputIterator first, InputIterator last, - OutputIterator result, - T init); - template<class ExecutionPolicy, - class InputIterator, class OutputIterator, - class T> - OutputIterator - exclusive_scan(ExecutionPolicy&& exec, - InputIterator first, InputIterator last, - OutputIterator result, - T init); - template<class InputIterator, class OutputIterator, - class T, class BinaryOperation> - OutputIterator - exclusive_scan(InputIterator first, InputIterator last, - OutputIterator result, - T init, BinaryOperation binary_op); - template<class ExecutionPolicy, - class InputIterator, class OutputIterator, - class T, class BinaryOperation> - OutputIterator - exclusive_scan(ExecutionPolicy&& exec, - InputIterator first, InputIterator last, - OutputIterator result, - T init, BinaryOperation binary_op); - - template<class InputIterator, class OutputIterator> - OutputIterator - inclusive_scan(InputIterator first, InputIterator last, - OutputIterator result); - template<class ExecutionPolicy, - class InputIterator, class OutputIterator> - OutputIterator - inclusive_scan(ExecutionPolicy&& exec, - InputIterator first, InputIterator last, - OutputIterator result); - template<class InputIterator, class OutputIterator, - class BinaryOperation> - OutputIterator - inclusive_scan(InputIterator first, InputIterator last, - OutputIterator result, - BinaryOperation binary_op); - template<class ExecutionPolicy, - class InputIterator, class OutputIterator, - class BinaryOperation> - OutputIterator - inclusive_scan(ExecutionPolicy&& exec, - InputIterator first, InputIterator last, - OutputIterator result, - BinaryOperation binary_op); - template<class InputIterator, class OutputIterator, - class BinaryOperation, class T> - OutputIterator - inclusive_scan(InputIterator first, InputIterator last, - OutputIterator result, - BinaryOperation binary_op, T init); - template<class ExecutionPolicy, - class InputIterator, class OutputIterator, - class BinaryOperation, class T> - OutputIterator - inclusive_scan(ExecutionPolicy&& exec, - InputIterator first, InputIterator last, - OutputIterator result, - BinaryOperation binary_op, T init); - - template<class InputIterator, class UnaryOperation, - class T, class BinaryOperation> - T transform_reduce(InputIterator first, InputIterator last, - UnaryOperation unary_op, - T init, BinaryOperation binary_op); - template<class ExecutionPolicy, - class InputIterator, class UnaryOperation, - class T, class BinaryOperation> - T transform_reduce(ExecutionPolicy&& exec, - InputIterator first, InputIterator last, - UnaryOperation unary_op, - T init, BinaryOperation binary_op); - - template<class InputIterator, class OutputIterator, - class UnaryOperation, class T, class BinaryOperation> - OutputIterator - transform_exclusive_scan(InputIterator first, InputIterator last, - OutputIterator result, - UnaryOperation unary_op, - T init, BinaryOperation binary_op); - template<class ExecutionPolicy, - class InputIterator, class OutputIterator, - class UnaryOperation, class T, class BinaryOperation> - OutputIterator - transform_exclusive_scan(ExecutionPolicy&& exec, - InputIterator first, InputIterator last, - OutputIterator result, - UnaryOperation unary_op, - T init, BinaryOperation binary_op); - - template<class InputIterator, class OutputIterator, - class UnaryOperation, class BinaryOperation> - OutputIterator - transform_inclusive_scan(InputIterator first, InputIterator last, - OutputIterator result, - UnaryOperation unary_op, - BinaryOperation binary_op); - template<class ExecutionPolicy, - class InputIterator, class OutputIterator, - class UnaryOperation, class BinaryOperation> - OutputIterator - transform_inclusive_scan(ExecutionPolicy&& exec, - InputIterator first, InputIterator last, - OutputIterator result, - UnaryOperation unary_op, - BinaryOperation binary_op); - - template<class InputIterator, class OutputIterator, - class UnaryOperation, class BinaryOperation, class T> - OutputIterator - transform_inclusive_scan(InputIterator first, InputIterator last, - OutputIterator result, - UnaryOperation unary_op, - BinaryOperation binary_op, T init); - template<class ExecutionPolicy, - class InputIterator, class OutputIterator, - class UnaryOperation, class BinaryOperation, class T> - OutputIterator - transform_inclusive_scan(ExecutionPolicy&& exec, - InputIterator first, InputIterator last, - OutputIterator result, - UnaryOperation unary_op, - BinaryOperation binary_op, T init); -} -} -} -} --
reduce(first, last, typename iterator_traits<InputIterator>::value_type{})
.reduce(first, last, init, plus<>())
.GENERALIZED_SUM(binary_op, init, *first, ..., *(first + (last - first) - 1))
.binary_op
shall not invalidate iterators or subranges, nor modify elements in the
- range [first,last)
.last - first
) applications of binary_op
.reduce
and accumulate
is that the behavior
- of reduce
may be non-deterministic for non-associative or non-commutative binary_op
.
- exclusive_scan(first, last, result, init, plus<>())
.i
in [result,result + (last - first))
the
- value of GENERALIZED_NONCOMMUTATIVE_SUM(binary_op, init, *first, ..., *(first + (i - result) - 1))
.
- result
.binary_op
shall not invalidate iterators or subranges, nor modify elements in the
- ranges [first,last)
or [result,result + (last - first))
.
- last - first
) applications of binary_op
.exclusive_scan
and inclusive_scan
is that
- exclusive_scan
excludes the i
th input element from the i
th
- sum. If binary_op
is not mathematically associative, the behavior of
- exclusive_scan
may be non-deterministic.
- inclusive_scan(first, last, result, plus<>())
.
- i
in [result,result + (last - first))
the value of
- GENERALIZED_NONCOMMUTATIVE_SUM(binary_op, *first, ..., *(first + (i - result)))
or
- GENERALIZED_NONCOMMUTATIVE_SUM(binary_op, init, *first, ..., *(first + (i - result)))
- if init
is provided.
- result
.
- binary_op
shall not invalidate iterators or subranges, nor modify elements in the
- ranges [first,last)
or [result,result + (last - first))
.
- last - first
) applications of binary_op
.exclusive_scan
and inclusive_scan
is that
- inclusive_scan
includes the i
th input element in the i
th sum.
- If binary_op
is not mathematically associative, the behavior of
- inclusive_scan
may be non-deterministic.
- GENERALIZED_SUM(binary_op, init, unary_op(*first), ..., unary_op(*(first + (last - first) -
1)))
.
- unary_op
nor binary_op
shall invalidate subranges, or modify elements in the range [first,last)
last - first
) applications each of unary_op
and binary_op
.transform_reduce
does not apply unary_op
to init
.i
in [result,result + (last - first))
the value of
- GENERALIZED_NONCOMMUTATIVE_SUM(binary_op, init, unary_op(*first), ..., unary_op(*(first + (i
- result) - 1)))
.
- result
.unary_op
nor binary_op
shall invalidate iterators or subranges, or modify elements in the
- ranges [first,last)
or [result,result + (last - first))
.
- last - first
) applications each of unary_op
and binary_op
.transform_exclusive_scan
and transform_inclusive_scan
is that transform_exclusive_scan
- excludes the ith input element from the ith sum. If binary_op
is not mathematically associative, the behavior of
- transform_exclusive_scan
may be non-deterministic. transform_exclusive_scan
does not apply unary_op
to init
.
- i
in [result,result + (last - first))
the value of
- GENERALIZED_NONCOMMUTATIVE_SUM(binary_op, unary_op(*first), ..., unary_op(*(first + (i -
result))))
or
- GENERALIZED_NONCOMMUTATIVE_SUM(binary_op, init, unary_op(*first), ..., unary_op(*(first + (i
- result))))
- if init
is provided.
- result
.unary_op
nor binary_op
shall invalidate iterators or subranges, or modify elements in the ranges [first,last)
- or [result,result + (last - first))
.
- last - first
) applications each of unary_op
and binary_op
.transform_exclusive_scan
and transform_inclusive_scan
is that transform_inclusive_scan
- includes the ith input element from the ith sum. If binary_op
is not mathematically associative, the behavior of
- transform_inclusive_scan
may be non-deterministic. transform_inclusive_scan
does not apply unary_op
to init
.
-
- During the execution of a standard parallel algorithm,
- if temporary memory resources are required and none are available,
- the algorithm throws a std::bad_alloc
exception.
-
- During the execution of a standard parallel algorithm, if the invocation of an element access function - exits via an uncaught exception, the behavior of the program is determined by the type of - execution policy used to invoke the algorithm: - -
parallel_vector_execution_policy
, unsequenced_policy
, or vector_policy
,
- std::terminate
shall be called.
- sequential_execution_policy
or
- parallel_execution_policy
, the execution of the algorithm exits via an
- exception. The exception shall be an exception_list
containing all uncaught exceptions thrown during
- the invocations of element access functions, or optionally the uncaught exception if there was only one.-- -
for_each
is executed sequentially,
- if an invocation of the user-provided function object throws an exception, for_each
can exit via the uncaught exception, or throw an exception_list
containing the original exception.
- -- -
std::bad_alloc
, all exceptions thrown during the execution of
- the algorithm are communicated to the caller. It is unspecified whether an algorithm implementation will "forge ahead" after
- encountering and capturing a user exception.
- --
std::bad_alloc
exception even if one or more
- user-provided function objects have exited via an exception. For example, this can happen when an algorithm fails to allocate memory while
- creating or adding elements to the exception_list
object.
- <experimental/exception_list>
synopsis-namespace std {-namespace std::experimental { -inline namespace parallelism_v2 { -inline namespace v2 {+namespace std::experimental { +inline namespace parallelism_v2 { class exception_list : public exception { public: -typedef unspecified iterator;- using iterator = unspecified; + using iterator = unspecified; size_t size() const noexcept; iterator begin() const noexcept; @@ -73,20 +18,16 @@Header
const char* what() const noexcept override; }; -<experimental/exception_list>
synopsis}} } -}
- The class exception_list
owns a sequence of exception_ptr
objects. The parallel
- algorithms may use the
+ The class exception_list
to communicate uncaught exceptions encountered during parallel execution to the
- caller of the algorithm.exception_list
owns a sequence of exception_ptr
objects.
- The type exception_list::iterator
shall fulfill the requirements of
+ The type exception_list::iterator
fulfillsshall fulfill the requirements of
ForwardIterator
.
- This clause describes classes that are execution policy types. An object - of an execution policy type indicates the kinds of parallelism allowed in the execution - of an algorithm and expresses the consequent requirements on the element - access functions. -
-std::vector<int> v = ... - -// standard sequential sort -std::sort(v.begin(), v.end()); - -using namespace std::experimental::parallel; - -// explicitly sequential sort -sort(seq, v.begin(), v.end()); - -// permitting parallel execution -sort(par, v.begin(), v.end()); - -// permitting vectorization as well -sort(par_vec, v.begin(), v.end()); - -// sort with dynamically-selected execution -size_t threshold = ... -execution_policy exec = seq; -if (v.size() > threshold) -{ - exec = par; -} - -sort(exec, v.begin(), v.end()); --
--
<experimental/execution_policy>
synopsis<experimental/execution>
synopsis-#include <execution> - -namespace std {-namespace std::experimental { -inline namespace parallelism_v2 { -inline namespace v2 { -+namespace std::experimental { +inline namespace parallelism_v2 { namespace execution {- template<class T> struct is_execution_policy; - template<class T> constexpr bool is_execution_policy_v = is_execution_policy<T>::value; - - - class sequential_execution_policy; - - - class parallel_execution_policy; +#include <execution> - - class parallel_vector_execution_policy; - - - class execution_policy; - class unsequenced_policy; @@ -80,75 +16,14 @@ Header
<experimental/execution
synopsi_policy>class vector_policy; - + inline constexpr unsequenced_policy unseq{ unspecified }; - inline constexpr parallel_policy par{ unspecified }; + inline constexpr vector_policy vec parallel_policy par{ unspecified }; } -}} } -}
-template<class T> struct is_execution_policy { see below }; -- -
is_execution_policy
can be used to detect parallel execution policies for the purpose of excluding function signatures from otherwise ambiguous overload resolution participation.
is_execution_policy<T>
shall be a UnaryTypeTrait with a BaseCharacteristic of true_type
if T
is the type of a standard or implementation-defined execution policy, otherwise false_type
.
-
-
-- -
The behavior of a program that adds specializations for is_execution_policy
is undefined.
-class sequential_execution_policy{ unspecified }; -- -
The class sequential_execution_policy
is an execution policy type used as a unique type to disambiguate parallel algorithm overloading and require that a parallel algorithm's execution may not be parallelized.
-class parallel_execution_policy{ unspecified }; -- -
The class parallel_execution_policy
is an execution policy type used as a unique type to disambiguate parallel algorithm overloading and indicate that a parallel algorithm's execution may be parallelized.
-class parallel_vector_execution_policy{ unspecified }; -- -
The class class parallel_vector_execution_policy
is an execution policy type used as a unique type to disambiguate parallel algorithm overloading and indicate that a parallel algorithm's execution may be vectorized and parallelized.
The class unsequenced_policy
is an execution policy type used as a unique type to disambiguate parallel algorithm overloading and indicate that a parallel algorithm's execution may be vectorized, e.g., executed on a single thread using instructions that operate on multiple data items.
The invocations of element access functions in parallel algorithms invoked with an execution policy of type unsequenced_policy
are permitted to execute in an unordered fashion in the calling thread, unsequenced with respect to one another within the calling thread.
-
The invocations of element access functions in parallel algorithms invoked with an execution policy of type unsequenced_policy
are permitted to execute in an unordered fashion in the calling thread, unsequenced with respect to one another within the calling thread.
+
During the execution of a parallel algorithm with the experimental::execution::unsequenced_policy
policy, if the invocation of an element access function exits via an uncaught exception, terminate()
shall be called.
During the execution of a parallel algorithm with the experimental::execution::unsequenced_policy
policy, if the invocation of an element access function exits via an uncaught exception, terminate()
willshall be called.
The class vector_policy
is an execution policy type used as a unique type to disambiguate parallel algorithm overloading and indicate that a parallel algorithm's execution may be vectorized. Additionally, such vectorization will result in an execution that respects the sequencing constraints of wavefront application ([parallel.alg.general.wavefront]). unsequenced_policy
, for example.
The invocations of element access functions in parallel algorithms invoked with an execution policy of type vector_policy
are permitted to execute in unordered fashion in the calling thread, unsequenced with respect to one another within the calling thread, subject to the sequencing constraints of wavefront application (for_loop
or for_loop_strided
.
During the execution of a parallel algorithm with the experimental::execution::vector_policy
policy, if the invocation of an element access function exits via an uncaught exception, terminate()
shall be called.
-class execution_policy -{ - public: -- -- template<class T> execution_policy(const T& exec); - template<class T> execution_policy& operator=(const T& exec); - - - const type_info& type() const noexcept; - template<class T> T* get() noexcept; - template<class T> const T* get() const noexcept; -}; -
The class execution_policy
is a container for execution policy objects.
- execution_policy
allows dynamic control over standard algorithm execution.
std::vector<float> sort_me = ... - -using namespace std::experimental::parallel; -execution_policy exec = seq; - -if(sort_me.size() > threshold) -{ - exec = std::par; -} - -std::sort(exec, std::begin(sort_me), std::end(sort_me));-
Objects of type execution_policy
shall be constructible and assignable from objects of
- type T
for which is_execution_policy<T>::value
is true
.
execution_policy
construct/assignexecution_policy
object with a copy of exec
's state.is_execution_policy<T>::value
is true
.
- exec
's state to *this
.*this
execution_policy
object accesstypeid(T)
, such that T
is the type of the execution policy object contained by *this
.target_type() == typeid(T)
, a pointer to the stored execution policy object; otherwise a null pointer.The invocations of element access functions in parallel algorithms invoked with an execution policy of type vector_policy
are permitted to execute in unordered fashion in the calling thread, unsequenced with respect to one another within the calling thread, subject to the sequencing constraints of wavefront application (for_loop
, for_loop_n, or for_loop_strided
, or for_loop_strided_n
.
is_execution_policy<T>::value
is true
.During the execution of a parallel algorithm with the experimental::execution::vector_policy
policy, if the invocation of an element access function exits via an uncaught exception, terminate()
willshall be called.
--constexpr sequential_execution_policy seq{}; -constexpr parallel_execution_policy par{}; -constexpr parallel_vector_execution_policy par_vec{};-constexpr execution::unsequenced_policy unseq{}; -constexpr execution::vector_policy vec{}; +inline constexpr execution::unsequenced_policy unseq{}; +inline constexpr execution::vector_policy vec{};
The header <experimental/execution
declares a global object associated with each type of execution policy defined by this Technical Specification._policy>
The header <experimental/execution>
declares a global object associated with each type of execution policy defined by this Technical Specification.
std
. Unless otherwise specified, all
components described in this Technical Specification are declared in namespace
- std::experimental::parallelism_v2parallel::v2
.
+ std::experimental::parallelism_v2
.
std
.
@@ -15,7 +15,7 @@ Unless otherwise specified, references to such entities described in this
Technical Specification are assumed to be qualified with
- std::experimental::parallelism_v2
, and references to entities described in the C++
+ parallel::v2std::experimental::parallelism_v2
, and references to entities described in the C++
Standard Library are assumed to be qualified with std::
.
Extensions that are expected to eventually be added to an existing header @@ -43,29 +43,14 @@
__cpp_lib_experimental_parallel_algorithm
<experimental/algorithm>
<experimental/exception_list>
<experimental/execution_policy>
<experimental/numeric>
- __cpp_lib_experimental_parallel_task_block
<experimental/exception_list>
<experimental/exception_list>
<experimental/task_block>
__cpp_lib_experimental_execution_vector_policy
<experimental/algorithm>
<experimental/execution>
__cpp_lib_experimental_parallel_for_loop
<experimental/algorithm>
__cpp_lib_experimental_parallel_for_loop
<experimental/algorithm>
ISO/IEC 14882:2017— is herein called the C++ Standard.
- The library described in ISO/IEC 14882:2017— clauses 20-3317-30 is herein called
+
ISO/IEC 14882:2017 is herein called the C++ Standard.
+ The library described in ISO/IEC 14882:2017 clauses 20-33 is herein called
the C++ Standard Library. The C++ Standard Library components described in
- ISO/IEC 14882:2017— clauses 28, 29.8 and 23.10.1025, 26.7 and 20.7.2 are herein called the C++ Standard
+ ISO/IEC 14882:2017 clauses 28, 29.8 and 23.10.10 are herein called the C++ Standard
Algorithms Library.
Unless otherwise specified, the whole of the C++ Standard's Library
- introduction (C++14 §20) is included into this
+ introduction (C++14 §20) is included into this
Technical Specification by reference.
<experimental/task_block>
synopsis-@@ -29,21 +25,17 @@namespace std {-namespace std::experimental { -inline namespace parallelism_v2 { -inline namespace v2 {+namespace std::experimental { +inline namespace parallelism_v2 { class task_cancelled_exception; class task_block; template<class F> - void define_task_block(F&& f); + void define_task_block(F&& f); template<class f> - void define_task_block_restore_thread(F&& f); -}+ void define_task_block_restore_thread(F&& f); } } -}
<experimental/task_block>
synopsistask_cancelled_exception
-namespace std {-namespace std::experimental { -inline namespace parallelism_v2 { -inline namespace v2 {+namespace std::experimental { +inline namespace parallelism_v2 { class task_cancelled_exception : public exception { public: task_cancelled_exception() noexcept; - virtual const char* what() const noexcept override; + virtual const char* what() const noexcept override; }; -}} } -}
@@ -69,10 +61,8 @@
task_cancelled_exception
member function what
task_block
-namespace std {-namespace std::experimental { -inline namespace parallelism_v2 { -inline namespace v2 {+namespace std::experimental { +inline namespace parallelism_v2 { class task_block { @@ -80,19 +70,17 @@Class
~task_block(); public: - task_block(const task_block&) = delete; - task_block& operator=(const task_block&) = delete; - void operator&() const = delete; + task_block(const task_block&) = delete; + task_block& operator=(const task_block&) = delete; + void operator&() const = delete; template<class F> - void run(F&& f); + void run(F&& f); void wait(); }; -task_block
}} } -}
diff --git a/terms_and_definitions.html b/terms_and_definitions.html
index 86a659d..a725262 100644
--- a/terms_and_definitions.html
+++ b/terms_and_definitions.html
@@ -1,7 +1,6 @@
For the purposes of this document, the terms and definitions given in the C++ Standard and the following apply. A parallel algorithm is a function template described by this Technical Specification declared in namespace
- Parallel algorithms access objects indirectly accessible via their arguments by invoking the following functions:
-
- Terms and definitions
-
-
-
-Terms and definitions
-
std::experimental::parallel::v2
with a formal template parameter named ExecutionPolicy
.
-
-
- These functions are herein called element access functions.
-
- sort
function may invoke the following element access functions:
-
-
-
- RandomAccessIterator
.
- swap
function on the elements of the sequence (as per 25.4.1.1 [sort]/2).
- Compare
function object.
-