llvm-project/clang/docs/OpenMPSupport.rst

Ignoring revisions in .git-blame-ignore-revs. Click here to bypass and see the normal blame view.

363 lines
51 KiB
ReStructuredText
Raw Normal View History

.. raw:: html
<style type="text/css">
.none { background-color: #FFCCCC }
.part { background-color: #FFFF99 }
.good { background-color: #CCFF99 }
</style>
.. role:: none
.. role:: part
.. role:: good
.. contents::
:local:
==============
OpenMP Support
==============
Clang fully supports OpenMP 4.5, almost all of 5.0 and most of 5.1/2.
Clang supports offloading to X86_64, AArch64, PPC64[LE], NVIDIA GPUs (all models) and AMD GPUs (all models).
In addition, the LLVM OpenMP runtime `libomp` supports the OpenMP Tools
Interface (OMPT) on x86, x86_64, AArch64, and PPC64 on Linux, Windows, and macOS.
OMPT is also supported for NVIDIA and AMD GPUs.
For the list of supported features from OpenMP 5.0 and 5.1
see `OpenMP implementation details`_ and `OpenMP 51 implementation details`_.
General improvements
====================
- New collapse clause scheme to avoid expensive remainder operations.
Compute loop index variables after collapsing a loop nest via the
collapse clause by replacing the expensive remainder operation with
multiplications and additions.
- When using the collapse clause on a loop nest the default behavior
is to automatically extend the representation of the loop counter to
64 bits for the cases where the sizes of the collapsed loops are not
known at compile time. To prevent this conservative choice and use
at most 32 bits, compile your program with the
`-fopenmp-optimistic-collapse`.
GPU devices support
===================
Data-sharing modes
------------------
Clang supports two data-sharing models for Cuda devices: `Generic` and `Cuda`
modes. The default mode is `Generic`. `Cuda` mode can give an additional
performance and can be activated using the `-fopenmp-cuda-mode` flag. In
`Generic` mode all local variables that can be shared in the parallel regions
are stored in the global memory. In `Cuda` mode local variables are not shared
between the threads and it is user responsibility to share the required data
between the threads in the parallel regions. Often, the optimizer is able to
reduce the cost of `Generic` mode to the level of `Cuda` mode, but the flag,
as well as other assumption flags, can be used for tuning.
Features not supported or with limited support for Cuda devices
---------------------------------------------------------------
- Cancellation constructs are not supported.
- Doacross loop nest is not supported.
- User-defined reductions are supported only for trivial types.
- Nested parallelism: inner parallel regions are executed sequentially.
- Debug information for OpenMP target regions is supported, but sometimes it may
be required to manually specify the address class of the inspected variables.
In some cases the local variables are actually allocated in the global memory,
but the debug info may be not aware of it.
.. _OpenMP implementation details:
OpenMP 5.0 Implementation Details
=================================
The following table provides a quick overview over various OpenMP 5.0 features
and their implementation status. Please post on the
`Discourse forums (Runtimes - OpenMP category)`_ for more
information or if you want to help with the
implementation.
+------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
|Category | Feature | Status | Reviews |
+==============================+==============================================================+==========================+=======================================================================+
| loop | support != in the canonical loop form | :good:`done` | D54441 |
+------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
| loop | #pragma omp loop (directive) | :part:`partial` | D145823 (combined forms) |
+------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
| loop | #pragma omp loop bind | :part:`worked on` | D144634 (needs review) |
+------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
| loop | collapse imperfectly nested loop | :good:`done` | |
+------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
| loop | collapse non-rectangular nested loop | :good:`done` | |
+------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
| loop | C++ range-base for loop | :good:`done` | |
+------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
| loop | clause: if for SIMD directives | :good:`done` | |
+------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
| loop | inclusive scan (matching C++17 PSTL) | :good:`done` | |
+------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
| memory management | memory allocators | :good:`done` | r341687,r357929 |
+------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
| memory management | allocate directive and allocate clause | :good:`done` | r355614,r335952 |
+------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
| OMPD | OMPD interfaces | :good:`done` | https://reviews.llvm.org/D99914 (Supports only HOST(CPU) and Linux |
+------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
| OMPT | OMPT interfaces (callback support) | :good:`done` | |
+------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
| thread affinity | thread affinity | :good:`done` | |
+------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
| task | taskloop reduction | :good:`done` | |
+------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
| task | task affinity | :part:`not upstream` | https://github.com/jklinkenberg/openmp/tree/task-affinity |
+------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
| task | clause: depend on the taskwait construct | :good:`done` | D113540 (regular codegen only) |
+------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
| task | depend objects and detachable tasks | :good:`done` | |
+------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
| task | mutexinoutset dependence-type for tasks | :good:`done` | D53380,D57576 |
+------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
| task | combined taskloop constructs | :good:`done` | |
+------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
| task | master taskloop | :good:`done` | |
+------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
| task | parallel master taskloop | :good:`done` | |
+------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
| task | master taskloop simd | :good:`done` | |
+------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
| task | parallel master taskloop simd | :good:`done` | |
+------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
| SIMD | atomic and simd constructs inside SIMD code | :good:`done` | |
+------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
| SIMD | SIMD nontemporal | :good:`done` | |
+------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
| device | infer target functions from initializers | :part:`worked on` | |
+------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
| device | infer target variables from initializers | :good:`done` | D146418 |
+------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
| device | OMP_TARGET_OFFLOAD environment variable | :good:`done` | D50522 |
+------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
| device | support full 'defaultmap' functionality | :good:`done` | D69204 |
+------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
| device | device specific functions | :good:`done` | |
+------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
| device | clause: device_type | :good:`done` | |
+------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
| device | clause: extended device | :good:`done` | |
+------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
| device | clause: uses_allocators clause | :good:`done` | |
+------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
| device | clause: in_reduction | :part:`worked on` | r308768 |
+------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
| device | omp_get_device_num() | :good:`done` | D54342,D128347 |
+------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
| device | structure mapping of references | :none:`unclaimed` | |
+------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
| device | nested target declare | :good:`done` | D51378 |
+------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
| device | implicitly map 'this' (this[:1]) | :good:`done` | D55982 |
+------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
| device | allow access to the reference count (omp_target_is_present) | :good:`done` | |
+------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
| device | requires directive | :good:`done` | |
+------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
| device | clause: unified_shared_memory | :good:`done` | D52625,D52359 |
+------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
| device | clause: unified_address | :part:`partial` | |
+------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
| device | clause: reverse_offload | :part:`partial` | D52780,D155003 |
+------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
| device | clause: atomic_default_mem_order | :good:`done` | D53513 |
+------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
| device | clause: dynamic_allocators | :part:`unclaimed parts` | D53079 |
+------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
| device | user-defined mappers | :good:`done` | D56326,D58638,D58523,D58074,D60972,D59474 |
+------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
| device | mapping lambda expression | :good:`done` | D51107 |
+------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
| device | clause: use_device_addr for target data | :good:`done` | |
+------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
| device | support close modifier on map clause | :good:`done` | D55719,D55892 |
+------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
| device | teams construct on the host device | :good:`done` | r371553 |
+------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
| device | support non-contiguous array sections for target update | :good:`done` | |
+------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
| device | pointer attachment | :good:`done` | |
+------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
| atomic | hints for the atomic construct | :good:`done` | D51233 |
+------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
| base language | C11 support | :good:`done` | |
+------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
| base language | C++11/14/17 support | :good:`done` | |
+------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
| base language | lambda support | :good:`done` | |
+------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
| misc | array shaping | :good:`done` | D74144 |
+------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
| misc | library shutdown (omp_pause_resource[_all]) | :good:`done` | D55078 |
+------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
| misc | metadirectives | :part:`mostly done` | D91944 |
+------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
| misc | conditional modifier for lastprivate clause | :good:`done` | |
+------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
| misc | iterator and multidependences | :good:`done` | |
+------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
| misc | depobj directive and depobj dependency kind | :good:`done` | |
+------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
| misc | user-defined function variants | :good:`done`. | D67294, D64095, D71847, D71830, D109635 |
+------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
| misc | pointer/reference to pointer based array reductions | :good:`done` | |
+------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
| misc | prevent new type definitions in clauses | :good:`done` | |
+------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
| memory model | memory model update (seq_cst, acq_rel, release, acquire,...) | :good:`done` | |
+------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
.. _OpenMP 51 implementation details:
OpenMP 5.1 Implementation Details
=================================
The following table provides a quick overview over various OpenMP 5.1 features
and their implementation status.
Please post on the
`Discourse forums (Runtimes - OpenMP category)`_ for more
information or if you want to help with the
implementation.
+------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
|Category | Feature | Status | Reviews |
+==============================+==============================================================+==========================+=======================================================================+
| atomic | 'compare' clause on atomic construct | :good:`done` | D120290, D120007, D118632, D120200, D116261, D118547, D116637 |
+------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
| atomic | 'fail' clause on atomic construct | :part:`worked on` | D123235 (in progress) |
+------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
[OpenMP] Support OpenMP 5.1 attributes OpenMP 5.1 added support for writing OpenMP directives using [[]] syntax in addition to using #pragma and this introduces support for the new syntax. In OpenMP, the attributes take one of two forms: [[omp::directive(...)]] or [[omp::sequence(...)]]. A directive attribute contains an OpenMP directive clause that is identical to the analogous #pragma syntax. A sequence attribute can contain either sequence or directive arguments and is used to ensure that the attributes are processed sequentially for situations where the order of the attributes matter (remember: https://eel.is/c++draft/dcl.attr.grammar#4.sentence-4). The approach taken here is somewhat novel and deserves mention. We could refactor much of the OpenMP parsing logic to work for either pragma annotation tokens or for attribute clauses. It would be a fair amount of effort to share the logic for both, but it's certainly doable. However, the semantic attribute system is not designed to handle the arbitrarily complex arguments that OpenMP directives contain. Adding support to thread the novel parsed information until we can produce a semantic attribute would be considerably more effort. What's more, existing OpenMP constructs are not (often) represented as semantic attributes. So doing this through Attr.td would be a massive undertaking that would likely only benefit OpenMP and comes with additional risks. Rather than walk down that path, I am taking advantage of the fact that the syntax of the directives within the directive clause is identical to that of the #pragma form. Once the parser recognizes that we're processing an OpenMP attribute, it caches all of the directive argument tokens and then replays them as though the user wrote a pragma. This reuses the same OpenMP parsing and semantic logic directly, but does come with a risk if the OpenMP committee decides to purposefully diverge their pragma and attribute syntaxes. So, despite this being a novel approach that does token replay, I think it's actually a better approach than trying to do this through the declarative syntax in Attr.td.
2021-07-12 06:51:19 -04:00
| base language | C++ attribute specifier syntax | :good:`done` | D105648 |
2019-12-13 18:40:54 -06:00
+------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
| device | 'present' map type modifier | :good:`done` | D83061, D83062, D84422 |
+------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
| device | 'present' motion modifier | :good:`done` | D84711, D84712 |
+------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
| device | 'present' in defaultmap clause | :good:`done` | D92427 |
+------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
| device | map clause reordering based on 'present' modifier | :none:`unclaimed` | |
+------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
| device | device-specific environment variables | :none:`unclaimed` | |
+------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
| device | omp_target_is_accessible routine | :none:`unclaimed` | |
+------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
| device | omp_get_mapped_ptr routine | :good:`done` | D141545 |
+------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
| device | new async target memory copy routines | :good:`done` | D136103 |
+------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
| device | thread_limit clause on target construct | :part:`partial` | D141540 (offload), D152054 (host, in progress) |
+------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
| device | has_device_addr clause on target construct | :none:`unclaimed` | |
+------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
| device | iterators in map clause or motion clauses | :none:`unclaimed` | |
+------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
| device | indirect clause on declare target directive | :none:`unclaimed` | |
+------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
| device | allow virtual functions calls for mapped object on device | :part:`partial` | |
+------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
| device | interop construct | :part:`partial` | parsing/sema done: D98558, D98834, D98815 |
+------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
| device | assorted routines for querying interoperable properties | :part:`partial` | D106674 |
+------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
| loop | Loop tiling transformation | :good:`done` | D76342 |
+------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
| loop | Loop unrolling transformation | :good:`done` | D99459 |
+------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
| loop | 'reproducible'/'unconstrained' modifiers in 'order' clause | :part:`partial` | D127855 |
+------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
| memory management | alignment for allocate directive and clause | :good:`done` | D115683 |
+------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
| memory management | new memory management routines | :none:`unclaimed` | |
+------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
| memory management | changes to omp_alloctrait_key enum | :none:`unclaimed` | |
+------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
| memory model | seq_cst clause on flush construct | :none:`unclaimed` | |
+------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
| misc | 'omp_all_memory' keyword and use in 'depend' clause | :good:`done` | D125828, D126321 |
+------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
| misc | error directive | :good:`done` | D139166 |
+------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
| misc | scope construct | :none:`worked on` | D157933 |
+------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
| misc | routines for controlling and querying team regions | :part:`partial` | D95003 (libomp only) |
+------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
| misc | changes to ompt_scope_endpoint_t enum | :none:`unclaimed` | |
+------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
| misc | omp_display_env routine | :good:`done` | D74956 |
+------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
| misc | extended OMP_PLACES syntax | :none:`unclaimed` | |
+------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
| misc | OMP_NUM_TEAMS and OMP_TEAMS_THREAD_LIMIT env vars | :good:`done` | D138769 |
+------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
| misc | 'target_device' selector in context specifier | :none:`unclaimed` | |
+------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
| misc | begin/end declare variant | :good:`done` | D71179 |
+------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
| misc | dispatch construct and function variant argument adjustment | :part:`worked on` | D99537, D99679 |
+------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
| misc | assumes directives | :part:`worked on` | |
+------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
| misc | assume directive | :part:`worked on` | |
+------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
| misc | nothing directive | :good:`done` | D123286 |
+------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
| misc | masked construct and related combined constructs | :part:`worked on` | D99995, D100514 |
+------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
| misc | default(firstprivate) & default(private) | :good:`done` | D75591 (firstprivate), D125912 (private) |
+------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
| other | deprecating master construct | :none:`unclaimed` | |
+------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
| OMPT | new barrier types added to ompt_sync_region_t enum | :none:`unclaimed` | |
+------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
| OMPT | async data transfers added to ompt_target_data_op_t enum | :none:`unclaimed` | |
+------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
| OMPT | new barrier state values added to ompt_state_t enum | :none:`unclaimed` | |
+------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
| OMPT | new 'emi' callbacks for external monitoring interfaces | :good:`done` | |
+------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
| OMPT | device tracing interface | :none:`unclaimed` | |
+------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
| task | 'strict' modifier for taskloop construct | :none:`unclaimed` | |
+------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
| task | inoutset in depend clause | :good:`done` | D97085, D118383 |
+------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
| task | nowait clause on taskwait | :part:`partial` | parsing/sema done: D131830, D141531 |
+------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
[OpenMP][OpenACC] Implement `ompx_hold` map type modifier extension in Clang (1/2) This patch implements Clang support for an original OpenMP extension we have developed to support OpenACC: the `ompx_hold` map type modifier. The next patch in this series, D106510, implements OpenMP runtime support. Consider the following example: ``` #pragma omp target data map(ompx_hold, tofrom: x) // holds onto mapping of x { foo(); // might have map(delete: x) #pragma omp target map(present, alloc: x) // x is guaranteed to be present printf("%d\n", x); } ``` The `ompx_hold` map type modifier above specifies that the `target data` directive holds onto the mapping for `x` throughout the associated region regardless of any `target exit data` directives executed during the call to `foo`. Thus, the presence assertion for `x` at the enclosed `target` construct cannot fail. (As usual, the standard OpenMP reference count for `x` must also reach zero before the data is unmapped.) Justification for inclusion in Clang and LLVM's OpenMP runtime: * The `ompx_hold` modifier supports OpenACC functionality (structured reference count) that cannot be achieved in standard OpenMP, as of 5.1. * The runtime implementation for `ompx_hold` (next patch) will thus be used by Flang's OpenACC support. * The Clang implementation for `ompx_hold` (this patch) as well as the runtime implementation are required for the Clang OpenACC support being developed as part of the ECP Clacc project, which translates OpenACC to OpenMP at the directive AST level. These patches are the first step in upstreaming OpenACC functionality from Clacc. * The Clang implementation for `ompx_hold` is also used by the tests in the runtime implementation. That syntactic support makes the tests more readable than low-level runtime calls can. Moreover, upstream Flang and Clang do not yet support OpenACC syntax sufficiently for writing the tests. * More generally, the Clang implementation enables a clean separation of concerns between OpenACC and OpenMP development in LLVM. That is, LLVM's OpenMP developers can discuss, modify, and debug LLVM's extended OpenMP implementation and test suite without directly considering OpenACC's language and execution model, which can be handled by LLVM's OpenACC developers. * OpenMP users might find the `ompx_hold` modifier useful, as in the above example. See new documentation introduced by this patch in `openmp/docs` for more detail on the functionality of this extension and its relationship with OpenACC. For example, it explains how the runtime must support two reference counts, as specified by OpenACC. Clang recognizes `ompx_hold` unless `-fno-openmp-extensions`, a new command-line option introduced by this patch, is specified. Reviewed By: ABataev, jdoerfert, protze.joachim, grokos Differential Revision: https://reviews.llvm.org/D106509
2021-08-31 15:17:07 -04:00
OpenMP Extensions
=================
The following table provides a quick overview over various OpenMP
[OpenMP][OpenACC] Implement `ompx_hold` map type modifier extension in Clang (1/2) This patch implements Clang support for an original OpenMP extension we have developed to support OpenACC: the `ompx_hold` map type modifier. The next patch in this series, D106510, implements OpenMP runtime support. Consider the following example: ``` #pragma omp target data map(ompx_hold, tofrom: x) // holds onto mapping of x { foo(); // might have map(delete: x) #pragma omp target map(present, alloc: x) // x is guaranteed to be present printf("%d\n", x); } ``` The `ompx_hold` map type modifier above specifies that the `target data` directive holds onto the mapping for `x` throughout the associated region regardless of any `target exit data` directives executed during the call to `foo`. Thus, the presence assertion for `x` at the enclosed `target` construct cannot fail. (As usual, the standard OpenMP reference count for `x` must also reach zero before the data is unmapped.) Justification for inclusion in Clang and LLVM's OpenMP runtime: * The `ompx_hold` modifier supports OpenACC functionality (structured reference count) that cannot be achieved in standard OpenMP, as of 5.1. * The runtime implementation for `ompx_hold` (next patch) will thus be used by Flang's OpenACC support. * The Clang implementation for `ompx_hold` (this patch) as well as the runtime implementation are required for the Clang OpenACC support being developed as part of the ECP Clacc project, which translates OpenACC to OpenMP at the directive AST level. These patches are the first step in upstreaming OpenACC functionality from Clacc. * The Clang implementation for `ompx_hold` is also used by the tests in the runtime implementation. That syntactic support makes the tests more readable than low-level runtime calls can. Moreover, upstream Flang and Clang do not yet support OpenACC syntax sufficiently for writing the tests. * More generally, the Clang implementation enables a clean separation of concerns between OpenACC and OpenMP development in LLVM. That is, LLVM's OpenMP developers can discuss, modify, and debug LLVM's extended OpenMP implementation and test suite without directly considering OpenACC's language and execution model, which can be handled by LLVM's OpenACC developers. * OpenMP users might find the `ompx_hold` modifier useful, as in the above example. See new documentation introduced by this patch in `openmp/docs` for more detail on the functionality of this extension and its relationship with OpenACC. For example, it explains how the runtime must support two reference counts, as specified by OpenACC. Clang recognizes `ompx_hold` unless `-fno-openmp-extensions`, a new command-line option introduced by this patch, is specified. Reviewed By: ABataev, jdoerfert, protze.joachim, grokos Differential Revision: https://reviews.llvm.org/D106509
2021-08-31 15:17:07 -04:00
extensions and their implementation status. These extensions are not
currently defined by any standard, so links to associated LLVM
documentation are provided. As these extensions mature, they will be
considered for standardization. Please post on the
`Discourse forums (Runtimes - OpenMP category)`_ to provide feedback.
[OpenMP][OpenACC] Implement `ompx_hold` map type modifier extension in Clang (1/2) This patch implements Clang support for an original OpenMP extension we have developed to support OpenACC: the `ompx_hold` map type modifier. The next patch in this series, D106510, implements OpenMP runtime support. Consider the following example: ``` #pragma omp target data map(ompx_hold, tofrom: x) // holds onto mapping of x { foo(); // might have map(delete: x) #pragma omp target map(present, alloc: x) // x is guaranteed to be present printf("%d\n", x); } ``` The `ompx_hold` map type modifier above specifies that the `target data` directive holds onto the mapping for `x` throughout the associated region regardless of any `target exit data` directives executed during the call to `foo`. Thus, the presence assertion for `x` at the enclosed `target` construct cannot fail. (As usual, the standard OpenMP reference count for `x` must also reach zero before the data is unmapped.) Justification for inclusion in Clang and LLVM's OpenMP runtime: * The `ompx_hold` modifier supports OpenACC functionality (structured reference count) that cannot be achieved in standard OpenMP, as of 5.1. * The runtime implementation for `ompx_hold` (next patch) will thus be used by Flang's OpenACC support. * The Clang implementation for `ompx_hold` (this patch) as well as the runtime implementation are required for the Clang OpenACC support being developed as part of the ECP Clacc project, which translates OpenACC to OpenMP at the directive AST level. These patches are the first step in upstreaming OpenACC functionality from Clacc. * The Clang implementation for `ompx_hold` is also used by the tests in the runtime implementation. That syntactic support makes the tests more readable than low-level runtime calls can. Moreover, upstream Flang and Clang do not yet support OpenACC syntax sufficiently for writing the tests. * More generally, the Clang implementation enables a clean separation of concerns between OpenACC and OpenMP development in LLVM. That is, LLVM's OpenMP developers can discuss, modify, and debug LLVM's extended OpenMP implementation and test suite without directly considering OpenACC's language and execution model, which can be handled by LLVM's OpenACC developers. * OpenMP users might find the `ompx_hold` modifier useful, as in the above example. See new documentation introduced by this patch in `openmp/docs` for more detail on the functionality of this extension and its relationship with OpenACC. For example, it explains how the runtime must support two reference counts, as specified by OpenACC. Clang recognizes `ompx_hold` unless `-fno-openmp-extensions`, a new command-line option introduced by this patch, is specified. Reviewed By: ABataev, jdoerfert, protze.joachim, grokos Differential Revision: https://reviews.llvm.org/D106509
2021-08-31 15:17:07 -04:00
+------------------------------+-----------------------------------------------------------------------------------+--------------------------+--------------------------------------------------------+
|Category | Feature | Status | Reviews |
+==============================+===================================================================================+==========================+========================================================+
| atomic extension | `'atomic' strictly nested within 'teams' | :good:`prototyped` | D126323 |
| | <https://openmp.llvm.org/docs/openacc/OpenMPExtensions.html#atomicWithinTeams>`_ | | |
+------------------------------+-----------------------------------------------------------------------------------+--------------------------+--------------------------------------------------------+
| device extension | `'ompx_hold' map type modifier | :good:`prototyped` | D106509, D106510 |
| | <https://openmp.llvm.org/docs/openacc/OpenMPExtensions.html#ompx-hold>`_ | | |
+------------------------------+-----------------------------------------------------------------------------------+--------------------------+--------------------------------------------------------+
.. _Discourse forums (Runtimes - OpenMP category): https://discourse.llvm.org/c/runtimes/openmp/35