mirror of
https://github.com/llvm/llvm-project.git
synced 2025-04-27 13:06:06 +00:00
101 lines
4.7 KiB
ReStructuredText
101 lines
4.7 KiB
ReStructuredText
===========================
|
|
OpenMP 16.0.0 Release Notes
|
|
===========================
|
|
|
|
|
|
.. warning::
|
|
These are in-progress notes for the upcoming LLVM 16.0.0 release.
|
|
Release notes for previous releases can be found on
|
|
`the Download Page <https://releases.llvm.org/download.html>`_.
|
|
|
|
|
|
Introduction
|
|
============
|
|
|
|
This document contains the release notes for the OpenMP runtime, release 16.0.0.
|
|
Here we describe the status of OpenMP, including major improvements
|
|
from the previous release. All OpenMP releases may be downloaded
|
|
from the `LLVM releases web site <https://llvm.org/releases/>`_.
|
|
|
|
Non-comprehensive list of changes in this release
|
|
=================================================
|
|
|
|
* OpenMP target offloading will no longer support on 32-bit Linux systems.
|
|
``libomptarget`` and plugins will not be built on 32-bit systems.
|
|
|
|
* OpenMP target offloading plugins are re-implemented and named as the NextGen
|
|
plugins. These have an internal unified interface that implement the common
|
|
behavior of all the plugins. This way, generic optimizations or features can
|
|
be implemented once, in the plugin interface, so all the plugins include them
|
|
with no additional effort. Also, all new plugins now behave more similarly and
|
|
debugging is simplified. The NextGen module includes the NVIDIA CUDA, the
|
|
AMDGPU and the GenericELF64bit plugins. These NextGen plugins are enabled by
|
|
default and replace the original ones. The new plugins can be disabled by
|
|
setting the environment variable ``LIBOMPTARGET_NEXTGEN_PLUGINS`` to ``false``
|
|
(default: ``true``).
|
|
|
|
* Support for building the OpenMP runtime for Windows on AArch64 and ARM
|
|
with MinGW based toolchains.
|
|
|
|
* Made the OpenMP runtime tests run successfully on Windows.
|
|
|
|
* Improved performance and internalization when compiling in LTO mode using
|
|
``-foffload-lto``.
|
|
|
|
* Created the ``nvptx-arch`` and ``amdgpu-arch`` tools to query the user's
|
|
installed GPUs.
|
|
|
|
* Removed ``CLANG_OPENMP_NVPTX_DEFAULT_ARCH`` in favor of using the new
|
|
``nvptx-arch`` tool.
|
|
|
|
* Added support for ``--offload-arch=native`` which queries the user's locally
|
|
available GPU architectures. Now ``-fopenmp --offload-arch=native`` is
|
|
sufficient to target all of the user's GPUs.
|
|
|
|
* Added ``-fopenmp-target-jit`` to enable JIT support. Only basic JIT feature is
|
|
supported in this release. A couple of JIT related environment variables were
|
|
added, which can be found on `LLVM/OpenMP runtimes page <https://openmp.llvm.org/design/Runtimes.html#libomptarget-jit-opt-level>`.
|
|
|
|
* OpenMP now supports ``-Xarch_host`` to control sending compiler arguments only
|
|
to the host compilation.
|
|
|
|
* Improved ``clang-format`` when used on OpenMP offloading applications.
|
|
|
|
* ``f16`` suffix is supported when compiling OpenMP programs if the target
|
|
supports it.
|
|
|
|
* Python 3 is required to run OpenMP LIT tests now.
|
|
|
|
* Fixed a number of bugs and regressions.
|
|
|
|
* Improved host thread utilization on target nowait regions. Target tasks are
|
|
now continuously re-enqueued by the OpenMP runtime until their device-side
|
|
operations are completed, unblocking the host thread to execute other tasks.
|
|
|
|
* Target tasks re-enqueue can be controlled on a per-thread basis based on
|
|
exponential backoff counting. ``OMPTARGET_QUERY_COUNT_THRESHOLD`` defines how
|
|
many target tasks must be re-enqueued before the thread starts blocking on the
|
|
device operations (defaults to 10). ``OMPTARGET_QUERY_COUNT_MAX`` defines the
|
|
maximum value for the per-thread re-enqueue counter (defaults to 5).
|
|
``OMPTARGET_QUERY_COUNT_BACKOFF_FACTOR`` defines the decrement factor applied
|
|
to the counter when a target task is completed (defaults to 0.5).
|
|
|
|
* GPU dynamic shared memory (aka. local data share (lds)) can now be allocated
|
|
per kernel via the ``ompx_dyn_cgroup_mem(<Bytes>)`` clause. For an example,
|
|
see https://openmp.llvm.org/design/Runtimes.html#dynamic-shared-memory.
|
|
|
|
* OpenMP-Opt (run as part of O1/O2/O3) will more effectively lower GPU resource
|
|
usage and improve performance.
|
|
|
|
* Support record-and-replay functionality for individual OpenMP offload kernels.
|
|
Enabling recording in the host OpenMP target runtime library stores per-kernel
|
|
the device image, device memory state, and kernel launching information. The
|
|
newly added command-line tool `llvm-omp-kernel-replay` replays kernel execution.
|
|
Environment variables control recording/replaying:
|
|
* LIBOMPTARGET_RECORDING=<0|1>, 0: disable recording (default), 1: enable recording
|
|
* LIBOMPTARGET_RR_DEVMEM_SIZE = <integer in bytes>, default 64GB, amount of device
|
|
memory to pre-allocate for storing/loading when recording/replaying
|
|
* LIBOMPTARGET_RR_SAVE_OUTPUT=<0|1>, 0: disable saving device memory post-kernel execution
|
|
(default), 1: enable saving device memory post-kernel execution (used for verification
|
|
with `llvm-omp-kernel-replay`)
|