The patch contains a basic BumpAllocator for (AMD)GPUs to allow us to
run more tests. The allocator implements `malloc`, both internally and
externally, while we continue to default to the NVIDIA `malloc` when we
target NVIDIA GPUs. Once we have smarter or customizable allocators we
should consider this choice, for now, this allocator is better than
none. It traps if it is out of memory, making it easy to debug. Heap
size is configured via `LIBOMPTARGET_HEAP_SIZE` and defaults to 512MB.
It allows to track allocation statistics via
`LIBOMPTARGET_DEVICE_RTL_DEBUG=8` (together with
`-fopenmp-target-debug=8`). Two tests were added, and one was enabled.
This is the next step towards fixing
https://github.com/llvm/llvm-project/issues/66708
As discussed on the weekly OpenMP meeting on the second of August 2023, the default version
in the OpenMP documentation shoud be changed from OpenMP 5.0 to 5.1.
Differential Revision: https://reviews.llvm.org/D156901
This change adds the option of using different units for blocktimes specified via the KMP_BLOCKTIME environment variable. The parsing of the environment now recognizes units suffixes: ms and us. If a units suffix is not specified, the default unit is ms. Thus default behavior is still the same, and any previous usage still works the same. Internally, blocktime is now converted to microseconds everywhere, so settings that exceed INT_MAX in microseconds are considered "infinite".
kmp_set/get_blocktime are updated to use the units the user specified with KMP_BLOCKTIME, and if not specified, ms are used.
Added better range checking and inform messages for the two time units. Large values of blocktime for default (ms) case (beyond INT_MAX/1000) are no longer allowed, but will autocorrect with an INFORM message.
The delay for determining ticks per usec was lowered. It is now 1 million ticks which was calculated as ~450us based on 2.2GHz clock which is pretty typical base clock frequency on X86:
(1e6 Ticks) / (2.2e9 Ticks/sec) * (1e6 usec/sec) = 454 usec
Really short benchmarks can be affected by longer delay.
Update KMP_BLOCKTIME docs.
Portions of this commit were authored by Johnny Peyton.
Differential Revision: https://reviews.llvm.org/D157646
If the Envar is set to true (default), busy HSA queues will be
actively avoided when assigning a queue to a Stream.
Otherwise, we will initialize a new HSA queue for each requested
Stream, then default to round robin once the set maximum has been
reached.
Reviewed By: jdoerfert, kevinsala
Differential Revision: https://reviews.llvm.org/D156996
This feature was supposed to allow you to trace execution inside of
Libomptarget. However, this never really worked properly. The printing
was always reoganized, only worked for single threads, and pretty much
only told you a handful of things about a runtime library that's an
implementation detail to all users. Despite this, it contributed about
40% of the total filesize of the deviceRTL. This patch simply removes
this functionalit which I think was past due.
Reviewed By: tianshilei1992
Differential Revision: https://reviews.llvm.org/D157001
This patch lazily initializes queues/streams/events since their initialization
might come at a cost even if we do not use them.
To further benefit from this, AMDGPU/HSA queue management is moved into the
AMDGPUStreamManager of an AMDGPUDevice. Streams may now use different HSA queues
during their lifetime and identify busy queues.
When a Stream is requested from the resource manager, it will search for and
try to assign an idle queue. During the search for an idle queue the manager
may initialize more queues, up to the set maximum (default: 4).
When no idle queue could be found: resort to round robin selection.
With contributions from Johannes Doerfert <johannes@jdoerfert.de>
Depends on D156245
Reviewed By: kevinsala
Differential Revision: https://reviews.llvm.org/D154523
This command adds an OpenMP offloading specific command line reference. The OpenMP FAQ links to the .rst new file.
Reviewed By: jhuber6
Differential Revision: https://reviews.llvm.org/D156387
When I was trying to improve the OpenMP documentation, I found that the information in `OpenMP/docs/README.md` did not contain up-to-date information about how to build the OpenMP documentation with Sphinx. When I ran `make
docs-openmp-html`, the command failed because there were a few syntax errors in `openmp/docs/design/Runtimes.rst`. This commit fixes the syntax errors and updates the documentation on building the OpenMP documentation.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D156470
I have added a few things to the OpenMP FAQ which I think were missing. Feel free to suggest some changes. Are there missing options in the offloading command line reference? And what do you think about the section "Q: Why is my
build taking a long time"?
Differential Revision: https://reviews.llvm.org/D156387
This points users to the `libc` documentation and explains the basics of
how it's used inside the runtime.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D155318
It's time to remove the old plugins as the next-gen has already been set
to default in LLVM 16.
Reviewed By: tianshilei1992
Differential Revision: https://reviews.llvm.org/D142820
If a combined loop has insufficient parallelism (= low trip count), we
might end up with too few teams/blocks. To counter that we can reduce
the number of threads per team we use. This patch implements a heuristic
and exposes a new environment variable to control the minimum of threads
to be employed in this case.
Issue reported by:
Felipe Cabarcas Jaramillo <cabarcas@udel.edu> (@fel-cab).
Reviewed By: tianshilei1992
Differential Revision: https://reviews.llvm.org/D152014
This reverts commit d763c6e5e2d0a6b34097aa7dabca31e9aff9b0b6.
Adds the patch by @hans from
https://github.com/llvm/llvm-project/issues/62719
This patch fixes the Windows build.
d763c6e5e2d0a6b34097aa7dabca31e9aff9b0b6 reverted the reviews
D144509 [CMake] Bumps minimum version to 3.20.0.
This partly undoes D137724.
This change has been discussed on discourse
https://discourse.llvm.org/t/rfc-upgrading-llvms-minimum-required-cmake-version/66193
Note this does not remove work-arounds for older CMake versions, that
will be done in followup patches.
D150532 [OpenMP] Compile assembly files as ASM, not C
Since CMake 3.20, CMake explicitly passes "-x c" (or equivalent)
when compiling a file which has been set as having the language
C. This behaviour change only takes place if "cmake_minimum_required"
is set to 3.20 or newer, or if the policy CMP0119 is set to new.
Attempting to compile assembly files with "-x c" fails, however
this is workarounded in many cases, as OpenMP overrides this with
"-x assembler-with-cpp", however this is only added for non-Windows
targets.
Thus, after increasing cmake_minimum_required to 3.20, this breaks
compiling the GNU assembly for Windows targets; the GNU assembly is
used for ARM and AArch64 Windows targets when building with Clang.
This patch unbreaks that.
D150688 [cmake] Set CMP0091 to fix Windows builds after the cmake_minimum_required bump
The build uses other mechanism to select the runtime.
Fixes#62719
Reviewed By: #libc, Mordante
Differential Revision: https://reviews.llvm.org/D151344
This is an ongoing series of commits that are reformatting our
Python code. This catches the last of the python files to
reformat. Since they where so few I bunched them together.
Reformatting is done with `black`.
If you end up having problems merging this commit because you
have made changes to a python file, the best way to handle that
is to run git checkout --ours <yourfile> and then reformat it
with black.
If you run into any problems, post to discourse about it and
we will try to help.
RFC Thread below:
https://discourse.llvm.org/t/rfc-document-and-standardize-python-code-style
Reviewed By: jhenderson, #libc, Mordante, sivachandra
Differential Revision: https://reviews.llvm.org/D150784
This reverts commit 65429b9af6a2c99d340ab2dcddd41dab201f399c.
Broke several projects, see https://reviews.llvm.org/D144509#4347562 onwards.
Also reverts follow-up commit "[OpenMP] Compile assembly files as ASM, not C"
This reverts commit 4072c8aee4c89c4457f4f30d01dc9bb4dfa52559.
Also reverts fix attempt "[cmake] Set CMP0091 to fix Windows builds after the cmake_minimum_required bump"
This reverts commit 7d47dac5f828efd1d378ba44a97559114f00fb64.
They don't convey any useful information and make the documentation
unnecessarily hard to read.
Differential Revision: https://reviews.llvm.org/D149641
Adds HSA timeout hint of 2 seconds to the AMDGPU nextgen-plugin to improve
performance of small kernels.
The HSA runtime may stay in HSA_WAIT_STATE_ACTIVE for up to the timeout
value before switching to HSA_WAIT_STATE_BLOCKED. This can improve
latency from which small kernels can benefit.
The value was determined via experimentation w/ different benchmarks.
The timeout value can be overriden using the environment variable
LIBOMPTARGET_AMDGPU_STREAM_BUSYWAIT with a value in microseconds.
Original author: Greg Rodgers <Gregory.Rodgers@amd.com>
Contributions from: JP Lehr <JanPatrick.Lehr@amd.com>
Differential Revision: https://reviews.llvm.org/D148808
Summary:
At some point we stopped copying this file to the server, but
realistically this is just a static `.pdf` hosted in the LLVM repository
so we can link it directly.
We recently reverted a patch that automatically set the rpath on OpenMP
executables. This was used because the `libomptarget.so` library is only
expected to work with the same version of compiler that will be using
it. This patch adds some documentation for how to get similar behaviour
as before using a clang configuration file.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D147943
Some build bots have not been updated to the new minimal CMake version.
Reverting for now and ping the buildbot owners.
This reverts commit 44c6b905f8527635e49bb3ea97dea315f92d38ec.
This partly undoes D137724.
This change has been discussed on discourse
https://discourse.llvm.org/t/rfc-upgrading-llvms-minimum-required-cmake-version/66193
Note this does not remove work-arounds for older CMake versions, that
will be done in followup patches.
Reviewed By: mehdi_amini, MaskRay, ChuanqiXu, to268, thieta, tschuett, phosek, #libunwind, #libc_vendors, #libc, #libc_abi, sivachandra, philnik, zibi
Differential Revision: https://reviews.llvm.org/D144509
Dynamic memory allows users to allocate fast shared memory when a kernel
is launched. We support a single size for all kernels via the
`LIBOMPTARGET_SHARED_MEMORY_SIZE` environment variable but now we can
control it per kernel invocation, hence allow computed values.
Note: Only the nextgen plugins will allocate memory based on the clause,
the old plugins will silently miscompile.
Differential Revision: https://reviews.llvm.org/D141233
The JIT is a great debugging tool since we can modify the IR manually
before launching it in an existing test case. The new flasks allow to
skip optimizations, to use the exact given IR, as well as to provide a
finished object file. The latter is useful to try out different backend
options and to have complete freedom with pass pipelines.
Documentation is included. Minimal refactoring was performed to make the
second object fit in nicely.