For `libomp` on AIX, we build shared object `libomp.so` first and then
archive it into `libomp.a`. Due to a CMake for AIX problem, the install
step also tries to publish `libomp.so`. While we use a script to build
`libomp.a` out-of-tree for Clang and avoided the problem, this chokes
the in-tree build for Flang. The issue will be reported to CMake but
before a fixed CMake is available, this patch ensures only `libomp.a` is
published.
If I use commas to delimit architectures in
`LIBOMPTARGET_DEVICE_ARCHITECTURES`, cmake for the runtimes complains:
```
Unknown GPU architecture 'sm_70,sm_80,sm_90'
```
Semicolons are required instead.
This patch adds support for pause resource with a new enumerator
omp_pause_stop_tool. The expected behavior of this enumerator is
* omp_pause_resource: not allowed
* omp_pause_resource_all: equivalent to omp_pause_hard
Summary:
The 'omp_alloc' function should be callable from a target region. This
patch implemets it by simply calling `malloc` for every non-default
trait value allocator. All the special access modifiers are
unimplemented and return null. The null allocator returns null as the
spec states it should not be usable from the target.
Start __kmp_invoke_microtask with PACBTI in order to identify the
function as a valid branch target. Before returning, SP is
authenticated.
Also add the BTI and PAC markers to z_Linux_asm.S.
With this patch, libomp.so can now be generated with DT_AARCH64_BTI_PLT
when built with -mbranch-protection=standard.
The implementation is based on the code available in compiler-rt.
This fixes several of those when building with MSVC on Windows:
```
[3625/7617] Building CXX object
projects\openmp\runtime\src\CMakeFiles\omp.dir\kmp_affinity.cpp.obj
C:\src\git\llvm-project\openmp\runtime\src\kmp_affinity.cpp(2637):
warning C4062: enumerator 'KMP_HW_UNKNOWN' in switch of enum 'kmp_hw_t'
is not handled
C:\src\git\llvm-project\openmp\runtime\src\kmp.h(628): note: see
declaration of 'kmp_hw_t'
```
* Separate wasi and emscripten as they have different constraints and
abilities
* Emscripten mimics Linux/POSIX by statically linking the musl runtime.
This allow nearly all KMP_OS_LINUX code paths to work correctly. There
are only a few places that need to be adjusted related to dynamic
linking (dl_open)
* Internally link openmp globals
* With CommonLinkage it is needed to emit them in an assembly file, now
they are defined and used within each compilation unit
* With ExternalLinkage they suffer from duplicate symbols during linking
for unnamed globals like reduction/critical
* Interestingly this aligns with the TODO comment above this code
Similar to (de)allocation traces, we can record kernel launch stack
traces and display them in case of an error. However, the AMD GPU plugin
signal handler, which is invoked on memroy faults, cannot pinpoint the
offending kernel. Insteade print `<NUM>`, set via
`OFFLOAD_TRACK_NUM_KERNEL_LAUNCH_TRACES=<NUM>`, many traces. The
recoding/record uses a ring buffer of fixed size (for now 8).
For `trap` errors, we print the actual kernel name, and trace if
recorded.
As a first step towards a GPU sanitizer we now can track allocations and
deallocations in order to report double frees, and other problems during
deallocation.
On non-hyperthreaded machines, the thread id is not always explicit in
the /proc/cpuinfo file. This patch adds a check to ensure the thread ids
are put in.
These are Intel-specific changes for the CPUID leaf 31 method for
detecting machine topology.
* Cleanup known levels usage in x2apicid topology algorithm
Change to be a constant mask of all Intel topology type values.
* Take unknown ids into account when sorting them
If a hardware id is unknown, then put further down the hardware thread
list so it will take last priority when assigning to threads.
* Have sub ids printed out for hardware thread dump
* Add caches to topology
New` kmp_cache_ids_t` class helps create cache ids which are then put
into the topology table after regular topology type ids have been put
in.
* Allow empty masks in place list creation
Have enumeration information and place list generation take into account
that certain hardware threads may be lacking certain layers
* Allow different procs to have different number of topology levels
Accommodates possible situation where CPUID.1F has different depth for
different hardware threads. Each hardware thread has a topology
description which is just a small set of its topology levels. These
descriptions are tracked to see if the topology is uniform or not.
* Change regular ids with logical ids
Instead of keeping the original sub ids that the x2apicid topology
detection algorithm gives, change each id to its logical id which is a
number: [0, num_items - 1]. This makes inserting new layers into the
topology significantly simpler.
* Insert caches into topology
This change takes into account that most topologies are uniform and
therefore can use the quicker method of inserting caches as equivalent
layers into the topology.
The debug assert is meant to check that the index is a valid which means
the runtime needs to check against the size of the array instead of the
number of threads. A free()-ed thread put back in the thread pool may
index into anywhere inside the task team's available array from 0 to
tt_max_threads potentially.
Fixes: #94260
Add the reverse directive which will be introduced in the upcoming
OpenMP 6.0 specification. A preview has been published in [Technical
Report 12](https://www.openmp.org/wp-content/uploads/openmp-TR12.pdf).
---------
Co-authored-by: Alexey Bataev <a.bataev@outlook.com>
This change makes the runtime use new OMPT state and sync kinds
introduced in OpenMP 5.1 in place of the deprecated implicit state and
sync kinds. Events from implicit barriers use different enumerators for
workshare, parallel, and teams.
from PEP8
(https://peps.python.org/pep-0008/#programming-recommendations):
> Comparisons to singletons like None should always be done with is or
is not, never the equality operators.
Co-authored-by: Eisuke Kawashima <e-kwsm@users.noreply.github.com>
from PEP8
(https://peps.python.org/pep-0008/#programming-recommendations):
> Comparisons to singletons like None should always be done with is or
is not, never the equality operators.
Co-authored-by: Eisuke Kawashima <e-kwsm@users.noreply.github.com>
For the case where baseTypeSize does not match the size of T, read the
value into a separate char buffer first. This avoids accessing buf using
two different types.
Fixes https://github.com/llvm/llvm-project/issues/94616.
Use more specific values from `ompt_work_t` to allow the tool identify
the schedule of a worksharing-loop. With this patch, the runtime will
report the schedule chosen by the runtime rather than necessarily the
schedule literally requested by the clause.
E.g., for guided + just one iteration per thread, the runtime would
choose and report static.
Fixes issue #63904
Add support to the runtime for 6.0 spec features that allow num_threads
clause to take a list, and also make use of the strict modifier.
Provides new compiler interface functions for these features.
Sometimes it might be beneficial to spawn more thread blocks instead of
reusing existing for multiple loop iterations.
**Alternatives considered:**
Make `DefaultNumBlocks` settable via an environment variable.
---------
Co-authored-by: Joseph Huber <huberjn@outlook.com>
Without a value set conditional checks like
if(NOT ${OPENMP_STANDALONE_BUILD})
will not be able to evaluate to true.
Fixes issue introduced from PR #93463, which did not allow the OMPT
variable to be propogated up to offload during a runtimes build.
PR #75125 introduced upward propagation of some OMPT-related CMake
variables.
For stand-alone builds this results in a warning that `SCOPE_PARENT` has
no meaning in a top-level directory.
- The first hidden-helper-thread did not trigger thread-begin
- The "detaching" from a target-task when waiting for completion missed
to call task-switch
- Target tasks identified themself as explicit task
Co-authored-by: Kaloyan Ignatov <kaloyan.ignatov@rwth-aachen.de>
Summary:
The `add_llvm_library` interface handles installing the llvm libraries,
however we want to do our own handling. Otherwise, this will install
into the `./lib` location instead of the `./lib/<target>` one.
Avoid calling NULL function pointers in cases where ompt_start_tool
succeeds but those tsan functions
do not really exist.
Fix https://github.com/llvm/llvm-project/issues/93524
---------
Co-authored-by: Joachim <protze@rz.rwth-aachen.de>
Update the folder titles for targets in the monorepository that have not
seen taken care of for some time. These are the folders that targets are
organized in Visual Studio and XCode
(`set_property(TARGET <target> PROPERTY FOLDER "<title>")`)
when using the respective CMake's IDE generator.
* Ensure that every target is in a folder
* Use a folder hierarchy with each LLVM subproject as a top-level folder
* Use consistent folder names between subprojects
* When using target-creating functions from AddLLVM.cmake, automatically
deduce the folder. This reduces the number of
`set_property`/`set_target_property`, but are still necessary when
`add_custom_target`, `add_executable`, `add_library`, etc. are used. A
LLVM_SUBPROJECT_TITLE definition is used for that in each subproject's
root CMakeLists.txt.
This reverts commit 098c6dfa8157681699a71fce9e3d94515e66311f.
This reverts commit 8c718a3a91df4ab68dc3f1ca3887ea730c9aed84.
This reverts commit 4fb02de9d490d0773441aa30124bb4d1272230d3.