We used to perform team reduction on global memory allocated in the
runtime and by clang. This was racy as multiple instances of a kernel,
or different kernels with team reductions, would use the same locations.
Since we now have the kernel launch environment, we can allocate dynamic
memory per-launch, allowing us to move all the state into a non-racy
place.
Fixes: https://github.com/llvm/llvm-project/issues/70249
This patch moves `ArraySizeModifier` before `Type` declaration so that it's complete at `ArrayTypeBitfields` declaration. It's also converted to scoped enum along the way.
We used to pass the min/max threads/teams values through different paths
from the frontend to the middle end. This simplifies the situation by
passing the values once, only when we will create the KernelEnvironment,
which contains the values. At that point we also manifest the metadata,
as appropriate. Some footguns have also been removed, e.g., our target
check is now triple-based, not calling convention-based, as the latter
is dependent on the ordering of operations. The types of the values have
been unified to int32_t.
This patch starts the support for OpenMP kernel language, basically to write
OpenMP target region in SIMT style, similar to kernel languages such as CUDA.
What included in this first patch is the `ompx_bare` clause for `target teams`
directive. When `ompx_bare` exists, globalization is disabled such that local
variables will not be globalized. The runtime init/deinit function calls will
not be emitted. That being said, almost all OpenMP executable directives are
not supported in the region, such as parallel, task. This patch doesn't include
the Sema checks for that, so the use of them is UB. Simple directives, such as
atomic, can be used. We provide a set of APIs (for C, they are prefix with
`ompx_`; for C++, they are in `ompx` namespace) to get thread id, block id, etc.
Please refer to
https://tianshilei.me/wp-content/uploads/llvm-hpc-2023.pdf for more details.
This patch starts the support for OpenMP kernel language, basically to
write
OpenMP target region in SIMT style, similar to kernel languages such as
CUDA.
What included in this first patch is the `ompx_bare` clause for `target
teams`
directive. When `ompx_bare` exists, globalization is disabled such that
local
variables will not be globalized. The runtime init/deinit function calls
will
not be emitted. That being said, almost all OpenMP executable directives
are
not supported in the region, such as parallel, task. This patch doesn't
include
the Sema checks for that, so the use of them is UB. Simple directives,
such as
atomic, can be used. We provide a set of APIs (for C, they are prefix
with
`ompx_`; for C++, they are in `ompx` namespace) to get thread id, block
id, etc.
For more details, you can refer to
https://tianshilei.me/wp-content/uploads/llvm-hpc-2023.pdf.
This patch updates the `OpenMPIRBuilderConfig` structure to hold all
available 'requires' clauses, and it replicates part of the code
generation for the 'requires' registration function from clang in the
`OMPIRBuilder`, to be used with flang.
Porting the rest of features of the clang implementation to the IRBuilder
and sharing it between clang and flang remains for a future patch, due to the
complexity of the logic selecting the attributes of the generated
registration function.
Differential Revision: https://reviews.llvm.org/D147217
Fix for an issue where clang was not adding the address space according
to the data layout, instead was using the default which resulted in a
crash at times. The fix includes changes to the cases of
LargeCapMemAlloc and CGroupMemAlloc where we are setting the AddrSpace
according to the DataLayout.
This patch introduces per kernel environment. Previously, flags such as execution mode are set through global variables with name like `__kernel_name_exec_mode`. They are accessible on the host by reading the corresponding global variable, but not from the device. Besides, some assumptions, such as no nested parallelism, are not per kernel basis, preventing us applying per kernel optimization in the device runtime.
This is a combination and refinement of patch series D116908, D116909, and D116910.
Depend on D155886.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D142569
This patch introduces per kernel environment. Previously, flags such as execution mode are set through global variables with name like `__kernel_name_exec_mode`. They are accessible on the host by reading the corresponding global variable, but not from the device. Besides, some assumptions, such as no nested parallelism, are not per kernel basis, preventing us applying per kernel optimization in the device runtime.
This is a combination and refinement of patch series D116908, D116909, and D116910.
Depend on D155886.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D142569
This patch renames the `OpenMPIRBuilderConfig` flags to reduce confusion over
their meaning. `IsTargetCodegen` becomes `IsGPU`, whereas `IsEmbedded` becomes
`IsTargetDevice`. The `-fopenmp-is-device` compiler option is also renamed to
`-fopenmp-is-target-device` and the `omp.is_device` MLIR attribute is renamed
to `omp.is_target_device`. Getters and setters of all these renamed properties
are also updated accordingly. Many unit tests have been updated to use the new
names, but an alias for the `-fopenmp-is-device` option is created so that
external programs do not stop working after the name change.
`IsGPU` is set when the target triple is AMDGCN or NVIDIA PTX, and it is only
valid if `IsTargetDevice` is specified as well. `IsTargetDevice` is set by the
`-fopenmp-is-target-device` compiler frontend option, which is only added to
the OpenMP device invocation for offloading-enabled programs.
Differential Revision: https://reviews.llvm.org/D154591
The loop directive is a descriptive construct which allows the compiler
flexibility in how it generates code for the directive's associated
loop(s). See OpenMP specification 5.2 [257:8-9].
Codegen added in this patch for the combined 'loop' directives are:
'target teams loop' -> 'target teams distribute parallel for'
'teams loop' -> 'teams distribute parallel for'
'target parallel loop' -> 'target parallel for'
'parallel loop' -> 'parallel for'
NOTE: The implementation of the 'loop' directive itself is unchanged.
Differential Revision: https://reviews.llvm.org/D145823
Partial progress towards replacing uses of CreateElementBitCast, as it
no longer does what its name suggests.
Reviewed By: barannikov88
Differential Revision: https://reviews.llvm.org/D154229
In getNVPTXLaneID(CodeGenFunction &), the value of LaneIDBits is 4294967295 since function call llvm::Log2_32(CGF->getTarget()->getGridValue().GV_Warp_Size) might return 4294967295.
unsigned LaneIDBits =
llvm::Log2_32(CGF.getTarget().getGridValue().GV_Warp_Size);
unsigned LaneIDMask = ~0u >> (32u - LaneIDBits);
The shift amount (32U - LaneIDBits) might be 33, So it has undefined behavior for right shifting by more than 31 bits.
This patch adds an assert to guard the LaneIDBits overflow issue with LaneIDMask value.
Reviewed By: tahonermann
Differential Revision: https://reviews.llvm.org/D151606
Reported by Coverity:
1. Inside "ASTReader.cpp" file, in clang::ASTReader::FindExternalLexicalDecls(clang::DeclContext const *, llvm::function_ref<bool (clang::Decl::Kind)>, llvm::SmallVectorImpl<clang::Decl *> &): Using the auto keyword without an & causes a copy.
auto_causes_copy: Using the auto keyword without an & causes the copy of an object of type pair.
2. Inside "ASTReader.cpp" file, in clang::ASTReader::ReadAST(llvm::StringRef, clang::serialization::ModuleKind, clang::SourceLocation, unsigned int, llvm::SmallVectorImpl<clang::ASTReader::ImportedSubmodule> *): Using the auto keyword without an & causes a copy.
auto_causes_copy: Using the auto keyword without an & causes the copy of an object of type DenseMapPair.
3. Inside "CGOpenMPRuntimeGPU.cpp" file, in clang::CodeGen::CGOpenMPRuntimeGPU::emitGenericVarsEpilog(clang::CodeGen::CodeGenFunction &, bool): Using the auto keyword without an & causes a copy.
auto_causes_copy: Using the auto keyword without an & causes the copy of an object of type pair.
4. Inside "ASTWriter.cpp" file, in clang::ASTWriter::WriteHeaderSearch(clang::HeaderSearch const &): Using the auto keyword without an & causes a copy.
auto_causes_copy: Using the auto keyword without an & causes the copy of an object of type UnresolvedHeaderDirective.
Reviewed By: tahonermann
Differential Revision: https://reviews.llvm.org/D149461
This reverts commit 35cfadfbe2decd9633560b3046fa6c17523b2fa9.
It makes a couple of buildbots unhappy because of the following test failures:
- `Transforms/OpenMP/add_attributes.ll'`
- `mapping/declare_mapper_target_data.cpp` on AMDGPU
This patch introduces per kernel environment. Previously, flags such as execution mode are set through global variables with name like `__kernel_name_exec_mode`. They are accessible on the host by reading the corresponding global variable, but not from the device. Besides, some assumptions, such as no nested parallelism, are not per kernel basis, preventing us applying per kernel optimization in the device runtime.
This is a combination and refinement of patch series D116908, D116909, and D116910.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D142569
This patch prefixes omp outlined helpers and reduction funcs
with the original function's name.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D140722
This patch attempts to prefix omp outlined helpers and reduction funcs
with the original function's name.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D140722
When constructing an attribute, the syntactic form was specified
using two arguments: an attribute-independent syntax type and an
attribute-specific spelling index. This patch replaces them with
a single argument.
In most cases, that's done using a new Form class that combines the
syntax and spelling into a single object. This has the minor benefit
of removing a couple of constructors. But the main purpose is to allow
additional information to be stored as well, beyond just the syntax and
spelling enums.
In the case of the attribute-specific Create and CreateImplicit
functions, the patch instead uses the attribute-specific spelling
enum. This helps to ensure that the syntax and spelling are
consistent with each other and with the Attr.td definition.
If a Create or CreateImplicit caller specified a syntax and
a spelling, the patch drops the syntax argument and keeps the
spelling. If the caller instead specified only a syntax
(so that the spelling was SpellingNotCalculated), the patch
simply drops the syntax argument.
There were two cases of the latter: TargetVersion and Weak.
TargetVersionAttrs were created with GNU syntax, which matches
their definition in Attr.td, but which is also the default.
WeakAttrs were created with Pragma syntax, which does not match
their definition in Attr.td. Dropping the argument switches
them to AS_GNU too (to match [GCC<"weak">]).
Differential Revision: https://reviews.llvm.org/D148102
This patch adds the OffloadEntriesInfoManager to the OpenMPIRBuilder, and
allows the OffloadEntriesInfoManager to access the Configuration in the
OpenMPIRBuilder. With the shared Config there is no risk for inconsistencies,
and there is no longer the need for clang to have a separate
OffloadEntriesInfoManager.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D146549
Dynamic memory allows users to allocate fast shared memory when a kernel
is launched. We support a single size for all kernels via the
`LIBOMPTARGET_SHARED_MEMORY_SIZE` environment variable but now we can
control it per kernel invocation, hence allow computed values.
Note: Only the nextgen plugins will allocate memory based on the clause,
the old plugins will silently miscompile.
Differential Revision: https://reviews.llvm.org/D141233
This patch moves some of the logic on how to emit target region functions and
adds emitTargetRegionFunction to the OpenMPIRBuilder. Also the
OpenMPOffloadMandatory flag is added to the config.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D139634
This patch mechanically replaces None with std::nullopt where the
compiler would warn if None were deprecated. The intent is to reduce
the amount of manual work required in migrating from Optional to
std::optional.
This is part of an effort to migrate from llvm::Optional to
std::optional:
https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716
This value was added to clang/Basic in D111566, but is only used during
codegen, where we can use the LLVM IR DataLayout instead. I noticed this
because the downstream CHERI targets would have to also set this value
for AArch64/RISC-V/MIPS. Instead of duplicating more information between
LLVM IR and Clang, this patch moves getTargetAddressSpace(QualType T) to
CodeGenTypes, where we can consult the DataLayout.
Reviewed By: rjmccall
Differential Revision: https://reviews.llvm.org/D138296
This change moves the getName function from clang and moves the separator class
members from CGOpenMPRuntime into OMPIRBuilder. Also enusre all the getters
in the config class are const.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D137725
This patch introudces the OpenMPIRBuilderConfig class which contains various
flags that are needed to lower OMP constructs to LLVM-IR. The purpose is to
keep the flags in one place so they do not have to be passed in every time.
The flags can be set optionally since some uses cases don't rely on functions
that depend on these flags.
Reviewed By: jdoerfert, tschuett
Differential Revision: https://reviews.llvm.org/D138220
This patch moves the createOffloadEntriesAndInfoMetadata to OpenMPIRBuilder,
the createOffloadEntry helper function. The clang specific error handling is
invoked using a callback. This code will also be used by flang in the future.
We use protected visibility for almost everything with offloading. This
is because it provides us with the ability to read things from the host
without the expectation that it will be preempted by a shared library
load, bugs related to this have happened when offloading to the host.
This patch just makes the `exec_mode` global generated for each plugin
have protected visibility.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D135285
Parallel regions are outlined as functions with capture variables explicitly generated as distinct parameters in the function's argument list. That complicates the fork_call interface in the OpenMP runtime: (1) the fork_call is variadic since there is a variable number of arguments to forward to the outlined function, (2) wrapping/unwrapping arguments happens in the OpenMP runtime, which is sub-optimal, has been a source of ABI bugs, and has a hardcoded limit (16) in the number of arguments, (3) forwarded arguments must cast to pointer types, which complicates debugging. This patch avoids those issues by aggregating captured arguments in a struct to pass to the fork_call.
Reviewed By: jdoerfert, jhuber6, ABataev
Differential Revision: https://reviews.llvm.org/D102107