210 Commits

Author SHA1 Message Date
Sergio Afonso
56975b4ecd
[OpenMPIRBuilder] Split calculation of canonical loop trip count, NFC (#127820)
This patch splits off the calculation of canonical loop trip counts from
the creation of canonical loops. This makes it possible to reuse this
logic to, for instance, populate the `__tgt_target_kernel` runtime call
for SPMD kernels.

This feature is used to simplify one of the existing OpenMPIRBuilder
tests.
2025-02-25 10:32:54 +00:00
Akash Banerjee
785a5b4676
[MLIR][OpenMP] Add LLVM translation support for OpenMP UserDefinedMappers (#124746)
This patch adds OpenMPToLLVMIRTranslation support for the OpenMP Declare
Mapper directive.

Since both MLIR and Clang now support custom mappers, I've changed the
respective function params to no longer be optional as well.

Depends on #121005
2025-02-18 17:55:48 +00:00
Abid Qadeer
5f7acf7259
[flang][OMPIRbuilder] Set debug loc on terminator created by splitBB. (#125897)
Fixes #125088.

When splitBB is called with createBranch=true, it creates a branch
instruction in the old block. But no debug loc is set on that branch
instruction. If that is used as InsertPoint in the restoreIP, it has the
potential to set the current debug location to null and subsequent
instruction will come out without a debug location. This caused the
verification check to fail as shown in the bug report.

This PR changes splitBB and spliceBB function to also take a debugLoc
parameter which can be used to set the debug location of the branch
instruction.
2025-02-05 22:35:43 +00:00
Abid Qadeer
e151b1d1f6
[MLIR][OpenMP] Use correct DebugLoc in target construct callbacks. (#125856)
This is same as PR #125106 which somehow is stuck in a "Processing
Update" loop for many hours now. I am going to close that one and push
this one instead.

While working on https://github.com/llvm/llvm-project/issues/125088, I
noticed a problem with the TargetBodyGenCallbackTy and
TargetGenArgAccessorsCallbackTy. The OMPIRBuilder and MLIR side Both
maintain their own IRBuilder and when control goes from one to other, we
have to take care to not use a stale debug location. The code currently
rely on restoreIP to set the insertion point and the debug location. But
if the passes InsertPointTy has an empty block, then the debug location
will not be updated (see SetInsertPoint). This can cause invalid debug
location to be attached to instruction and the verifier will complain.

Similarly when we exit the callback, the debug location of the Builder
is not set to what it was before the callback. This again can cause
verification failures.

This PR resets the debug location at the start and also uses an
InsertPointGuard to restore the debug location at exit.

Both of these problems would have been caught by the unit tests but they
were not setting the debug location of the builder before calling the
createTarget so the problem was hidden. I have updated the tests
accordingly.
2025-02-05 14:59:37 +00:00
Ritanya-B-Bharadwaj
8c36665267
[OpenMP]Initial parsing/sema support for target_device selector set (#118471)
This patch adds initial support for target_device selector set - Section
9.2 (Spec 6.0)
2025-02-05 19:24:24 +05:30
Jeremy Morse
81d18ad864
[NFC][DebugInfo] Make some block-start-position methods return iterators (#124287)
As part of the "RemoveDIs" work to eliminate debug intrinsics, we're
replacing methods that use Instruction*'s as positions with iterators. A
number of these (such as getFirstNonPHIOrDbg) are sufficiently
infrequently used that we can just replace the pointer-returning version
with an iterator-returning version, hopefully without much/any
disruption.

Thus this patch has getFirstNonPHIOrDbg and
getFirstNonPHIOrDbgOrLifetime return an iterator, and updates all
call-sites. There are no concerns about the iterators returned being
converted to Instruction*'s and losing the debug-info bit: because the
methods skip debug intrinsics, the iterator head bit is always false
anyway.
2025-01-27 16:27:54 +00:00
Alex MacLean
07ed8187ac
[OpenMP] Replace nvvm.annotation usage with kernel calling conventions (#122320)
Specifying a kernel with the `ptx_kernel` or `amdgpu_kernel` calling
convention is a more idiomatic and compile-time performant than using
the `nvvm.annoation !"kernel"` metadata.

Transition OMPIRBuilder to use calling conventions for PTX kernels and
no longer emit `nvvm.annoation`. Update OpenMPOpt to work with kernels
specified via calling convention as well as metadata. Update OpenMP
tests to use the calling conventions.
2025-01-24 16:56:10 -08:00
Jeremy Morse
6292a808b3
[NFC][DebugInfo] Use iterator-flavour getFirstNonPHI at many call-sites (#123737)
As part of the "RemoveDIs" project, BasicBlock::iterator now carries a
debug-info bit that's needed when getFirstNonPHI and similar feed into
instruction insertion positions. Call-sites where that's necessary were
updated a year ago; but to ensure some type safety however, we'd like to
have all calls to getFirstNonPHI use the iterator-returning version.

This patch changes a bunch of call-sites calling getFirstNonPHI to use
getFirstNonPHIIt, which returns an iterator. All these call sites are
where it's obviously safe to fetch the iterator then dereference it. A
follow-up patch will contain less-obviously-safe changes.

We'll eventually deprecate and remove the instruction-pointer
getFirstNonPHI, but not before adding concise documentation of what
considerations are needed (very few).

---------

Co-authored-by: Stephen Tozer <Melamoto@gmail.com>
2025-01-24 13:27:56 +00:00
Mats Jun Larsen
7bb949ec61
[IR][unittests] Replace of PointerType::getUnqual(Type) with opaque version (NFC) (#123901)
Follow up to https://github.com/llvm/llvm-project/issues/123569
2025-01-22 18:02:51 +09:00
Sergio Afonso
9bc8828093
[OMPIRBuilder][MLIR] Add support for target 'if' clause (#122478)
This patch implements support for handling the 'if' clause of OpenMP
'target' constructs in the OMPIRBuilder and updates MLIR to LLVM IR
translation of the `omp.target` MLIR operation to make use of this new
feature.
2025-01-15 10:16:19 +00:00
Sergio Afonso
d0b641b7e2
[OMPIRBuilder] Propagate attributes to outlined target regions (#117875)
This patch copies the target-cpu and target-features attributes of
functions containing target regions into the corresponding outlined
function holding the target region.

This mirrors what is currently being done for all other outlined
functions through the `CodeExtractor` in `OpenMPIRBuilder::finalize()`.
2025-01-14 12:35:50 +00:00
Sergio Afonso
fabc443e93
[OMPIRBuilder] Support runtime number of teams and threads, and SPMD mode (#116051)
This patch introduces a `TargetKernelRuntimeAttrs` structure to hold
host-evaluated `num_teams`, `thread_limit`, `num_threads` and trip count
values passed to the runtime kernel offloading call.

Additionally, kernel type information is used to influence target device
code generation and the `IsSPMD` flag is replaced by `ExecFlags`, which
provides more granularity.
2025-01-14 12:34:37 +00:00
Sergio Afonso
27bc6bdaba
[OMPIRBuilder] Introduce struct to hold default kernel teams/threads (#116050)
This patch introduces the `OpenMPIRBuilder::TargetKernelDefaultAttrs`
structure used to simplify passing default and constant values for
number of teams and threads, and possibly other target kernel-related
information in the future.

This is used to forward values passed to `createTarget` to
`createTargetInit`, which previously used a default unrelated set of
values.
2025-01-14 11:08:55 +00:00
Sergio Afonso
b79ed8729b
[OpenMP][OMPIRBuilder] Handle non-failing calls properly (#115863)
The preprocessor definition used to enable asserts and the one that
`llvm::Error` and `llvm::Expected` use to ensure all created instances are
checked are not the same. By making these checks inside of an `assert` in cases
where errors are not expected, certain build configurations would trigger
runtime failures (e.g. `-DLLVM_ENABLE_ASSERTIONS=OFF
-DLLVM_UNREACHABLE_OPTIMIZE=ON`).

The `llvm::cantFail()` function, which was intended for this use case, is used
by this patch in place of `assert` to prevent these runtime failures. In tests,
new preprocessor definitions based on `ASSERT_THAT_EXPECTED` and
`EXPECT_THAT_EXPECTED` are used instead, to avoid silent failures in release
builds.
2025-01-09 10:28:16 +00:00
Kaviya Rajendiran
d3eb65f15d
[MLIR][OpenMP] Lowering aligned clause to LLVM IR for SIMD directive (#119536)
This patch,
- Added a translation support for aligned clause in SIMD directive by passing the alignment details to "llvm.assume" intrinsic.
- Updated the insertion point for llvm.assume intrinsic call in "OMPIRBuilder.cpp".
- Added a check in aligned clause MLIR lowering, to ensure that the alignment value must be a power of 2.
2025-01-03 16:22:38 +05:30
Krzysztof Parzyszek
03cbe42627
[flang][OpenMP] Rework LINEAR clause (#119278)
The OmpLinearClause class was a variant of two classes, one for when the
linear modifier was present, and one for when it was absent. These two
classes did not follow the conventions for parse tree nodes, (i.e.
tuple/wrapper/union formats), which necessitated specialization of the
parse tree visitor.

The new form of OmpLinearClause is the standard tuple with a list of
modifiers and an object list. The specialization of parse tree visitor
for it has been removed.
Parsing and unparsing of the new form bears additional complexity due to
syntactical differences between OpenMP 5.2 and prior versions: in OpenMP
5.2 the argument list is post-modified, while in the prior versions, the
step modifier was a post-modifier while the linear modifier had an
unusual syntax of `modifier(list)`.

With this change the LINEAR clause is no different from any other
clauses in terms of its structure and use of modifiers. Modifier
validation and all other checks work the same as with other clauses.
2024-12-12 12:19:35 -06:00
Kareem Ergawy
f9734b9df1
[mlir][OpenMP] - MLIR to LLVMIR translation support for delayed privatization of allocatables in omp.target ops (#116576)
This PR adds support to translate the `private` clause from MLIR to
LLVMIR when used on allocatables in the context of an `omp.target` op.

This replaces https://github.com/llvm/llvm-project/pull/113208.

Parent PR: https://github.com/llvm/llvm-project/pull/116770. Only the
latest commit is relevant to the PR.
2024-12-12 14:39:58 +01:00
Krzysztof Parzyszek
cdbd22876b
[flang][OpenMP] Use new modifiers in ALLOCATE clause (#117627)
Again, this simplifies the semantic checks and lowering quite a bit.
Update the check for positive alignment to use a more informative
message, and to highlight the modifier itsef, not the whole clause.
Remove the checks for the allocator expression itself being positive:
there is nothing in the spec that says that it should be positive.

Remove the "simple" modifier from the AllocateT template, since both
simple and complex modifiers are the same thing, only differing in
syntax.
2024-12-02 08:11:49 -06:00
Sergio Afonso
d87964de78
[OpenMP][OMPIRBuilder] Error propagation across callbacks (#112533)
This patch implements an approach to communicate errors between the
OMPIRBuilder and its users. It introduces `llvm::Error` and
`llvm::Expected` objects to replace the values returned by callbacks
passed to `OMPIRBuilder` codegen functions. These functions then check
the result for errors when callbacks are called and forward them back to
the caller, which has the flexibility to recover, exit cleanly or dump a
stack trace.

This prevents a failed callback to leave the IR in an invalid state and
still continue the codegen process, triggering unrelated assertions or
segmentation faults. In the case of MLIR to LLVM IR translation of the
'omp' dialect, this change results in the compiler emitting errors and
exiting early instead of triggering a crash for not-yet-implemented
errors. The behavior in Clang and openmp-opt stays unchanged, since
callbacks will continue always returning 'success'.
2024-10-25 11:30:16 +01:00
Youngsuk Kim
2a65f081b6
[llvm][OpenMPIRBuilderTest] Avoid Type::getPointerTo() (NFC) (#111196)
`llvm::Type::getPointerTo()` is to be deprecated & removed soon.
2024-10-04 16:28:08 -04:00
Jay Foad
eb6e7e8f89
[unittests] Use {} instead of std::nullopt to initialize empty ArrayRef (#109388)
Follow up to #109133.
2024-09-21 10:59:50 +01:00
Abid Qadeer
9e08db796b
[OpenMPIRBuilder] Don't drop debug info for target region. (#80692)
When an outlined function is generated for omp target region, a
corresponding DISubprogram was not being generated. This resulted in all
the debug information for the target region being dropped.

This commit adds DISubprogram for the outlined function if there is one
available for the parent function. It also updates the current debug
location so that the right scope is used for the entries in the outlined
function.

There are places in the OpenMPIRBuilder which changes insertion point but
don't update the debug location accordingly. They cause issue when debug info
is enabled. I have fixed a few that I observed to cause issue. But there may be
more and a systematic cleanup may be required.

With this change in place, I can set source line breakpoint in target
region and run to them in debugger.
2024-09-04 10:16:14 +01:00
Sergio Afonso
84b1e59580
[MLIR][OpenMP][OMPIRBuilder] Add lowering support for omp.target_triples (#100156)
This patch modifies MLIR to LLVM IR lowering of the OpenMP dialect to take into
consideration the contents of the `omp.target_triples` module attribute while
generating code for `omp.target` operations.

It adds the `OpenMPIRBuilderConfig::TargetTriples` field and initializes it
using the `amendOperation` flow of the `OpenMPToLLVMIRTranslation` pass. Some
changes are introduced into the `OpenMPIRBuilder` to allow passing the
information about whether a target region is intended to be offloaded from
outside.

The result of this change is that offloading calls are only generated when the
`--offload-arch` or `-fopenmp-targets` options are given to the compiler.
Otherwise, only the host fallback code is generated. This fixes linker errors
currently triggered by `flang-new` if a source file containing a `target`
construct is compiled without any of the aforementioned options.

Several unit tests impacted by these changes, which are intended to check host
code generated for `omp.target` operations, are updated to contain the new
attribute. Without it, no calls to `__tgt_target_kernel` and associated control
flow operations are generated.

Fixes #100209.
2024-08-02 11:58:40 +01:00
Pranav Bhandarkar
5b4e5f8ac6
[OpenMPIRBuilder][Clang][NFC] - Combine emitOffloadingArrays and emitOffloadingArraysArgument in OpenMPIRBuilder (#97088)
This patch introduces a new interface in `OpenMPIRBuilder` that combines
the creation of the so-called offloading pointer arrays and their
subsequent preparation as arguments to the OpenMP runtime library. We
then use this in Clang. 

This is intended to be used in the near future
by other frontends such as Flang when lowering MLIR to LLVMIR.
2024-07-25 16:28:11 -05:00
Krzysztof Parzyszek
a0c590795e
[Frontend][OpenMP] Allow implicit clauses to fail to apply (#100460)
The `linear(x)` clause implies `firstprivate(x)` on the compound
construct if `x` is not an induction variable. With more construct
combinations coming in OpenMP 6.0, the `firstprivate` clause may not be
possible to apply, e.g. in "masked simd".
An additional benefit from this change is that it allows treating leaf
constructs as combined constructs with a single constituent. Otherwise,
a `linear` clause on a lone `simd` construct could imply a
`firstprivate` clause that can't be applied.
2024-07-25 09:20:18 -05:00
harishch4
b4ab52c8e7
[Flang][OpenMP] Lowering Order clause to MLIR (#96730) 2024-06-27 11:58:12 +05:30
Akash Banerjee
6b1c51bc05
[OpenMP] Migrate GPU Reductions CodeGen from Clang to OMPIRBuilder (#80343)
This patch migrates the CGOpenMPRuntimeGPU::emitReduction and related functions to the OpenMPIRBUilder. In future patches MLIR OpenMP translation would be making use of these functions.

Co-authored-by: Jan Leyonberg <jan.leyonberg@amd.com>
2024-06-26 20:18:38 +01:00
Stephen Tozer
d75f9dd1d2 Revert "[IR][NFC] Update IRBuilder to use InsertPosition (#96497)"
Reverts the above commit, as it updates a common header function and
did not update all callsites:

  https://lab.llvm.org/buildbot/#/builders/29/builds/382

This reverts commit 6481dc57612671ebe77fe9c34214fba94e1b3b27.
2024-06-24 18:00:22 +01:00
Stephen Tozer
6481dc5761
[IR][NFC] Update IRBuilder to use InsertPosition (#96497)
Uses the new InsertPosition class (added in #94226) to simplify some of
the IRBuilder interface, and removes the need to pass a BasicBlock
alongside a BasicBlock::iterator, using the fact that we can now get the
parent basic block from the iterator even if it points to the sentinel.
This patch removes the BasicBlock argument from each constructor or call
to setInsertPoint.

This has no functional effect, but later on as we look to remove the
`Instruction *InsertBefore` argument from instruction-creation
(discussed
[here](https://discourse.llvm.org/t/psa-instruction-constructors-changing-to-iterator-only-insertion/77845)),
this will simplify the process by allowing us to deprecate the
InsertPosition constructor directly and catch all the cases where we use
instructions rather than iterators.
2024-06-24 17:27:43 +01:00
Krzysztof Parzyszek
eb88e7c1d9
[Frontend][OpenMP] Remove reduction from allowed clauses for target (#90754)
The "reduction" clause is not allowed on the "target" construct.
2024-05-30 07:56:58 -05:00
Tom Eccles
74a87548e5
[flang][MLIR][OpenMP] make reduction by-ref toggled per variable (#92244)
Fixes #88935

Toggling reduction by-ref broke when multiple reduction clauses were
used. Decisions made for the by-ref status for later clauses could then
invalidate decisions for earlier clauses. For example,

```
reduction(+:scalar,scalar2) reduction(+:array)
```

The first clause would choose by value reduction and generate by-value
reduction regions, but then after this the second clause would force
by-ref to support the array argument. But by the time the second clause
is processed, the first clause has already had the wrong kind of
reduction regions generated.

This is solved by toggling whether a variable should be reduced by
reference per variable. In the above example, this allows only `array`
to be reduced by ref.
2024-05-16 15:27:59 +01:00
Krzysztof Parzyszek
4ec4a8e7fe
[Frontend][OpenMP] Privatizing clauses in construct decomposition (#92176)
Add remaining clauses with the "privatizing" property to construct
decomposition, specifically to the part handling the `allocate` clause.

---------

Co-authored-by: Tom Eccles <t@freedommail.info>
2024-05-15 09:04:07 -05:00
Krzysztof Parzyszek
be7c9e3957 [flang][OpenMP] Decompose compound constructs, do recursive lowering (#90098)
A compound construct with a list of clauses is broken up into individual
leaf/composite constructs. Each such construct has the list of clauses
that apply to it based on the OpenMP spec.

Each lowering function (i.e. a function that generates MLIR ops) is now
responsible for generating its body as described below.

Functions that receive AST nodes extract the construct, and the clauses
from the node. They then create a work queue consisting of individual
constructs, and invoke a common dispatch function to process (lower) the
queue.

The dispatch function examines the current position in the queue, and
invokes the appropriate lowering function. Each lowering function
receives the queue as well, and once it needs to generate its body, it
either invokes the dispatch function on the rest of the queue (if any),
or processes nested evaluations if the work queue is at the end.

Re-application of ca1bd5995f6ed934f9187305190a5abfac049173 with fixes for
compilation errors.
2024-05-13 10:32:16 -05:00
Krzysztof Parzyszek
25a3ba3315 Revert "[flang][OpenMP] Decompose compound constructs, do recursive lowering (#90098)"
It breaks some builds, e.g.
https://lab.llvm.org/buildbot/#/builders/268/builds/13909

This reverts commit ca1bd5995f6ed934f9187305190a5abfac049173.
2024-05-13 08:43:45 -05:00
Krzysztof Parzyszek
ca1bd5995f
[flang][OpenMP] Decompose compound constructs, do recursive lowering (#90098)
A compound construct with a list of clauses is broken up into individual
leaf/composite constructs. Each such construct has the list of clauses
that apply to it based on the OpenMP spec.

Each lowering function (i.e. a function that generates MLIR ops) is now
responsible for generating its body as described below.

Functions that receive AST nodes extract the construct, and the clauses
from the node. They then create a work queue consisting of individual
constructs, and invoke a common dispatch function to process (lower) the
queue.

The dispatch function examines the current position in the queue, and
invokes the appropriate lowering function. Each lowering function
receives the queue as well, and once it needs to generate its body, it
either invokes the dispatch function on the rest of the queue (if any),
or processes nested evaluations if the work queue is at the end.
2024-05-13 08:09:24 -05:00
Krzysztof Parzyszek
4631e7bad6
[Frontend][OpenMP] Add unit tests for getLeafConstructsOrSelf, NFC (#90110) 2024-04-30 11:46:30 -05:00
Krzysztof Parzyszek
d577518d98
[Frontend][OpenMP] Implement getLeafOrCompositeConstructs (#89104)
This function will break up a construct into constituent leaf and
composite constructs, e.g. if OMPD_c_d_e and OMPD_d_e are composite
constructs, then OMPD_a_b_c_d_e will be broken up into the list {OMPD_a,
OMPD_b, OMPD_c_d_e}.
2024-04-24 08:03:36 -05:00
Krzysztof Parzyszek
70d3ddb280
[Frontend][OpenMP] Add functions for checking construct type (#87258)
Implement helper functions to identify leaf, composite, and combined
constructs.
2024-04-23 08:10:40 -05:00
Krzysztof Parzyszek
40137ff0d8
[Frontend][OpenMP] Refactor getLeafConstructs, add getCompoundConstruct (#87247)
Emit a special leaf construct table in DirectiveEmitter.cpp, which will
allow both decomposition of a construct into leafs, and composition of
constituent constructs into a single compound construct (if possible).
The function `getLeafConstructs` is no longer auto-generated, but
implemented in OMP.cpp.

The table contains a row for each directive, and each row has the
following format
`dir_id, num_leafs, leaf1, leaf2, ..., leafN, -1, ...`
The rows are sorted lexicographically with respect to the leaf
constructs. This allows a binary search for the row corresponding to the
given list of leafs.

There is an auxiliary table that for each directive contains the index
of the row corresponding to that directive.

Looking up leaf constructs for a directive `dir_id` is of constant time,
and and consists of two lookups: `LeafTable[Auxiliary[dir_id]]`.
Finding a compound directive given the set of leafs is of time O(logn),
and is roughly represented by
`row = binary_search(LeafTable); return row[0]`.

The functions `getLeafConstructs` and `getCompoundConstruct` use these
lookup methods internally.
2024-04-22 14:41:11 -05:00
Sergio Afonso
3eb0ba34b0
[MLIR][Flang][OpenMP] Make omp.simdloop into a loop wrapper (#87365)
This patch updates the definition of `omp.simdloop` to enforce the
restrictions of a wrapper operation. It has been renamed to `omp.simd`,
to better reflect the naming used in the spec. All uses of "simdloop" in
function names have been updated accordingly.

Some changes to Flang lowering and OpenMP to LLVM IR translation are
introduced to prevent the introduction of compilation/test failures. The
eventual long term solution might be different.
2024-04-17 11:28:30 +01:00
Joseph Huber
470aefb240
[Offload][NFC] Remove omp_ prefix from offloading entries (#88071)
Summary:
These entires are generic for offloading with the new driver now. Having
the `omp` prefix was a historical artifact and is confusing when used
for CUDA. This patch just renames them for now, future patches will
rework the binary format to make it more common.
2024-04-09 15:50:15 -05:00
Akash Banerjee
e9da5f0083
[OpenMP] Fix target data region codegen being omitted for device pass (#85218)
This patch enables the BodyCodeGen callback to still trigger for the
TargetData nested region during the device pass. There maybe Target code
nested within the TargetData region for which this is required.

Also add tests for the same.
2024-03-19 13:04:23 +00:00
Leandro Lupori
64422cf826
[llvm][mlir][OMPIRBuilder] Translate omp.single's copyprivate (#80488)
Use the new copyprivate list from omp.single to emit calls to
__kmpc_copyprivate, during the creation of the single operation
in OMPIRBuilder.

This is patch 4 of 4, to add support for COPYPRIVATE in Flang.
Original PR: https://github.com/llvm/llvm-project/pull/73128
2024-02-28 13:33:42 -03:00
agozillon
dcf4ca558c
[OpenMP][MLIR][OMPIRBuilder] Add a small optional constant alloca raise function pass to finalize, utilised in convertTarget (#78818)
This patch seeks to add a mechanism to raise constant (not ConstantExpr
or runtime/dynamic) sized allocations into the entry block for select
functions that have been inserted into a list for processing. This
processing occurs during the finalize call, after OutlinedInfo regions
have completed. This currently has only been utilised for
createOutlinedFunction, which is triggered for TargetOp generation in
the OpenMP MLIR dialect lowering to LLVM-IR.

This currently is required for Target kernels generated by
createOutlinedFunction to avoid subsequent optimization passes doing
some unintentional malformed optimizations for AMD kernels (unsure if it
occurs for other vendors). If the allocas are generated inside of the
kernel and are not in the entry block and are subsequently passed to a
function this can lead to required instructions being erased or
manipulated in a way that causes the kernel to run into a HSA access
error.

This fix is related to a series of problems found in:
https://github.com/llvm/llvm-project/issues/74603

This problem primarily presents itself for Flang's HLFIR AssignOp
currently, when utilised with a scalar temporary constant on the RHS and
a descriptor type on the LHS. It will generate a call to a runtime
function, wrap the RHS temporary in a newly allocated descriptor (an
llvm struct), and pass both the LHS and RHS descriptor into the runtime
function call. This will currently be
embedded into the middle of the target region in the user entry block,
which means the allocas are also embedded in the middle, which seems to
pose
issues when later passes are executed. This issue may present itself in
other HLFIR operations or unrelated operations that generate allocas as
a by product, but for the moment, this one test case is the only
scenario I've found this problem.

Perhaps this is not the appropriate fix, I am very open to other
suggestions, I've tried a few others (at varying levels of the
flang/mlir compiler flow), but this one is the smallest and least
intrusive change set. The other two, that come to mind (but I've not
fully looked into, the former I tried a little with blocks but it had a
few issues I'd need to think through):

- Having a proper alloca only block (or region) generated for TargetOps
that we could merge into the entry block that's generated by
convertTarget's createOutlinedFunction.
- Or diverging a little from Clang's current target generation and using
the CodeExtractor to generate the user code as an outlined function
region invoked from the kernel we make, with our kernel arguments passed
into it. Similar to the current parallel generation. I am not sure how
well this would intermingle with the existing parallel generation though
that's layered in.

Both of these methods seem like quite a divergence from the current
status quo, which I am not entirely sure is merited for the small test
this change aims to fix.
2024-02-23 22:59:41 +01:00
Joseph Huber
cc374d8056
[OpenMP] Remove register_requires global constructor (#80460)
Summary:
Currently, OpenMP handles the `omp requires` clause by emitting a global
constructor into the runtime for every translation unit that requires
it. However, this is not a great solution because it prevents us from
having a defined order in which the runtime is accessed and used.

This patch changes the approach to no longer use global constructors,
but to instead group the flag with the other offloading entires that we
already handle. This has the effect of still registering each flag per
requires TU, but now we have a single constructor that handles
everything.

This function removes support for the old `__tgt_register_requires` and
replaces it with a warning message. We just had a recent release, and
the OpenMP policy for the past four releases since we switched to LLVM
is that we do not provide strict backwards compatibility between major
LLVM releases now that the library is versioned. This means that a user
will need to recompile if they have an old binary that relied on
`register_requires` having the old behavior. It is important that we
actively deprecate this, as otherwise it would not solve the problem of
having no defined init and shutdown order for `libomptarget`. The
problem of `libomptarget` not having a define init and shutdown order
cascades into a lot of other issues so I have a strong incentive to be
rid of it.

It is worth noting that the current `__tgt_offload_entry` only has space
for a 32-bit integer here. I am planning to overhaul these at some point
as well.
2024-02-21 11:33:32 -06:00
Kazu Hirata
5c9d82de6b [llvm] Use StringRef::{starts,ends}_with (NFC)
This patch replaces uses of StringRef::{starts,ends}with with
StringRef::{starts,ends}_with for consistency with
std::{string,string_view}::{starts,ends}_with in C++20.

I'm planning to deprecate and eventually remove
StringRef::{starts,ends}with.
2023-12-13 22:46:02 -08:00
Dominik Adamski
bb4484d41e
[OpenMPIRBuilder] Add support for target workshare loops (#73360)
The workshare loop for target region uses the new OpenMP device runtime.
The code generation scheme for the new device runtime is presented
below:

Input code:
```
workshare-loop {
  loop-body
}
```

Output code:
helper function which represents loop body:
```
function-loop-body(counter, loop-body-args) {
  loop-body
}
```
workshare-loop is replaced by the proper device runtime call:
```
call __kmpc_new_worksharing_rtl(function-loop-body, loop-body-args,
                                                     loop-tripcount, ...)
```
This PR uses the new device runtime functions which were added in PR:
https://github.com/llvm/llvm-project/pull/73225
2023-12-06 09:47:09 +01:00
Fangrui Song
dd3184c30f [unittest,examples] Replace uses of IRBuilder::getInt8PtrTy with getPtrTy. NFC 2023-11-27 08:29:13 -08:00
Paulo Matos
7b9d73c2f9
[NFC] Remove Type::getInt8PtrTy (#71029)
Replace this with PointerType::getUnqual().
Followup to the opaque pointer transition. Fixes an in-code TODO item.
2023-11-07 17:26:26 +01:00
Dominik Adamski
2cce0f6c57
[OpenMP][OMPIRBuilder] Add support to omp target parallel (#67000)
Added support for LLVM IR code generation which is used for handling omp
target parallel code. The call for __kmpc_parallel_51 is generated and
the parallel region is outlined to separate function.

The proper setup of kmpc_target_init mode is not included in the commit.
It is assumed that the SPMD mode for target initialization is properly
set by other codegen functions.
2023-11-06 11:44:00 +01:00