1902 Commits

Author SHA1 Message Date
Valentin Clement (バレンタイン クレメン)
f4d87c42a6
[flang][cuda] Add asyncId to allocate entry point (#134947) 2025-04-09 10:52:02 -07:00
vdonaldson
8ebc98c3b0
[flang] Update IEEE_SUPPORT_FLAG implementation (#134937)
Optional argument X in an IEEE_SUPPORT_FLAG(FLAG, X) call may be an
array.
2025-04-08 21:20:21 -04:00
Tom Eccles
4c09ae0b2e
[flang][OpenMP] Lowering for CANCEL and CANCELLATIONPOINT (#134248)
These will still hit TODOs in OpenMPToLLVMIRConversion.cpp
2025-04-08 10:29:18 +01:00
Tom Eccles
446d4f51eb
[flang][OpenMP][Lower] fix statement context cleanup insertion point (#133891)
The statement context is used for lowering clauses for openmp operations
using generalised helpers from flang lowering. The statement context
stores closures which generate code for cleaning up temporary values
generated by the lowering helper. These closures are run when the
statement construct is destroyed. Keeping the statement context local to
the clause or operation being lowered without any special handling was
not correct because any cleanup code would be generated at the insertion
point when that statement context went out of scope (which would in
general be inside of the newly created container operation). It would be
better to generate the cleanup code after the newly created operation
(clause processing is synchronous even for deferred tasks).

Currently supported clauses are mostly populated with simple scalar
values that require no cleanup. Even the simple array sections added by
#132994 needed no cleanup because indexing the right values of the array
did not create any temporaries. Supporting array sections with vector
indexing will generate hlfir.destroy operations for cleanup. This patch
fixes where those will be created. Those hlfir.destroy operations don't
generate any FIR (or LLVM) code, but the issue still exists
theoretically.

I wasn't able to find any clauses which have any cleanup to use to test
this PR. It is probably NFC for the current lowering. This will be
tested in [the PR adding vector subscripting of array
sections](https://github.com/llvm/llvm-project/pull/133892).
2025-04-08 10:27:27 +01:00
vdonaldson
c1c0d551ba
[flang] Non-type-bound defined IO lowering for an array of derived type (#134667)
Update Non-type-bound IO lowering to call OutputDerivedType for an array
of derived type (rather than OutputDescriptor).
2025-04-07 13:24:48 -04:00
Leandro Lupori
01ec74dfd0
[flang][OpenMP] Fix copyprivate of procedure pointers (#134292)
Just modify the assert to consider fir::BoxProcType as valid. No
other changes are needed.

Fixes #131549
2025-04-07 13:18:07 -03:00
Michael Klemm
7fa388d77b
[Flang][OpenMP] Fix bug with default(none) and host-assoc threadprivate variable (#134122)
When a host associated `threadprivate` variable was used in a parallel
region with `default(none)` in an internal subroutine was failing,
because the compiler did not properly determine that the variable was
pre-determined `threadprivate` and thus should not have been reported as
missing a DSA.
2025-04-07 17:20:17 +02:00
Zhen Wang
8f0d8d28cc
Delete duplicated hlfir.declare op of induction variables of do concurrent when inside cuf kernel directive. (#134467)
Delete duplicated creation of hlfir.declare op of do concurrent
induction variables when inside cuf kernel directive.
Obtain the correct hlfir.declare op generated from bindSymbol, and add
it to ivValues.
2025-04-06 19:31:09 -07:00
Valentin Clement (バレンタイン クレメン)
24dfcc0c02
[flang][cuda] Use the nvvm.vote.sync op for all and any (#134433)
NVVM operations are now available for all and any as well. Use the op
and clean up the generation function to handle all the 3 vote sync
kinds.
2025-04-04 13:45:03 -07:00
Eugene Epshteyn
61af05fe82
[flang] Add runtime and lowering implementation for extended intrinsic PUTENV (#134412)
Implement extended intrinsic PUTENV, both function and subroutine forms.
Add PUTENV documentation to flang/docs/Intrinsics.md. Add functional and
semantic unit tests.
2025-04-04 16:26:08 -04:00
Valentin Clement (バレンタイン クレメン)
cd2f85a24b
[mlir][NVVM] Add ops for vote all and any sync (#134309)
Add operations for `nvvm.vote.all.sync` and `nvvm.vote.any.sync`
intrinsics similar to `nvvm.vote.ballot.sync`.
2025-04-04 11:06:10 -07:00
Asher Mancinelli
85fd83ed49
[flang][nfc] Use llvm memmove intrinsic over regular call (#134294)
Follow up to #134170.

We should be using the LLVM intrinsics instead of plain fir.calls when
we can. Existing code creates a declaration for the llvm intrinsic and
a regular fir.call, which makes it hard for consumers of the IR to find
all the intrinsic calls.
2025-04-04 06:13:30 -07:00
Kareem Ergawy
6333f8457c
[flang][OpenMP] Move reductions from loop to teams when loop is mapped to distribute (#132920)
Follow-up to #132003, in particular, see
https://github.com/llvm/llvm-project/pull/132003#issuecomment-2739701936.

This PR extends reduction support for `loop` directives. Consider the
following scenario:
```fortran
subroutine bar
  implicit none
  integer :: x, i

  !$omp teams loop reduction(+: x)
  DO i = 1, 5
    call foo()
  END DO
end subroutine
```
Note the following:
* According to the spec, the `reduction` clause will be attached to
`loop` during earlier stages in the compiler.
* Additionally, `loop` cannot be mapped to `distribute parallel for` due
to the call to a foreign function inside the loop's body.
* Therefore, `loop` must be mapped to `distribute`.
* However, `distribute` does not have `reduction` clauses.
* As a result, we have to move the `reduction`s from the `loop` to its
parent `teams` directive, which is what is done by this PR.
2025-04-04 06:20:51 +02:00
Andre Kuhlenschmidt
b11eece1bb
[flang][intrinsics] Implement the time intrinsic (#133823)
This PR implements the nonstandard intrinsic time.

In addition to running the unit tests, I also double checked that the
example code works by manually compiling and running it.
2025-04-03 15:33:40 -07:00
Andre Kuhlenschmidt
85fdab33b0
[flang][intrinsic] add nonstandard intrinsic unlink (#134162)
This PR adds the intrinsic `unlink` to flang. 

## Test plan
- Added two codegen unit tests and ensured flang-check continues to
pass.
- Manually compiled and ran the example from the documentation.
2025-04-03 14:33:53 -07:00
Valentin Clement (バレンタイン クレメン)
fb6f60ddc5
[flang][cuda][NFC] Use NVVM VoteBallotOp (#134307)
`llvm.nvvm.vote.ballot.sync` has its own operation so use it in
lowering.
2025-04-03 14:19:31 -07:00
Valentin Clement (バレンタイン クレメン)
de40f6101d
[flang][cuda][NFC] Use NVVM op for match all (#134303) 2025-04-03 14:19:21 -07:00
Valentin Clement (バレンタイン クレメン)
7288f1bc32
[flang][cuda] Use nvvm operation for match any (#134283)
The string used for intrinsic was not the correct one
"llvm.nvvm.match.any.sync.i32p". There was an extra `p` at the end.

Use the NVVM operation instead so we don't duplicate it.
2025-04-03 12:08:30 -07:00
Slava Zakharin
3f6ae3f0a8
[flang] Added driver options for arrays repacking. (#134002)
Added options:
  * -f[no-]repack-arrays
  * -f[no-]stack-repack-arrays
  * -frepack-arrays-contiguity=whole/innermost
2025-04-03 10:43:28 -07:00
Valentin Clement (バレンタイン クレメン)
3e59ff27e5
[flang][cuda] Fix pred type for vote functions (#134166) 2025-04-03 10:33:09 -07:00
Asher Mancinelli
d7d91500b6
[flang][nfc] Initial changes needed to use llvm intrinsics instead of regular calls (#134170)
Flang uses `fir.call <llvm intrinsic>` in a few places. This means
consumers of the IR need to strcmp every fir.call if they want to find a
particular LLVM intrinsic.
Emit LLVM memcpy intrinsics instead.
2025-04-03 08:37:40 -07:00
Sergio Afonso
18dd299fb1
[Flang][MLIR][OpenMP] Host-evaluation of omp.loop bounds (#133908)
This patch updates Flang lowering and kernel flags identification in
MLIR so that loop bounds on `target teams loop` constructs are evaluated
on the host, making the trip count available to the corresponding
`__tgt_target_kernel` call emitted for the target region.

This is necessary in order to properly execute these constructs as
`target teams distribute parallel do`.

Co-authored-by: Kareem Ergawy <kareem.ergawy@amd.com>
2025-04-03 15:06:19 +01:00
Valentin Clement (バレンタイン クレメン)
db21ae7803
[flang][cuda] Support any_sync and ballot_sync (#134135) 2025-04-02 14:26:09 -07:00
Krzysztof Parzyszek
564e04b703
[flang][OpenMP] Use function symbol on DECLARE TARGET (#134107)
Consider:
```
function foo()
  !$omp declare target(foo) ! This `foo` was a function-result symbol
  ...
end
```
When resolving symbols, for this case use the symbol corresponding to
the function instead of the symbol corresponding to the function result.

Currently, this will result in an error:
```
error: A variable that appears in a DECLARE TARGET directive must be
declared in the scope of a module or have the SAVE attribute, either
explicitly or implicitly
```
2025-04-02 15:16:33 -05:00
Juan Manuel Martinez Caamaño
beae0e9f1a
[AMDGPU] Use a target feature to enable __builtin_amdgcn_global_load_lds on gfx9/10 (#133055)
This patch introduces the `vmem-to-lds-load-insts` target feature, which
can be used to enable builtins `__builtin_amdgcn_global_load_lds` and
`__builtin_amdgcn_raw_ptr_buffer_load_lds` on platforms which have this
feature.

This feature is only available on gfx9/10.

A limitation of using a common target feature for both builtins is that
we could have made `__builtin_amdgcn_raw_ptr_buffer_load_lds` available
on gfx6,7,8.
2025-04-02 20:00:09 +02:00
Jean-Didier PAILLEUX
c309abd925
[flang] Implement !DIR$ NOVECTOR and !DIR$ NOUNROLL[_AND_JAM] (#133885)
Hi,
This patch implements support for the following directives :
- `!DIR$ NOUNROLL_AND_JAM` to disable unrolling and jamming on a DO
LOOP.
- `!DIR$ NOUNROLL` to disable unrolling on a DO LOOP.
- `!DIR$ NOVECTOR` to disable vectorization on a DO LOOP.
2025-04-02 14:30:01 +02:00
Tom Eccles
9b2fd1a6ec
[flang][OpenMP] Bump default OpenMP version to 3.1 (#133745)
Precise OpenMP standards support information is being documented in
#132707

Flang now has good support for OpenMP Version 3.1 and earlier.
2025-04-02 10:43:48 +01:00
Valentin Clement (バレンタイン クレメン)
ae8dd63681
[flang][cuda] Add interface and lowering for all_sync (#134001) 2025-04-01 17:59:11 -07:00
Jean-Didier PAILLEUX
513a91a5f1
[flang/flang-rt] Implement PERROR intrinsic form GNU Extension (#132406)
Add the implementation of the `PERROR(STRING) ` intrinsic from the GNU
Extension to prints on the stderr a newline-terminated error message
corresponding to the last system error prefixed by `STRING`.
(https://gcc.gnu.org/onlinedocs/gfortran/PERROR.html)
2025-04-01 15:47:54 +02:00
Tom Eccles
e17d864f55
[flang][OpenMP][Lower] lower array subscripts for task depend (#132994)
The OpenMP standard says that all dependencies in the same set of
inter-dependent tasks must be non-overlapping. This simplification means
that the OpenMP only needs to keep track of the base addresses of
dependency variables. This can be seen in kmp_taskdeps.cpp, which stores
task dependency information in a hash table, using the base address as a
key.

This patch generates a rebox operation to slice boxed arrays, but only
the box data address is used for the task dependency. The extra box is
optimized away by LLVM at O3.

Vector subscripts are TODO (I will address in my next patch).

This also fixes a bug for ordinary subscripts when the symbol was mapped
to a box:

Fixes #132647
2025-04-01 10:26:14 +01:00
Jean-Didier PAILLEUX
bae3577002
[flang] Define ERF, ERFC and ERFC_SCALED intrinsics with Q and D prefix (#125217)
`ERF`, `ERFC` and `ERFC_SCALED` intrinsics prefixed by `Q` and `D` are
missing. Codes such as `CP2K`(https://github.com/cp2k/cp2k) and
`TurboRVB`(https://github.com/sissaschool/turborvb) use these intrinsics
just like defined in the GNU standard and here:
https://www.ibm.com/docs/fr/xl-fortran-aix/16.1.0?topic=reference-intrinsic-procedures
These intrinsics are based on the existing intrinsics but apply a
restriction on the type kind.

- `DERF`, `DERFC` and `DERFC_SCALED` are for double précision only.
- `QERF`, `QERFC` and `QERFC_SCALED` are for quad précision only.
2025-04-01 08:07:26 +02:00
Thirumalai Shaktivel
091dcb8fc2
[Flang] Make a private copy for the common block variables in copyin clause (#111359)
Fixes: https://github.com/llvm/llvm-project/issues/82949
2025-04-01 11:35:44 +05:30
Thirumalai Shaktivel
374a5bea52
[Flang][OpenMP] Add PointerAssociateScalar to Cray Pointer used in the DSA (#133232)
Issue: Cray Pointer is not associated to Cray Pointee, leading to
Segmentation fault

Fix: GetUltimate, retrieves the base symbol in the current scope, which
gets passed all the references and returns the original symbol

---------

Co-authored-by: Michael Klemm <michael.klemm@amd.com>
2025-03-29 15:39:12 +01:00
swatheesh-mcw
fe30cf18ab
Revert "Revert "[flang][openmp] Adds Parser and Semantic Support for Interop Construct, and Init and Use Clauses."" (#132343)
Reverts llvm/llvm-project#132005
2025-03-28 15:21:52 +00:00
Nick Sarnie
48b7530273
[clang][flang][Triple][llvm] Add isOffload function to LangOpts and isGPU function to Triple (#126956)
I'm adding support for SPIR-V, so let's consolidate these checks.

---------

Signed-off-by: Sarnie, Nick <nick.sarnie@intel.com>
2025-03-28 14:19:20 +00:00
Joseph Huber
772173f548
[Clang][AMDGPU] Remove special handling for COV4 libraries (#132870)
Summary:
When we were first porting to COV5, this lead to some ABI issues due to
a change in how we looked up the work group size. Bitcode libraries
relied on the builtins to emit code, but this was changed between
versions. This prevented the bitcode libraries, like OpenMP or libc,
from being used for both COV4 and COV5. The solution was to have this
'none' functionality which effectively emitted code that branched off of
a global to resolve to either version.

This isn't a great solution because it forced every TU to have this
variable in it. The patch in
https://github.com/llvm/llvm-project/pull/131033 removed support for
COV4 from OpenMP, which was the only consumer of this functionality.
Other users like HIP and OpenCL did not use this because they linked the
ROCm Device Library directly which has its own handling (The name was
borrowed from it after all).

So, now that we don't need to worry about backward compatibility with
COV4, we can remove this special handling. Users can still emit COV4
code, this simply removes the special handling used to make the OpenMP
device runtime bitcode version agnostic.
2025-03-28 07:35:16 -05:00
Eugene Epshteyn
2c8e26081f
[flang] Add HOSTNM runtime and lowering intrinsics implementation (#131910)
Implement GNU extension intrinsic HOSTNM, both function and subroutine
forms. Add HOSTNM documentation to `flang/docs/Intrinsics.md`. Add
lowering and semantic unit tests.

(This change is modeled after GETCWD implementation.)
2025-03-25 13:17:17 -04:00
vdonaldson
92e0560347
[flang] ieee_denorm (#132307)
Add support for the nonstandard ieee_denorm exception for real kinds 3,
4, 8 on x86 processors.
2025-03-25 13:02:43 -04:00
Leandro Lupori
ef56f4b5a0
[flang][OpenMP] Fix reduction of arrays with non-default lower bounds (#132228)
Using LoopNest's indices with ShapeShifts that have non-default
lower bounds results in accesses to incorrect array elements.
To avoid having to adjust each index, a ShapeShift with default
lower bounds can be used instead.

Fixes #131751
2025-03-24 09:48:41 -03:00
Kareem Ergawy
4fa5ab382e
[flang][OpenMP] Skip multi-block teams regions when processing loop directives (#132687)
Fixes a regression when the generic `loop` directive conversion pass
encounters a multi-block `teams` region. At the moment, we skip such
regions.
2025-03-24 07:43:11 -05:00
Kareem Ergawy
c031579cb7
[flang][OpenMP] Hoist reduction info from nested loop ops to parent teams ops (#132003)
Fixes a bug in reductions when converting `teams loop` constructs with
`reduction` clauses.

According to the spec (v5.2, p340, 36):

```
The effect of the reduction clause is as if it is applied to all leaf
constructs that permit the clause, except for the following constructs:
* ....
* The teams construct, when combined with the loop construct.
```

Therefore, for a combined directive similar to: `!$omp teams loop
reduction(...)`, the earlier stages of the compiler assign the
`reduction` clauses only to the `loop` leaf and not to the `teams` leaf.

On the other hand, if we have a combined construct similar to: `!$omp
teams distribute parallel do`, the `reduction` clauses are assigned both
to the `teams` and the `do` leaves. We need to match this behavior when
we convert `teams` op with a nested `loop` op since the target set of
constructs/ops will be incorrect without moving the reductions up to the
`teams` op as well.

This PR introduces a pattern that does exactly this. Given the following
input:
```
omp.teams {
  omp.loop reduction(@red_sym %red_op -> %red_arg : !fir.ref<i32>) {
    omp.loop_nest ... {
      ...
    }
  }
}
```
this pattern updates the `omp.teams` op in-place to:
```
omp.teams reduction(@red_sym %red_op -> %teams_red_arg : !fir.ref<i32>) {
  omp.loop reduction(@red_sym %teams_red_arg -> %red_arg : !fir.ref<i32>) {
    omp.loop_nest ... {
      ...
    }
  }
}
```

Note the following:
* The nested `omp.loop` is not rewritten by this pattern, this happens
through `GenericLoopConversionPattern`.
* The reduction info are cloned from the nested `omp.loop` op to the
parent `omp.teams` op.
* The reduction operand of the `omp.loop` op is updated to be the
**new** reduction block argument of the `omp.teams` op.
2025-03-21 14:12:15 -05:00
jeanPerier
42d49a1a5b
[flang] fix openmp tests after #132261 (#132391)
I missed these tests in my test update after #132261 because they are
under `! REQUIRES: openmp_runtime`.
Broke build bots:
https://lab.llvm.org/buildbot/#/builders/50/builds/11789
2025-03-21 14:04:37 +01:00
jeanPerier
44261dae5b
[flang][NFC] use hlfir.declare first result when both results are raw pointers (#132261)
Currently, the helpers to get fir::ExtendedValue out of hlfir::Entity
use hlfir.declare second result (`#1`) in most cases. This is because
this result is the same as the input and matches what FIR was getting
before lowering to HLFIR.

But this creates odd situations when both hlfir.declare are raw pointers
and either result ends-up being used in the IR depending on whether the
code was generated by a helper using fir::ExtendedValue, or via "pure
HLFIR" helpers using the first result.

This will typically prevent simple CSE and easy identification that two
operation (e.g load/store) are touching the exact same memory location
without using alias analysis or "manual detection" (looking for common
hlfir.declare defining op).

Hence, when hlfir.declare results are both raw pointers, use `#0` when
producing `fir::ExtendedValue`.
When `#0` is a fir.box, keep using `#1` because these are not the same. 
The only code change is in HLFIRTools.cpp and is pretty small, but there
is a big test fallout of `#1` to `#0`.
2025-03-21 11:41:04 +01:00
Sergio Afonso
b231f6f862
[MLIR][OpenMP] Improve omp.map.info verification (#132066)
This patch makes the `map_type` and `map_capture_type` arguments of the
`omp.map.info` operation required, which was already an invariant being
verified by its users via `verifyMapClause()`. This makes it clearer, as
getters no longer return misleading `std::optional` values.

Checks for the `mapper_id` argument are moved to a verifier for the
operation, rather than being checked by users.

Functionally NFC, but not marked as such due to a reordering of
arguments in the assembly format of `omp.map.info`.
2025-03-20 15:48:45 +00:00
Slava Zakharin
2c91f10362
[flang] Fixed repacking for TARGET and INTENT(OUT) (#131972)
TARGET dummy arrays can be accessed indirectly, so it is unsafe
to repack them.
INTENT(OUT) dummy arrays that require finalization on entry
to their subroutine must be copied-in by `fir.pack_arrays`.

In addition, based on my testing results, I think it will be useful
to document that `LOC` and `IS_CONTIGUOUS` will have different values
for the repacked arrays. I still need to decide where to document
this, so just added a note in the design doc for the time being.
2025-03-19 17:12:32 -07:00
Peter Klausler
a0ae88bffe
[flang] Disable 3 x86-64 tests on non-x86-64 (#132088)
Now that COMPLEX(KIND=10) is properly disabled where 80-bit
floating-point types are not available, three tests that were not
peculiar to x86-64 are failing on other targets. Make them specific to
x86-64.
2025-03-19 12:52:59 -07:00
Peter Klausler
9f284e1784
[flang] Disabling REAL kinds must also disable their COMPLEX (#131353)
When disabling kinds of REAL in the TargetCharacteristics, one must also
disable the corresponding kinds of COMPLEX.

Fixes https://github.com/llvm/llvm-project/issues/131088.
2025-03-19 12:00:51 -07:00
Kiran Chandramohan
96b112fb61
Revert "[flang][openmp] Adds Parser and Semantic Support for Interop Construct, and Init and Use Clauses." (#132005)
Reverts llvm/llvm-project#120584

Reverting due to CI failure
https://lab.llvm.org/buildbot/#/builders/157/builds/22946
2025-03-19 11:13:52 +00:00
swatheesh-mcw
ee8a759bfb
[flang][openmp] Adds Parser and Semantic Support for Interop Construct, and Init and Use Clauses. (#120584)
Adds Parser and Semantic Support for the below construct and clauses:
- Interop Construct
- Init Clause
- Use Clause

Note:
The other clauses supported by Interop Construct such as Destroy, Use,
Depend and Device are added already.
2025-03-19 10:49:17 +00:00
Tom Eccles
e7c6e3557b
[flang][OpenMP] Fix threadprivate pointer variable in common block (#131888)
Fixes #112538

The problem was that the host associated symbol for the threadprivate
variable doesn't have all of the symbol attributes (e.g. POINTER). This
caused the lowering code to generate the wrong type, eventually hitting
an assertion.
2025-03-19 10:12:52 +00:00