7511 Commits

Author SHA1 Message Date
Valentin Clement (バレンタイン クレメン)
18ff8df958
[flang][cuda] Register managed variables with double descriptor (#134444)
Allocatable or pointer module variables with the CUDA managed attribute
are defined with a double descriptor. One on the host and one on the
device. Only the data pointed to by the descriptor will be allocated in
managed memory.
Allow the registration of any allocatable or pointer module variables
like device or constant.
2025-04-04 14:38:01 -07:00
Valentin Clement (バレンタイン クレメン)
24dfcc0c02
[flang][cuda] Use the nvvm.vote.sync op for all and any (#134433)
NVVM operations are now available for all and any as well. Use the op
and clean up the generation function to handle all the 3 vote sync
kinds.
2025-04-04 13:45:03 -07:00
Eugene Epshteyn
61af05fe82
[flang] Add runtime and lowering implementation for extended intrinsic PUTENV (#134412)
Implement extended intrinsic PUTENV, both function and subroutine forms.
Add PUTENV documentation to flang/docs/Intrinsics.md. Add functional and
semantic unit tests.
2025-04-04 16:26:08 -04:00
Valentin Clement (バレンタイン クレメン)
cd2f85a24b
[mlir][NVVM] Add ops for vote all and any sync (#134309)
Add operations for `nvvm.vote.all.sync` and `nvvm.vote.any.sync`
intrinsics similar to `nvvm.vote.ballot.sync`.
2025-04-04 11:06:10 -07:00
Peter Klausler
5942f0269e
[flang] Preserve compiler directives in -E output (#133959)
No longer require -fopenmp or -fopenacc with -E, unless specific version
number options are also required for predefined macros. This means that
most source can be preprocessed with -E and then later compiled with
-fopenmp, -fopenacc, or neither.

This means that OpenMP conditional compilation lines (!$) are also
passed through to -E output. The tricky part of this patch was dealing
with the fact that those conditional lines can also contain regular
Fortran line continuation, and that now has to be deferred when !$ lines
are interspersed.
2025-04-04 09:49:57 -07:00
Peter Klausler
1bef59c9db
[flang][preprocessor] Further macro replacement of continued identifiers (#134302)
The preprocessor can perform macro replacement within identifiers when
they are split up with Fortran line continuation, but is failing to do
macro replacement on a continued identifier when none of its parts are
replaced.
2025-04-04 08:44:22 -07:00
Peter Klausler
507ce46b6f
[flang][preprocessor] Directive continuation must skip empty macros (#134149)
When a compiler directive continuation line starts with keyword macro
names that have empty expansions, skip them.
2025-04-04 08:43:56 -07:00
Peter Klausler
efd7caac2e
[flang] IEEE_SUPPORT_FLAG(..., LOCAL) in specification expression (#134270)
The optional second argument to IEEE_SUPPORT_FLAG (and related functions
from the intrinsic IEEE_ARITHMETIC module) is needed only for its type,
not its value. Restrictions on local objects as arguments to function
references in specification expressions shouldn't apply to it.

Define a new attribute for dummy data object characteristics to
distinguish such arguments, set it for the appropriate intrinsic
function references, and test it during specification expression
validation.
2025-04-04 08:43:25 -07:00
Peter Klausler
262b3f7615
[flang] Remove runtime dependence on C++ support for types (#134164)
Fortran::runtime::Descriptor::BytesFor() only works for Fortran
intrinsic types for which a C++ type counterpart exists, so it crashes
on some types that are legitimate Fortran types like REAL(2). Move some
logic from Evaluate into a new header in flang/Common, then use it to
avoid this needless dependence on C++.
2025-04-04 08:42:38 -07:00
Peter Klausler
3674a5f18e
[flang] Permit unused USE association of subprogram name (#134009)
A function or subroutine can allow an object of the same name to appear
in its scope, so long as the name is not used. This is similar to the
case of a name being imported from multiple distinct modules, and
implemented by the same representation.

It's not clear whether this is conforming behavior or a common
extension.
2025-04-04 08:41:32 -07:00
Peter Klausler
c8bde44cfc
[flang] Implement FSEEK and FTELL (#133003)
Add function and subroutine forms of FSEEK and FTELL as intrinsic
procedures. Accept common aliases from legacy compilers as well.
    
A separate patch to llvm-test-suite will enable tests for these
procedures once this patch has merged.
    
Depends on https://github.com/llvm/llvm-project/pull/132423; CI builds
will likely fail until that patch is merged and this PR is rebased.
2025-04-04 08:40:51 -07:00
Asher Mancinelli
85fd83ed49
[flang][nfc] Use llvm memmove intrinsic over regular call (#134294)
Follow up to #134170.

We should be using the LLVM intrinsics instead of plain fir.calls when
we can. Existing code creates a declaration for the llvm intrinsic and
a regular fir.call, which makes it hard for consumers of the IR to find
all the intrinsic calls.
2025-04-04 06:13:30 -07:00
Sergio Afonso
a17d49687a
[Flang][Driver][AMDGPU] Fix -mcode-object-version (#134230)
This patch updates flang to follow clang's behavior when processing the
`-mcode-object-version` option.

It is now used to populate an LLVM module flag called
`amdhsa_code_object_version` expected by the backend and also updates
the driver to add the `--amdhsa-code-object-version` option to the
frontend invocation for device compilation of AMDGPU targets.
2025-04-04 11:54:49 +01:00
Kareem Ergawy
6333f8457c
[flang][OpenMP] Move reductions from loop to teams when loop is mapped to distribute (#132920)
Follow-up to #132003, in particular, see
https://github.com/llvm/llvm-project/pull/132003#issuecomment-2739701936.

This PR extends reduction support for `loop` directives. Consider the
following scenario:
```fortran
subroutine bar
  implicit none
  integer :: x, i

  !$omp teams loop reduction(+: x)
  DO i = 1, 5
    call foo()
  END DO
end subroutine
```
Note the following:
* According to the spec, the `reduction` clause will be attached to
`loop` during earlier stages in the compiler.
* Additionally, `loop` cannot be mapped to `distribute parallel for` due
to the call to a foreign function inside the loop's body.
* Therefore, `loop` must be mapped to `distribute`.
* However, `distribute` does not have `reduction` clauses.
* As a result, we have to move the `reduction`s from the `loop` to its
parent `teams` directive, which is what is done by this PR.
2025-04-04 06:20:51 +02:00
Andre Kuhlenschmidt
b11eece1bb
[flang][intrinsics] Implement the time intrinsic (#133823)
This PR implements the nonstandard intrinsic time.

In addition to running the unit tests, I also double checked that the
example code works by manually compiling and running it.
2025-04-03 15:33:40 -07:00
Andre Kuhlenschmidt
85fdab33b0
[flang][intrinsic] add nonstandard intrinsic unlink (#134162)
This PR adds the intrinsic `unlink` to flang. 

## Test plan
- Added two codegen unit tests and ensured flang-check continues to
pass.
- Manually compiled and ran the example from the documentation.
2025-04-03 14:33:53 -07:00
Valentin Clement (バレンタイン クレメン)
fb6f60ddc5
[flang][cuda][NFC] Use NVVM VoteBallotOp (#134307)
`llvm.nvvm.vote.ballot.sync` has its own operation so use it in
lowering.
2025-04-03 14:19:31 -07:00
Valentin Clement (バレンタイン クレメン)
de40f6101d
[flang][cuda][NFC] Use NVVM op for match all (#134303) 2025-04-03 14:19:21 -07:00
Valentin Clement (バレンタイン クレメン)
7288f1bc32
[flang][cuda] Use nvvm operation for match any (#134283)
The string used for intrinsic was not the correct one
"llvm.nvvm.match.any.sync.i32p". There was an extra `p` at the end.

Use the NVVM operation instead so we don't duplicate it.
2025-04-03 12:08:30 -07:00
Slava Zakharin
3f6ae3f0a8
[flang] Added driver options for arrays repacking. (#134002)
Added options:
  * -f[no-]repack-arrays
  * -f[no-]stack-repack-arrays
  * -frepack-arrays-contiguity=whole/innermost
2025-04-03 10:43:28 -07:00
Valentin Clement (バレンタイン クレメン)
3e59ff27e5
[flang][cuda] Fix pred type for vote functions (#134166) 2025-04-03 10:33:09 -07:00
Asher Mancinelli
d7d91500b6
[flang][nfc] Initial changes needed to use llvm intrinsics instead of regular calls (#134170)
Flang uses `fir.call <llvm intrinsic>` in a few places. This means
consumers of the IR need to strcmp every fir.call if they want to find a
particular LLVM intrinsic.
Emit LLVM memcpy intrinsics instead.
2025-04-03 08:37:40 -07:00
Sergio Afonso
18dd299fb1
[Flang][MLIR][OpenMP] Host-evaluation of omp.loop bounds (#133908)
This patch updates Flang lowering and kernel flags identification in
MLIR so that loop bounds on `target teams loop` constructs are evaluated
on the host, making the trip count available to the corresponding
`__tgt_target_kernel` call emitted for the target region.

This is necessary in order to properly execute these constructs as
`target teams distribute parallel do`.

Co-authored-by: Kareem Ergawy <kareem.ergawy@amd.com>
2025-04-03 15:06:19 +01:00
Valentin Clement (バレンタイン クレメン)
db21ae7803
[flang][cuda] Support any_sync and ballot_sync (#134135) 2025-04-02 14:26:09 -07:00
Krzysztof Parzyszek
564e04b703
[flang][OpenMP] Use function symbol on DECLARE TARGET (#134107)
Consider:
```
function foo()
  !$omp declare target(foo) ! This `foo` was a function-result symbol
  ...
end
```
When resolving symbols, for this case use the symbol corresponding to
the function instead of the symbol corresponding to the function result.

Currently, this will result in an error:
```
error: A variable that appears in a DECLARE TARGET directive must be
declared in the scope of a module or have the SAVE attribute, either
explicitly or implicitly
```
2025-04-02 15:16:33 -05:00
Kazu Hirata
aa33c09561 [flang] Fix a warning
This patch fixes:

  flang/lib/Optimizer/OpenMP/DoConcurrentConversion.cpp:184:18: error:
  unused variable 'loc' [-Werror,-Wunused-variable]
2025-04-02 10:14:50 -07:00
vdonaldson
8a0f694381
[flang] Legacy ASSIGN statement target processing (#133737)
Like other target statements, the statement associated with the label in
a legacy ASSIGN statement could be inside a construct. Constructs
containing such a target must therefore be marked as unstructured,
fairly similar to how targets are processed in `markBranchTarget`.
2025-04-02 09:52:13 -04:00
Kareem Ergawy
de6c9096ba
[flang][OpenMP] Handle "loop-local values" in do concurrent nests (#127635)
Extends `do concurrent` mapping to handle "loop-local values". A
loop-local value is one that is used exclusively inside the loop but
allocated outside of it. This usually corresponds to temporary values
that are used inside the loop body for initialzing other variables for
example. After collecting these values, the pass localizes them to the
loop nest by moving their allocations.

PR stack:
- https://github.com/llvm/llvm-project/pull/126026
- https://github.com/llvm/llvm-project/pull/127595
- https://github.com/llvm/llvm-project/pull/127633
- https://github.com/llvm/llvm-project/pull/127634
- https://github.com/llvm/llvm-project/pull/127635 (this PR)
2025-04-02 15:43:19 +02:00
مهدي شينون (Mehdi Chinoune)
666df54ea6
[flang] Fold double bessel functions on Windows. (#130253)
There are no functions for `float`.
see:
https://learn.microsoft.com/en-us/cpp/c-runtime-library/reference/bessel-functions-j0-j1-jn-y0-y1-yn
2025-04-02 14:43:09 +01:00
Jean-Didier PAILLEUX
c309abd925
[flang] Implement !DIR$ NOVECTOR and !DIR$ NOUNROLL[_AND_JAM] (#133885)
Hi,
This patch implements support for the following directives :
- `!DIR$ NOUNROLL_AND_JAM` to disable unrolling and jamming on a DO
LOOP.
- `!DIR$ NOUNROLL` to disable unrolling on a DO LOOP.
- `!DIR$ NOVECTOR` to disable vectorization on a DO LOOP.
2025-04-02 14:30:01 +02:00
Kareem Ergawy
ef56b53712
[flang][OpenMP] Extend do concurrent mapping to multi-range loops (#127634)
Adds support for converting mulit-range loops to OpenMP (on the host
only for now). The changes here "prepare" a loop nest for collapsing by
sinking iteration variables to the innermost `fir.do_loop` op in the
nest.

PR stack:
- https://github.com/llvm/llvm-project/pull/126026
- https://github.com/llvm/llvm-project/pull/127595
- https://github.com/llvm/llvm-project/pull/127633
- https://github.com/llvm/llvm-project/pull/127634 (this PR)
- https://github.com/llvm/llvm-project/pull/127635
2025-04-02 12:43:04 +02:00
Tom Eccles
9b2fd1a6ec
[flang][OpenMP] Bump default OpenMP version to 3.1 (#133745)
Precise OpenMP standards support information is being documented in
#132707

Flang now has good support for OpenMP Version 3.1 and earlier.
2025-04-02 10:43:48 +01:00
Kareem Ergawy
3f8bfc9f7f
[flang][OpenMP] Map simple do concurrent loops to OpenMP host constructs (#127633)
Upstreams one more part of the ROCm `do concurrent` to OpenMP mapping
pass. This PR add support for converting simple loops to the equivalent
OpenMP constructs on the host: `omp parallel do`. Towards that end, we
have to collect more information about loop nests for which we add new
utils in the `looputils` name space.

PR stack:
- https://github.com/llvm/llvm-project/pull/126026
- https://github.com/llvm/llvm-project/pull/127595
- https://github.com/llvm/llvm-project/pull/127633 (this PR)
- https://github.com/llvm/llvm-project/pull/127634
- https://github.com/llvm/llvm-project/pull/127635
2025-04-02 11:26:58 +02:00
Kareem Ergawy
41d718b1cf
[flang][OpenMP] Upstream do concurrent loop-nest detection. (#127595)
Upstreams the next part of do concurrent to OpenMP mapping pass (from
AMD's ROCm implementation). See
https://github.com/llvm/llvm-project/pull/126026 for more context.

This PR add loop nest detection logic. This enables us to discover
muli-range do concurrent loops and then map them as "collapsed" loop
nests to OpenMP.

This is a follow up for
https://github.com/llvm/llvm-project/pull/126026, only the latest commit
is relevant.

This is a replacement for
https://github.com/llvm/llvm-project/pull/127478 using a
`/user/<username>/<branchname>` branch.

PR stack:
- https://github.com/llvm/llvm-project/pull/126026
- https://github.com/llvm/llvm-project/pull/127595 (this PR)
- https://github.com/llvm/llvm-project/pull/127633
- https://github.com/llvm/llvm-project/pull/127634
- https://github.com/llvm/llvm-project/pull/127635
2025-04-02 10:12:52 +02:00
Kareem Ergawy
5d364481e3
[flang][OpenMP] Upstream first part of do concurrent mapping (#126026)
This PR starts the effort to upstream AMD's internal implementation of
`do concurrent` to OpenMP mapping. This replaces #77285 since we
extended this WIP quite a bit on our fork over the past year.

An important part of this PR is a document that describes the current
status downstream, the upstreaming status, and next steps to make this
pass much more useful.

In addition to this document, this PR also contains the skeleton of the
pass (no useful transformations are done yet) and some testing for the
added command line options.

This looks like a huge PR but a lot of the added stuff is documentation.

It is also worth noting that the downstream pass has been validated on
https://github.com/BerkeleyLab/fiats. For the CPU mapping, this achived
performance speed-ups that match pure OpenMP, for GPU mapping we are
still working on extending our support for implicit memory mapping and
locality specifiers.

PR stack:
- https://github.com/llvm/llvm-project/pull/126026 (this PR)
- https://github.com/llvm/llvm-project/pull/127595
- https://github.com/llvm/llvm-project/pull/127633
- https://github.com/llvm/llvm-project/pull/127634
- https://github.com/llvm/llvm-project/pull/127635
2025-04-02 09:24:38 +02:00
Valentin Clement (バレンタイン クレメン)
ae8dd63681
[flang][cuda] Add interface and lowering for all_sync (#134001) 2025-04-01 17:59:11 -07:00
Andre Kuhlenschmidt
b6edd25f17
[flang][intrinsics] NFC: make comment consistent (#133972)
Just makes this named argument comment consistent with all the others in
the file.
2025-04-01 14:30:10 -07:00
Slava Zakharin
58551faaf1
[flang] Inline fir.is_contiguous_box in some cases. (#133812)
Added inlining for `rank == 1` and `innermost` cases.
2025-04-01 08:41:11 -07:00
Jean-Didier PAILLEUX
513a91a5f1
[flang/flang-rt] Implement PERROR intrinsic form GNU Extension (#132406)
Add the implementation of the `PERROR(STRING) ` intrinsic from the GNU
Extension to prints on the stderr a newline-terminated error message
corresponding to the last system error prefixed by `STRING`.
(https://gcc.gnu.org/onlinedocs/gfortran/PERROR.html)
2025-04-01 15:47:54 +02:00
Tom Eccles
e17d864f55
[flang][OpenMP][Lower] lower array subscripts for task depend (#132994)
The OpenMP standard says that all dependencies in the same set of
inter-dependent tasks must be non-overlapping. This simplification means
that the OpenMP only needs to keep track of the base addresses of
dependency variables. This can be seen in kmp_taskdeps.cpp, which stores
task dependency information in a hash table, using the base address as a
key.

This patch generates a rebox operation to slice boxed arrays, but only
the box data address is used for the task dependency. The extra box is
optimized away by LLVM at O3.

Vector subscripts are TODO (I will address in my next patch).

This also fixes a bug for ordinary subscripts when the symbol was mapped
to a box:

Fixes #132647
2025-04-01 10:26:14 +01:00
Jean-Didier PAILLEUX
bae3577002
[flang] Define ERF, ERFC and ERFC_SCALED intrinsics with Q and D prefix (#125217)
`ERF`, `ERFC` and `ERFC_SCALED` intrinsics prefixed by `Q` and `D` are
missing. Codes such as `CP2K`(https://github.com/cp2k/cp2k) and
`TurboRVB`(https://github.com/sissaschool/turborvb) use these intrinsics
just like defined in the GNU standard and here:
https://www.ibm.com/docs/fr/xl-fortran-aix/16.1.0?topic=reference-intrinsic-procedures
These intrinsics are based on the existing intrinsics but apply a
restriction on the type kind.

- `DERF`, `DERFC` and `DERFC_SCALED` are for double précision only.
- `QERF`, `QERFC` and `QERFC_SCALED` are for quad précision only.
2025-04-01 08:07:26 +02:00
Thirumalai Shaktivel
091dcb8fc2
[Flang] Make a private copy for the common block variables in copyin clause (#111359)
Fixes: https://github.com/llvm/llvm-project/issues/82949
2025-04-01 11:35:44 +05:30
Slava Zakharin
5f268d04f9
[flang] Code generation for fir.pack/unpack_array. (#132080)
The code generation relies on `ShallowCopyDirect` runtime
to copy data between the original and the temporary arrays
(both directions). The allocations are done by the compiler
generated code. The heap allocations could have been passed
to `ShallowCopy` runtime, but I decided to expose the allocations
so that the temporary descriptor passed to `ShallowCopyDirect`
has `nocapture` - maybe this will be better for LLVM optimizations.
2025-03-31 11:42:17 -07:00
Slava Zakharin
0ac8cb1b3d
[flang] Recognize fir.pack_array in LoopVersioning. (#133191)
This change enables LoopVersioning when `fir.pack_array` is met
in the def-use chain. It fixes a couple of huge performance regressions
caused by enabling `-frepack-arrays`.
2025-03-31 11:41:43 -07:00
Thirumalai Shaktivel
374a5bea52
[Flang][OpenMP] Add PointerAssociateScalar to Cray Pointer used in the DSA (#133232)
Issue: Cray Pointer is not associated to Cray Pointee, leading to
Segmentation fault

Fix: GetUltimate, retrieves the base symbol in the current scope, which
gets passed all the references and returns the original symbol

---------

Co-authored-by: Michael Klemm <michael.klemm@amd.com>
2025-03-29 15:39:12 +01:00
swatheesh-mcw
fe30cf18ab
Revert "Revert "[flang][openmp] Adds Parser and Semantic Support for Interop Construct, and Init and Use Clauses."" (#132343)
Reverts llvm/llvm-project#132005
2025-03-28 15:21:52 +00:00
Nick Sarnie
48b7530273
[clang][flang][Triple][llvm] Add isOffload function to LangOpts and isGPU function to Triple (#126956)
I'm adding support for SPIR-V, so let's consolidate these checks.

---------

Signed-off-by: Sarnie, Nick <nick.sarnie@intel.com>
2025-03-28 14:19:20 +00:00
Joseph Huber
772173f548
[Clang][AMDGPU] Remove special handling for COV4 libraries (#132870)
Summary:
When we were first porting to COV5, this lead to some ABI issues due to
a change in how we looked up the work group size. Bitcode libraries
relied on the builtins to emit code, but this was changed between
versions. This prevented the bitcode libraries, like OpenMP or libc,
from being used for both COV4 and COV5. The solution was to have this
'none' functionality which effectively emitted code that branched off of
a global to resolve to either version.

This isn't a great solution because it forced every TU to have this
variable in it. The patch in
https://github.com/llvm/llvm-project/pull/131033 removed support for
COV4 from OpenMP, which was the only consumer of this functionality.
Other users like HIP and OpenCL did not use this because they linked the
ROCm Device Library directly which has its own handling (The name was
borrowed from it after all).

So, now that we don't need to worry about backward compatibility with
COV4, we can remove this special handling. Users can still emit COV4
code, this simply removes the special handling used to make the OpenMP
device runtime bitcode version agnostic.
2025-03-28 07:35:16 -05:00
Bruno Cardoso Lopes
7c3ecffe9b
[MLIR][LLVMIR] Add support for the full form of global_{ctor,dtor} (#133176)
Currently only ctor/dtor list and their priorities are supported. This
PR adds support for the missing data field.

Few implementation notes:
- The assembly printer has a fixed form because previous `attr_dict`
will sort the dict by key name, making global_dtor and global_ctor
differ in the order of printed arguments.
- LLVM's `ptr null` is being converted to `#llvm.zero` otherwise we'd
have to create a region to use the default operation conversion from
`ptr null`, which is silly given that the field only support null or a
symbol.
2025-03-27 14:11:05 -07:00
Andre Kuhlenschmidt
077940621d
[flang][openacc] Make OpenACC block construct parse errors less verbose. (#131042)
This PR does reduces the verbosity of parser errors for OpenACC block
constructs that do not parse correctly because they are missing their
trailing end block directive by:
- Removing the redundant error messages created by parsing 3 different
styles of directive tokens.
- Providing a general mechanism of configuring the max number of
contexts printed for every syntax error.
- Not printing less specific contexts that are at the same location.

Prior to the changes:
```
$ flang -fc1 -fopenacc -fsyntax-only flang/test/Parser/acc-data-statement.f90 2>&1 | tee acc-data-statement.prior.log | wc -l
262
```

[acc-data-statement.prior.log](https://github.com/user-attachments/files/19298165/acc-data-statement.prior.log)

```
$ flang -fc1 -fopenacc -fsyntax-only flang/test/Parser/acc-data-statement.f90 2>&1 | tee acc-data-statement.prior.log | wc -l
73
```

[acc-data-statement.post.log](https://github.com/user-attachments/files/19298181/acc-data-statement.post.log)
2025-03-26 12:36:04 -07:00