2529 Commits

Author SHA1 Message Date
Kareem Ergawy
5d364481e3
[flang][OpenMP] Upstream first part of do concurrent mapping (#126026)
This PR starts the effort to upstream AMD's internal implementation of
`do concurrent` to OpenMP mapping. This replaces #77285 since we
extended this WIP quite a bit on our fork over the past year.

An important part of this PR is a document that describes the current
status downstream, the upstreaming status, and next steps to make this
pass much more useful.

In addition to this document, this PR also contains the skeleton of the
pass (no useful transformations are done yet) and some testing for the
added command line options.

This looks like a huge PR but a lot of the added stuff is documentation.

It is also worth noting that the downstream pass has been validated on
https://github.com/BerkeleyLab/fiats. For the CPU mapping, this achived
performance speed-ups that match pure OpenMP, for GPU mapping we are
still working on extending our support for implicit memory mapping and
locality specifiers.

PR stack:
- https://github.com/llvm/llvm-project/pull/126026 (this PR)
- https://github.com/llvm/llvm-project/pull/127595
- https://github.com/llvm/llvm-project/pull/127633
- https://github.com/llvm/llvm-project/pull/127634
- https://github.com/llvm/llvm-project/pull/127635
2025-04-02 09:24:38 +02:00
Valentin Clement (バレンタイン クレメン)
ae8dd63681
[flang][cuda] Add interface and lowering for all_sync (#134001) 2025-04-01 17:59:11 -07:00
Valentin Clement (バレンタイン クレメン)
bb179c483a
[flang][rt] Allow ReportFatalUserError to be build on device (#133979) 2025-04-01 13:50:42 -07:00
Valentin Clement (バレンタイン クレメン)
01889de8e9
[flang][device] Enable Stop functions on device build (#133803)
Update `StopStatement` and `StopStatementText` to be build for the
device.
2025-04-01 10:06:45 -07:00
Jean-Didier PAILLEUX
513a91a5f1
[flang/flang-rt] Implement PERROR intrinsic form GNU Extension (#132406)
Add the implementation of the `PERROR(STRING) ` intrinsic from the GNU
Extension to prints on the stderr a newline-terminated error message
corresponding to the last system error prefixed by `STRING`.
(https://gcc.gnu.org/onlinedocs/gfortran/PERROR.html)
2025-04-01 15:47:54 +02:00
Slava Zakharin
5f268d04f9
[flang] Code generation for fir.pack/unpack_array. (#132080)
The code generation relies on `ShallowCopyDirect` runtime
to copy data between the original and the temporary arrays
(both directions). The allocations are done by the compiler
generated code. The heap allocations could have been passed
to `ShallowCopy` runtime, but I decided to expose the allocations
so that the temporary descriptor passed to `ShallowCopyDirect`
has `nocapture` - maybe this will be better for LLVM optimizations.
2025-03-31 11:42:17 -07:00
swatheesh-mcw
fe30cf18ab
Revert "Revert "[flang][openmp] Adds Parser and Semantic Support for Interop Construct, and Init and Use Clauses."" (#132343)
Reverts llvm/llvm-project#132005
2025-03-28 15:21:52 +00:00
Krzysztof Parzyszek
33cd00f8c8
[flang] Use more generic overload for Operation in Traverse (#133305)
Currently there are two specific overloads: for unary operations, i.e.
`Operation<D, R, O>`, and binary ones `Operation<D, R, LO, RO>`.

This makes it impossible for a derived class to use a single overload to
handle all types of operations: `Operation<D, R, O...>`. Since the base
overloads need to be included in the derived class's scope, via `using
Base::operator()` either one of the specific overloads will always be a
better candidate than the more generic derived one.

```
class MyVisitor : public Traverse<...> {
  using Traverse<...>::operator();

  template <typename D, typename R, typename... O>
  Result operator()(const Operation<D, R, O...> &op) const {
    // Will never be used.
  }
};
```
This patch replaces the two specific overloads for Operation in Traverse
with a single generic overload, while preserving the existing
functionality, and allowing derived classes to use a single overload as
well.
2025-03-28 08:17:31 -05:00
Peter Klausler
3bc8aa7823
[flang] Catch whole assumed-size array as RHS (#132819)
The right-hand side expression of an intrinsic assignment statement may
not be the name of an assumed-size array dummy argument.
2025-03-26 12:09:57 -07:00
Peter Klausler
4ea5aa09de
[flang][NFC] Restore I/O runtime API header name (#132423)
flang/include/flang/Runtime/io-api.h was changed into io-api-consts.h,
then wrapped into a new io-api.h that includes io-api-consts.h, does
some redundant includes and declarations, and then declares the
prototype of one function, InquiryKeywordHashDecode.

Make that function static in io-stmt.cpp prior to its sole call site,
then undo the renaming, to reduce confusion and redundancy.
2025-03-26 12:09:16 -07:00
Valentin Clement (バレンタイン クレメン)
e6dda9c23a
[flang][cuda] Only create shared memory global when needed (#132999) 2025-03-26 09:26:50 -07:00
Kajetan Puchalski
529c5b71c6
[flang] Add -f[no-]slp-vectorize flags (#132801)
Add -f[no-]slp-vectorize to the flang driver.
Add corresponding -fvectorize-slp to the flang frontend.

Enable -fslp-vectorize at -O2 and higher in flang to match the current
behaviour in clang.

---------

Signed-off-by: Kajetan Puchalski <kajetan.puchalski@arm.com>
2025-03-26 16:10:35 +00:00
Eugene Epshteyn
2c8e26081f
[flang] Add HOSTNM runtime and lowering intrinsics implementation (#131910)
Implement GNU extension intrinsic HOSTNM, both function and subroutine
forms. Add HOSTNM documentation to `flang/docs/Intrinsics.md`. Add
lowering and semantic unit tests.

(This change is modeled after GETCWD implementation.)
2025-03-25 13:17:17 -04:00
vdonaldson
92e0560347
[flang] ieee_denorm (#132307)
Add support for the nonstandard ieee_denorm exception for real kinds 3,
4, 8 on x86 processors.
2025-03-25 13:02:43 -04:00
Joseph Huber
ef2735d243
[Flang] Detect endianness in the preprocessor (#132767)
Summary:
Currently we use `TestBigEndian` in CMake to determine endianness. This
doesn't work on all platforms and is deprecated since CMake 3.20.
Instead of using CMake, we can just use the GNU/Clang preprocessor
definitions.

The only difficulty is MSVC, mostly because they don't support the same
macros. But, as far as I'm aware, MSVC / Windows targets are always
little endian, and if not we can just override it for that specific
target in the future.
2025-03-24 18:29:05 -05:00
Krzysztof Parzyszek
c221d64206
[flang] Remove mentions of evaluate::Variable<T> (#132805)
The template itself was not defined anywhere. The closest thing was a
forward declaration in flang/include/flang/Evaluate/variable.h.
2025-03-24 18:26:57 -05:00
Kareem Ergawy
6328506536
[flang][fir] Add rewrite pattern to convert fir.do_concurrent to fir.do_loop (#132207)
Rewrites `fir.do_concurrent` ops to a corresponding nest of `fir.do_loop
... unordered` ops.
2025-03-24 12:09:32 +01:00
Krzysztof Parzyszek
68180d8d16
[flang][OpenMP] Use OmpDirectiveSpecification in standalone directives (#131163)
This uses OmpDirectiveSpecification in the rest of the standalone
directives.
2025-03-20 06:50:43 -05:00
Peter Klausler
6b9716b7f4
[flang] Catch bad usage case of whole assumed-size array (#132052)
Whole assumed-size arrays are generally not allowed outside specific
contexts, where expression analysis notes that they can appear. But
contexts can nest, and in the case of an actual argument that turns out
to be an array constructor, the permission to use a whole assumed-size
array must be rescinded.

Fixes https://github.com/llvm/llvm-project/issues/131909.
2025-03-19 12:02:34 -07:00
Peter Klausler
9f284e1784
[flang] Disabling REAL kinds must also disable their COMPLEX (#131353)
When disabling kinds of REAL in the TargetCharacteristics, one must also
disable the corresponding kinds of COMPLEX.

Fixes https://github.com/llvm/llvm-project/issues/131088.
2025-03-19 12:00:51 -07:00
Peter Klausler
587f997db7
[flang] Catch C15104(4) violations when coindexing is present (#130677)
The value of a structure constructor component can't have a pointer
ultimate component if it is a coindexed designator.
2025-03-19 11:58:59 -07:00
Krzysztof Parzyszek
cd26dd5595
[flang][OpenMP] Use OmpDirectiveSpecification in simple directives (#131162)
The `OmpDirectiveSpecification` contains directive name, the list of
arguments, and the list of clauses. It was introduced to store the
directive specification in METADIRECTIVE, and could be reused everywhere
a directive representation is needed.
In the long term this would unify the handling of common directive
properties, as well as creating actual constructs from METADIRECTIVE by
linking the contained directive specification with any associated user
code.
2025-03-19 11:34:40 -05:00
Kiran Chandramohan
96b112fb61
Revert "[flang][openmp] Adds Parser and Semantic Support for Interop Construct, and Init and Use Clauses." (#132005)
Reverts llvm/llvm-project#120584

Reverting due to CI failure
https://lab.llvm.org/buildbot/#/builders/157/builds/22946
2025-03-19 11:13:52 +00:00
swatheesh-mcw
ee8a759bfb
[flang][openmp] Adds Parser and Semantic Support for Interop Construct, and Init and Use Clauses. (#120584)
Adds Parser and Semantic Support for the below construct and clauses:
- Interop Construct
- Init Clause
- Use Clause

Note:
The other clauses supported by Interop Construct such as Destroy, Use,
Depend and Device are added already.
2025-03-19 10:49:17 +00:00
Slava Zakharin
fd0e20a64b
[flang] Generate fir.pack/unpack_array in Lowering. (#131704)
Basic generation of array repacking operations in Lowering.
2025-03-18 21:26:33 -07:00
Slava Zakharin
7d7b58bc5d
[flang-rt] Added ShallowCopy API. (#131702)
This API will be used for copying non-contiguous arrays
into contiguous temporaries to support `-frepack-arrays`.
The builder factory API will be used in the following commits.
2025-03-18 12:58:25 -07:00
Kareem Ergawy
1094ffcafb
[flang][fir] Add MLIR op for do concurrent (#130893)
Adds new MLIR ops to model `do concurrent`. In order to make `do
concurrent` representation self-contained, a loop is modeled using 2
ops, one wrapper and one that contains the actual body of the loop. For
example, a 2D `do concurrent` loop is modeled as follows:

```mlir
  fir.do_concurrent {
    %i = fir.alloca i32
    %j = fir.alloca i32
    fir.do_concurrent.loop
      (%i_iv, %j_iv) = (%i_lb, %j_lb) to (%i_ub, %j_ub) step (%i_st, %j_st) {
      %0 = fir.convert %i_iv : (index) -> i32
      fir.store %0 to %i : !fir.ref<i32>

      %1 = fir.convert %j_iv : (index) -> i32
      fir.store %1 to %j : !fir.ref<i32>
    }
  }
```

The `fir.do_concurrent` wrapper op encapsulates both the actual loop and
the allocations required for the iteration variables. The
`fir.do_concurrent.loop` op is a multi-dimensional op that contains the
loop control and body. See the ops' docs for more info.
2025-03-18 10:53:44 +01:00
Valentin Clement (バレンタイン クレメン)
74d4fc0a3e
[flang][cuda][NFC] Use ssa value for offset in shared memory op (#131661)
Switch from attribute to a value as we need to support dynamic offset
when multiple variables are used with dynamic shared memory.
2025-03-17 14:23:34 -07:00
Valentin Clement (バレンタイン クレメン)
4fb20b85fd
[flang][cuda] Compute offset on cuf.shared_memory ops (#131395)
Add a pass to compute the size of the shared memory (static shared
memory) and the offsets of each variables to be placed in shared memory.
The global representing the shared memory is also created during this
pass.

In case of dynamic shared memory, the global as a type of
`!fir.array<0xi8>` and the size of the memory is set at kernel launch.
2025-03-14 19:34:35 -07:00
Valentin Clement (バレンタイン クレメン)
4818623924
[flang][cuda] Add cuf.shared_memory operation (#131392)
Introduce `cuf.shared_memory` operation. The operation is used to get
the pointer in shared memory for a specific variable. The shared memory
is materialized as a global in address space 3 and the different
variables are pointing to it at different offset.

Follow up patches will add lowering and conversion of this operation.
2025-03-14 15:43:25 -07:00
Slava Zakharin
00f9c855fb
[flang] Added fir.is_contiguous_box and fir.box_total_elements ops. (#131047)
These are helper operations to aid with expanding of fir.pack_array.
2025-03-14 08:25:05 -07:00
jeanPerier
3ff3b29dd6
[flang] lower remaining cases of pointer assignments inside forall (#130772)
Implement handling of `NULL()` RHS, polymorphic pointers, as well as
lower bounds or bounds remapping in pointer assignment inside FORALL.

These cases eventually do not require updating hlfir.region_assign,
lowering can simply prepare the new descriptor for the LHS inside the
RHS region.

Looking more closely at the polymorphic cases, there is not need to call
the runtime, fir.rebox and fir.embox do handle the dynamic type setting
correctly.

After this patch, the last remaining TODO is the allocatable assignment
inside FORALL, which like some cases here, is more likely an accidental
feature given FORALL was deprecated in F2003 at the same time than
allocatable components where added.
2025-03-14 10:51:46 +01:00
Valentin Clement (バレンタイン クレメン)
57d87ed7f0
[flang][NFC] Add parenthesis to avoid warning (#131219)
Remove warning introduced in 369da8421c2f7
2025-03-13 14:28:35 -07:00
Valentin Clement (バレンタイン クレメン)
369da8421c
[flang][cuda] Allow assumed-size declaration for SHARED variable (#130833)
Avoid triggering an assertion for shared variable using the assumed-size
syntax.

```
attributes(global) subroutine sharedstar()
  real, shared :: s(*) ! ok. dynamic shared memory.
end subroutine
```
2025-03-13 11:06:17 -07:00
Tom Eccles
01aca42363
[flang] Add support for -f[no-]verbose-asm (#130788)
This flag provides extra commentary in the assembly output.
2025-03-13 15:22:13 +00:00
Mats Petersson
d0188ebcc2
[flang][OpenMP]Add symbls omp_in, omp_out and omp_priv in DECLARE RED… (#129908)
…UCTION

This patch allows better parsing of the reduction and initializer
components, including supporting derived types in both those places.

There is more work needed here, but this is a definite improvement in
what can be handled through parser and semantics.

Note that declare reduction is still not supported in lowering, so any
attempt to compile DECLARE REDUCTION code will end with a TODO aka "Not
yet implemented" abort in the compiler.

Note that this version of the code does not cover declaring multiple
reductions using the same name with different types. This is will be
fixed in a future patch. [This was also the case before this change].

One existing test modified to actually compile (as it didn't in the
original form).
2025-03-13 09:39:45 +00:00
Slava Zakharin
74eba972ca
[flang] Definitions of fir.pack/unpack_array operations. (#130698)
As defined in #127147.
2025-03-11 14:15:29 -07:00
jeanPerier
356bf3fa2d
Reland " [flang] Rely on global initialization for simpler derived types" (#130290)
Currently, all derived types are initialized through `_FortranAInitialize`, which is functionally correct, but bears poor runtime performance. This patch falls back on global initialization for "simpler" derived types to speed up the initialization.

Note: this relands #114002 with the fix for the LLVM timeout regressions that have been seen. The fix is to use the added fir.copy to avoid aggregate load/store.

Co-authored-by: NimishMishra <42909663+NimishMishra@users.noreply.github.com>
2025-03-11 15:19:43 +01:00
jeanPerier
1ddf18057a
[flang] introduce fir.copy to avoid load store of aggregates (#130289)
Introduce a FIR operation to do memcopy/memmove of compile time constant size types.

This is to avoid requiring derived type copies to done with load/store
which is badly supported in LLVM when the aggregate type is "big" (no
threshold can easily be defined here, better to always avoid them for
fir.type).

This was the root cause of the regressions caused by #114002 which introduced a
load/store of fir.type<> which caused hand/asserts to fire in LLVM on
several benchmarks.

See https://llvm.org/docs/Frontend/PerformanceTips.html#avoid-creating-values-of-aggregate-type
2025-03-11 09:31:03 +01:00
Peter Klausler
c189852218
[flang] Ignore empty keyword macros before directives (#130333)
Ignore any keyword macros with empty directives that might appear before
a compiler directive.

Fixes https://github.com/llvm/llvm-project/issues/126459.
2025-03-10 13:21:10 -07:00
Peter Klausler
d53079055e
[flang] Catch coindexed procedure pointer/binding references (#129931)
A procedure designator cannot be coindexed, except for cases in which
the coindexing doesn't matter (i.e. a binding that can't be overridden).
2025-03-10 13:18:07 -07:00
Peter Klausler
53c3a2c69a
[flang] Static checking for empty coarrays (#129610)
A coarray must not have a zero extent on a codimension; that would yield
an empty coarray. When cobounds are constants, verify them.
2025-03-10 13:16:31 -07:00
مهدي شينون (Mehdi Chinoune)
cf5aa559a8
[flang] Don't redefine pid_t on MinGW-w64. (#130288) 2025-03-10 17:27:47 +00:00
Krzysztof Parzyszek
5ba7a3bd4c
[flang][OpenMP] Parse cancel-directive-name as clause (#130146)
The cancellable construct names on CANCEL or CANCELLATION POINT
directives are actually clauses (with the same names as the
corresponding constructs).

Instead of parsing them into a custom structure, parse them as a clause,
which will make CANCEL/CANCELLATION POINT follow the same uniform scheme
as other constructs (<directive> [(<arguments>)] [clauses]).
2025-03-10 11:58:02 -05:00
Krzysztof Parzyszek
4e453d5292
[flang][OpenMP] Accept old FLUSH syntax in METADIRECTIVE (#130122)
Accommodate it in OmpDirectiveSpecification, which may become the
primary component of the actual FLUSH construct in the future.
2025-03-10 08:12:46 -05:00
Krzysztof Parzyszek
d67947162f
[flang][OpenMP] Implement HAS_DEVICE_ADDR clause (#128568)
The HAS_DEVICE_ADDR indicates that the object(s) listed exists at an
address that is a valid device address. Specifically,
`has_device_addr(x)` means that (in C/C++ terms) `&x` is a device
address.

When entering a target region, `x` does not need to be allocated on the
device, or have its contents copied over (in the absence of additional
mapping clauses). Passing its address verbatim to the region for use is
sufficient, and is the intended goal of the clause.

Some Fortran objects use descriptors in their in-memory representation.
If `x` had a descriptor, both the descriptor and the contents of `x`
would be located in the device memory. However, the descriptors are
managed by the compiler, and can be regenerated at various points as
needed. The address of the effective descriptor may change, hence it's
not safe to pass the address of the descriptor to the target region.
Instead, the descriptor itself is always copied, but for objects like
`x`, no further mapping takes place (as this keeps the storage pointer
in the descriptor unchanged).

---------

Co-authored-by: Sergio Afonso <safonsof@amd.com>
2025-03-10 08:11:01 -05:00
Kajetan Puchalski
0c7e895de3
[flang] Move parser invocations into ParserActions (#130309)
FrontendActions.cpp is currently one of the biggest compilation units in
all of flang. Measuring its compilation gives the following metrics:

User time (seconds): 139.21
System time (seconds): 4.65
Maximum resident set size (kbytes): 5891440 (5.61 GB)

This commit separates out explicit invocations of the parser into a
separate compilation unit - ParserActions.cpp - through helper functions
in order to decrease the maximum compilation time and memory usage of a
single unit.
After the split, the measurements of FrontendActions.cpp are as follows:

User time (seconds): 70.08
System time (seconds): 3.16
Maximum resident set size (kbytes): 3961492 (3.7 GB)

While the ones for the newly created ParserActions.cpp as follows:

User time (seconds): 104.33
System time (seconds): 3.37
Maximum resident set size (kbytes): 4185600 (3.99 GB)

---------

Signed-off-by: Kajetan Puchalski <kajetan.puchalski@arm.com>
2025-03-10 11:33:47 +00:00
Valentin Clement (バレンタイン クレメン)
ae42f07103
[flang][cuda] Allow array pointer for atomicexch and atomiccas (#130363) 2025-03-07 15:36:08 -08:00
Valentin Clement (バレンタイン クレメン)
829e8993e5
[flang][cuda] Lower __LDCA, __LDCS, __LDLU, __LDCV, __LDCG with arrays (#130357) 2025-03-07 15:35:52 -08:00
Valentin Clement (バレンタイン クレメン)
dcda314b6c
[flang][cuda] Fix atmoicxor lowering to accept arrays (#130331)
The first agrument can be an address of a scalare, an array element or
even just the address of the first element of an array. Update lowering
to not trigger elemental lowering.
2025-03-07 13:05:42 -08:00