9432 Commits

Author SHA1 Message Date
David Truby
44aa476aa1
[flang] AArch64 ABI for BIND(C) VALUE parameters (#118305)
This patch adds handling for derived type VALUE parameters in BIND(C)
functions for AArch64.
2024-12-18 07:43:22 +00:00
David Truby
4c6e13f644
[flang] Add cmake error if building with clang-cl and MSVC 17.12 (#120114) 2024-12-18 06:15:29 +00:00
Kareem Ergawy
dc936f3c19
Revert "[flang][OpenMP] Implicitly map allocatable record fields (#117867)" (#120360) 2024-12-18 06:52:24 +01:00
Kareem Ergawy
db09014a07
[flang][OpenMP] Implicitly map allocatable record fields (#117867)
This is a starting PR to implicitly map allocatable record fields.

This PR contains the following changes:
1. Re-purposes some of the utils used in `Lower/OpenMP.cpp` so that
   these utils work on the `mlir::Value` level rather than the
   `semantics::Symbol` level. This takes one step towards to enabling
   MLIR passes to more easily do some lowering themselves (e.g. creating
   `omp.map.bounds` ops for implicitely caputured data like this PR
   does).
2. Adds support for implicitely capturing and mapping allocatable fields
   in record types.

There is quite some distant to still cover to have full support for
this. I added a number of todos to guide further development.

Co-authored-by: Andrew Gozillon <andrew.gozillon@amd.com>

Co-authored-by: Andrew Gozillon <andrew.gozillon@amd.com>
2024-12-18 05:37:58 +01:00
Valentin Clement (バレンタイン クレメン)
81333cfc52
[flang][cuda] Relax host array check for cuda constant (#120333)
Array with CONSTANT attribute declared in module spec part are device
arrays and should not trigger the host array check.
2024-12-17 17:04:32 -08:00
Valentin Clement (バレンタイン クレメン)
15c61a208f
[flang][cuda] Do not consider SHARED array as host array (#120306)
Update the current `FindHostArray` to not return shared array as host
array.
2024-12-17 13:42:14 -08:00
Valentin Clement (バレンタイン クレメン)
97b7bace67
[flang][cuda] Allow host array with PARAMETER attribute in device context (#120298)
Host arrays are normally not allowed in device context unless they have
a `PARAMETER` attribute. This patch update the check so no error is
emitted.
2024-12-17 13:41:24 -08:00
Valentin Clement (バレンタイン クレメン)
bbeafe4b94
[flang][cuda] Apply implict data attribute to local arrays (#120293)
Add the implicit data attribute to local arrays that don't have one.
This simplifies the host array detection in semantic.
2024-12-17 12:56:39 -08:00
Peter Klausler
a957cedea9
[flang] Handle substring in data statement constant (#120130)
The case of a constant substring wasn't handled in the parser for data
statement constants.

Fixes https://github.com/llvm/llvm-project/issues/119005.
2024-12-17 12:10:50 -08:00
Peter Klausler
b2c363e261
[flang] Fix generic resolution with actual/dummy procedure incompatib… (#120105)
…ility

We generally allow any legal procedure pointer target as an actual
argument for association with a dummy procedure, since many actual
procedures are underspecified EXTERNALs. But for proper generic
resolution, it is necessary to disallow incompatible functions with
explicit result types.

Fixes https://github.com/llvm/llvm-project/issues/119151.
2024-12-17 12:10:29 -08:00
Slava Zakharin
9d33874936
[flang] Support -f[no-]realloc-lhs. (#120165)
-frealloc-lhs is the default.
If -fno-realloc-lhs is specified, then an allocatable on the left
side of an intrinsic assignment is not implicitly (re)allocated
to conform with the right hand side. Fortran runtime will issue
an error if there is a mismatch in shape/type/allocation-status.
2024-12-17 09:06:05 -08:00
paperchalice
b07e7b76c5
[cmake] Drop AddFileDependencies and CMakeParseArguments (#120002)
Theses modules are deprecated and have trivial implementations in modern
cmake.
2024-12-17 19:24:32 +08:00
Valentin Clement (バレンタイン クレメン)
5e1f87e849
[flang][cuda] Correctly allocate memory for descriptor load (#120164)
CodeGen will allocate memory for a new descriptor on descriptor loads.
CUDA Fortran local descriptor are allocated in managed memory by the
runtime. The newly allocated storage for cuda descriptor must also be
allocated through the runtime.
2024-12-16 19:12:05 -08:00
Valentin Clement (バレンタイン クレメン)
67ae944bfa
[flang][cuda] Check for use of host array in device context (#119756)
Now that variables have implicit attribute, we can check for illegal use
of module host variable in device context.
2024-12-16 13:34:22 -08:00
Slava Zakharin
0b442bc516
[flang][NFC] Added debug output to opt-bufferization pass. (#119936) 2024-12-16 12:58:22 -08:00
Valentin Clement (バレンタイン クレメン)
f8656204d7
[flang][cuda] Do not lower device target in porgram as global (#120126)
As it was done in #102512, do not create global for arrays declared in
program unit with cuda data attribute.
2024-12-16 12:34:01 -08:00
Slava Zakharin
2402bccc80
[flang] Turn SimplifyHLFIRIntrinsics into a greedy rewriter. (#119946)
This is almost an NFC, except that folding changed ordering
of some operations.
2024-12-16 08:00:29 -08:00
Slava Zakharin
f239922cdc
[flang] Enable hlfir.sum inlining by default. (#119937)
There is already a LIT test for hlfir.sum inlining that uses
the engineering option. I would like to keep the option
for short period of time to be able to revert
in case of performance regressions that I was not able to see.
2024-12-16 07:59:15 -08:00
Valentin Clement (バレンタイン クレメン)
65e00315c9
[flang][cuda] Adapt TargetRewrite to support gpu.launch_func (#119933) 2024-12-16 06:53:46 -08:00
macurtis-amd
3c3094b60d
[flang] Ensure directive sentinels are in cols 1-5 in pp output (#119406)
Preprocessor output is intended to be valid fixed form.
2024-12-16 08:25:37 -06:00
Valentin Clement (バレンタイン クレメン)
1345ee4232
[flang][cuda] Do not apply implicit data attribute on dummy arg with VALUE (#119927)
Dummy arguments with the VALUE attribute do not need the implicit data
attribute.
2024-12-13 14:41:49 -08:00
Valentin Clement (バレンタイン クレメン)
3273d0bb14
[flang][cuda] Apply implicit data attribute only in device context (#119919)
Fix the condition so the implicit device data attribute is not applied
when the routine has `attribute(host)`
2024-12-13 13:43:33 -08:00
Slava Zakharin
a00946fc94
[flang] Simplify hlfir.sum total reductions. (#119482)
I am trying to switch to keeping the reduction value in a temporary
scalar location so that I can use hlfir::genLoopNest easily.
This also allows using omp.loop_nest with worksharing for OpenMP.
2024-12-13 13:08:28 -08:00
Slava Zakharin
af5d3afff5
[flang] Improve disjoint/identical slices recognition in opt-bufferization. (#119780)
The changes are needed to be able to optimize
'x(9,:)=SUM(x(1:8,:),DIM=1)'
without a temporary array. This pattern exists in exchange2.

The patch also fixes an existing problem in Flang with this test:
```
program main
  integer :: a(10) = (/1,2,3,4,5,6,7,8,9,10/)
  integer :: expected(10) = (/1,10,9,8,7,6,5,4,3,2/)
  print *, 'INPUT: ', a
  print *, 'EXPECTED: ', expected
  call test(a, 10, 2, 10, 9)
  print *, 'RESULT: ', a
contains
  subroutine test(a, size, x, y, z)
    integer :: x, y, z, size
    integer :: a(:)
    a(x:y:1) = a(z:x-1:-1) + 1
  end subroutine test
end program main
```
2024-12-13 13:08:02 -08:00
Krzysztof Parzyszek
c57a8f5b3f [flang][OpenMP] Remove redundant Fortran:: from namespaces, NFC
Apply clang-format after the changes.
2024-12-13 11:00:05 -06:00
Mats Petersson
75e6d0eb4d
[flang][OpenMP]Add support for OpenMP ERROR directive (#119582)
Lowering leads to a TODO, with a test to confirm.

Also testing unparse.

---------

Co-authored-by: Krzysztof Parzyszek <Krzysztof.Parzyszek@amd.com>
2024-12-13 14:05:48 +00:00
Ivan R. Ivanov
7c9404c279
[flang][OpenMP] Add frontend support for ompx_bare clause (#111106) 2024-12-13 21:44:43 +09:00
Valentin Clement (バレンタイン クレメン)
37978c466b
[flang][cuda] Remove unused variable 2024-12-12 15:04:16 -08:00
Valentin Clement (バレンタイン クレメン)
ea04148c27
[flang][cuda] Extend implicit global handling to any type descriptor (#119769)
Relax the check to also handle other type descriptor globals.
2024-12-12 14:52:49 -08:00
Valentin Clement (バレンタイン クレメン)
7141837957
[flang][cuda] Implicitly add DEVICE attribute in device/global functions (#119743)
Variables in global and device function/subroutine that have no CUDA
Fortran data attribute are implicitly DEVICE.
2024-12-12 12:47:34 -08:00
Slava Zakharin
139e69b7bc
[flang] Simple folding for hlfir.shape_of. (#119649)
This folding makes sure there are no hlfir.shape_of users
of hlfir.elemental - this may enable more InlineElementals matches,
because it is looking for exactly two uses of an hlfir.elemental.
2024-12-12 10:38:34 -08:00
Krzysztof Parzyszek
03cbe42627
[flang][OpenMP] Rework LINEAR clause (#119278)
The OmpLinearClause class was a variant of two classes, one for when the
linear modifier was present, and one for when it was absent. These two
classes did not follow the conventions for parse tree nodes, (i.e.
tuple/wrapper/union formats), which necessitated specialization of the
parse tree visitor.

The new form of OmpLinearClause is the standard tuple with a list of
modifiers and an object list. The specialization of parse tree visitor
for it has been removed.
Parsing and unparsing of the new form bears additional complexity due to
syntactical differences between OpenMP 5.2 and prior versions: in OpenMP
5.2 the argument list is post-modified, while in the prior versions, the
step modifier was a post-modifier while the linear modifier had an
unusual syntax of `modifier(list)`.

With this change the LINEAR clause is no different from any other
clauses in terms of its structure and use of modifiers. Modifier
validation and all other checks work the same as with other clauses.
2024-12-12 12:19:35 -06:00
Krzysztof Parzyszek
58f9c4fc00
[flang][OpenMP] Semantic checks for IN_REDUCTION and TASK_REDUCTION (#118841)
Update parsing of these two clauses and add semantic checks for them.
Simplify some code in IsReductionAllowedForType and
CheckReductionOperator.
2024-12-12 12:19:12 -06:00
Kareem Ergawy
f9734b9df1
[mlir][OpenMP] - MLIR to LLVMIR translation support for delayed privatization of allocatables in omp.target ops (#116576)
This PR adds support to translate the `private` clause from MLIR to
LLVMIR when used on allocatables in the context of an `omp.target` op.

This replaces https://github.com/llvm/llvm-project/pull/113208.

Parent PR: https://github.com/llvm/llvm-project/pull/116770. Only the
latest commit is relevant to the PR.
2024-12-12 14:39:58 +01:00
Tom Eccles
32403f79f4
[flang][unittests] fix test broken when run as root (#119604)
It is convenient to run tests as root inside of a docker container.

The test (and the library function it is testing) are already
unsupported on Windows so it is safe to use UNIX-isms here.
2024-12-12 09:41:44 +00:00
Valentin Clement (バレンタイン クレメン)
956d0dd624
[flang][cuda] Support builtin global in device global pass (#119626) 2024-12-11 17:09:56 -08:00
Valentin Clement (バレンタイン クレメン)
151901c762
[flang][rt][device] Use enum-set.h as Fortran.h (#119611) 2024-12-11 15:38:38 -08:00
Slava Zakharin
5eef9ba784
[flang] Inline hlfir.cshift as hlfir.elemental. (#119480) 2024-12-11 15:00:07 -08:00
Leandro Lupori
db9856b516
[flang][OpenMP][NFC] Turn symTable into a reference (#119435)
Convert `DataSharingProcessor::symTable` from pointer to reference.
This avoids accidental null pointer dereferences and makes it
possible to use `symTable` when delayed privatization is disabled.
2024-12-11 16:26:19 -03:00
Mats Petersson
00e1cc4c9d
[flang][OpenMP]Add support for fail clause (#118683)
Support the atomic compare option of a fail(memory-order) clauses.

Additional tests introduced to check that parsing and semantics checks
for the new clause is handled.

Lowering for atomic compare is still unsupported and wil end in a TOOD
(aka "Not yet implemented"). A test for this case with the fail clause
is also present.
2024-12-11 16:29:02 +00:00
Paul Osmialowski
03019c687f
[clang][driver] When -fveclib=ArmPL flag is in use, always link against libamath (#116432)
Using `-fveclib=ArmPL` without `-lamath` likely effects in the link-time
errors.
2024-12-11 14:01:29 +00:00
khaki3
609899f443
[flang][cuda] Avoid stack corruption when setting kernel launch parameters (#119469)
In order to get the pointer to a structure member, `getelementptr`
typically requires two indices: one to indicate the structure itself,
and another to specify the member's position. We are missing the former
in `GPULaunchKernelConversion`, so generated code may cause stack
corruption. This PR corrects the indices of a structure used as a kernel
launch temp.
2024-12-10 16:08:22 -08:00
Valentin Clement (バレンタイン クレメン)
850c932f05
[flang][cuda] Walk through cuf kernel for implicit globals (#119455)
Globals used in cuf kernel need to be flagged as well.
2024-12-10 14:01:53 -08:00
Valentin Clement (バレンタイン クレメン)
8c19c24a78
[flang][cuda][NFC] Add missing template declaration (#119443) 2024-12-10 13:10:23 -08:00
Valentin Clement (バレンタイン クレメン)
dc5236e6b1
[flang][cuda] Update target rewrite to work on gpu.func (#119283)
Update the pass so it can perform the signature rewrite on gpu.func.
2024-12-10 12:36:49 -08:00
khaki3
e9866d5d14
[flang][cuda] Fix GPULaunchKernelConversion to generate correct kernel launch parameters (#119431)
For the call to _FortranACUFLaunchKernel, we store the pointer to a
member of a temporary structure in a parameter array. However, when we
obtain an element pointer from the parameter array, its address is
calculated based on the type of the structure. This PR properly treats
the parameter array as an array of pointers.

Example:

```mlir
%30 = llvm.load %29 : !llvm.ptr -> i32
%31 = llvm.mlir.constant(1 : i32) : i32
%32 = llvm.alloca %31 x !llvm.struct<(i64, i64, i32, ptr)> : (i32) -> !llvm.ptr
%33 = llvm.mlir.constant(4 : i32) : i32
%34 = llvm.alloca %33 x !llvm.ptr : (i32) -> !llvm.ptr
%35 = llvm.mlir.constant(0 : i32) : i32
%36 = llvm.getelementptr %32[%35] : (!llvm.ptr, i32) -> !llvm.ptr, !llvm.struct<(i64, i64, i32, ptr)>
llvm.store %8, %36 : i64, !llvm.ptr
%37 = llvm.getelementptr %34[%35] : (!llvm.ptr, i32) -> !llvm.ptr, !llvm.struct<(i64, i64, i32, ptr)>
llvm.store %36, %37 : !llvm.ptr, !llvm.ptr
...
llvm.call @_FortranACUFLaunchKernel(%47, %8, %8, %8, %2, %8, %8, %7, %34, %48) : (!llvm.ptr, i64, i64, i64, i64, i64, i64, i32, !llvm.ptr, !llvm.ptr) -> () 
```
In this example, `%37 = llvm.getelementptr %34[%35] : (!llvm.ptr, i32)
-> !llvm.ptr, !llvm.struct<(i64, i64, i32, ptr)>` will be `%37 =
llvm.getelementptr %34[%35] : (!llvm.ptr, i32) -> !llvm.ptr, !llvm.ptr`.
2024-12-10 11:32:32 -08:00
Valentin Clement (バレンタイン クレメン)
0469bb91aa
[flang][cuda] Fix lowering when step is a variable (#119421)
Add missing conversion.
2024-12-10 09:48:15 -08:00
Slava Zakharin
c7634c1b61
[flang] Disabled hlfir.sum inlining by default. (#119287)
To temporarily address exchange2 perf regression reported in #118556
I disabled the inlining by default, and put it under engineering
option `-flang-simplify-hlfir-sum`.
2024-12-10 09:18:50 -08:00
jeanPerier
28a0ad09c1
[flang][hlfir] fix issue 118922 (#119219)
hlfir.elemental codegen optimize-out the final as_expr copy for temps
local to its body, but sometimes, clean-up may have been emitted for
this temp, and the code did not handle that.
This caused #118922 and @113843.

Only elide the copy if the as_expr is the last op.
2024-12-10 15:00:32 +01:00
Paul Osmialowski
f8a1f42dd5
[test][flang][driver] Fix test that assumes libomp default (#119368)
This patch supplements the fix introduced by PR #119319.
2024-12-10 13:52:55 +00:00