This is a starting PR to implicitly map allocatable record fields.
This PR contains the following changes:
1. Re-purposes some of the utils used in `Lower/OpenMP.cpp` so that
these utils work on the `mlir::Value` level rather than the
`semantics::Symbol` level. This takes one step towards to enabling
MLIR passes to more easily do some lowering themselves (e.g. creating
`omp.map.bounds` ops for implicitely caputured data like this PR
does).
2. Adds support for implicitely capturing and mapping allocatable fields
in record types.
There is quite some distant to still cover to have full support for
this. I added a number of todos to guide further development.
Co-authored-by: Andrew Gozillon <andrew.gozillon@amd.com>
Co-authored-by: Andrew Gozillon <andrew.gozillon@amd.com>
…ility
We generally allow any legal procedure pointer target as an actual
argument for association with a dummy procedure, since many actual
procedures are underspecified EXTERNALs. But for proper generic
resolution, it is necessary to disallow incompatible functions with
explicit result types.
Fixes https://github.com/llvm/llvm-project/issues/119151.
-frealloc-lhs is the default.
If -fno-realloc-lhs is specified, then an allocatable on the left
side of an intrinsic assignment is not implicitly (re)allocated
to conform with the right hand side. Fortran runtime will issue
an error if there is a mismatch in shape/type/allocation-status.
CodeGen will allocate memory for a new descriptor on descriptor loads.
CUDA Fortran local descriptor are allocated in managed memory by the
runtime. The newly allocated storage for cuda descriptor must also be
allocated through the runtime.
There is already a LIT test for hlfir.sum inlining that uses
the engineering option. I would like to keep the option
for short period of time to be able to revert
in case of performance regressions that I was not able to see.
I am trying to switch to keeping the reduction value in a temporary
scalar location so that I can use hlfir::genLoopNest easily.
This also allows using omp.loop_nest with worksharing for OpenMP.
The changes are needed to be able to optimize
'x(9,:)=SUM(x(1:8,:),DIM=1)'
without a temporary array. This pattern exists in exchange2.
The patch also fixes an existing problem in Flang with this test:
```
program main
integer :: a(10) = (/1,2,3,4,5,6,7,8,9,10/)
integer :: expected(10) = (/1,10,9,8,7,6,5,4,3,2/)
print *, 'INPUT: ', a
print *, 'EXPECTED: ', expected
call test(a, 10, 2, 10, 9)
print *, 'RESULT: ', a
contains
subroutine test(a, size, x, y, z)
integer :: x, y, z, size
integer :: a(:)
a(x:y:1) = a(z:x-1:-1) + 1
end subroutine test
end program main
```
This folding makes sure there are no hlfir.shape_of users
of hlfir.elemental - this may enable more InlineElementals matches,
because it is looking for exactly two uses of an hlfir.elemental.
The OmpLinearClause class was a variant of two classes, one for when the
linear modifier was present, and one for when it was absent. These two
classes did not follow the conventions for parse tree nodes, (i.e.
tuple/wrapper/union formats), which necessitated specialization of the
parse tree visitor.
The new form of OmpLinearClause is the standard tuple with a list of
modifiers and an object list. The specialization of parse tree visitor
for it has been removed.
Parsing and unparsing of the new form bears additional complexity due to
syntactical differences between OpenMP 5.2 and prior versions: in OpenMP
5.2 the argument list is post-modified, while in the prior versions, the
step modifier was a post-modifier while the linear modifier had an
unusual syntax of `modifier(list)`.
With this change the LINEAR clause is no different from any other
clauses in terms of its structure and use of modifiers. Modifier
validation and all other checks work the same as with other clauses.
It is convenient to run tests as root inside of a docker container.
The test (and the library function it is testing) are already
unsupported on Windows so it is safe to use UNIX-isms here.
Convert `DataSharingProcessor::symTable` from pointer to reference.
This avoids accidental null pointer dereferences and makes it
possible to use `symTable` when delayed privatization is disabled.
Support the atomic compare option of a fail(memory-order) clauses.
Additional tests introduced to check that parsing and semantics checks
for the new clause is handled.
Lowering for atomic compare is still unsupported and wil end in a TOOD
(aka "Not yet implemented"). A test for this case with the fail clause
is also present.
In order to get the pointer to a structure member, `getelementptr`
typically requires two indices: one to indicate the structure itself,
and another to specify the member's position. We are missing the former
in `GPULaunchKernelConversion`, so generated code may cause stack
corruption. This PR corrects the indices of a structure used as a kernel
launch temp.
For the call to _FortranACUFLaunchKernel, we store the pointer to a
member of a temporary structure in a parameter array. However, when we
obtain an element pointer from the parameter array, its address is
calculated based on the type of the structure. This PR properly treats
the parameter array as an array of pointers.
Example:
```mlir
%30 = llvm.load %29 : !llvm.ptr -> i32
%31 = llvm.mlir.constant(1 : i32) : i32
%32 = llvm.alloca %31 x !llvm.struct<(i64, i64, i32, ptr)> : (i32) -> !llvm.ptr
%33 = llvm.mlir.constant(4 : i32) : i32
%34 = llvm.alloca %33 x !llvm.ptr : (i32) -> !llvm.ptr
%35 = llvm.mlir.constant(0 : i32) : i32
%36 = llvm.getelementptr %32[%35] : (!llvm.ptr, i32) -> !llvm.ptr, !llvm.struct<(i64, i64, i32, ptr)>
llvm.store %8, %36 : i64, !llvm.ptr
%37 = llvm.getelementptr %34[%35] : (!llvm.ptr, i32) -> !llvm.ptr, !llvm.struct<(i64, i64, i32, ptr)>
llvm.store %36, %37 : !llvm.ptr, !llvm.ptr
...
llvm.call @_FortranACUFLaunchKernel(%47, %8, %8, %8, %2, %8, %8, %7, %34, %48) : (!llvm.ptr, i64, i64, i64, i64, i64, i64, i32, !llvm.ptr, !llvm.ptr) -> ()
```
In this example, `%37 = llvm.getelementptr %34[%35] : (!llvm.ptr, i32)
-> !llvm.ptr, !llvm.struct<(i64, i64, i32, ptr)>` will be `%37 =
llvm.getelementptr %34[%35] : (!llvm.ptr, i32) -> !llvm.ptr, !llvm.ptr`.
To temporarily address exchange2 perf regression reported in #118556
I disabled the inlining by default, and put it under engineering
option `-flang-simplify-hlfir-sum`.
hlfir.elemental codegen optimize-out the final as_expr copy for temps
local to its body, but sometimes, clean-up may have been emitted for
this temp, and the code did not handle that.
This caused #118922 and @113843.
Only elide the copy if the as_expr is the last op.