9399 Commits

Author SHA1 Message Date
Kareem Ergawy
f9734b9df1
[mlir][OpenMP] - MLIR to LLVMIR translation support for delayed privatization of allocatables in omp.target ops (#116576)
This PR adds support to translate the `private` clause from MLIR to
LLVMIR when used on allocatables in the context of an `omp.target` op.

This replaces https://github.com/llvm/llvm-project/pull/113208.

Parent PR: https://github.com/llvm/llvm-project/pull/116770. Only the
latest commit is relevant to the PR.
2024-12-12 14:39:58 +01:00
Tom Eccles
32403f79f4
[flang][unittests] fix test broken when run as root (#119604)
It is convenient to run tests as root inside of a docker container.

The test (and the library function it is testing) are already
unsupported on Windows so it is safe to use UNIX-isms here.
2024-12-12 09:41:44 +00:00
Valentin Clement (バレンタイン クレメン)
956d0dd624
[flang][cuda] Support builtin global in device global pass (#119626) 2024-12-11 17:09:56 -08:00
Valentin Clement (バレンタイン クレメン)
151901c762
[flang][rt][device] Use enum-set.h as Fortran.h (#119611) 2024-12-11 15:38:38 -08:00
Slava Zakharin
5eef9ba784
[flang] Inline hlfir.cshift as hlfir.elemental. (#119480) 2024-12-11 15:00:07 -08:00
Leandro Lupori
db9856b516
[flang][OpenMP][NFC] Turn symTable into a reference (#119435)
Convert `DataSharingProcessor::symTable` from pointer to reference.
This avoids accidental null pointer dereferences and makes it
possible to use `symTable` when delayed privatization is disabled.
2024-12-11 16:26:19 -03:00
Mats Petersson
00e1cc4c9d
[flang][OpenMP]Add support for fail clause (#118683)
Support the atomic compare option of a fail(memory-order) clauses.

Additional tests introduced to check that parsing and semantics checks
for the new clause is handled.

Lowering for atomic compare is still unsupported and wil end in a TOOD
(aka "Not yet implemented"). A test for this case with the fail clause
is also present.
2024-12-11 16:29:02 +00:00
Paul Osmialowski
03019c687f
[clang][driver] When -fveclib=ArmPL flag is in use, always link against libamath (#116432)
Using `-fveclib=ArmPL` without `-lamath` likely effects in the link-time
errors.
2024-12-11 14:01:29 +00:00
khaki3
609899f443
[flang][cuda] Avoid stack corruption when setting kernel launch parameters (#119469)
In order to get the pointer to a structure member, `getelementptr`
typically requires two indices: one to indicate the structure itself,
and another to specify the member's position. We are missing the former
in `GPULaunchKernelConversion`, so generated code may cause stack
corruption. This PR corrects the indices of a structure used as a kernel
launch temp.
2024-12-10 16:08:22 -08:00
Valentin Clement (バレンタイン クレメン)
850c932f05
[flang][cuda] Walk through cuf kernel for implicit globals (#119455)
Globals used in cuf kernel need to be flagged as well.
2024-12-10 14:01:53 -08:00
Valentin Clement (バレンタイン クレメン)
8c19c24a78
[flang][cuda][NFC] Add missing template declaration (#119443) 2024-12-10 13:10:23 -08:00
Valentin Clement (バレンタイン クレメン)
dc5236e6b1
[flang][cuda] Update target rewrite to work on gpu.func (#119283)
Update the pass so it can perform the signature rewrite on gpu.func.
2024-12-10 12:36:49 -08:00
khaki3
e9866d5d14
[flang][cuda] Fix GPULaunchKernelConversion to generate correct kernel launch parameters (#119431)
For the call to _FortranACUFLaunchKernel, we store the pointer to a
member of a temporary structure in a parameter array. However, when we
obtain an element pointer from the parameter array, its address is
calculated based on the type of the structure. This PR properly treats
the parameter array as an array of pointers.

Example:

```mlir
%30 = llvm.load %29 : !llvm.ptr -> i32
%31 = llvm.mlir.constant(1 : i32) : i32
%32 = llvm.alloca %31 x !llvm.struct<(i64, i64, i32, ptr)> : (i32) -> !llvm.ptr
%33 = llvm.mlir.constant(4 : i32) : i32
%34 = llvm.alloca %33 x !llvm.ptr : (i32) -> !llvm.ptr
%35 = llvm.mlir.constant(0 : i32) : i32
%36 = llvm.getelementptr %32[%35] : (!llvm.ptr, i32) -> !llvm.ptr, !llvm.struct<(i64, i64, i32, ptr)>
llvm.store %8, %36 : i64, !llvm.ptr
%37 = llvm.getelementptr %34[%35] : (!llvm.ptr, i32) -> !llvm.ptr, !llvm.struct<(i64, i64, i32, ptr)>
llvm.store %36, %37 : !llvm.ptr, !llvm.ptr
...
llvm.call @_FortranACUFLaunchKernel(%47, %8, %8, %8, %2, %8, %8, %7, %34, %48) : (!llvm.ptr, i64, i64, i64, i64, i64, i64, i32, !llvm.ptr, !llvm.ptr) -> () 
```
In this example, `%37 = llvm.getelementptr %34[%35] : (!llvm.ptr, i32)
-> !llvm.ptr, !llvm.struct<(i64, i64, i32, ptr)>` will be `%37 =
llvm.getelementptr %34[%35] : (!llvm.ptr, i32) -> !llvm.ptr, !llvm.ptr`.
2024-12-10 11:32:32 -08:00
Valentin Clement (バレンタイン クレメン)
0469bb91aa
[flang][cuda] Fix lowering when step is a variable (#119421)
Add missing conversion.
2024-12-10 09:48:15 -08:00
Slava Zakharin
c7634c1b61
[flang] Disabled hlfir.sum inlining by default. (#119287)
To temporarily address exchange2 perf regression reported in #118556
I disabled the inlining by default, and put it under engineering
option `-flang-simplify-hlfir-sum`.
2024-12-10 09:18:50 -08:00
jeanPerier
28a0ad09c1
[flang][hlfir] fix issue 118922 (#119219)
hlfir.elemental codegen optimize-out the final as_expr copy for temps
local to its body, but sometimes, clean-up may have been emitted for
this temp, and the code did not handle that.
This caused #118922 and @113843.

Only elide the copy if the as_expr is the last op.
2024-12-10 15:00:32 +01:00
Paul Osmialowski
f8a1f42dd5
[test][flang][driver] Fix test that assumes libomp default (#119368)
This patch supplements the fix introduced by PR #119319.
2024-12-10 13:52:55 +00:00
NimishMishra
edc50f3954
[flang][OpenMP] Add lowering support for task detach (#119128)
This PR adds lowering task detach to MLIR.
2024-12-10 03:25:06 -08:00
执着
e8baa792e7
Backtrace support for flang (#118179)
Fixed build failures in old PRs due to missing files
2024-12-10 10:31:48 +00:00
Yusuke MINATO
a88677edc0
Reland "[flang] Integrate the option -flang-experimental-integer-overflow into -fno-wrapv" (#118933)
This relands #110063.
The performance issue on 503.bwaves_r is found not to be related to the
patch, and is resolved by fbd89bcc when LTO is enabled.
2024-12-10 16:26:53 +09:00
Valentin Clement
7bcd459dce [flang][cuda][NFC] Fix typo in test filename 2024-12-09 19:22:30 -08:00
Valentin Clement (バレンタイン クレメン)
a1d71c3693
[flang][cuda] Additional update to ExternalNameConversion (#119276) 2024-12-09 17:39:51 -08:00
Valentin Clement (バレンタイン クレメン)
650e736904
[flang][cuda][NFC] Add some diagnostic when module or fct are not found (#119277) 2024-12-09 17:39:36 -08:00
Valentin Clement (バレンタイン クレメン)
75623bfe1b
[flang][cuda] Handle gpu.return in AbstractResult pass (#119035) 2024-12-09 17:39:16 -08:00
Razvan Lupusoru
a0eb794da8
[MLIR][acc] Introduce varType to acc data clause operations (#119007)
The acc data clause operations hold an operand named `varPtr`. This was
intended to hold a pointer to a variable - where the element type of
that pointer specifies the type of the variable. However, for both
memref and llvm dialects, this assumption is not true. This is because
memref element type for cases like memref<10xf32> is simply f32 and for
LLVM, after opaque pointers, the variable type is no longer recoverable.

Thus, introduce varType to ensure that appropriate semantics are kept.

Both the parser and printer for this new type attribute allow it to not
be specified in cases where a dialect's getElementType() applied to
`varPtr`'s type has a recoverable type. And more specifically, for FIR,
no changes are needed in the MLIR unit tests.
2024-12-09 15:14:48 -08:00
Slava Zakharin
44cd8f0d06
[flang] Lower CSHIFT to hlfir.cshift operation. (#118917) 2024-12-09 14:02:58 -08:00
Valentin Clement (バレンタイン クレメン)
1d4b5c161f
[flang][cuda] Change how abstract result pass is scheduled on func.func and gpu.func (#119034)
Use `pm.nest` to schedule the pass on nested `func.func` and `gpu.func`
in the `gpu.module`.

AbstractResult pass is not meant to run on the whole gpu.module at once.
2024-12-09 13:31:27 -08:00
Slava Zakharin
110b891f93
[flang] Added lowering for hlfir.cshift operation. (#118918) 2024-12-09 11:02:11 -08:00
Kiran Chandramohan
4e59721cc6
[Flang][OpenMP] Make boxed procedure pass aware of OpenMP private ops (#118261)
Fixes #109727
2024-12-09 17:27:18 +00:00
Kiran Chandramohan
2344cc4983
[Flang] Update Maintainers (#117124)
Move to a markdown file and update maintainers.
This brings the project closer to updated guidance
(https://llvm.org/docs/DeveloperPolicy.html#maintainers). A list of
active and inactive maintainers is provided. Maintainers are also
grouped into lead or component maintainers.
2024-12-09 17:18:06 +00:00
Slava Zakharin
084451cdd2
[flang] Do not inline SUM with invalid DIM argument. (#118911)
Such SUMs might appear in dead code after constant propagation.
They do not have to be inlined.
2024-12-09 07:55:52 -08:00
Slava Zakharin
1ca392764a
[flang] Added definition of hlfir.cshift operation. (#118732)
CSHIFT intrinsic will be lowered to this operation, which
then can be optimized as inline sequence or lowered into
a runtime call.
2024-12-09 07:55:22 -08:00
Zhaoxin Yang
669f704d0d
[Flang][LoongArch] Enable clang command-line options in flang. (#118244)
Mainly including the following LoongArch specific options: -m[no-]lsx,
-m[no-]lasx, -msimd=, -m[no-]frecipe, -m[no-]lam-bh, -m[no-]lamcas,
-m[no-]ld-seq-sa, -m[no-]div32,
-m[no-]annotate-tablejump
2024-12-09 19:59:39 +08:00
Valentin Clement (バレンタイン クレメン)
16c2a1016e
Revert "[flang] Allow to pass an async id to allocate the descriptor (#118713)" (#119109)
This reverts commit 7d1c661381d36018fd105f4ad4c2d6dc45e7288b.

This commit breaks some device runtime builds. Need time to investigate.
2024-12-07 19:55:12 -08:00
Paul Osmialowski
755519f7f6
[clang][driver] Use $ prefix with config file options to have them added after all of the command line options (#117573)
Currently, if a -l (or -Wl,) flag is added into a config file
(e.g. clang.cfg), it is situated before any object file in the
effective command line. If the library requested by given -l flag is
static, its symbols will not be made visible to any of the object
files provided by the user. Also, the presence of any of the linker
flags in a config file confuses the driver whenever the user invokes
clang without any parameters (see issue #67209).

This patch attempts to solve both of the problems, by allowing a split
of the arguments list into two parts. The head part of the list will
be used as before, but the tail part will be appended after the
command line flags provided by the user and only when it is known
that the linking should occur. The $-prefixed arguments will be added
to the tail part.
2024-12-07 11:18:44 +00:00
Thirumalai Shaktivel
e73ec1a74a
[Flang][OpenMP] Add some semantic checks for Linear clause (#111354)
This PR adds all the missing semantics for the Linear clause based on
the OpenMP 5.2 restrictions. The restriction details are mentioned
below.

OpenMP 5.2:
5.4.6 linear Clause restrictions
- A linear-modifier may be specified as ref or uval only on a declare
simd directive.
- If linear-modifier is not ref, all list items must be of type integer.
- If linear-modifier is ref or uval, all list items must be dummy
arguments without the VALUE attribute.
- List items must not be Cray pointers or variables that have the
POINTER attribute. Cray pointer support has been deprecated.
- If linear-modifier is ref, list items must be polymorphic variables,
assumed-shape arrays, or variables with the ALLOCATABLE attribute.
- A common block name must not appear in a linear clause.
- The list-item cannot appear more than once

4.4.4 ordered Clause restriction
- If n is explicitly specified, a linear clause must not be specified on
the same directive.

5.11 aligned Clause restriction
- Each list item must have C_PTR or Cray pointer type or have the
POINTER or ALLOCATABLE attribute. Cray pointer support has been
deprecated.
2024-12-06 12:11:46 -06:00
Krzysztof Parzyszek
02db35a1d6
[flang][OpenMP] Implement CheckReductionObjects for all reduction c… (#118689)
…lauses

Currently we only do semantic checks for REDUCTION. There are two other
clauses, IN_REDUCTION, and TASK_REDUCTION which will also need those
checks. Implement a function that checks the common list-item
requirements for all those clauses.
2024-12-06 12:00:48 -06:00
jeanPerier
d6ec7c82f3
[flang][CUF] fix missing header after #112188 (#118993)
Otherwise, builds with `-DFLANG_CUF_RUNTIME` hits:

```
runtime/CUDA/descriptor.cpp:44:24: error: invalid use of incomplete type 'const class Fortran::runtime::Descriptor'
   44 |   std::size_t count{src->SizeInBytes()};
```
2024-12-06 17:22:47 +01:00
Michael Kruse
c91ba04328
[Flang][NFC] Split runtime headers in preparation for cross-compilation. (#112188)
Split some headers into headers for public and private declarations in
preparation for #110217. Moving the runtime-private headers in
runtime-private include directory will occur in #110298.

* Do not use `sizeof(Descriptor)` in the compiler. The size of the
descriptor is target-dependent while `sizeof(Descriptor)` is the size of
the Descriptor for the host platform which might be too small when
cross-compiling to a different platform. Another problem is that the
emitted assembly ((cross-)compiling to the same target) is not identical
between Flang's running on different systems. Moving the declaration of
`class Descriptor` out of the included header will also reduce the
amount of #included sources.

* Do not use `sizeof(ArrayConstructorVector)` and
`alignof(ArrayConstructorVector)` in the compiler. Same reason as with
`Descriptor`.

* Compute the descriptor's extra flags without instantiating a
Descriptor. `Fortran::runtime::Descriptor` is defined in the runtime
source, but not the compiler source.

* Move `InquiryKeywordHashDecode` into runtime-private header. The
function is defined in the runtime sources and trying to call it in the
compiler would lead to a link-error.

* Move allocator-kind magic numbers into common header. They are the
only declarations out of `allocator-registry.h` in the compiler as well.
 
This does not make Flang cross-compile ready yet, the main goal is to
avoid transitive header dependencies from Flang to clang-rt. There are
more assumptions that host platform is the same as the target platform.
2024-12-06 15:29:00 +01:00
Renaud Kauffmann
27e458c8cb
[flang][cuda] Distinguish constant fir.global from globals with a #cuf.cuda<constant> attribute (#118912)
1. In `CufOpConversion` `isDeviceGlobal` was renamed
`isRegisteredGlobal` and moved to the common file. `isRegisteredGlobal`
excludes constant `fir.global` operation from registration. This is to
avoid calls to `_FortranACUFGetDeviceAddress` on globals which do not
have any symbols in the runtime. This was done for
`_FortranACUFRegisterVariable` in #118582, but also needs to be done
here after #118591
2. `CufDeviceGlobal` no longer adds the `#cuf.cuda<constant>` attribute
to the constant global. As discussed in #118582 a module variable with
the #cuf.cuda<constant> attribute is not a compile time constant. Yet,
the compile time constant also needs to be copied into the GPU module.
The candidates for copy to the GPU modules are
- the globals needing regsitrations regardless of their uses in device
code (they can be referred to in host code as well)
       - the compile time constant when used in device code 

3. The registration of "constant" module device variables (
#cuf.cuda<constant>) can be restored in `CufAddConstructor`
2024-12-05 18:36:48 -08:00
Slava Zakharin
cc46d0bee9
[flang] Expand SUM(DIM=CONSTANT) into an hlfir.elemental. (#118556)
An array SUM with the specified constant DIM argument
may be expanded into hlfir.elemental with a reduction loop
inside it processing all elements of the specified dimension.
The expansion allows further optimization of the cases like
`A=SUM(B+1,DIM=1)` in the optimized bufferization pass
(given that it can prove there are no read/write conflicts).
2024-12-05 09:36:12 -08:00
Slava Zakharin
3f0cc068ce
[flang] Assume matching shapes in elemental assignment with non-realloc lhs. (#118552)
The optimized bufferization pass cannot optimize very simple cases of
elemental
assignments, because of the suboptimal checks order. This patch relies
on the fact that in a legal program the lhs and rhs of an assignment
have matching shapes, when lhs is not an allocatable and rhs is a result
of an elemental array operation.
2024-12-05 09:34:32 -08:00
Valentin Clement (バレンタイン クレメン)
83ccaad473
[flang][cuda] Use async id for device stream allocation (#118733)
When stream is specified use cudaMallocAsync with the specified stream
2024-12-05 08:57:10 -08:00
Krzysztof Parzyszek
8a90b5b317 [flang][test] Change re.I to flags=re.I in re.sub
Follow-up to da6099c9ad. As a positional argument, the `re.I` was in
place of `count`, not `flags`.
2024-12-05 09:41:40 -06:00
jeanPerier
ff78cd5f3d
[flang] fix private pointers and default initialized variables (#118494)
Both OpenMP privatization and DO CONCURRENT LOCAL lowering was incorrect
for pointers and derived type with default initialization.

For pointers, the descriptor was not established with the rank/type
code/element size, leading to undefined behavior if any inquiry was made
to it prior to a pointer assignment (and if/when using the runtime for
pointer assignments, the descriptor must have been established).

For derived type with default initialization, the copies were not
default initialized.
2024-12-05 14:09:48 +01:00
Krzysztof Parzyszek
da6099c9ad
[flang][test] Recognize !$acc and !$omp spelled with capital letters (#118666)
If there are any continuation lines in the source, they will be printed
by the unparser with capital letters (at least in case of OpenMP). To
avoid having them stripped out, recognize their spellings using capital
letters as well.

---------

Co-authored-by: Michael Kruse <github@meinersbur.de>
2024-12-05 06:44:38 -06:00
Michael Kruse
0cda970ecc
[Flang][NFC] Split common headers to reduce dependencies. (#110244)
Fortran.h and target.h are defining symbols where some are used by both, the Fortran runtime (Flang-RT) and Fortran compiler (Flang), and others are used by Flang only. With the upcoming refactoring of the Fortran runtime into its own subproject (#110217), move the declarations that are used by both into new headers to minimize the amount of code that will need to be shared by Flang-RT and Flang.

Details:

 * `Fortran.h`: Flang-RT  only uses some enum definitions out of this file, but not `AsFortran` which is defined in `Fortran.cpp`. Moving the enums into `Fortran-consts.h` allows keeping `Fortran.cpp` within Flang.

 * `target.h`: Contains some floating-point definitions that is used by the non-GTest unittests in `fp-testing.h`. Flang-RT also uses some non-GTest as well. Moving those definitions avoids the dependence on the entire FortranEvaluate library.
2024-12-05 11:29:32 +01:00
Valentin Clement (バレンタイン クレメン)
7d1c661381
[flang] Allow to pass an async id to allocate the descriptor (#118713)
This is a patch in preparation for the support stream ordered memory
allocator in CUDA Fortran.

This patch adds an asynchronous id to the AllocatableAllocate runtime
function and to Descriptor::Allocate so it can be passed down to the
registered allocator. It is up to the allocator to use this value or
not.

A follow up patch will implement that asynchronous allocator for CUDA
Fortran.
2024-12-04 18:24:40 -08:00
vdonaldson
df43af40ec
Vkd1 (#118721) 2024-12-04 19:16:49 -05:00
vdonaldson
17f99accf2
[flang] build test fix/suppression (#118716) 2024-12-04 18:47:45 -05:00