It is convenient to run tests as root inside of a docker container.
The test (and the library function it is testing) are already
unsupported on Windows so it is safe to use UNIX-isms here.
Convert `DataSharingProcessor::symTable` from pointer to reference.
This avoids accidental null pointer dereferences and makes it
possible to use `symTable` when delayed privatization is disabled.
Support the atomic compare option of a fail(memory-order) clauses.
Additional tests introduced to check that parsing and semantics checks
for the new clause is handled.
Lowering for atomic compare is still unsupported and wil end in a TOOD
(aka "Not yet implemented"). A test for this case with the fail clause
is also present.
In order to get the pointer to a structure member, `getelementptr`
typically requires two indices: one to indicate the structure itself,
and another to specify the member's position. We are missing the former
in `GPULaunchKernelConversion`, so generated code may cause stack
corruption. This PR corrects the indices of a structure used as a kernel
launch temp.
For the call to _FortranACUFLaunchKernel, we store the pointer to a
member of a temporary structure in a parameter array. However, when we
obtain an element pointer from the parameter array, its address is
calculated based on the type of the structure. This PR properly treats
the parameter array as an array of pointers.
Example:
```mlir
%30 = llvm.load %29 : !llvm.ptr -> i32
%31 = llvm.mlir.constant(1 : i32) : i32
%32 = llvm.alloca %31 x !llvm.struct<(i64, i64, i32, ptr)> : (i32) -> !llvm.ptr
%33 = llvm.mlir.constant(4 : i32) : i32
%34 = llvm.alloca %33 x !llvm.ptr : (i32) -> !llvm.ptr
%35 = llvm.mlir.constant(0 : i32) : i32
%36 = llvm.getelementptr %32[%35] : (!llvm.ptr, i32) -> !llvm.ptr, !llvm.struct<(i64, i64, i32, ptr)>
llvm.store %8, %36 : i64, !llvm.ptr
%37 = llvm.getelementptr %34[%35] : (!llvm.ptr, i32) -> !llvm.ptr, !llvm.struct<(i64, i64, i32, ptr)>
llvm.store %36, %37 : !llvm.ptr, !llvm.ptr
...
llvm.call @_FortranACUFLaunchKernel(%47, %8, %8, %8, %2, %8, %8, %7, %34, %48) : (!llvm.ptr, i64, i64, i64, i64, i64, i64, i32, !llvm.ptr, !llvm.ptr) -> ()
```
In this example, `%37 = llvm.getelementptr %34[%35] : (!llvm.ptr, i32)
-> !llvm.ptr, !llvm.struct<(i64, i64, i32, ptr)>` will be `%37 =
llvm.getelementptr %34[%35] : (!llvm.ptr, i32) -> !llvm.ptr, !llvm.ptr`.
To temporarily address exchange2 perf regression reported in #118556
I disabled the inlining by default, and put it under engineering
option `-flang-simplify-hlfir-sum`.
hlfir.elemental codegen optimize-out the final as_expr copy for temps
local to its body, but sometimes, clean-up may have been emitted for
this temp, and the code did not handle that.
This caused #118922 and @113843.
Only elide the copy if the as_expr is the last op.
The acc data clause operations hold an operand named `varPtr`. This was
intended to hold a pointer to a variable - where the element type of
that pointer specifies the type of the variable. However, for both
memref and llvm dialects, this assumption is not true. This is because
memref element type for cases like memref<10xf32> is simply f32 and for
LLVM, after opaque pointers, the variable type is no longer recoverable.
Thus, introduce varType to ensure that appropriate semantics are kept.
Both the parser and printer for this new type attribute allow it to not
be specified in cases where a dialect's getElementType() applied to
`varPtr`'s type has a recoverable type. And more specifically, for FIR,
no changes are needed in the MLIR unit tests.
Use `pm.nest` to schedule the pass on nested `func.func` and `gpu.func`
in the `gpu.module`.
AbstractResult pass is not meant to run on the whole gpu.module at once.
Move to a markdown file and update maintainers.
This brings the project closer to updated guidance
(https://llvm.org/docs/DeveloperPolicy.html#maintainers). A list of
active and inactive maintainers is provided. Maintainers are also
grouped into lead or component maintainers.
Mainly including the following LoongArch specific options: -m[no-]lsx,
-m[no-]lasx, -msimd=, -m[no-]frecipe, -m[no-]lam-bh, -m[no-]lamcas,
-m[no-]ld-seq-sa, -m[no-]div32,
-m[no-]annotate-tablejump
Currently, if a -l (or -Wl,) flag is added into a config file
(e.g. clang.cfg), it is situated before any object file in the
effective command line. If the library requested by given -l flag is
static, its symbols will not be made visible to any of the object
files provided by the user. Also, the presence of any of the linker
flags in a config file confuses the driver whenever the user invokes
clang without any parameters (see issue #67209).
This patch attempts to solve both of the problems, by allowing a split
of the arguments list into two parts. The head part of the list will
be used as before, but the tail part will be appended after the
command line flags provided by the user and only when it is known
that the linking should occur. The $-prefixed arguments will be added
to the tail part.
This PR adds all the missing semantics for the Linear clause based on
the OpenMP 5.2 restrictions. The restriction details are mentioned
below.
OpenMP 5.2:
5.4.6 linear Clause restrictions
- A linear-modifier may be specified as ref or uval only on a declare
simd directive.
- If linear-modifier is not ref, all list items must be of type integer.
- If linear-modifier is ref or uval, all list items must be dummy
arguments without the VALUE attribute.
- List items must not be Cray pointers or variables that have the
POINTER attribute. Cray pointer support has been deprecated.
- If linear-modifier is ref, list items must be polymorphic variables,
assumed-shape arrays, or variables with the ALLOCATABLE attribute.
- A common block name must not appear in a linear clause.
- The list-item cannot appear more than once
4.4.4 ordered Clause restriction
- If n is explicitly specified, a linear clause must not be specified on
the same directive.
5.11 aligned Clause restriction
- Each list item must have C_PTR or Cray pointer type or have the
POINTER or ALLOCATABLE attribute. Cray pointer support has been
deprecated.
…lauses
Currently we only do semantic checks for REDUCTION. There are two other
clauses, IN_REDUCTION, and TASK_REDUCTION which will also need those
checks. Implement a function that checks the common list-item
requirements for all those clauses.
Otherwise, builds with `-DFLANG_CUF_RUNTIME` hits:
```
runtime/CUDA/descriptor.cpp:44:24: error: invalid use of incomplete type 'const class Fortran::runtime::Descriptor'
44 | std::size_t count{src->SizeInBytes()};
```
Split some headers into headers for public and private declarations in
preparation for #110217. Moving the runtime-private headers in
runtime-private include directory will occur in #110298.
* Do not use `sizeof(Descriptor)` in the compiler. The size of the
descriptor is target-dependent while `sizeof(Descriptor)` is the size of
the Descriptor for the host platform which might be too small when
cross-compiling to a different platform. Another problem is that the
emitted assembly ((cross-)compiling to the same target) is not identical
between Flang's running on different systems. Moving the declaration of
`class Descriptor` out of the included header will also reduce the
amount of #included sources.
* Do not use `sizeof(ArrayConstructorVector)` and
`alignof(ArrayConstructorVector)` in the compiler. Same reason as with
`Descriptor`.
* Compute the descriptor's extra flags without instantiating a
Descriptor. `Fortran::runtime::Descriptor` is defined in the runtime
source, but not the compiler source.
* Move `InquiryKeywordHashDecode` into runtime-private header. The
function is defined in the runtime sources and trying to call it in the
compiler would lead to a link-error.
* Move allocator-kind magic numbers into common header. They are the
only declarations out of `allocator-registry.h` in the compiler as well.
This does not make Flang cross-compile ready yet, the main goal is to
avoid transitive header dependencies from Flang to clang-rt. There are
more assumptions that host platform is the same as the target platform.
1. In `CufOpConversion` `isDeviceGlobal` was renamed
`isRegisteredGlobal` and moved to the common file. `isRegisteredGlobal`
excludes constant `fir.global` operation from registration. This is to
avoid calls to `_FortranACUFGetDeviceAddress` on globals which do not
have any symbols in the runtime. This was done for
`_FortranACUFRegisterVariable` in #118582, but also needs to be done
here after #118591
2. `CufDeviceGlobal` no longer adds the `#cuf.cuda<constant>` attribute
to the constant global. As discussed in #118582 a module variable with
the #cuf.cuda<constant> attribute is not a compile time constant. Yet,
the compile time constant also needs to be copied into the GPU module.
The candidates for copy to the GPU modules are
- the globals needing regsitrations regardless of their uses in device
code (they can be referred to in host code as well)
- the compile time constant when used in device code
3. The registration of "constant" module device variables (
#cuf.cuda<constant>) can be restored in `CufAddConstructor`
An array SUM with the specified constant DIM argument
may be expanded into hlfir.elemental with a reduction loop
inside it processing all elements of the specified dimension.
The expansion allows further optimization of the cases like
`A=SUM(B+1,DIM=1)` in the optimized bufferization pass
(given that it can prove there are no read/write conflicts).
The optimized bufferization pass cannot optimize very simple cases of
elemental
assignments, because of the suboptimal checks order. This patch relies
on the fact that in a legal program the lhs and rhs of an assignment
have matching shapes, when lhs is not an allocatable and rhs is a result
of an elemental array operation.
Both OpenMP privatization and DO CONCURRENT LOCAL lowering was incorrect
for pointers and derived type with default initialization.
For pointers, the descriptor was not established with the rank/type
code/element size, leading to undefined behavior if any inquiry was made
to it prior to a pointer assignment (and if/when using the runtime for
pointer assignments, the descriptor must have been established).
For derived type with default initialization, the copies were not
default initialized.
If there are any continuation lines in the source, they will be printed
by the unparser with capital letters (at least in case of OpenMP). To
avoid having them stripped out, recognize their spellings using capital
letters as well.
---------
Co-authored-by: Michael Kruse <github@meinersbur.de>
Fortran.h and target.h are defining symbols where some are used by both, the Fortran runtime (Flang-RT) and Fortran compiler (Flang), and others are used by Flang only. With the upcoming refactoring of the Fortran runtime into its own subproject (#110217), move the declarations that are used by both into new headers to minimize the amount of code that will need to be shared by Flang-RT and Flang.
Details:
* `Fortran.h`: Flang-RT only uses some enum definitions out of this file, but not `AsFortran` which is defined in `Fortran.cpp`. Moving the enums into `Fortran-consts.h` allows keeping `Fortran.cpp` within Flang.
* `target.h`: Contains some floating-point definitions that is used by the non-GTest unittests in `fp-testing.h`. Flang-RT also uses some non-GTest as well. Moving those definitions avoids the dependence on the entire FortranEvaluate library.
This is a patch in preparation for the support stream ordered memory
allocator in CUDA Fortran.
This patch adds an asynchronous id to the AllocatableAllocate runtime
function and to Descriptor::Allocate so it can be passed down to the
registered allocator. It is up to the allocator to use this value or
not.
A follow up patch will implement that asynchronous allocator for CUDA
Fortran.