Following the conclusion of the
[RFC](https://discourse.llvm.org/t/rfc-names-for-flang-rt-libraries/84321),
rename Flang's runtime libraries as follows:
* libFortranRuntime.(a|so) to libflang_rt.runtime.(a|so)
* libFortranFloat128Math.a to libflang_rt.quadmath.a
* libCufRuntime_cuda_${CUDAToolkit_VERSION_MAJOR}.(a|so) to
libflang_rt.cuda_${CUDAToolkit_VERSION_MAJOR}.(a|so)
This follows the same naming scheme as Compiler-RT libraries
(`libclang_rt.${component}.(a|so)`). It provides some consistency
between Flang's runtime libraries for current and potential future
library components.
Introduce the CMake switch FLANG_INCLUDE_RUNTIME. When set to off, do
not add build instructions for the runtime.
This is required for Flang-RT (#110217) and the current runtime CMake
code to co-exist. When using `LLVM_ENABLE_RUNTIME=flang-rt`, the in-tree
build instructions are in conflict and must be disabled.
Avoid using the same library for runtime and compiler. `FortranDecimal`
was used in two ways:
1. As an auxiliary library needed for `libFortranRuntime.a`. This patch
adds the two source files of FortranDecimal directly into
FortranRuntime, so `FortranRuntime` is not used anymore.
2. As a library used by the Flang compiler. As the only remaining use of
the library, extra CMake code to make it compatible with the runtime can
be removed.
Before this PR, `enable_cuda_compilation` is applied to `FortranDecimal`
which causes everything that links to it, including flang (the
compiler), to depend on libcudart when CUDA support is enabled.
Having two runtime library just makes everything more complicated while
the user ideally should not be concerned with how the runtime is
structured internally. Some logic was copied for FortranDecimal because
of this, such as the ability to be compiled out-of tree
(b75a3c9f31c1ffdc9856aee32991d8129b372ee7) which is undocumented, the
logic to link against the various versions of Microsofts runtime library
(#70833), and avoiding dependency on the C++ runtime
(7783bba22c7add678d796741d30669c73159b3d8).
The non-GTest library will be shared by unittests of Flang and Flang-RT.
Promote it as a regular library for use by both projects.
In the long term, we may want to convert these to regular GTest checks
to avoid having multiple testing frameworks.
1. Our static functions are a bit spread out in this file. I am
gathering them in an anonymous namespace
2. Moving the code to get the `target` attribute on a `fir.global` into
its own utility.
Move non-common files from FortranCommon to FortranSupport (analogous to
LLVMSupport) such that
* declarations and definitions that are only used by the Flang compiler,
but not by the runtime, are moved to FortranSupport
* declarations and definitions that are used by both ("common"), the
compiler and the runtime, remain in FortranCommon
* generic STL-like/ADT/utility classes and algorithms remain in
FortranCommon
This allows a for cleaner separation between compiler and runtime
components, which are compiled differently. For instance, runtime
sources must not use STL's `<optional>` which causes problems with CUDA
support. Instead, the surrogate header `flang/Common/optional.h` must be
used. This PR fixes this for `fast-int-sel.h`.
Declarations in include/Runtime are also used by both, but are
header-only. `ISO_Fortran_binding_wrapper.h`, a header used by compiler
and runtime, is also moved into FortranCommon.
Unfortunately we still have a lot of cases like
!fir.box<!fir.array<10xi32>> where we read dimensions from the mold in
case there are non-default lower bounds stored inside the box. I will
address this in the next patch.
This PR changes the emitted block structure of alloc, init, and copy
regions for `omp.private` and `omp.declare_reduction` ops a little bit.
In particular, this decouples init and copy regions from the alloca
insertion-point. The main motivation is fix "Instruction does not
dominate all uses!" errors that happen specially when an init region
uses a value from the OpenMP region it is being inlined into. The issue
happens because, previous to this PR, we inline the init region right
after the latest alloc block (since we used the alloca IP); which in
some cases (see exmaple below), is too early and causes the use
dominance issue.
Example that would break without this PR (when delayed privatization is
enabled for `omp.wsloop`s):
```fortran
subroutine test2 (xyz)
integer :: i
integer :: xyz(:)
!$omp target map(from:xyz)
!$omp do private(xyz)
do i = 1, 10
xyz(i) = i
end do
!$omp end target
end subroutine
```
Before this patch, reduction of derived type components crashed the
compiler when trying to create the omp.declare_reduction.
In OpenMP 3.1 the standard says "a list item that appears in a reduction
clause must be a named variable of intrinsic type" (page 106). As I
understand it, a derived type component is not a variable.
OpenMP 4.0 added declare reduction, partly so that users could define
their own reductions on derived types. The above wording was removed
from the standard but derived type components were never explicitly
allowed.
OpenMP 5.0 added "A variable that is part of another variable, with the
exception of array elements, cannot appear in17 a reduction clause".
All standard versions also require the reduction argument to be
"definable", which roughly means that it is a variable. A
derived type component is more like an expression.
Fixes#125445
First part of a series of patches to improve private/reduction init and
cleanup region generation.
This commit is NFC. I factored out processing for each datatype into its
own method so that it is easier to keep track of what is being handled
where (I found the old gigantic init region generation function
difficult to navigate). The methods all share context in a helper class
to avoid having to pass a very large number of arguments.
I also removed the conflation between the mold argument and the mold
argument after loading. This should make it easier to avoid generating
dead uses of the mold argument in a later non-nfc patch.
`syncthreads_and`, `syncthreads_count`, `syncthreads_or`, `synwrap` must
take their argument by value. This patch updates the interfaces and
makes sure these functions can be called inside a cuff kernel as well.
IsSymplyContiguous was visiting expressions and returning false on
expressions like `x(::2) + y`, which triggered an assert in lowering
when preparing arguments for copy-in/out.
Update it to return false for everything that is not a variable, except
when provided a flag to treat PARAMETER bases as variables. This flags
is required for internal usages in lowering where lowering needs to now
if the read-only memory is being addressed contiguously or not.
Update call lowering to always copy parameter array section into
contiguous writable memory when passing them. The rational here is that
copy-out generated in nested calls using the dummy arguments will cause
a segfault.
The Fortran libraries are not part of MLIR, so they should use
target_link_libraries() rather than mlir_target_link_libraries().
This fixes an issue introduced in
https://github.com/llvm/llvm-project/pull/120966.
When -fimplicit-none-ext is passed, flang behaves as if "implicit
none(external)" was specified for all relevant constructs in Fortran
source file.
Note: implicit17.f90 was based on implicit07.f90 with `implicit
none(external)` removed and `-fimplicit-none-ext` added.
The current implementation of OpenACC lowering includes explicit
expansion of following cases:
- Creation of `acc.bounds` operations for all arrays, including those
whose dimensions are captured in the type (eg `!fir.array<100xf32>`)
- Expansion of box types by only putting the box's address in the data
clause. The address was extracted with a `fir.box_addr` operation and
the bounds were filled with `fir.box_dims` operation.
However, with the creation of the new type interface `MappableType`, the
idea is that specific type-based semantics can now be used. This also
really simplifies representation in the IR. Consider the following
example:
```
subroutine sub(arr)
real :: arr(:)
!$acc enter data copyin(arr)
end subroutine
```
Before the current PR, the relevant acc dialect IR looked like:
```
func.func @_QPsub(%arg0: !fir.box<!fir.array<?xf32>> {fir.bindc_name =
"arr"}) {
...
%1:2 = hlfir.declare %arg0 dummy_scope %0 {uniq_name = "_QFsubEarr"} :
(!fir.box<!fir.array<?xf32>>, !fir.dscope) ->
(!fir.box<!fir.array<?xf32>>, !fir.box<!fir.array<?xf32>>)
%c1 = arith.constant 1 : index
%c0 = arith.constant 0 : index
%2:3 = fir.box_dims %1#0, %c0 : (!fir.box<!fir.array<?xf32>>, index)
-> (index, index, index)
%c0_0 = arith.constant 0 : index
%3 = arith.subi %2#1, %c1 : index
%4 = acc.bounds lowerbound(%c0_0 : index) upperbound(%3 : index)
extent(%2#1 : index) stride(%2#2 : index) startIdx(%c1 : index)
{strideInBytes = true}
%5 = fir.box_addr %1#0 : (!fir.box<!fir.array<?xf32>>) ->
!fir.ref<!fir.array<?xf32>>
%6 = acc.copyin varPtr(%5 : !fir.ref<!fir.array<?xf32>>) bounds(%4) ->
!fir.ref<!fir.array<?xf32>> {name = "arr", structured = false}
acc.enter_data dataOperands(%6 : !fir.ref<!fir.array<?xf32>>)
```
After the current change, it looks like:
```
func.func @_QPsub(%arg0: !fir.box<!fir.array<?xf32>> {fir.bindc_name =
"arr"}) {
...
%1:2 = hlfir.declare %arg0 dummy_scope %0 {uniq_name = "_QFsubEarr"} :
(!fir.box<!fir.array<?xf32>>, !fir.dscope) ->
(!fir.box<!fir.array<?xf32>>, !fir.box<!fir.array<?xf32>>)
%2 = acc.copyin var(%1#0 : !fir.box<!fir.array<?xf32>>) ->
!fir.box<!fir.array<?xf32>> {name = "arr", structured = false}
acc.enter_data dataOperands(%2 : !fir.box<!fir.array<?xf32>>)
```
Restoring the old behavior can be done with following command line
options:
`--openacc-unwrap-fir-box=true --openacc-generate-default-bounds=true`
Fixes a bug in updating `lastprivate` variables. Previously, we only
iterated over the symbols collected from `lastprivate` clauses. This
meants that for pre-determined symbols, we did not implement the update
correctly (e.g. for loop iteration variables of `simd` constructs).
When a global variable is used in the OpenMP target region, it is passed
as an argument to the function that implements target region. But the
`DeclareOp` for this incarnation still have the original name of the
variable. As some of our checks to decide if a variable is global or nor
are based on the name, this can result in a local variable being treated
as global. This PR hardens the check a bit. We now also check that
memory ref is actually an `AddrOfOp` before looking at the name.
Implement parsing and symbol resolution for directives that take
arguments. There are a few, and most of them take objects. Special
handling is needed for two that take more specialized arguments: DECLARE
MAPPER and DECLARE REDUCTION.
This only affects directives in METADIRECTIVE's WHEN and OTHERWISE
clauses. Parsing and semantic checks of other cases is unaffected.
Dummy arguments from other entry statement that are not live in the current entry have no backing storage, user code referring to them is not allowed to be reached. The compiler was generating initialization/destruction code for them when INTENT(OUT), causing undefined behaviors.
….cpp
This is intended to reduce the memory usage while compiling flang
sources.
There were over 7500 instantiations of function templates defined in the
Utils.h file. Most of them were not referenced anywhere outside, except
for specialized implementations of getHashValue and isEqual in
IterationSpace.cpp. These function were also moved to Utils.cpp.
This implements checks of the validity of context set selectors and
trait selectors, plus the types of trait properties. Clause properties
are also validated, but not name or extension properties.
---------
Co-authored-by: Tom Eccles <tom.eccles@arm.com>
Fix isSimplyContiguous:
- It ignored scalars, causing scalar fir.box to not be opened when
possible in `translateToExtendedValue`
Fix isOptional:
It is not reliable when the memory SSA value cannot be linked to a
declare. This is exposed by the `isSimplyContiguous` fix,
This is wrong because declare operation should not always assumed to be
visible (e.g., value may travel through a select, or the optional be
generated by the compiler like genOptionalBox in
lib/Lower/ConvertCall.cpp).
- Turn `isOptional` into a safer `mayBeOptional`
- Update translateToExtendedValue to open scalar fir.box for such values
in a fir.if.
- It turned out some `translateToExtendedValue` usage relied on fir.box
of optional scalars to be left untouched (mainly because they want to
forward those fir.box to the runtime), add an option to allow that.
This patch shares core interface methods dealing with argument and
result attributes from CallableOpInterface with the CallOpInterface and
makes them mandatory to gives more consistent guarantees about concrete
operations using these interfaces.
This allows adding argument attributes on call like operations, which is
sometimes required to get proper ABI, like with llvm.call (and llvm.invoke).
The patch adds optional `arg_attrs` and `res_attrs` attributes to operations using
these interfaces that did not have that already.
They can then re-use the common "rich function signature"
printing/parsing helpers if they want (for the LLVM dialect, this is
done in the next patch).
Part of RFC: https://discourse.llvm.org/t/mlir-rfc-adding-argument-and-result-attributes-to-llvm-call/84107