For example, determine that the address in p below cannot alias the
address of v:
```
subroutine test()
real, pointer :: p
real, target :: t
real :: v
p => t
v = p
end subroutine test
```
This is needed to generate proper ABI flags in the ELF header for LTO
builds. If these flags aren't set correctly, we can't link with objects
that were built with the correct flags.
For non-LTO builds the mcpu/mattr in the TargetMachine will cause the
backend to infer an ABI. For LTO builds the mcpu/mattr aren't set.
I've only added lp64, lp64f, and lp64d ABIs. ilp32* requires riscv32
which is not yet supported in flang. lp64e requires a different
DataLayout string and would need additional plumbing.
Fixes#115679
This patch changes the codegen for non-precise acosh, asin, asinh and
atanh calls to generate math ops instead. This wasn't done before
because the math dialect did not have the corresponding operations at
the time.
In fixed form, an initial line can have an OpenMP conditional
compilation sentinel only if columns 3 through 5 have only
white space or numbers.
Fixes https://github.com/llvm/llvm-project/issues/89560
We currently drop the `DeclareOp` which are not in the entry block of
the function. This was done to side step a problem with the variables in
OpenMP target region. Now that issue has been addressed in #118314 so we
can lift this restriction as well.
This is a followup to https://github.com/llvm/llvm-project/pull/126745,
generalizing it to always use TargetFolder, including inside function
bodies.
This avoids generating non-canonical constant expressions that can be
folded away.
The OpenACC type interfaces have been updated to require that a type
self-identify which type category it belongs to. Ensure that FIR types
are able to provide this self identification.
In addition to implementing the new API, the PointerLikeType interface
attachment was moved to FIROpenACCSupport library like MappableType to
ensure all type interfaces and their implementation are now in the same
spot.
Explicitly disable EH & RTTI when building Flang runtime library. This
fixes the runtime built when Flang is built standalone against system
LLVM that was compiled with EH & RTTI enabled.
I think this change may be sufficient to lift the top-level
`LLVM_ENABLE_EH` restriction from Flang. However, I'd prefer if somebody
more knowledgeable decided on that.
Last piece that implements the TODO for sret and byval setting on
indirect calls.
This includes a fix to the codegen last patch. I thought types in in
type attributes were automatically converted in dialect conversion
passes, but that is not the case. The sret and byval type needs to be
converted to llvm types in codegen (mlir FuncOp conversion is doing a
similar conversion).
This patch introduces a directive set for combined constructs where
`teams` is the last leaf. This is used in a couple places to simplify
checks, which is NFC, but it also replaces two incorrect uses of
`topTeamsSet`.
Before, these checks would incorrectly skip combined constructs where
`teams` was the last leaf construct when checking for allowed nested
constructs inside of a `teams` region. Similarly, it would also
incorrectly perform these checks whenever a compound `teams` construct
where `teams` was the first leaf construct was found.
After tuneCPU was changed to std::string in
c8376a93bb9853cbcedeb22d80a9b200060eaf85 the flang builds broke, due to
a missing initializer.
If we want to add tuneCPU to the MLIRToLLVMPassPipelineConfig, we might
want to tackle that separately after the build is restored. This should
be no different than the previous behaviour.
For non-basic types GenericOptionParser::findArgStrForValue will return
null, ultimately an llvm_unreachable, when the specific values are not
found. Add the enum, much like the debug-level option in AddDebugInfo to
resolve this problem. Also change tuneCPU to be std::string or it will
also fail.
We currently handle sequences of fixed-length arrays properly by **not**
emitting length parameters for `embox` ops inside the `omp.private` op.
However, we do not handle the scalar case. This PR extends
`getLengthParameters` defined in `PrivateReductionUtils.cpp` to handle
such cases.
Fixes issue reported in #125732.
The LLVM dialect lowers globals using IRBuilder, relying on it creating
constant expressions where possible. As we remove support for more
constant expressions (per
https://discourse.llvm.org/t/rfc-remove-most-constant-expressions/63179),
this can cause issues for cases where the constant expression is no
longer supported, and the operation cannot be constant folded without
DataLayout being available. In particular, I ran into this issue with
flang and the removal of mul constant expressions.
Address this by using TargetFolder when creating globals, which will
perform DL-aware constant folding. I think it would make sense to also
do this in general, but I'm starting with globals where not doing this
can result in translation failures.
Ideally, globals with these problematic expressions would never be
generated in the first place, but there has been little movement on
fixing this (https://github.com/llvm/llvm-project/issues/96047).
Add pretty printer/parser for fir.call argument/result attributes and
propagate them to llvm.call.
This will allow implementing the TODO about ABI relevant argument
attribute in indirect calls.
Typically, we do not track memory sources after a load because of the
dynamic nature of the load and the fact that the alias analysis is a
simple static analysis.
However, the code is written in a way that makes it seem like we are
continuing to track memory but in reality we are only doing so when we
know that the tracked memory is a leaf and therefore when there will
only be one more iteration through the switch statement. In other words,
we are iterating one more time, to gather data about a box, anticipating
that this will be the last time. This is a hack that helped avoid
cut-and-paste from other case statements but gives the wrong impression
about the intention of the code and makes it confusing.
To make it clear that there is no more tracking, we gather all the
necessary data from the memref of the load, in the case statement for
the load, and exit the loop. I am also limiting this data gathering for
the case when we load a box reference while we were actually following
data, as tests have shows, is the only case when we need it for. Other
cases will be handled conservatively, but this can change in the future,
on a case-by-case basis.
---------
Co-authored-by: Joel E. Denny <jdenny.ornl@gmail.com>
When the loop induction variable implicit private clause was being
generated, the name was left empty. The intent is that the data clause
operation holds the source language variable name. Thus, add the missing
name now.
MLIR_MAIN_SRC_DIR and MLIR_INCLUDE_DIR point to the source directory,
which is not installed. As such, the installed MLIRConfig.cmake also
should not reference it.
The comment indicates that these are needed for mlir_tablegen(), but I
don't see any related uses.
The motivation for this is the use in flang, where we end up inheriting
a meaningless MLIR_MAIN_SRC_DIR from a previous MLIR build, whose source
directory doesn't exist anymore, and that cannot be overridden with the
correct path, because it's not a cached variable.
Instead do what all the other projects do for LLVM_MAIN_SRC_DIR and
initialize MLIR_MAIN_SRC_DIR to CMAKE_CURRENT_SOURCE_DIR/../mlir.
For MLIR_INCLUDE_DIR there already is an exported MLIR_INCLUDE_DIRS,
which can be used instead.
One constructor was missing to propagate fast-math flags
from an operation to the builder. It is fixed now.
And the builder creation in one opt-bufferization case should take
the rewriter, I think.
The acc.delete operation has semantics of decrementing present counter
and deleting the data when the counter reaches zero. Since both
acc.present and acc.nocreate are both intended to increment present
counter, this matching exit action must be inserted.
This is also what was specified in OpenACC dialect documentation:
https://mlir.llvm.org/docs/Dialects/OpenACCDialect/#operation-categories
https://github.com/llvm/llvm-project/pull/123331 added support for the
unrolling directive. In the presence of an explicit unrolling factor,
that unrolling factor would be unconditionally passed into the metadata
even when it was 1 or 0. These special cases should instead disable
unrolling. Adding an explicit unrolling factor of 0 triggered this
assertion which is fixed by this patch:
```
unsigned int unrollCountPragmaValue(const llvm::Loop*):
Assertion `Count >= 1 && "Unroll count must be positive."' failed.
```
Updated tests and documentation.
Following the conclusion of the
[RFC](https://discourse.llvm.org/t/rfc-names-for-flang-rt-libraries/84321),
rename Flang's runtime libraries as follows:
* libFortranRuntime.(a|so) to libflang_rt.runtime.(a|so)
* libFortranFloat128Math.a to libflang_rt.quadmath.a
* libCufRuntime_cuda_${CUDAToolkit_VERSION_MAJOR}.(a|so) to
libflang_rt.cuda_${CUDAToolkit_VERSION_MAJOR}.(a|so)
This follows the same naming scheme as Compiler-RT libraries
(`libclang_rt.${component}.(a|so)`). It provides some consistency
between Flang's runtime libraries for current and potential future
library components.
Introduce the CMake switch FLANG_INCLUDE_RUNTIME. When set to off, do
not add build instructions for the runtime.
This is required for Flang-RT (#110217) and the current runtime CMake
code to co-exist. When using `LLVM_ENABLE_RUNTIME=flang-rt`, the in-tree
build instructions are in conflict and must be disabled.
Avoid using the same library for runtime and compiler. `FortranDecimal`
was used in two ways:
1. As an auxiliary library needed for `libFortranRuntime.a`. This patch
adds the two source files of FortranDecimal directly into
FortranRuntime, so `FortranRuntime` is not used anymore.
2. As a library used by the Flang compiler. As the only remaining use of
the library, extra CMake code to make it compatible with the runtime can
be removed.
Before this PR, `enable_cuda_compilation` is applied to `FortranDecimal`
which causes everything that links to it, including flang (the
compiler), to depend on libcudart when CUDA support is enabled.
Having two runtime library just makes everything more complicated while
the user ideally should not be concerned with how the runtime is
structured internally. Some logic was copied for FortranDecimal because
of this, such as the ability to be compiled out-of tree
(b75a3c9f31c1ffdc9856aee32991d8129b372ee7) which is undocumented, the
logic to link against the various versions of Microsofts runtime library
(#70833), and avoiding dependency on the C++ runtime
(7783bba22c7add678d796741d30669c73159b3d8).
The non-GTest library will be shared by unittests of Flang and Flang-RT.
Promote it as a regular library for use by both projects.
In the long term, we may want to convert these to regular GTest checks
to avoid having multiple testing frameworks.
1. Our static functions are a bit spread out in this file. I am
gathering them in an anonymous namespace
2. Moving the code to get the `target` attribute on a `fir.global` into
its own utility.
Move non-common files from FortranCommon to FortranSupport (analogous to
LLVMSupport) such that
* declarations and definitions that are only used by the Flang compiler,
but not by the runtime, are moved to FortranSupport
* declarations and definitions that are used by both ("common"), the
compiler and the runtime, remain in FortranCommon
* generic STL-like/ADT/utility classes and algorithms remain in
FortranCommon
This allows a for cleaner separation between compiler and runtime
components, which are compiled differently. For instance, runtime
sources must not use STL's `<optional>` which causes problems with CUDA
support. Instead, the surrogate header `flang/Common/optional.h` must be
used. This PR fixes this for `fast-int-sel.h`.
Declarations in include/Runtime are also used by both, but are
header-only. `ISO_Fortran_binding_wrapper.h`, a header used by compiler
and runtime, is also moved into FortranCommon.
Unfortunately we still have a lot of cases like
!fir.box<!fir.array<10xi32>> where we read dimensions from the mold in
case there are non-default lower bounds stored inside the box. I will
address this in the next patch.
This PR changes the emitted block structure of alloc, init, and copy
regions for `omp.private` and `omp.declare_reduction` ops a little bit.
In particular, this decouples init and copy regions from the alloca
insertion-point. The main motivation is fix "Instruction does not
dominate all uses!" errors that happen specially when an init region
uses a value from the OpenMP region it is being inlined into. The issue
happens because, previous to this PR, we inline the init region right
after the latest alloc block (since we used the alloca IP); which in
some cases (see exmaple below), is too early and causes the use
dominance issue.
Example that would break without this PR (when delayed privatization is
enabled for `omp.wsloop`s):
```fortran
subroutine test2 (xyz)
integer :: i
integer :: xyz(:)
!$omp target map(from:xyz)
!$omp do private(xyz)
do i = 1, 10
xyz(i) = i
end do
!$omp end target
end subroutine
```
Before this patch, reduction of derived type components crashed the
compiler when trying to create the omp.declare_reduction.
In OpenMP 3.1 the standard says "a list item that appears in a reduction
clause must be a named variable of intrinsic type" (page 106). As I
understand it, a derived type component is not a variable.
OpenMP 4.0 added declare reduction, partly so that users could define
their own reductions on derived types. The above wording was removed
from the standard but derived type components were never explicitly
allowed.
OpenMP 5.0 added "A variable that is part of another variable, with the
exception of array elements, cannot appear in17 a reduction clause".
All standard versions also require the reduction argument to be
"definable", which roughly means that it is a variable. A
derived type component is more like an expression.
Fixes#125445
First part of a series of patches to improve private/reduction init and
cleanup region generation.
This commit is NFC. I factored out processing for each datatype into its
own method so that it is easier to keep track of what is being handled
where (I found the old gigantic init region generation function
difficult to navigate). The methods all share context in a helper class
to avoid having to pass a very large number of arguments.
I also removed the conflation between the mold argument and the mold
argument after loading. This should make it easier to avoid generating
dead uses of the mold argument in a later non-nfc patch.
`syncthreads_and`, `syncthreads_count`, `syncthreads_or`, `synwrap` must
take their argument by value. This patch updates the interfaces and
makes sure these functions can be called inside a cuff kernel as well.