This patch defines `fir::SafeTempArrayCopyAttrInterface` and the
corresponding
OpenACC/OpenMP related attributes in FIR dialect. The actual
implementations
are just placeholders right now, and array repacking becomes a no-op
if `-fopenacc/-fopenmp` is used for the compilation.
Implement extended intrinsic PUTENV, both function and subroutine forms.
Add PUTENV documentation to flang/docs/Intrinsics.md. Add functional and
semantic unit tests.
Add function and subroutine forms of FSEEK and FTELL as intrinsic
procedures. Accept common aliases from legacy compilers as well.
A separate patch to llvm-test-suite will enable tests for these
procedures once this patch has merged.
Depends on https://github.com/llvm/llvm-project/pull/132423; CI builds
will likely fail until that patch is merged and this PR is rebased.
This PR implements the nonstandard intrinsic time.
In addition to running the unit tests, I also double checked that the
example code works by manually compiling and running it.
This PR adds the intrinsic `unlink` to flang.
## Test plan
- Added two codegen unit tests and ensured flang-check continues to
pass.
- Manually compiled and ran the example from the documentation.
Hi,
This patch implements support for the following directives :
- `!DIR$ NOUNROLL_AND_JAM` to disable unrolling and jamming on a DO
LOOP.
- `!DIR$ NOUNROLL` to disable unrolling on a DO LOOP.
- `!DIR$ NOVECTOR` to disable vectorization on a DO LOOP.
This PR starts the effort to upstream AMD's internal implementation of
`do concurrent` to OpenMP mapping. This replaces #77285 since we
extended this WIP quite a bit on our fork over the past year.
An important part of this PR is a document that describes the current
status downstream, the upstreaming status, and next steps to make this
pass much more useful.
In addition to this document, this PR also contains the skeleton of the
pass (no useful transformations are done yet) and some testing for the
added command line options.
This looks like a huge PR but a lot of the added stuff is documentation.
It is also worth noting that the downstream pass has been validated on
https://github.com/BerkeleyLab/fiats. For the CPU mapping, this achived
performance speed-ups that match pure OpenMP, for GPU mapping we are
still working on extending our support for implicit memory mapping and
locality specifiers.
PR stack:
- https://github.com/llvm/llvm-project/pull/126026 (this PR)
- https://github.com/llvm/llvm-project/pull/127595
- https://github.com/llvm/llvm-project/pull/127633
- https://github.com/llvm/llvm-project/pull/127634
- https://github.com/llvm/llvm-project/pull/127635
Add the implementation of the `PERROR(STRING) ` intrinsic from the GNU
Extension to prints on the stderr a newline-terminated error message
corresponding to the last system error prefixed by `STRING`.
(https://gcc.gnu.org/onlinedocs/gfortran/PERROR.html)
This change enables LoopVersioning when `fir.pack_array` is met
in the def-use chain. It fixes a couple of huge performance regressions
caused by enabling `-frepack-arrays`.
Implement GNU extension intrinsic HOSTNM, both function and subroutine
forms. Add HOSTNM documentation to `flang/docs/Intrinsics.md`. Add
lowering and semantic unit tests.
(This change is modeled after GETCWD implementation.)
This patch makes the `map_type` and `map_capture_type` arguments of the
`omp.map.info` operation required, which was already an invariant being
verified by its users via `verifyMapClause()`. This makes it clearer, as
getters no longer return misleading `std::optional` values.
Checks for the `mapper_id` argument are moved to a verifier for the
operation, rather than being checked by users.
Functionally NFC, but not marked as such due to a reordering of
arguments in the assembly format of `omp.map.info`.
TARGET dummy arrays can be accessed indirectly, so it is unsafe
to repack them.
INTENT(OUT) dummy arrays that require finalization on entry
to their subroutine must be copied-in by `fir.pack_arrays`.
In addition, based on my testing results, I think it will be useful
to document that `LOC` and `IS_CONTIGUOUS` will have different values
for the repacked arrays. I still need to decide where to document
this, so just added a note in the design doc for the time being.
This API will be used for copying non-contiguous arrays
into contiguous temporaries to support `-frepack-arrays`.
The builder factory API will be used in the following commits.
This is a document describing why and how to add support for repacking
of assumed-shape dummy arrrays to provide more efficient data cache.
It proposes adding new FIR operations and outlines the compiler flow
handling these operations.
I would like to hear feedback on all of it, but especially on:
* The possibility of detecting safeness of the repacking
in the context of OpenACC/OpenMP. If it is not possible
to do the runtime checks to determine safety, then
there is not need to add the `TempCopyIsSafe` attributes
to the instruction.
* Whether it is possible to preserve the debug information
in cases where `fir.pack_array` is sunk after `[hl]fir.declare`,
so that before the `fir.pack_array` a debugger will refer
to the values in the original array, and after `fir.pack_array`
it will refer to the copy.
Some Arm processors support exception halting control and some do not.
An Arm executable will run on either type of processor, so it is
effectively unknown at compile time whether or not this support will be
available. ieee_support_halting is therefore implemented with a runtime
check.
The result of a call to ieee_support_standard depends in part on support
for halting control. Update the ieee_support_standard implementation to
check for support for halting control at runtime.
Two new ieee_round_type values were added in f18 beyond the four values
defined in f03 and f08: ieee_away and ieee_other. Contemporary hardware
typically does not have support for these rounding modes, so flang does
not support them. ieee_support_rounding calls for these values return
false. Current generated code handles some attempts to set the rounding
mode to one of these unsupported values by setting the mode to
ieee_nearest. Update the code to explicitly do this in all cases.
This patch implements support for the UNROLL_AND_JAM directive to enable
or disable unrolling and jamming on a `DO LOOP`.
It must be placed immediately before a `DO LOOP` and applies only to the
loop that follows. N is an integer that specifying the unrolling factor.
This is done by adding an attribute to the branch into the loop in LLVM
to indicate that the loop should unrolled and jammed.
Extract Flang's runtime library to use the LLVM_ENABLE_RUNTIME
mechanism. It will only become active when
`LLVM_ENABLE_RUNTIMES=flang-rt` is used, which also changes the
`FLANG_INCLUDE_RUNTIME` to `OFF` so the old runtime build rules do not
conflict. This also means that unless `LLVM_ENABLE_RUNTIMES=flang-rt` is
passed, nothing changes with the current build process.
Motivation:
* Consistency with LLVM's other runtime libraries (compiler-rt, libc,
libcxx, openmp offload, ...)
* Allows compiling the runtime for multiple targets at once using the
LLVM_RUNTIME_TARGETS configuration options
* Installs the runtime into the compiler's per-target resource directory
so it can be automatically found even when cross-compiling
Also see RFC discussion at
https://discourse.llvm.org/t/rfc-use-llvm-enable-runtimes-for-flangs-runtime/80826
https://github.com/llvm/llvm-project/pull/123331 added support for the
unrolling directive. In the presence of an explicit unrolling factor,
that unrolling factor would be unconditionally passed into the metadata
even when it was 1 or 0. These special cases should instead disable
unrolling. Adding an explicit unrolling factor of 0 triggered this
assertion which is fixed by this patch:
```
unsigned int unrollCountPragmaValue(const llvm::Loop*):
Assertion `Count >= 1 && "Unroll count must be positive."' failed.
```
Updated tests and documentation.
Following the conclusion of the
[RFC](https://discourse.llvm.org/t/rfc-names-for-flang-rt-libraries/84321),
rename Flang's runtime libraries as follows:
* libFortranRuntime.(a|so) to libflang_rt.runtime.(a|so)
* libFortranFloat128Math.a to libflang_rt.quadmath.a
* libCufRuntime_cuda_${CUDAToolkit_VERSION_MAJOR}.(a|so) to
libflang_rt.cuda_${CUDAToolkit_VERSION_MAJOR}.(a|so)
This follows the same naming scheme as Compiler-RT libraries
(`libclang_rt.${component}.(a|so)`). It provides some consistency
between Flang's runtime libraries for current and potential future
library components.
Avoid using the same library for runtime and compiler. `FortranDecimal`
was used in two ways:
1. As an auxiliary library needed for `libFortranRuntime.a`. This patch
adds the two source files of FortranDecimal directly into
FortranRuntime, so `FortranRuntime` is not used anymore.
2. As a library used by the Flang compiler. As the only remaining use of
the library, extra CMake code to make it compatible with the runtime can
be removed.
Before this PR, `enable_cuda_compilation` is applied to `FortranDecimal`
which causes everything that links to it, including flang (the
compiler), to depend on libcudart when CUDA support is enabled.
Having two runtime library just makes everything more complicated while
the user ideally should not be concerned with how the runtime is
structured internally. Some logic was copied for FortranDecimal because
of this, such as the ability to be compiled out-of tree
(b75a3c9f31c1ffdc9856aee32991d8129b372ee7) which is undocumented, the
logic to link against the various versions of Microsofts runtime library
(#70833), and avoiding dependency on the C++ runtime
(7783bba22c7add678d796741d30669c73159b3d8).
In the discussion around #116792, @rjmccall mentioned that ARCMigrate
has been obsoleted and that we could go ahead and remove it from Clang,
so this patch does just that.
Add the implementation of the IERRNO intrinsic to get the last system
error number, as given by the C errno variable.
This intrinsic is also used in RAMSES
(https://github.com/ramses-organisation/ramses/).
Runtime function call to a void function are producing a ssa value
because the FunctionType result is set to NoneType with is later
translated to a empty struct. This is not an issue when going to LLVM IR
but it breaks when lowering a gpu module to PTX. This patch update the
RTModel to correctly set the FunctionType result type to nothing.
This is one runtime call before this patch at the LLVM IR dialect step.
```
%45 = llvm.call @_FortranAAssign(%arg0, %1, %44, %4) : (!llvm.ptr, !llvm.ptr, !llvm.ptr, i32) -> !llvm.struct<()>
```
After the patch the call would be correctly formed
```
llvm.call @_FortranAAssign(%arg0, %1, %44, %4) : (!llvm.ptr, !llvm.ptr, !llvm.ptr, i32) -> ()
```
Without the patch it would lead to error like:
```
ptxas /tmp/mlir-cuda_device_mod-nvptx64-nvidia-cuda-sm_60-e804b6.ptx, line 10; error : Output parameter cannot be an incomplete array.
ptxas /tmp/mlir-cuda_device_mod-nvptx64-nvidia-cuda-sm_60-e804b6.ptx, line 125; error : Call has wrong number of parameters
```
The change is pretty much mechanical.
The F23 standard requires that a call to intrinsic module procedure
ieee_support_halting be foldable to a constant at compile time in some
contexts. See for example F23 Clause 10.1.11 [Specification expression]
list item (13), Clause 1.1.12 [Constant expression] list item (11), and
references to specification and constant expressions elsewhere, such as
constraints C1012, C853, and C704.
Some Arm processors allow a user to control processor behavior when an
arithmetic exception is signaled, and some Arm processors do not have
this capability. An Arm executable will run on either type of processor,
so it is effectively unknown at compile time whether or not this support
will be available at runtime. This in conflict with the standard
requirement.
This patch addresses this conflict by implementing ieee_support_halting
calls on Arm processors to check if this capability is present at
runtime. A call to ieee_support_halting in a constant context, such as
in the specification part of a program unit, will generate a compile
time "cannot be computed as a constant value" error. The expectation is
that such calls are unlikely to appear in production code.
Code generation for other processors will continue to generate a compile
time constant result for ieee_support_halting calls.
Implement the UNSIGNED extension type and operations under control of a
language feature flag (-funsigned).
This is nearly identical to the UNSIGNED feature that has been available
in Sun Fortran for years, and now implemented in GNU Fortran for
gfortran 15, and proposed for ISO standardization in J3/24-116.txt.
See the new documentation for details; but in short, this is C's
unsigned type, with guaranteed modular arithmetic for +, -, and *, and
the related transformational intrinsic functions SUM & al.
A character length specifier in an entity declaration or a component
declaration is required by the standard to follow any array bounds or
coarray bounds that are present. Several Fortran compilers allow the
character length specifier to follow the name and appear before the
bounds.
Fixes https://github.com/llvm/llvm-project/issues/117372.
When there's an error in a DO statement loop control, error recovery
isn't great. A bare "DO" is a valid statement, so a failure to parse its
loop control doesn't fail on the whole statement. Its partial parse ends
after the keyword, and as some other statement parsers can get further
into the input before failing, errors in the loop control can lead to
confusing error messages about bad pointer assignment statements and
others. So just check that a bare "DO" is followed by the end of the
statement.
It's my understanding that all code review pre-commit takes place on
GitHub Pull Requests and that post-commit review is done either on the
closed PR or the commit on GitHub.
Nearly every Fortran compiler supports "PRINT namelistname" as a synonym
for "WRITE (*, NML=namelistname)". Implement this extension via parse
tree rewriting.
Fixes https://github.com/llvm/llvm-project/issues/111738.