Add the implementation of the `PERROR(STRING) ` intrinsic from the GNU
Extension to prints on the stderr a newline-terminated error message
corresponding to the last system error prefixed by `STRING`.
(https://gcc.gnu.org/onlinedocs/gfortran/PERROR.html)
The OpenMP standard says that all dependencies in the same set of
inter-dependent tasks must be non-overlapping. This simplification means
that the OpenMP only needs to keep track of the base addresses of
dependency variables. This can be seen in kmp_taskdeps.cpp, which stores
task dependency information in a hash table, using the base address as a
key.
This patch generates a rebox operation to slice boxed arrays, but only
the box data address is used for the task dependency. The extra box is
optimized away by LLVM at O3.
Vector subscripts are TODO (I will address in my next patch).
This also fixes a bug for ordinary subscripts when the symbol was mapped
to a box:
Fixes#132647
The code generation relies on `ShallowCopyDirect` runtime
to copy data between the original and the temporary arrays
(both directions). The allocations are done by the compiler
generated code. The heap allocations could have been passed
to `ShallowCopy` runtime, but I decided to expose the allocations
so that the temporary descriptor passed to `ShallowCopyDirect`
has `nocapture` - maybe this will be better for LLVM optimizations.
This change enables LoopVersioning when `fir.pack_array` is met
in the def-use chain. It fixes a couple of huge performance regressions
caused by enabling `-frepack-arrays`.
Issue: Cray Pointer is not associated to Cray Pointee, leading to
Segmentation fault
Fix: GetUltimate, retrieves the base symbol in the current scope, which
gets passed all the references and returns the original symbol
---------
Co-authored-by: Michael Klemm <michael.klemm@amd.com>
Summary:
When we were first porting to COV5, this lead to some ABI issues due to
a change in how we looked up the work group size. Bitcode libraries
relied on the builtins to emit code, but this was changed between
versions. This prevented the bitcode libraries, like OpenMP or libc,
from being used for both COV4 and COV5. The solution was to have this
'none' functionality which effectively emitted code that branched off of
a global to resolve to either version.
This isn't a great solution because it forced every TU to have this
variable in it. The patch in
https://github.com/llvm/llvm-project/pull/131033 removed support for
COV4 from OpenMP, which was the only consumer of this functionality.
Other users like HIP and OpenCL did not use this because they linked the
ROCm Device Library directly which has its own handling (The name was
borrowed from it after all).
So, now that we don't need to worry about backward compatibility with
COV4, we can remove this special handling. Users can still emit COV4
code, this simply removes the special handling used to make the OpenMP
device runtime bitcode version agnostic.
Currently only ctor/dtor list and their priorities are supported. This
PR adds support for the missing data field.
Few implementation notes:
- The assembly printer has a fixed form because previous `attr_dict`
will sort the dict by key name, making global_dtor and global_ctor
differ in the order of printed arguments.
- LLVM's `ptr null` is being converted to `#llvm.zero` otherwise we'd
have to create a region to use the default operation conversion from
`ptr null`, which is silly given that the field only support null or a
symbol.
This PR does reduces the verbosity of parser errors for OpenACC block
constructs that do not parse correctly because they are missing their
trailing end block directive by:
- Removing the redundant error messages created by parsing 3 different
styles of directive tokens.
- Providing a general mechanism of configuring the max number of
contexts printed for every syntax error.
- Not printing less specific contexts that are at the same location.
Prior to the changes:
```
$ flang -fc1 -fopenacc -fsyntax-only flang/test/Parser/acc-data-statement.f90 2>&1 | tee acc-data-statement.prior.log | wc -l
262
```
[acc-data-statement.prior.log](https://github.com/user-attachments/files/19298165/acc-data-statement.prior.log)
```
$ flang -fc1 -fopenacc -fsyntax-only flang/test/Parser/acc-data-statement.f90 2>&1 | tee acc-data-statement.prior.log | wc -l
73
```
[acc-data-statement.post.log](https://github.com/user-attachments/files/19298181/acc-data-statement.post.log)
The map of symbols requiring new local aliases for USE association needs
to use the symbols' ultimate resolutions to avoid missing cases that can
arise in convoluted codes with lots of confusing renamings.
Fixes https://github.com/llvm/llvm-project/issues/132435.
flang/include/flang/Runtime/io-api.h was changed into io-api-consts.h,
then wrapped into a new io-api.h that includes io-api-consts.h, does
some redundant includes and declarations, and then declares the
prototype of one function, InquiryKeywordHashDecode.
Make that function static in io-stmt.cpp prior to its sole call site,
then undo the renaming, to reduce confusion and redundancy.
Add -f[no-]slp-vectorize to the flang driver.
Add corresponding -fvectorize-slp to the flang frontend.
Enable -fslp-vectorize at -O2 and higher in flang to match the current
behaviour in clang.
---------
Signed-off-by: Kajetan Puchalski <kajetan.puchalski@arm.com>
When building Flang with Clang, we need to do the same quadmath.h
wrapping as we do for flang-rt. I extracted the CMake code
into FlangCommon.cmake, and cleaned up the arguments passing
to execute_process (note that `-###` was treated as `-` in the original
code, because `#` starts a comment). I believe the Clang command
does not require the input source file, so I removed it as well.
Implement GNU extension intrinsic HOSTNM, both function and subroutine
forms. Add HOSTNM documentation to `flang/docs/Intrinsics.md`. Add
lowering and semantic unit tests.
(This change is modeled after GETCWD implementation.)
Using LoopNest's indices with ShapeShifts that have non-default
lower bounds results in accesses to incorrect array elements.
To avoid having to adjust each index, a ShapeShift with default
lower bounds can be used instead.
Fixes#131751
Fixes a bug in reductions when converting `teams loop` constructs with
`reduction` clauses.
According to the spec (v5.2, p340, 36):
```
The effect of the reduction clause is as if it is applied to all leaf
constructs that permit the clause, except for the following constructs:
* ....
* The teams construct, when combined with the loop construct.
```
Therefore, for a combined directive similar to: `!$omp teams loop
reduction(...)`, the earlier stages of the compiler assign the
`reduction` clauses only to the `loop` leaf and not to the `teams` leaf.
On the other hand, if we have a combined construct similar to: `!$omp
teams distribute parallel do`, the `reduction` clauses are assigned both
to the `teams` and the `do` leaves. We need to match this behavior when
we convert `teams` op with a nested `loop` op since the target set of
constructs/ops will be incorrect without moving the reductions up to the
`teams` op as well.
This PR introduces a pattern that does exactly this. Given the following
input:
```
omp.teams {
omp.loop reduction(@red_sym %red_op -> %red_arg : !fir.ref<i32>) {
omp.loop_nest ... {
...
}
}
}
```
this pattern updates the `omp.teams` op in-place to:
```
omp.teams reduction(@red_sym %red_op -> %teams_red_arg : !fir.ref<i32>) {
omp.loop reduction(@red_sym %teams_red_arg -> %red_arg : !fir.ref<i32>) {
omp.loop_nest ... {
...
}
}
}
```
Note the following:
* The nested `omp.loop` is not rewritten by this pattern, this happens
through `GenericLoopConversionPattern`.
* The reduction info are cloned from the nested `omp.loop` op to the
parent `omp.teams` op.
* The reduction operand of the `omp.loop` op is updated to be the
**new** reduction block argument of the `omp.teams` op.
I'm not sure why strides were not allowed in array sections: the stride
is explicitly allowed by the standard from the first version where array
sections were introduced. The limitation is that the stride must not be
negative.
Here I have added the check for a negative stride and updated the test
for a zero length section to take account of the stride.
Currently, the helpers to get fir::ExtendedValue out of hlfir::Entity
use hlfir.declare second result (`#1`) in most cases. This is because
this result is the same as the input and matches what FIR was getting
before lowering to HLFIR.
But this creates odd situations when both hlfir.declare are raw pointers
and either result ends-up being used in the IR depending on whether the
code was generated by a helper using fir::ExtendedValue, or via "pure
HLFIR" helpers using the first result.
This will typically prevent simple CSE and easy identification that two
operation (e.g load/store) are touching the exact same memory location
without using alias analysis or "manual detection" (looking for common
hlfir.declare defining op).
Hence, when hlfir.declare results are both raw pointers, use `#0` when
producing `fir::ExtendedValue`.
When `#0` is a fir.box, keep using `#1` because these are not the same.
The only code change is in HLFIRTools.cpp and is pretty small, but there
is a big test fallout of `#1` to `#0`.
This patch makes the `map_type` and `map_capture_type` arguments of the
`omp.map.info` operation required, which was already an invariant being
verified by its users via `verifyMapClause()`. This makes it clearer, as
getters no longer return misleading `std::optional` values.
Checks for the `mapper_id` argument are moved to a verifier for the
operation, rather than being checked by users.
Functionally NFC, but not marked as such due to a reordering of
arguments in the assembly format of `omp.map.info`.
TARGET dummy arrays can be accessed indirectly, so it is unsafe
to repack them.
INTENT(OUT) dummy arrays that require finalization on entry
to their subroutine must be copied-in by `fir.pack_arrays`.
In addition, based on my testing results, I think it will be useful
to document that `LOC` and `IS_CONTIGUOUS` will have different values
for the repacked arrays. I still need to decide where to document
this, so just added a note in the design doc for the time being.
Whole assumed-size arrays are generally not allowed outside specific
contexts, where expression analysis notes that they can appear. But
contexts can nest, and in the case of an actual argument that turns out
to be an array constructor, the permission to use a whole assumed-size
array must be rescinded.
Fixes https://github.com/llvm/llvm-project/issues/131909.
When reinterpreting an ambiguously parsed function reference as a
structure constructor, use the original symbol of the type in the
representation of the derived type spec of the structure constructor,
not its ultimate resolution. The distinction turns out to matter when
generating module files containing derived type constants as
initializers when the derived types' names have undergone USE
association renaming.
Fixes https://github.com/llvm/llvm-project/issues/131579.
A PURE subprogram can't have a local variable with the SAVE attribute.
An ASSOCIATE or SELECT TYPE construct entity whose selector is a
variable will return true from IsSave(); exclude them from the local
variable check.
Fixes https://github.com/llvm/llvm-project/issues/131356.
This patch adds the `OpWithBodyGenInfo::blockArgs` field and updates
`createBodyOfOp()` to prevent the need for `BlockArgOpenMPOpInterface`
operations to pass the same callback, minimizing chances of introducing
inconsistent behavior.
The `OmpDirectiveSpecification` contains directive name, the list of
arguments, and the list of clauses. It was introduced to store the
directive specification in METADIRECTIVE, and could be reused everywhere
a directive representation is needed.
In the long term this would unify the handling of common directive
properties, as well as creating actual constructs from METADIRECTIVE by
linking the contained directive specification with any associated user
code.