The `OmpDirectiveSpecification` contains directive name, the list of
arguments, and the list of clauses. It was introduced to store the
directive specification in METADIRECTIVE, and could be reused everywhere
a directive representation is needed.
In the long term this would unify the handling of common directive
properties, as well as creating actual constructs from METADIRECTIVE by
linking the contained directive specification with any associated user
code.
Adds Parser and Semantic Support for the below construct and clauses:
- Interop Construct
- Init Clause
- Use Clause
Note:
The other clauses supported by Interop Construct such as Destroy, Use,
Depend and Device are added already.
Fixes#112538
The problem was that the host associated symbol for the threadprivate
variable doesn't have all of the symbol attributes (e.g. POINTER). This
caused the lowering code to generate the wrong type, eventually hitting
an assertion.
There is no reason for character element type to be forbidden in this
helper.
The assert was firing in character pointer assignment in FORALL after
#130772 added a usage of this helper.
I am working on `-frepack-array` feature (#127147), which produces
non-trivial manipulations with arguments of `fir.declare`.
In this case, we end up with CFG computation of the `fir.declare`
argument, and AddDebugInfo pass incorrectly mapped two dummy arguments
to the same arg index in the debug attributes.
This patch makes sure that we assign the arg index only if we can prove
that we've traced the block argument to the function's entry block.
I believe this problem is not specific to `-frepack-arrays`, e.g.
it may appear due to MLIR inlining as well.
Arrays with assumed-length types are represented with a box
without explicit length parameters. This patch fixes the verification
to allow it for `fir.pack_array`.
Adds new MLIR ops to model `do concurrent`. In order to make `do
concurrent` representation self-contained, a loop is modeled using 2
ops, one wrapper and one that contains the actual body of the loop. For
example, a 2D `do concurrent` loop is modeled as follows:
```mlir
fir.do_concurrent {
%i = fir.alloca i32
%j = fir.alloca i32
fir.do_concurrent.loop
(%i_iv, %j_iv) = (%i_lb, %j_lb) to (%i_ub, %j_ub) step (%i_st, %j_st) {
%0 = fir.convert %i_iv : (index) -> i32
fir.store %0 to %i : !fir.ref<i32>
%1 = fir.convert %j_iv : (index) -> i32
fir.store %1 to %j : !fir.ref<i32>
}
}
```
The `fir.do_concurrent` wrapper op encapsulates both the actual loop and
the allocations required for the iteration variables. The
`fir.do_concurrent.loop` op is a multi-dimensional op that contains the
loop control and body. See the ops' docs for more info.
Below semantic checks for Taskloop clause mentioned in OpenMP [5.2]
specification were missing, this patch contains the semantic checks,
corresponding error messages and test cases:
OpenMP standard [5.2]:
[12.6] Taskloop Construct
[Restrictions]
Restrictions to the taskloop construct are as follows:
• The reduction-modifier must be default.
• The conditional lastprivate-modifier must not be specified.
Authored-by: shkaushi <sharang.kaushik@amd.com>
Add a pass to compute the size of the shared memory (static shared
memory) and the offsets of each variables to be placed in shared memory.
The global representing the shared memory is also created during this
pass.
In case of dynamic shared memory, the global as a type of
`!fir.array<0xi8>` and the size of the memory is set at kernel launch.
Introduce `cuf.shared_memory` operation. The operation is used to get
the pointer in shared memory for a specific variable. The shared memory
is materialized as a global in address space 3 and the different
variables are pointing to it at different offset.
Follow up patches will add lowering and conversion of this operation.
Global with the CUDA shared data attribute needs to be lowered to llvm
globals with the correct address space (3). Address space is set from
the `mlir::NVVM::NVVMMemorySpace::kSharedMemorySpace` enum from
`mlir/Dialect/LLVMIR/NVVMDialect.h`
Implement handling of `NULL()` RHS, polymorphic pointers, as well as
lower bounds or bounds remapping in pointer assignment inside FORALL.
These cases eventually do not require updating hlfir.region_assign,
lowering can simply prepare the new descriptor for the LHS inside the
RHS region.
Looking more closely at the polymorphic cases, there is not need to call
the runtime, fir.rebox and fir.embox do handle the dynamic type setting
correctly.
After this patch, the last remaining TODO is the allocatable assignment
inside FORALL, which like some cases here, is more likely an accidental
feature given FORALL was deprecated in F2003 at the same time than
allocatable components where added.
With `LLVM_ENABLE_RUNTIMES=openmp`, flang enables the OpenMP regression
tests, but `check-flang` was not ensuring that the OpenMP requirements
are built first. Fix by adding a `libomp-mod` to `flang-test-depends`.
Adding `libomp-mod` to extra_targets is necessary because there is no
target from openmp/ that is reachable from the parent
bootstrapping-build. `ninja openmp` fails because openmp/ has no
`openmp` target. `check-openmp` would also run the OpenMP tests and does
not even build `omp_lib.mod`. `runtimes` would build all the runtimes,
not just OpenMP.
Also fix the misleading CMake configure status messages that suggest the
only way to build omp_lib.mod/.h is `LLVM_ENABLE_PROJECTS=openmp`.
Avoid triggering an assertion for shared variable using the assumed-size
syntax.
```
attributes(global) subroutine sharedstar()
real, shared :: s(*) ! ok. dynamic shared memory.
end subroutine
```
Given the following input:
```fortran
program rep_loopbind
implicit none
integer :: i
real :: priv_val
!$omp teams private(priv_val)
!$omp distribute
do i=1,1000
end do
!$omp end teams
end program
```
the `AllocaOpConversion` pattern in `FIRToLLVMLowering` would **move**
the private allocations that belong to the `teams` directive (i.e. the
allocations needed for the private copies of `priv_val` and the loop's
iteration variable) from the the `omp.teams` op to the outside scope.
This is not correct since these allocations should be eventually emitted
inside the outlined region for the `teams` directive. Without this fix,
these allocation would be emitted in the parent function (or the parent
scope whatever it is).
When outlining an offload region, Flang creates a unique name by
querying an inode ID. However, when the name of the actual source file
does not match the logical file in a `#line` preprocessor directive,
code-gen was failing as it could not determine the inode ID. This PR
checks for this condition and if the logical file name does not exist,
the inode is replaced with a hash value created from the source code
itself.
…UCTION
This patch allows better parsing of the reduction and initializer
components, including supporting derived types in both those places.
There is more work needed here, but this is a definite improvement in
what can be handled through parser and semantics.
Note that declare reduction is still not supported in lowering, so any
attempt to compile DECLARE REDUCTION code will end with a TODO aka "Not
yet implemented" abort in the compiler.
Note that this version of the code does not cover declaring multiple
reductions using the same name with different types. This is will be
fixed in a future patch. [This was also the case before this change].
One existing test modified to actually compile (as it didn't in the
original form).
Improve the check for whether a type can be passed by copy. Currently,
passing by copy is done via the OMP_MAP_LITERAL mapping, which can only
transfer as much data as can be contained in a pointer representation.
This PR addresses some of the issues described in
https://github.com/llvm/llvm-project/issues/127617. Key changes:
- Stop assuming fixed-form for `-x f95` unless the input is a `.i` file.
This change ensures compatibility with `-save-temps` workflows while
preventing unintended fixed-form assumptions.
- Ensure `-x f95-cpp-input` enables `-cpp` by default, aligning Flang's
behavior with `gfortran`.
The saturated floating point conversion intrinsics match the semantics in the standard more closely than the fptosi/fptoui instructions.
Case 2 of 16.9.100 is
> INT (A [, KIND])
> If A is of type real, there are two cases: if |A| < 1, INT (A) has the
value 0; if |A| ≥ 1, INT (A) is the integer whose magnitude is the
largest integer that does not exceed the magnitude of A and whose sign
is the same as the sign of A.
Currently, converting a floating point value into an integer type too
small to hold the constant will be converted to poison in opt, leaving
us with garbage:
```
> cat t.f90
program main
real(kind=16) :: f
integer(kind=4) :: i
f=huge(f)
i=f
print *, i
end program main
# current upstream
> for i in `seq 10`; do; ./a.out; done
-862156992
-1497393344
-739096768
-1649494208
1761228608
-1959270592
-746244288
-1629194432
-231217344
382322496
```
With the saturated fptoui/fptosi intrinsics, we get the appropriate
values
```
# mine
> flang -O2 ./t.f90 && ./a.out
2147483647
> perl -e 'printf "%d\n", (2 ** 31) - 1'
2147483647
```
One notable difference: NaNs being converted to ints will become zero, unlike current flang (and some other compilers). Newer versions of GCC have this behavior.
The standard prohibits privatising namelist variables. We also decided
in #110671 to prohibit reductions of namelist variables.
This commit prevents this rule from being circumvented through the use
of equivalence statements.
Fixes#122824
So far, flang was not setting argument attributes on direct calls
assuming that putting them on the function operation was enough.
It was clarified in
38565da525
that they must be set on both call and functions, even for direct calls.
Crashes have been observed because of the lack of the attribute when
compiling `abs(x)` at `O2` and above on X86-64 for complex(16).
Currently, all derived types are initialized through `_FortranAInitialize`, which is functionally correct, but bears poor runtime performance. This patch falls back on global initialization for "simpler" derived types to speed up the initialization.
Note: this relands #114002 with the fix for the LLVM timeout regressions that have been seen. The fix is to use the added fir.copy to avoid aggregate load/store.
Co-authored-by: NimishMishra <42909663+NimishMishra@users.noreply.github.com>
These previously crashed the compiler because !fir.ptr (not wrapped
inside of a box) was not supported.
Real POINTER variables are supported as !fir.box<!fir.ptr<>>. The
version for EQUIVALENCE doesn't need to do anything different to
!fir.ref<>.
Introduce a FIR operation to do memcopy/memmove of compile time constant size types.
This is to avoid requiring derived type copies to done with load/store
which is badly supported in LLVM when the aggregate type is "big" (no
threshold can easily be defined here, better to always avoid them for
fir.type).
This was the root cause of the regressions caused by #114002 which introduced a
load/store of fir.type<> which caused hand/asserts to fire in LLVM on
several benchmarks.
See https://llvm.org/docs/Frontend/PerformanceTips.html#avoid-creating-values-of-aggregate-type
Previous PR: https://github.com/llvm/llvm-project/pull/129308
Changes:
* The alloc-32.fir test is now marked as requiring the X86 target.
* Drive-by fixes uncovered when fixing tests involving malloc
We currently don't consider an integer division to be a constant
expression unless its divisor is known and nonzero. This produces a
weird result in the linked test; do better.
Fixes https://github.com/llvm/llvm-project/issues/130534.
These are constraints that preclude the need to obtain type information
from descriptors on other images, essentially. When designating a
polymorphic component, its base may not be coindexed; nor shall a
coindexed designator have a type with a polymorphic potential subobject
component.
The relationship between a Cray pointee and its pointer is not in the
symbol table entry, but instead in a per-scope map. Use this information
to ensure that if a pointee/pointer is needed in the module file, so is
its pointer/pointee.
Fixes https://github.com/llvm/llvm-project/issues/130270.
A object's value can't be copied from another image by means of an
intrinsic assignment statement if it has a derived type that contains a
pointer subobject ultimate component.
The check for equality of actual and dummy argument coranks was taking
place only for ALLOCATABLE coarrays; perform the check for all cases,
and refine the ALLOCATABLE check to apply only to cases that don't fail
the new more general check.
F'2023 10.2.1.2 paragraph 2 imposes some requirements on the left-hand
sides of assignments when they have coindices, and one was not checked
while another was inaccurately checked. In short, intrinsic assignment
to a coindexed object can't change its type, and neither can it affect
allocatable components.