10126 Commits

Author SHA1 Message Date
Valentin Clement (バレンタイン クレメン)
3f30207a23
[flang][cuda] Fix __ldXX function signatures (#130628)
Function are taking array reference and not array directly.
2025-03-10 13:37:08 -07:00
Peter Klausler
c189852218
[flang] Ignore empty keyword macros before directives (#130333)
Ignore any keyword macros with empty directives that might appear before
a compiler directive.

Fixes https://github.com/llvm/llvm-project/issues/126459.
2025-03-10 13:21:10 -07:00
Peter Klausler
8227f2a35a
[flang] Fix Cray pointers in module file output (#130315)
The relationship between a Cray pointee and its pointer is not in the
symbol table entry, but instead in a per-scope map. Use this information
to ensure that if a pointee/pointer is needed in the module file, so is
its pointer/pointee.

Fixes https://github.com/llvm/llvm-project/issues/130270.
2025-03-10 13:20:52 -07:00
Peter Klausler
40d9096fda
[flang] Enforce C15104(5) for coindexed values (#130203)
A object's value can't be copied from another image by means of an
intrinsic assignment statement if it has a derived type that contains a
pointer subobject ultimate component.
2025-03-10 13:20:35 -07:00
Peter Klausler
73b96cff01
[flang] Refine coarray subobjects (#130183)
A subobject of a coarray is not also a coarray if it involves an
allocatable component, pointer component, or vector-valued subscript.
2025-03-10 13:20:18 -07:00
Peter Klausler
fd8de7524d
[flang] Check actual/dummy coranks in more cases (#130167)
The check for equality of actual and dummy argument coranks was taking
place only for ALLOCATABLE coarrays; perform the check for all cases,
and refine the ALLOCATABLE check to apply only to cases that don't fail
the new more general check.
2025-03-10 13:19:58 -07:00
Peter Klausler
bbc27fbb1c
[flang] Refine checks on assignments to coarrays (#129966)
F'2023 10.2.1.2 paragraph 2 imposes some requirements on the left-hand
sides of assignments when they have coindices, and one was not checked
while another was inaccurately checked. In short, intrinsic assignment
to a coindexed object can't change its type, and neither can it affect
allocatable components.
2025-03-10 13:19:39 -07:00
Peter Klausler
f6fc29d331
[flang] Check coranks on MOVE_ALLOC (#129944)
The FROM= and TO= arguments in a reference to the MOVE_ALLOC intrinsic
subroutine must have the same corank.
2025-03-10 13:18:23 -07:00
Peter Klausler
d53079055e
[flang] Catch coindexed procedure pointer/binding references (#129931)
A procedure designator cannot be coindexed, except for cases in which
the coindexing doesn't matter (i.e. a binding that can't be overridden).
2025-03-10 13:18:07 -07:00
Peter Klausler
b7557ab34d
[flang] Catch disallowed usage of coarrays in defined I/O (#129907)
Defined input/output subroutines must conform to documented interfaces
that do not allow for coarray dummy arguments.
2025-03-10 13:17:50 -07:00
Peter Klausler
8c535669a3
[flang] Semantic checking of LOCK statement (#129806)
Add checks for the LOCK statement, and complete and enable their tests.
2025-03-10 13:17:31 -07:00
Peter Klausler
33b061f8ea
[flang] Enforce C955 on DEALLOCATE (#129788)
Constraint C955 in F'2023 prohibits an allocate-object from being
coindexed. We catch this on ALLOCATE statements but missed it on
DEALLOCATE.
2025-03-10 13:17:12 -07:00
Peter Klausler
706e7d83cf
[flang] Enforce C1027 (F'2023) (#129617)
The LHS of a pointer assignment statement may not be coindexed.
2025-03-10 13:16:50 -07:00
Peter Klausler
53c3a2c69a
[flang] Static checking for empty coarrays (#129610)
A coarray must not have a zero extent on a codimension; that would yield
an empty coarray. When cobounds are constants, verify them.
2025-03-10 13:16:31 -07:00
Peter Klausler
e2733c82bd
[flang] Accept useless label on top-level FUNCTION (#129603)
The look-ahead parser for function program units didn't allow for a
useless label on the statement.

Fixes https://github.com/llvm/llvm-project/issues/129456.
2025-03-10 13:16:09 -07:00
Slava Zakharin
aa38754117
[RFC][flang] Add support for assumed-shape dummy arrays repacking. (#127147)
This is a document describing why and how to add support for repacking
of assumed-shape dummy arrrays to provide more efficient data cache.
It proposes adding new FIR operations and outlines the compiler flow
handling these operations.
I would like to hear feedback on all of it, but especially on:
  * The possibility of detecting safeness of the repacking
    in the context of OpenACC/OpenMP. If it is not possible
    to do the runtime checks to determine safety, then
    there is not need to add the `TempCopyIsSafe` attributes
    to the instruction.
  * Whether it is possible to preserve the debug information
    in cases where `fir.pack_array` is sunk after `[hl]fir.declare`,
    so that before the `fir.pack_array` a debugger will refer
    to the values in the original array, and after `fir.pack_array`
    it will refer to the copy.
2025-03-10 11:08:37 -07:00
مهدي شينون (Mehdi Chinoune)
cf5aa559a8
[flang] Don't redefine pid_t on MinGW-w64. (#130288) 2025-03-10 17:27:47 +00:00
Krzysztof Parzyszek
5ba7a3bd4c
[flang][OpenMP] Parse cancel-directive-name as clause (#130146)
The cancellable construct names on CANCEL or CANCELLATION POINT
directives are actually clauses (with the same names as the
corresponding constructs).

Instead of parsing them into a custom structure, parse them as a clause,
which will make CANCEL/CANCELLATION POINT follow the same uniform scheme
as other constructs (<directive> [(<arguments>)] [clauses]).
2025-03-10 11:58:02 -05:00
Krzysztof Parzyszek
4e453d5292
[flang][OpenMP] Accept old FLUSH syntax in METADIRECTIVE (#130122)
Accommodate it in OmpDirectiveSpecification, which may become the
primary component of the actual FLUSH construct in the future.
2025-03-10 08:12:46 -05:00
Krzysztof Parzyszek
d67947162f
[flang][OpenMP] Implement HAS_DEVICE_ADDR clause (#128568)
The HAS_DEVICE_ADDR indicates that the object(s) listed exists at an
address that is a valid device address. Specifically,
`has_device_addr(x)` means that (in C/C++ terms) `&x` is a device
address.

When entering a target region, `x` does not need to be allocated on the
device, or have its contents copied over (in the absence of additional
mapping clauses). Passing its address verbatim to the region for use is
sufficient, and is the intended goal of the clause.

Some Fortran objects use descriptors in their in-memory representation.
If `x` had a descriptor, both the descriptor and the contents of `x`
would be located in the device memory. However, the descriptors are
managed by the compiler, and can be regenerated at various points as
needed. The address of the effective descriptor may change, hence it's
not safe to pass the address of the descriptor to the target region.
Instead, the descriptor itself is always copied, but for objects like
`x`, no further mapping takes place (as this keeps the storage pointer
in the descriptor unchanged).

---------

Co-authored-by: Sergio Afonso <safonsof@amd.com>
2025-03-10 08:11:01 -05:00
Kajetan Puchalski
0c7e895de3
[flang] Move parser invocations into ParserActions (#130309)
FrontendActions.cpp is currently one of the biggest compilation units in
all of flang. Measuring its compilation gives the following metrics:

User time (seconds): 139.21
System time (seconds): 4.65
Maximum resident set size (kbytes): 5891440 (5.61 GB)

This commit separates out explicit invocations of the parser into a
separate compilation unit - ParserActions.cpp - through helper functions
in order to decrease the maximum compilation time and memory usage of a
single unit.
After the split, the measurements of FrontendActions.cpp are as follows:

User time (seconds): 70.08
System time (seconds): 3.16
Maximum resident set size (kbytes): 3961492 (3.7 GB)

While the ones for the newly created ParserActions.cpp as follows:

User time (seconds): 104.33
System time (seconds): 3.37
Maximum resident set size (kbytes): 4185600 (3.99 GB)

---------

Signed-off-by: Kajetan Puchalski <kajetan.puchalski@arm.com>
2025-03-10 11:33:47 +00:00
R
3121da52aa Revert "[flang] In AllocMemOp lowering, convert types for calling malloc on 32-bit (#129308)"
This reverts commit cf1964af5a461196904b663ede04c26555fcff69.

This causes breakage on all the non-x86 buildbots as they don't have the i686
target enabled. This was missed in pre-commit CI.
2025-03-08 02:42:24 +00:00
R
cf1964af5a
[flang] In AllocMemOp lowering, convert types for calling malloc on 32-bit (#129308)
Although 32-bit targets are currently not officially supported, add a type conversion in the AllocMemOp lowering when calling the `malloc` function on 32-bit targets. This fixes a type mismatch, and this fix makes it easier to potentially support such targets in the future.

This involves making sure the `LLVMTypeConverter` has the necessary information to know the target bit width.

Co-authored-by: Valentin Clement (バレンタイン クレメン) <clementval@gmail.com>
2025-03-08 02:25:17 +00:00
Valentin Clement (バレンタイン クレメン)
ae42f07103
[flang][cuda] Allow array pointer for atomicexch and atomiccas (#130363) 2025-03-07 15:36:08 -08:00
Valentin Clement (バレンタイン クレメン)
829e8993e5
[flang][cuda] Lower __LDCA, __LDCS, __LDLU, __LDCV, __LDCG with arrays (#130357) 2025-03-07 15:35:52 -08:00
agozillon
f1178815d2
[Flang][OpenMP][MLIR] Implement close, present and ompx_hold modifiers for Flang maps (#129586)
This PR adds an initial implementation for the map modifiers close,
present and ompx_hold, primarily just required adding the appropriate
map type flags to the map type bits. In the case of ompx_hold it
required adding the map type to the OpenMP dialect. Close has a bit of a
problem when utilised with the ALWAYS map type on descriptors, so it is
likely we'll have to make sure close and always are not applied to the
descriptor simultaneously in the future when we apply always to the
descriptors to facilitate movement of descriptor information to device
for consistency, however, we may find an alternative to this with
further investigation. For the moment, it is a TODO/Note to keep track
of it.
2025-03-07 22:22:30 +01:00
Valentin Clement (バレンタイン クレメン)
dcda314b6c
[flang][cuda] Fix atmoicxor lowering to accept arrays (#130331)
The first agrument can be an address of a scalare, an array element or
even just the address of the first element of an array. Update lowering
to not trigger elemental lowering.
2025-03-07 13:05:42 -08:00
Valentin Clement (バレンタイン クレメン)
5668c7bb90
[flang][cuda] Add more interfaces for __ldca, __ldcs, __ldlu and __ldcv (#130218) 2025-03-07 10:19:20 -08:00
Renaud Kauffmann
d4754db15d
Test fix: Adding REQUIRES: asserts (#130314) 2025-03-07 09:49:47 -08:00
Renaud Kauffmann
718c4ed8a0
[flang] [NFCI] Using getSource instead of getOriginalDef (#128984)
As discussed in past MRs, this change removes the use of getOriginalDef
to use getSource instead to gather data from an indirection.
2025-03-07 08:47:42 -08:00
Krzysztof Parzyszek
b47dac609b [flang][OpenMP] Remove pessimizing move introduced in 90f45a15ab
This unbreaks the builders that diagnose this situation:

  error: moving a temporary object prevents copy elision [-Werror,-Wpess
imizing-move]
    341 |   static OmpClauseList empty{std::move(decltype(OmpClauseList:
:v){})};
        |                              ^
2025-03-07 08:38:15 -06:00
Tom Eccles
d31a7dde48
Revert " [flang] Rely on global initialization for simpler derived types" (#130278)
Reverts llvm/llvm-project#114002

This causes a regression building cam4_r from spec2017
2025-03-07 13:59:29 +00:00
Krzysztof Parzyszek
90f45a15ab
[flang][OpenMP] Implement OmpDirectiveName, use in OmpDirectiveSpecif… (#130121)
…ication

The `OmpDirectiveName` class has a source in addition to wrapping the
llvm::omp::Directive.
2025-03-07 07:56:40 -06:00
Tom Eccles
f7daa9d302
[mlir][OpenMP] fix crash outlining infinite loop (#129872)
Previously an extra block was created by splitting the previous exit
block. This produced incorrect results when the outlined region
statically never terminated because then there wouldn't be a valid exit
block for the outlined region, this caused this newly added block to
have an incoming edge from outside of the outlining region, which caused
outlining to fail.

So far as I can tell this extra block no longer serves any purpose. The
comment says it is supposed to collate multiple control flow edges into
one place, but the code as it is now does not achieve this. In fact, as
can be seen from the changes to lit tests, this block was not actually
outlined in the end. This is because there are actually two code
extractors: one in the callback for creating a parallel op which is used
to find what the input/output variables are (which does have this block
added to it), and another one which actually does the outlining (which
this block was not added to).

Tested with the gfortran and fujitsu test suites.

Fixes #112884
2025-03-07 11:02:52 +00:00
jeanPerier
40e245a9aa
[flang] add support for procedure pointer assignment inside FORALL (#130114)
Very similar to object pointer assignment, the difference is the SSA
types of the LHS (!fir.ref<!fir.boxproc<()->()>> and RHS
(!fir.boxproc<()->()).

The RHS must be saved as simple address, not descriptors (it is not
possible to make CFI descriptor out of procedure entity).
2025-03-07 10:28:02 +01:00
Kiran Kumar T P
c02019141c
[LLVM-FLANG] [OpenMP] [Taskloop] - Add test case with cancel construct inside taskloop (#129862)
Added a test case with cancel construct inside taskloop. Currently
taskloop lowering is not supported so below error is issued: "not yet
implemented: Taskloop construct"
Once the lowering patch is merged, todo error should be issued for
cancel construct. "not yet implemented: OpenMPCancelConstruct"
2025-03-07 11:15:22 +05:30
Kareem Ergawy
9543e9e927
[flang][OpenMP] Handle pre-detemined lastprivate for simd (#129507)
This PR tries to fix `lastprivate` update issues in composite
constructs. In particular, pre-determined `lastprivate` symbols are
attached to the wrong leaf of the composite construct (the outermost
one). When using delayed privatization (should be the default mode in
the future), this results in trying to update the `lastprivate` symbol
in the wrong construct (outside the `omp.loop_nest` op).

For example, given the following input:
```fortran
!$omp target teams distribute parallel do simd collapse(2) private(y_max)
  do i=x_min,x_max
    do j=y_min,y_max
    enddo
  enddo
```

Without the fixes introduced in this PR, the `DataSharingProcessor`
tries to generate the `lastprivate` update ops in the `parallel` op
since this is the op for which the DSP instance is created.

The fix consists of 2 main parts:
1. Instead of creating a single DSP instance, one instance is created
for the leaf constructs that might need privatization (whether for
explicit, implicit, or pre-determined symbols).
2. When generating the `lastprivate` comparison ops, we don't directly
use the SSA values of the UBs and steps. Instead, we regenerated these
SSA values from the original loop bounds' expressions. We have to do
this to avoid using `host_eval` values in the `lastprivate` comparison
logic which is illegal.
2025-03-07 05:44:39 +01:00
Thirumalai Shaktivel
e15545cad8
[Flang][OpenMP] Allow copyprivate and nowait on the directive clauses (#127769)
Issue:
- Single construct used to throw a semantic error for copyprivate and
  nowait clause when used in the single directive.
- Also, the copyprivate with nowait restriction has been removed from
  OpenMP 6.0

Fix:
- Allow copyprivate and nowait on both single and end single directive
- Allow at most one nowait clause
- Throw a warning when the same list item is used in the copyprivate clause
  on the end single directive

From Reference guide (OpenMP 5.2, 2.10.2):
```
!$omp single [clause[ [,]clause] ... ]
loosely-structured-block
!$omp end single [end-clause[ [,]end-clause] ...]

clause:
  copyprivate (list)
  nowait
  [...]

end-clause:
  copyprivate (list)
  nowait
```

Towards: https://github.com/llvm/llvm-project/issues/110008
2025-03-07 09:24:32 +05:30
Valentin Clement (バレンタイン クレメン)
478e516140
[flang][cuda] Sync double descriptor after c_f_pointer call (#130194)
After a global device pointer is set through `c_f_pointer`, we need to
sync the double descriptor so the version on the device is also up to
date.
2025-03-06 19:19:51 -08:00
Kelvin Li
996092d5a5
[flang] probably convert Fortran logical to i1 in expanding hlfir.maxloc/hlfir.minloc opcodes (#129791)
If mask is a scalar, it always converts to !fir.box<!fir.array<1xi1>>.
The wrong value may be picked up when passing to the function
on the big endian platform. This patch is to do the conversion 
based on the original type of the mask and convert the value to 
i1 after the load.
2025-03-06 15:47:44 -05:00
Valentin Clement (バレンタイン クレメン)
c8898b09f9
[flang][rt] Use allocator registry to allocate the pointer payload (#129992)
pointer allocation is done through `AllocateValidatedPointerPayload`.
This function was not updated to use the registered allocators in the
descriptor to perform the allocation. This patch makes use of the
allocator.
The footer word is not set and not checked for allocator other than the
default one. The support will likely come in a follow up patch but this
will necessitate more functions to be registered to be able to set and
get the footer value when the allocation in on the device.
2025-03-06 08:47:27 -08:00
Kiran Chandramohan
e2911aa2c2
[Flang][OpenMP] Fix crash when loop index var is pointer or allocatable (#129717)
Use hlfir dereferencing for pointers and allocatables and use hlfir
assign. Also, change the code updating IV in lastprivate.

Note: This is a small change. Modifications in existing tests are
changes from fir.store to hlfir.assign.

Fixes #121290
2025-03-06 12:19:34 +00:00
Nikita Popov
979c275097
[IR] Store Triple in Module (NFC) (#129868)
The module currently stores the target triple as a string. This means
that any code that wants to actually use the triple first has to
instantiate a Triple, which is somewhat expensive. The change in #121652
caused a moderate compile-time regression due to this. While it would be
easy enough to work around, I think that architecturally, it makes more
sense to store the parsed Triple in the module, so that it can always be
directly queried.

For this change, I've opted not to add any magic conversions between
std::string and Triple for backwards-compatibilty purses, and instead
write out needed Triple()s or str()s explicitly. This is because I think
a decent number of them should be changed to work on Triple as well, to
avoid unnecessary conversions back and forth.

The only interesting part in this patch is that the default triple is
Triple("") instead of Triple() to preserve existing behavior. The former
defaults to using the ELF object format instead of unknown object
format. We should fix that as well.
2025-03-06 10:27:47 +01:00
Matthias Springer
a6151f4e23
[mlir][IR] Move match and rewrite functions into separate class (#129861)
The vast majority of rewrite / conversion patterns uses a combined
`matchAndRewrite` instead of separate `match` and `rewrite` functions.

This PR optimizes the code base for the most common case where users
implement a combined `matchAndRewrite`. There are no longer any `match`
and `rewrite` functions in `RewritePattern`, `ConversionPattern` and
their derived classes. Instead, there is a `SplitMatchAndRewriteImpl`
class that implements `matchAndRewrite` in terms of `match` and
`rewrite`.

Details:
* The `RewritePattern` and `ConversionPattern` classes are simpler
(fewer functions). Especially the `ConversionPattern` class, which now
has 5 fewer functions. (There were various `rewrite` overloads to
account for 1:1 / 1:N patterns.)
* There is a new class `SplitMatchAndRewriteImpl` that derives from
`RewritePattern` / `OpRewritePatern` / ..., along with a type alias
`RewritePattern::SplitMatchAndRewrite` for convenience.
* Fewer `llvm_unreachable` are needed throughout the code base. Instead,
we can use pure virtual functions. (In cases where users previously had
to implement `rewrite` or `matchAndRewrite`, etc.)
* This PR may also improve the number of [`-Woverload-virtual`
warnings](https://discourse.llvm.org/t/matchandrewrite-hiding-virtual-functions/84933)
that are produced by GCC. (To be confirmed...)

Note for LLVM integration: Patterns with separate `match` / `rewrite`
implementations, must derive from `X::SplitMatchAndRewrite` instead of
`X`.

---------

Co-authored-by: River Riddle <riddleriver@gmail.com>
2025-03-06 08:48:51 +01:00
Valentin Clement (バレンタイン クレメン)
2130285564
[flang][cuda] Make sure allocator id is set for pointer allocate (#129950) 2025-03-05 17:29:09 -08:00
Zhen Wang
d1abbb4dc5
[flang][cuda] Change induction variable from i32 to index for doconcurrent inside cuf kernel directive (#129924)
Use `index` instead of `i32` for induction variables for doconcurrent
inside cuf kernel directive. Regular do loop inside cuf kernel directive
also uses `index`:
```
cuf.kernel<<<*, *>>> (%arg0 : index) = ...
```
2025-03-05 14:50:42 -08:00
Krzysztof Parzyszek
44c6a23789
[flang][OpenMP][AMDGPU] Allow REAL(10) to compile on AMDGPU (#129742)
This will allow the following code to compile
```
program p
  real(10) :: x
  !$omp target
    continue
  !$omp end target
end
```
2025-03-05 14:23:59 -06:00
Mats Petersson
9925359fee
[flang][llvm][openmp]Add Initializer clause to OMP.td (#129540)
Then use this in the Flang compiler for parsing the OpenMP declare
reduction.

This has no real functional change to the existing code, it's only
moving the declaration itself around.

A few tests has been updated, to reflect the new type names.
2025-03-05 15:41:24 +00:00
NimishMishra
0ae1f0a310
[flang] Rely on global initialization for simpler derived types (#114002)
Currently, all derived types are initialized through `_FortranAInitialize`, which is functionally correct, but bears poor runtime performance. This patch falls back on global initialization for "simpler" derived types to speed up the initialization.
2025-03-05 05:44:51 -08:00
jeanPerier
7302e1b94e
[flang] implement simple pointer assignments inside FORALL (#129522)
The semantic of pointer assignments inside FORALL requires evaluating
the targets (RHS) and pointer variables (LHS) of all iterations before
evaluating the assignments.

In practice, if the compiler can prove that the RHS and LHS evaluations
are not impacted by the assignments, the evaluation of the FORALL
assignment statement can be done in a single loop. However, if the
compiler cannot prove this, it needs to "save" the addresses of the
targets and/or the pointer descriptors of each iterations before doing
the assignments.

This patch implements the most common cases where there is no lower bound
spec, no bounds remapping, the LHS is not polymorphic, and the RHS is
not NULL.

The HLFIR operation used to represent assignments inside FORALL can be
used for pointer assignments to (the only difference being that the LHS
is a descriptor address).

The analysis for intrinsic assignment can be reused, with the
distinction that the RHS data is not read during the assignment.

The logic that is used to save LHS in intrinsic assignments inside
FORALL is extracted to be used for the RHS of pointer assignments when
needed (saving a descriptor value).
Pointer assignment LHS are just descriptor addresses and are saved as
int_ptr values.
2025-03-05 11:24:04 +01:00