9983 Commits

Author SHA1 Message Date
Krzysztof Parzyszek
cd26dd5595
[flang][OpenMP] Use OmpDirectiveSpecification in simple directives (#131162)
The `OmpDirectiveSpecification` contains directive name, the list of
arguments, and the list of clauses. It was introduced to store the
directive specification in METADIRECTIVE, and could be reused everywhere
a directive representation is needed.
In the long term this would unify the handling of common directive
properties, as well as creating actual constructs from METADIRECTIVE by
linking the contained directive specification with any associated user
code.
2025-03-19 11:34:40 -05:00
jeanPerier
cd0a2a3f1b
[flang] add QSORT extension intrinsic to the runtime (#132033)
Add support for legacy Fortran intrinsic QSORT from lib 3f.
This is a thin Fortran wrapper over libc qsort.
2025-03-19 16:14:37 +01:00
Valentin Clement (バレンタイン クレメン)
20feca47c1
[flang][cuda] Allow ieee_arithmetic on the device (#131930)
- Allow ieee_arithmetic on the device
- Add ignore_tkr(d) to ieee_is_finite
2025-03-19 07:20:06 -07:00
Kiran Chandramohan
96b112fb61
Revert "[flang][openmp] Adds Parser and Semantic Support for Interop Construct, and Init and Use Clauses." (#132005)
Reverts llvm/llvm-project#120584

Reverting due to CI failure
https://lab.llvm.org/buildbot/#/builders/157/builds/22946
2025-03-19 11:13:52 +00:00
swatheesh-mcw
ee8a759bfb
[flang][openmp] Adds Parser and Semantic Support for Interop Construct, and Init and Use Clauses. (#120584)
Adds Parser and Semantic Support for the below construct and clauses:
- Interop Construct
- Init Clause
- Use Clause

Note:
The other clauses supported by Interop Construct such as Destroy, Use,
Depend and Device are added already.
2025-03-19 10:49:17 +00:00
Tom Eccles
e7c6e3557b
[flang][OpenMP] Fix threadprivate pointer variable in common block (#131888)
Fixes #112538

The problem was that the host associated symbol for the threadprivate
variable doesn't have all of the symbol attributes (e.g. POINTER). This
caused the lowering code to generate the wrong type, eventually hitting
an assertion.
2025-03-19 10:12:52 +00:00
jeanPerier
b8271ec8b3
[flang] accept character type in fir::changeTypeShape (#131892)
There is no reason for character element type to be forbidden in this
helper.
The assert was firing in character pointer assignment in FORALL after
#130772 added a usage of this helper.
2025-03-19 10:01:24 +01:00
Slava Zakharin
fd0e20a64b
[flang] Generate fir.pack/unpack_array in Lowering. (#131704)
Basic generation of array repacking operations in Lowering.
2025-03-18 21:26:33 -07:00
Slava Zakharin
9ed772cecc
[flang] Fixed computation of position of function's arg in AddDebugInfo. (#131672)
I am working on `-frepack-array` feature (#127147), which produces
non-trivial manipulations with arguments of `fir.declare`.
In this case, we end up with CFG computation of the `fir.declare`
argument, and AddDebugInfo pass incorrectly mapped two dummy arguments
to the same arg index in the debug attributes.
This patch makes sure that we assign the arg index only if we can prove
that we've traced the block argument to the function's entry block.
I believe this problem is not specific to `-frepack-arrays`, e.g.
it may appear due to MLIR inlining as well.
2025-03-18 16:46:59 -07:00
Valentin Clement (バレンタイン クレメン)
b7ed5c8e06
[flang][cuda] Check for ignore_tkr(d) when resolving generic call (#131923) 2025-03-18 15:39:04 -07:00
Slava Zakharin
7d7b58bc5d
[flang-rt] Added ShallowCopy API. (#131702)
This API will be used for copying non-contiguous arrays
into contiguous temporaries to support `-frepack-arrays`.
The builder factory API will be used in the following commits.
2025-03-18 12:58:25 -07:00
Kelvin Li
6c7c660afe
[flang] Use C-style casts to silence message (NFC) (#131796) 2025-03-18 13:02:18 -04:00
Slava Zakharin
e0bcf3aa0b
[flang] Allow no type parameters for fir.pack_array. (#131662)
Arrays with assumed-length types are represented with a box
without explicit length parameters. This patch fixes the verification
to allow it for `fir.pack_array`.
2025-03-18 07:59:04 -07:00
Akash Banerjee
cbc5c11fec
[MLIR][OpenMP] Add Lowering support for implicitly linking to default declare mappers (#131006) 2025-03-18 13:17:10 +00:00
Kareem Ergawy
83658ddb1b
[flang][OpenMP] Enable delayed privatization by default for omp.distribute (#131574)
Switches delayed privatization for `omp.distribute` to be on by default:
controlled by the `-openmp-enable-delayed-privatization` instead of by
`-openmp-enable-delayed-privatization-staging`.

### GFortran & Fujitsu test suite results:

#### gfotran test-suite (this PR):
```
Testing Time: 34.51s
  Passed: 6569
```

#### Fujitsu without changes (commit: 0813c5cf5f52):
```
Testing Time: 155.39s
  Passed            : 88325
  Failed            :   156
  Executable Missing:   408
```

#### Fujitsu with changes (this PR):
```
Testing Time: 158.54s
  Passed            : 88325
  Failed            :   156
  Executable Missing:   408
```
2025-03-18 14:07:41 +01:00
Kareem Ergawy
1094ffcafb
[flang][fir] Add MLIR op for do concurrent (#130893)
Adds new MLIR ops to model `do concurrent`. In order to make `do
concurrent` representation self-contained, a loop is modeled using 2
ops, one wrapper and one that contains the actual body of the loop. For
example, a 2D `do concurrent` loop is modeled as follows:

```mlir
  fir.do_concurrent {
    %i = fir.alloca i32
    %j = fir.alloca i32
    fir.do_concurrent.loop
      (%i_iv, %j_iv) = (%i_lb, %j_lb) to (%i_ub, %j_ub) step (%i_st, %j_st) {
      %0 = fir.convert %i_iv : (index) -> i32
      fir.store %0 to %i : !fir.ref<i32>

      %1 = fir.convert %j_iv : (index) -> i32
      fir.store %1 to %j : !fir.ref<i32>
    }
  }
```

The `fir.do_concurrent` wrapper op encapsulates both the actual loop and
the allocations required for the iteration variables. The
`fir.do_concurrent.loop` op is a multi-dimensional op that contains the
loop control and body. See the ops' docs for more info.
2025-03-18 10:53:44 +01:00
Valentin Clement (バレンタイン クレメン)
e5ec7bb21b
[flang][cuda] Set correct offsets for multiple variables in dynamic shared memory (#131674) 2025-03-17 17:13:06 -07:00
Valentin Clement (バレンタイン クレメン)
74d4fc0a3e
[flang][cuda][NFC] Use ssa value for offset in shared memory op (#131661)
Switch from attribute to a value as we need to support dynamic offset
when multiple variables are used with dynamic shared memory.
2025-03-17 14:23:34 -07:00
Kiran Chandramohan
93e0df07c2
[Flang][OpenMP] Allow zero trait score (#131473) 2025-03-17 09:49:08 +00:00
sharang.12492
7eb8b73178
[Flang][OpenMP][taskloop] Adding missing semantic checks in Taskloop (#128431)
Below semantic checks for Taskloop clause mentioned in OpenMP [5.2]
specification were missing, this patch contains the semantic checks,
corresponding error messages and test cases:
OpenMP standard [5.2]:
[12.6] Taskloop Construct
[Restrictions]
Restrictions to the taskloop construct are as follows: 
• The reduction-modifier must be default.
• The conditional lastprivate-modifier must not be specified.

Authored-by: shkaushi <sharang.kaushik@amd.com>
2025-03-17 12:35:37 +05:30
Valentin Clement (バレンタイン クレメン)
4fde8c341f
[flang][cuda] Lower CUDA shared variable with cuf.shared_memory op (#131399)
Use `cuf.shared_memory` operation instead of `cuf.alloc` for CUDA shared
variable. These variables do not need free operations.
2025-03-16 17:44:56 -07:00
Valentin Clement (バレンタイン クレメン)
e86081b6c2
[flang][cuda] Convert cuf.shared_memory operation to LLVM ops (#131396)
Convert the operation to `llvm.addressof` operation with
`llvm.getelementptr` with the appropriate offset.
2025-03-14 19:34:55 -07:00
Valentin Clement (バレンタイン クレメン)
4fb20b85fd
[flang][cuda] Compute offset on cuf.shared_memory ops (#131395)
Add a pass to compute the size of the shared memory (static shared
memory) and the offsets of each variables to be placed in shared memory.
The global representing the shared memory is also created during this
pass.

In case of dynamic shared memory, the global as a type of
`!fir.array<0xi8>` and the size of the memory is set at kernel launch.
2025-03-14 19:34:35 -07:00
Valentin Clement (バレンタイン クレメン)
4818623924
[flang][cuda] Add cuf.shared_memory operation (#131392)
Introduce `cuf.shared_memory` operation. The operation is used to get
the pointer in shared memory for a specific variable. The shared memory
is materialized as a global in address space 3 and the different
variables are pointing to it at different offset.

Follow up patches will add lowering and conversion of this operation.
2025-03-14 15:43:25 -07:00
Valentin Clement (バレンタイン クレメン)
a862b6deae
[flang][cuda] Lower shared global to the correct NVVM address space (#131368)
Global with the CUDA shared data attribute needs to be lowered to llvm
globals with the correct address space (3). Address space is set from
the `mlir::NVVM::NVVMMemorySpace::kSharedMemorySpace` enum from
`mlir/Dialect/LLVMIR/NVVMDialect.h`
2025-03-14 15:28:32 -07:00
Shilei Tian
dccc0a836c
[NFC][AMDGPU] Replace more direct arch comparison with isAMDGCN() (#131379)
This is an extension of #131357. Hopefully this would be the last one.
2025-03-14 17:02:15 -04:00
Slava Zakharin
00f9c855fb
[flang] Added fir.is_contiguous_box and fir.box_total_elements ops. (#131047)
These are helper operations to aid with expanding of fir.pack_array.
2025-03-14 08:25:05 -07:00
jeanPerier
3ff3b29dd6
[flang] lower remaining cases of pointer assignments inside forall (#130772)
Implement handling of `NULL()` RHS, polymorphic pointers, as well as
lower bounds or bounds remapping in pointer assignment inside FORALL.

These cases eventually do not require updating hlfir.region_assign,
lowering can simply prepare the new descriptor for the LHS inside the
RHS region.

Looking more closely at the polymorphic cases, there is not need to call
the runtime, fir.rebox and fir.embox do handle the dynamic type setting
correctly.

After this patch, the last remaining TODO is the allocatable assignment
inside FORALL, which like some cases here, is more likely an accidental
feature given FORALL was deprecated in F2003 at the same time than
allocatable components where added.
2025-03-14 10:51:46 +01:00
Michael Kruse
bddf24ddbd
[Flang] Add omp_lib dependency to check-flang (#130975)
With `LLVM_ENABLE_RUNTIMES=openmp`, flang enables the OpenMP regression
tests, but `check-flang` was not ensuring that the OpenMP requirements
are built first. Fix by adding a `libomp-mod` to `flang-test-depends`.

Adding `libomp-mod` to extra_targets is necessary because there is no
target from openmp/ that is reachable from the parent
bootstrapping-build. `ninja openmp` fails because openmp/ has no
`openmp` target. `check-openmp` would also run the OpenMP tests and does
not even build `omp_lib.mod`. `runtimes` would build all the runtimes,
not just OpenMP.

Also fix the misleading CMake configure status messages that suggest the
only way to build omp_lib.mod/.h is `LLVM_ENABLE_PROJECTS=openmp`.
2025-03-14 09:24:28 +01:00
Valentin Clement (バレンタイン クレメン)
57d87ed7f0
[flang][NFC] Add parenthesis to avoid warning (#131219)
Remove warning introduced in 369da8421c2f7
2025-03-13 14:28:35 -07:00
Kelvin Li
c2b66ce655
[flang][OpenMP] Silence unused-but-set-variable message (NFC) (#130979) 2025-03-13 14:09:47 -04:00
Valentin Clement (バレンタイン クレメン)
369da8421c
[flang][cuda] Allow assumed-size declaration for SHARED variable (#130833)
Avoid triggering an assertion for shared variable using the assumed-size
syntax.

```
attributes(global) subroutine sharedstar()
  real, shared :: s(*) ! ok. dynamic shared memory.
end subroutine
```
2025-03-13 11:06:17 -07:00
Michael Kruse
d06937aea3
[Flang][NFC] Fix typo (#130960)
This was mainly a test of the pre-merge CI, but merging it since it fixes an actual typo.
2025-03-13 17:55:07 +01:00
Tom Eccles
01aca42363
[flang] Add support for -f[no-]verbose-asm (#130788)
This flag provides extra commentary in the assembly output.
2025-03-13 15:22:13 +00:00
Kareem Ergawy
b003face11
[flang][OpenMP] Add OutlineableOpenMPOpInterface to omp.teams (#131109)
Given the following input:
```fortran
program rep_loopbind
  implicit none
  integer :: i
  real :: priv_val

  !$omp teams private(priv_val)
    !$omp distribute
    do i=1,1000
    end do
  !$omp end teams
end program
```
the `AllocaOpConversion` pattern in `FIRToLLVMLowering` would **move**
the private allocations that belong to the `teams` directive (i.e. the
allocations needed for the private copies of `priv_val` and the loop's
iteration variable) from the the `omp.teams` op to the outside scope.

This is not correct since these allocations should be eventually emitted
inside the outlined region for the `teams` directive. Without this fix,
these allocation would be emitted in the parent function (or the parent
scope whatever it is).
2025-03-13 16:03:19 +01:00
Michael Klemm
28ffa7f6a4
[flang][OpenMP] Fix missing missing inode issue (#130798)
When outlining an offload region, Flang creates a unique name by
querying an inode ID. However, when the name of the actual source file
does not match the logical file in a `#line` preprocessor directive,
code-gen was failing as it could not determine the inode ID. This PR
checks for this condition and if the logical file name does not exist,
the inode is replaced with a hash value created from the source code
itself.
2025-03-13 15:50:37 +01:00
Kajetan Puchalski
0c5eb4d68b
[flang] Use precompiled parsing headers (#130600)
Most of the high memory usage and compilation time in the frontend units
is due to including large parsing headers.
This commit moves out several of the largest parsing headers into a new
precompiled header linked to flangFrontend.
The new compilation metrics for FrontendActions.cpp are as follows:

User time (seconds): 38.40
System time (seconds): 2.00
Maximum resident set size (kbytes): 2710964 (2.58 GB) (vs 3.78 GB)

ParserActions.cpp:

User time (seconds): 69.37
System time (seconds): 1.81
Maximum resident set size (kbytes): 2599456 (2.47 GB) (vs 4 GB)

Alongside the new precompiled header compilation unit

User time (seconds): 41.61
System time (seconds): 2.72
Maximum resident set size (kbytes): 3107644 (2.96 GB)

---------

Signed-off-by: Kajetan Puchalski <kajetan.puchalski@arm.com>
2025-03-13 10:28:31 +00:00
Mats Petersson
d0188ebcc2
[flang][OpenMP]Add symbls omp_in, omp_out and omp_priv in DECLARE RED… (#129908)
…UCTION

This patch allows better parsing of the reduction and initializer
components, including supporting derived types in both those places.

There is more work needed here, but this is a definite improvement in
what can be handled through parser and semantics.

Note that declare reduction is still not supported in lowering, so any
attempt to compile DECLARE REDUCTION code will end with a TODO aka "Not
yet implemented" abort in the compiler.

Note that this version of the code does not cover declaring multiple
reductions using the same name with different types. This is will be
fixed in a future patch. [This was also the case before this change].

One existing test modified to actually compile (as it didn't in the
original form).
2025-03-13 09:39:45 +00:00
Krzysztof Parzyszek
f4fc2d731c
[flang][OpenMP] Map ByRef if size/alignment exceed that of a pointer (#130832)
Improve the check for whether a type can be passed by copy. Currently,
passing by copy is done via the OMP_MAP_LITERAL mapping, which can only
transfer as much data as can be contained in a pointer representation.
2025-03-12 19:41:11 -05:00
Slava Zakharin
c542991703
[flang-rt] Fixed HAVE_LDBL_MANT_DIG_113 detection. (#131010)
I thought I guessed a fix in #130836, but I was wrong.
We actually had the same code in
`flang/cmake/modules/FlangCommon.cmake`.
The check does not pass in flang-rt bootstrap build, because
`-nostdinc++` is added for all `runtimes` checks.
I decided to make the check with the C header, though, I am still
unsure whether it is reliable with a clang that has not been
installed (it is taken from the build structure during flang-rt
configure step).
I verified that this PR enables REAL(16) math entries on aarch64.
2025-03-12 16:50:01 -07:00
Nikita Popov
1a626e63b5 [flang] Fix deprecation warning
Adjust for #130940.
2025-03-12 18:00:06 +01:00
Nikita Popov
f137c3d592
[TargetRegistry] Accept Triple in createTargetMachine() (NFC) (#130940)
This avoids doing a Triple -> std::string -> Triple round trip in lots
of places, now that the Module stores a Triple.
2025-03-12 17:35:09 +01:00
Iñaki Amatria Barral
bdbe8fa1f3
[flang] Align -x language modes with gfortran (#130268)
This PR addresses some of the issues described in
https://github.com/llvm/llvm-project/issues/127617. Key changes:

- Stop assuming fixed-form for `-x f95` unless the input is a `.i` file.
This change ensures compatibility with `-save-temps` workflows while
preventing unintended fixed-form assumptions.
- Ensure `-x f95-cpp-input` enables `-cpp` by default, aligning Flang's
behavior with `gfortran`.
2025-03-12 16:45:33 +01:00
Asher Mancinelli
982527eef0
[flang] Use saturated intrinsics for floating point to integer conversions (#130686)
The saturated floating point conversion intrinsics match the semantics in the standard more closely than the fptosi/fptoui instructions.

Case 2 of 16.9.100 is

> INT (A [, KIND])
> If A is of type real, there are two cases: if |A| < 1, INT (A) has the
value 0; if |A| ≥ 1, INT (A) is the integer whose magnitude is the
largest integer that does not exceed the magnitude of A and whose sign
is the same as the sign of A.

Currently, converting a floating point value into an integer type too
small to hold the constant will be converted to poison in opt, leaving
us with garbage:

```
> cat t.f90
program main
  real(kind=16)   :: f
  integer(kind=4) :: i
  f=huge(f)
  i=f
  print *, i
end program main

# current upstream
> for i in `seq 10`; do; ./a.out; done
 -862156992
 -1497393344
 -739096768
 -1649494208
 1761228608
 -1959270592
 -746244288
 -1629194432
 -231217344
 382322496
```

With the saturated fptoui/fptosi intrinsics, we get the appropriate
values

```
# mine
> flang -O2 ./t.f90 && ./a.out
 2147483647

> perl -e 'printf "%d\n", (2 ** 31) - 1'
2147483647
```

One notable difference: NaNs being converted to ints will become zero, unlike current flang (and some other compilers). Newer versions of GCC have this behavior.
2025-03-12 08:14:46 -07:00
Michael Kruse
cbeae3e117 [Flang] Fix libquadmath in non-LLVM_ENABLE_RUNTIMES build.
The LLVM_ENABLE_RUNTIMES build introduced a new configure-time header
quadmath_wrapper.h. Also create the header in non-LLVM_ENABLE_RUNTIMES
builds.
2025-03-12 14:55:09 +01:00
Sergio Afonso
cf68c9378b
[Flang][OpenMP] Move declare mapper sym creation outside loop, NFC (#130794)
This patch simplifies the definition of
`ClauseProcessor::processMapObjects` by hoisting the creation of the
MLIR symbol associated to an existing `omp.declare_mapper` operation
outside of the loop processing all mapped objects.

That change removes some inter-iteration dependencies that made the
implementation more difficult to follow.
2025-03-12 11:54:29 +00:00
Tom Eccles
c851ee38ad
[flang][OpenMP] catch namelist access through equivalence (#130804)
The standard prohibits privatising namelist variables. We also decided
in #110671 to prohibit reductions of namelist variables.

This commit prevents this rule from being circumvented through the use
of equivalence statements.

Fixes #122824
2025-03-12 11:45:15 +00:00
jeanPerier
15e335f04f
[flang] also set llvm ABI argument attributes on direct calls (#130736)
So far, flang was not setting argument attributes on direct calls
assuming that putting them on the function operation was enough.

It was clarified in
38565da525
that they must be set on both call and functions, even for direct calls.

Crashes have been observed because of the lack of the attribute when
compiling `abs(x)` at `O2` and above on X86-64 for complex(16).
2025-03-12 09:55:05 +01:00
Slava Zakharin
74eba972ca
[flang] Definitions of fir.pack/unpack_array operations. (#130698)
As defined in #127147.
2025-03-11 14:15:29 -07:00
jeanPerier
356bf3fa2d
Reland " [flang] Rely on global initialization for simpler derived types" (#130290)
Currently, all derived types are initialized through `_FortranAInitialize`, which is functionally correct, but bears poor runtime performance. This patch falls back on global initialization for "simpler" derived types to speed up the initialization.

Note: this relands #114002 with the fix for the LLVM timeout regressions that have been seen. The fix is to use the added fir.copy to avoid aggregate load/store.

Co-authored-by: NimishMishra <42909663+NimishMishra@users.noreply.github.com>
2025-03-11 15:19:43 +01:00