657 Commits

Author SHA1 Message Date
Valentin Clement (バレンタイン クレメン)
49f8ccd1eb
[flang][cuda] Pass stream information to kernel launch functions (#135246) 2025-04-10 13:50:50 -07:00
Valentin Clement (バレンタイン クレメン)
ca53463137
[flang][cuda] Propagate stream information to gpu.launch_func op (#135227)
Use the information from `cuf.kernel_launch` to `gpu.launch_func`
2025-04-10 11:58:18 -07:00
Asher Mancinelli
8f23d4296c
Reland "[flang][nfc] Support volatility in Fir ops" (#135039)
#134858 had an extraneous include which caused the shared library builds
to break.
2025-04-09 12:45:55 -07:00
David Spickett
fb73086dd2
Revert "[flang][nfc] Support volatility in Fir ops" (#135034)
Reverts llvm/llvm-project#134858

Fails to build when shared libraries are enabled:
https://lab.llvm.org/buildbot/#/builders/80/builds/12361
```
: && /usr/local/bin/c++ -fPIC -fPIC -fno-semantic-interposition -fvisibility-inlines-hidden -Werror=date-time -Werror=unguarded-availability-new -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wmissing-field-initializers -pedantic -Wno-long-long -Wc++98-compat-extra-semi -Wimplicit-fallthrough -Wcovered-switch-default -Wno-noexcept-type -Wnon-virtual-dtor -Wdelete-non-virtual-dtor -Wsuggest-override -Wstring-conversion -Wmisleading-indentation -Wctad-maybe-unsupported -fdiagnostics-color -ffunction-sections -fdata-sections -Wno-deprecated-copy -Wno-string-conversion -Wno-ctad-maybe-unsupported -Wno-unused-command-line-argument -Wstring-conversion           -Wcovered-switch-default -Wno-nested-anon-types -O3 -DNDEBUG  -Wl,-z,defs -Wl,-z,nodelete   -Wl,-rpath-link,/home/tcwg-buildbot/worker/flang-aarch64-sharedlibs/build/./lib  -Wl,--gc-sections -shared -Wl,-soname,libFIRDialect.so.21.0git -o lib/libFIRDialect.so.21.0git tools/flang/lib/Optimizer/Dialect/CMakeFiles/FIRDialect.dir/FIRAttr.cpp.o tools/flang/lib/Optimizer/Dialect/CMakeFiles/FIRDialect.dir/FIRDialect.cpp.o tools/flang/lib/Optimizer/Dialect/CMakeFiles/FIRDialect.dir/FIROps.cpp.o tools/flang/lib/Optimizer/Dialect/CMakeFiles/FIRDialect.dir/FIRType.cpp.o tools/flang/lib/Optimizer/Dialect/CMakeFiles/FIRDialect.dir/FirAliasTagOpInterface.cpp.o tools/flang/lib/Optimizer/Dialect/CMakeFiles/FIRDialect.dir/FortranVariableInterface.cpp.o tools/flang/lib/Optimizer/Dialect/CMakeFiles/FIRDialect.dir/Inliner.cpp.o  -Wl,-rpath,"\$ORIGIN/../lib:/home/tcwg-buildbot/worker/flang-aarch64-sharedlibs/build/lib:"  lib/libCUFAttrs.so.21.0git  lib/libFIRDialectSupport.so.21.0git  lib/libLLVMAsmPrinter.so.21.0git  lib/libMLIRBuiltinToLLVMIRTranslation.so.21.0git  lib/libMLIROpenMPToLLVM.so.21.0git  lib/libMLIRLLVMToLLVMIRTranslation.so.21.0git  lib/libMLIRFuncToLLVM.so.21.0git  lib/libMLIRArithToLLVM.so.21.0git  lib/libMLIRArithAttrToLLVMConversion.so.21.0git  lib/libMLIRArithTransforms.so.21.0git  lib/libMLIRBufferizationTransforms.so.21.0git  lib/libMLIRBufferizationDialect.so.21.0git  lib/libMLIRSparseTensorDialect.so.21.0git  lib/libMLIRSCFDialect.so.21.0git  lib/libMLIRFuncTransforms.so.21.0git  lib/libMLIRShardingInterface.so.21.0git  lib/libMLIRMeshDialect.so.21.0git  lib/libMLIRVectorDialect.so.21.0git  lib/libMLIRTensorDialect.so.21.0git  lib/libMLIRParallelCombiningOpInterface.so.21.0git  lib/libMLIRMaskableOpInterface.so.21.0git  lib/libMLIRMaskingOpInterface.so.21.0git  lib/libMLIRVectorInterfaces.so.21.0git  lib/libMLIRControlFlowToLLVM.so.21.0git  lib/libMLIRControlFlowDialect.so.21.0git  lib/libMLIRMemRefToLLVM.so.21.0git  lib/libMLIRLLVMCommonConversion.so.21.0git  lib/libMLIRMemRefUtils.so.21.0git  lib/libMLIRAffineDialect.so.21.0git  lib/libMLIRMemRefDialect.so.21.0git  lib/libMLIRArithUtils.so.21.0git  lib/libMLIRComplexDialect.so.21.0git  lib/libMLIRArithDialect.so.21.0git  lib/libMLIRCastInterfaces.so.21.0git  lib/libMLIRInferIntRangeCommon.so.21.0git  lib/libMLIRShapedOpInterfaces.so.21.0git  lib/libMLIRDialect.so.21.0git  lib/libMLIRDialectUtils.so.21.0git  lib/libMLIROpenMPDialect.so.21.0git  lib/libMLIROpenACCMPCommon.so.21.0git  lib/libMLIRTargetLLVMIRExport.so.21.0git  lib/libMLIRDLTIDialect.so.21.0git  lib/libMLIRLLVMIRTransforms.so.21.0git  lib/libMLIRTransforms.so.21.0git  lib/libMLIRUBDialect.so.21.0git  lib/libMLIRRuntimeVerifiableOpInterface.so.21.0git  lib/libMLIRFuncDialect.so.21.0git  lib/libMLIRNVVMDialect.so.21.0git  lib/libMLIRTranslateLib.so.21.0git  lib/libMLIRParser.so.21.0git  lib/libMLIRBytecodeReader.so.21.0git  lib/libMLIRAsmParser.so.21.0git  lib/libMLIRTransformUtils.so.21.0git  lib/libMLIRSubsetOpInterface.so.21.0git  lib/libMLIRValueBoundsOpInterface.so.21.0git  lib/libMLIRDestinationStyleOpInterface.so.21.0git  lib/libMLIRRewrite.so.21.0git  lib/libMLIRRewritePDL.so.21.0git  lib/libMLIRPDLToPDLInterp.so.21.0git  lib/libMLIRPass.so.21.0git  lib/libMLIRAnalysis.so.21.0git  lib/libMLIRInferIntRangeInterface.so.21.0git  lib/libMLIRLoopLikeInterface.so.21.0git  lib/libMLIRPresburger.so.21.0git  lib/libMLIRViewLikeInterface.so.21.0git  lib/libMLIRPDLInterpDialect.so.21.0git  lib/libMLIRPDLDialect.so.21.0git  lib/libLLVMFrontendOpenMP.so.21.0git  lib/libLLVMTransformUtils.so.21.0git  lib/libMLIRLLVMDialect.so.21.0git  lib/libMLIRInferTypeOpInterface.so.21.0git  lib/libMLIRControlFlowInterfaces.so.21.0git  lib/libMLIRDataLayoutInterfaces.so.21.0git  lib/libMLIRFunctionInterfaces.so.21.0git  lib/libMLIRCallInterfaces.so.21.0git  lib/libMLIRMemorySlotInterfaces.so.21.0git  lib/libMLIRSideEffectInterfaces.so.21.0git  lib/libMLIRIR.so.21.0git  lib/libLLVMBitWriter.so.21.0git  lib/libLLVMAnalysis.so.21.0git  lib/libLLVMAsmParser.so.21.0git  lib/libLLVMBitReader.so.21.0git  lib/libMLIRSupport.so.21.0git  lib/libLLVMCore.so.21.0git  lib/libLLVMRemarks.so.21.0git  lib/libLLVMBinaryFormat.so.21.0git  lib/libLLVMTargetParser.so.21.0git  lib/libLLVMSupport.so.21.0git  -Wl,-rpath-link,/home/tcwg-buildbot/worker/flang-aarch64-sharedlibs/build/lib && :
/usr/bin/ld: tools/flang/lib/Optimizer/Dialect/CMakeFiles/FIRDialect.dir/FIROps.cpp.o: in function `fir::CharBoxValue::dump() const':
FIROps.cpp:(.text._ZNK3fir12CharBoxValue4dumpEv[_ZNK3fir12CharBoxValue4dumpEv]+0x20): undefined reference to `fir::operator<<(llvm::raw_ostream&, fir::CharBoxValue const&)'
/usr/bin/ld: tools/flang/lib/Optimizer/Dialect/CMakeFiles/FIRDialect.dir/FIROps.cpp.o: in function `fir::PolymorphicValue::dump() const':
FIROps.cpp:(.text._ZNK3fir16PolymorphicValue4dumpEv[_ZNK3fir16PolymorphicValue4dumpEv]+0x20): undefined reference to `fir::operator<<(llvm::raw_ostream&, fir::PolymorphicValue const&)'
/usr/bin/ld: tools/flang/lib/Optimizer/Dialect/CMakeFiles/FIRDialect.dir/FIROps.cpp.o: in function `fir::ArrayBoxValue::dump() const':
FIROps.cpp:(.text._ZNK3fir13ArrayBoxValue4dumpEv[_ZNK3fir13ArrayBoxValue4dumpEv]+0x20): undefined reference to `fir::operator<<(llvm::raw_ostream&, fir::ArrayBoxValue const&)'
/usr/bin/ld: tools/flang/lib/Optimizer/Dialect/CMakeFiles/FIRDialect.dir/FIROps.cpp.o: in function `fir::CharArrayBoxValue::dump() const':
FIROps.cpp:(.text._ZNK3fir17CharArrayBoxValue4dumpEv[_ZNK3fir17CharArrayBoxValue4dumpEv]+0x20): undefined reference to `fir::operator<<(llvm::raw_ostream&, fir::CharArrayBoxValue const&)'
/usr/bin/ld: tools/flang/lib/Optimizer/Dialect/CMakeFiles/FIRDialect.dir/FIROps.cpp.o: in function `fir::ProcBoxValue::dump() const':
FIROps.cpp:(.text._ZNK3fir12ProcBoxValue4dumpEv[_ZNK3fir12ProcBoxValue4dumpEv]+0x20): undefined reference to `fir::operator<<(llvm::raw_ostream&, fir::ProcBoxValue const&)'
/usr/bin/ld: tools/flang/lib/Optimizer/Dialect/CMakeFiles/FIRDialect.dir/FIROps.cpp.o: in function `fir::BoxValue::dump() const':
FIROps.cpp:(.text._ZNK3fir8BoxValue4dumpEv[_ZNK3fir8BoxValue4dumpEv]+0x20): undefined reference to `fir::operator<<(llvm::raw_ostream&, fir::BoxValue const&)'
/usr/bin/ld: tools/flang/lib/Optimizer/Dialect/CMakeFiles/FIRDialect.dir/FIROps.cpp.o: in function `fir::MutableBoxValue::dump() const':
FIROps.cpp:(.text._ZNK3fir15MutableBoxValue4dumpEv[_ZNK3fir15MutableBoxValue4dumpEv]+0x20): undefined reference to `fir::operator<<(llvm::raw_ostream&, fir::MutableBoxValue const&)'
/usr/bin/ld: tools/flang/lib/Optimizer/Dialect/CMakeFiles/FIRDialect.dir/FIROps.cpp.o: in function `fir::ExtendedValue::dump() const':
FIROps.cpp:(.text._ZNK3fir13ExtendedValue4dumpEv[_ZNK3fir13ExtendedValue4dumpEv]+0x18): undefined reference to `fir::operator<<(llvm::raw_ostream&, fir::ExtendedValue const&)'
clang++: error: linker command failed with exit code 1 (use -v to see invocation)
```
2025-04-09 15:41:45 +01:00
Asher Mancinelli
e42f860985
[flang][nfc] Support volatility in Fir ops (#134858)
Part two of merging #132486. Support volatility in fir ops.

* Introduce a new operation fir.volatile_cast, whose only purpose is to
add or take away the volatility of an SSA value's type. The types must
be otherwise identical, and any other type conversions must be handled
by fir.convert. fir.convert will give an error if the volatility of the
inputs does not match, such that all changes to volatility must be
handled explicitly through fir.volatile_cast.
* Add memory effects to ops that read from or write to memory. The
precedent for this comes from the LLVM dialect (feb7beaf70) where
llvm.load/store ops with the volatile attribute report read/write
effects to a generic memory resource. This change is similar in spirit
but different in two ways: the volatility of an operation is determined
by the type of its memref, not an attribute on the op, and the memory
effects of a load- or store-like operation on a volatile reference type
are reported against a particular memory resource,
`VolatileMemoryResource`. This is so MLIR optimizations are able to
reorder operations that are not volatile around operations that are,
which we believe more precisely models LLVM's volatile memory semantics.
@vzakhari suggested this in #132486 citing LangRef. See
https://llvm.org/docs/LangRef.html#volatile-memory-accesses

Changes needed to generate IR with volatile types are not included in
this change, so it should be non-functional, containing only the changes
to Fir ops and op utilities that will be needed once we enable lowering
to generate volatile types.
2025-04-09 05:55:24 -07:00
Asher Mancinelli
b2711e1526
[flang][nfc] Support volatile on ref, box, and class types (#134386)
Part one of merging #132486. Add support for representing volatility in
the type system for reference, box, and class types. Don't do anything
with volatile just yet, only support and test their representation and
utility functions.

The naming convention is a little goofy - `fir::isa_volatile_type` and
`fir::updateTypeWithVolatility` use different capitalization, but I put
them near similar functions and tried to match the surrounding
conventions and [the
docs](https://github.com/llvm/llvm-project/blob/main/flang/docs/C%2B%2Bstyle.md#naming)
best I could.
2025-04-07 06:51:02 -07:00
Valentin Clement (バレンタイン クレメン)
18ff8df958
[flang][cuda] Register managed variables with double descriptor (#134444)
Allocatable or pointer module variables with the CUDA managed attribute
are defined with a double descriptor. One on the host and one on the
device. Only the data pointed to by the descriptor will be allocated in
managed memory.
Allow the registration of any allocatable or pointer module variables
like device or constant.
2025-04-04 14:38:01 -07:00
Slava Zakharin
5f268d04f9
[flang] Code generation for fir.pack/unpack_array. (#132080)
The code generation relies on `ShallowCopyDirect` runtime
to copy data between the original and the temporary arrays
(both directions). The allocations are done by the compiler
generated code. The heap allocations could have been passed
to `ShallowCopy` runtime, but I decided to expose the allocations
so that the temporary descriptor passed to `ShallowCopyDirect`
has `nocapture` - maybe this will be better for LLVM optimizations.
2025-03-31 11:42:17 -07:00
Bruno Cardoso Lopes
7c3ecffe9b
[MLIR][LLVMIR] Add support for the full form of global_{ctor,dtor} (#133176)
Currently only ctor/dtor list and their priorities are supported. This
PR adds support for the missing data field.

Few implementation notes:
- The assembly printer has a fixed form because previous `attr_dict`
will sort the dict by key name, making global_dtor and global_ctor
differ in the order of printed arguments.
- LLVM's `ptr null` is being converted to `#llvm.zero` otherwise we'd
have to create a region to use the default operation conversion from
`ptr null`, which is silly given that the field only support null or a
symbol.
2025-03-27 14:11:05 -07:00
Valentin Clement (バレンタイン クレメン)
e6dda9c23a
[flang][cuda] Only create shared memory global when needed (#132999) 2025-03-26 09:26:50 -07:00
Valentin Clement (バレンタイン クレメン)
5be9082fed
[flang][cuda] Carry over the dynamic shared memory size to gpu.launch_func (#132837) 2025-03-24 18:37:19 -07:00
Sergio Afonso
b231f6f862
[MLIR][OpenMP] Improve omp.map.info verification (#132066)
This patch makes the `map_type` and `map_capture_type` arguments of the
`omp.map.info` operation required, which was already an invariant being
verified by its users via `verifyMapClause()`. This makes it clearer, as
getters no longer return misleading `std::optional` values.

Checks for the `mapper_id` argument are moved to a verifier for the
operation, rather than being checked by users.

Functionally NFC, but not marked as such due to a reordering of
arguments in the assembly format of `omp.map.info`.
2025-03-20 15:48:45 +00:00
Slava Zakharin
e0bcf3aa0b
[flang] Allow no type parameters for fir.pack_array. (#131662)
Arrays with assumed-length types are represented with a box
without explicit length parameters. This patch fixes the verification
to allow it for `fir.pack_array`.
2025-03-18 07:59:04 -07:00
Kareem Ergawy
1094ffcafb
[flang][fir] Add MLIR op for do concurrent (#130893)
Adds new MLIR ops to model `do concurrent`. In order to make `do
concurrent` representation self-contained, a loop is modeled using 2
ops, one wrapper and one that contains the actual body of the loop. For
example, a 2D `do concurrent` loop is modeled as follows:

```mlir
  fir.do_concurrent {
    %i = fir.alloca i32
    %j = fir.alloca i32
    fir.do_concurrent.loop
      (%i_iv, %j_iv) = (%i_lb, %j_lb) to (%i_ub, %j_ub) step (%i_st, %j_st) {
      %0 = fir.convert %i_iv : (index) -> i32
      fir.store %0 to %i : !fir.ref<i32>

      %1 = fir.convert %j_iv : (index) -> i32
      fir.store %1 to %j : !fir.ref<i32>
    }
  }
```

The `fir.do_concurrent` wrapper op encapsulates both the actual loop and
the allocations required for the iteration variables. The
`fir.do_concurrent.loop` op is a multi-dimensional op that contains the
loop control and body. See the ops' docs for more info.
2025-03-18 10:53:44 +01:00
Valentin Clement (バレンタイン クレメン)
e5ec7bb21b
[flang][cuda] Set correct offsets for multiple variables in dynamic shared memory (#131674) 2025-03-17 17:13:06 -07:00
Valentin Clement (バレンタイン クレメン)
74d4fc0a3e
[flang][cuda][NFC] Use ssa value for offset in shared memory op (#131661)
Switch from attribute to a value as we need to support dynamic offset
when multiple variables are used with dynamic shared memory.
2025-03-17 14:23:34 -07:00
Valentin Clement (バレンタイン クレメン)
e86081b6c2
[flang][cuda] Convert cuf.shared_memory operation to LLVM ops (#131396)
Convert the operation to `llvm.addressof` operation with
`llvm.getelementptr` with the appropriate offset.
2025-03-14 19:34:55 -07:00
Valentin Clement (バレンタイン クレメン)
4fb20b85fd
[flang][cuda] Compute offset on cuf.shared_memory ops (#131395)
Add a pass to compute the size of the shared memory (static shared
memory) and the offsets of each variables to be placed in shared memory.
The global representing the shared memory is also created during this
pass.

In case of dynamic shared memory, the global as a type of
`!fir.array<0xi8>` and the size of the memory is set at kernel launch.
2025-03-14 19:34:35 -07:00
Valentin Clement (バレンタイン クレメン)
4818623924
[flang][cuda] Add cuf.shared_memory operation (#131392)
Introduce `cuf.shared_memory` operation. The operation is used to get
the pointer in shared memory for a specific variable. The shared memory
is materialized as a global in address space 3 and the different
variables are pointing to it at different offset.

Follow up patches will add lowering and conversion of this operation.
2025-03-14 15:43:25 -07:00
Valentin Clement (バレンタイン クレメン)
a862b6deae
[flang][cuda] Lower shared global to the correct NVVM address space (#131368)
Global with the CUDA shared data attribute needs to be lowered to llvm
globals with the correct address space (3). Address space is set from
the `mlir::NVVM::NVVMMemorySpace::kSharedMemorySpace` enum from
`mlir/Dialect/LLVMIR/NVVMDialect.h`
2025-03-14 15:28:32 -07:00
Slava Zakharin
00f9c855fb
[flang] Added fir.is_contiguous_box and fir.box_total_elements ops. (#131047)
These are helper operations to aid with expanding of fir.pack_array.
2025-03-14 08:25:05 -07:00
Kareem Ergawy
b003face11
[flang][OpenMP] Add OutlineableOpenMPOpInterface to omp.teams (#131109)
Given the following input:
```fortran
program rep_loopbind
  implicit none
  integer :: i
  real :: priv_val

  !$omp teams private(priv_val)
    !$omp distribute
    do i=1,1000
    end do
  !$omp end teams
end program
```
the `AllocaOpConversion` pattern in `FIRToLLVMLowering` would **move**
the private allocations that belong to the `teams` directive (i.e. the
allocations needed for the private copies of `priv_val` and the loop's
iteration variable) from the the `omp.teams` op to the outside scope.

This is not correct since these allocations should be eventually emitted
inside the outlined region for the `teams` directive. Without this fix,
these allocation would be emitted in the parent function (or the parent
scope whatever it is).
2025-03-13 16:03:19 +01:00
Asher Mancinelli
982527eef0
[flang] Use saturated intrinsics for floating point to integer conversions (#130686)
The saturated floating point conversion intrinsics match the semantics in the standard more closely than the fptosi/fptoui instructions.

Case 2 of 16.9.100 is

> INT (A [, KIND])
> If A is of type real, there are two cases: if |A| < 1, INT (A) has the
value 0; if |A| ≥ 1, INT (A) is the integer whose magnitude is the
largest integer that does not exceed the magnitude of A and whose sign
is the same as the sign of A.

Currently, converting a floating point value into an integer type too
small to hold the constant will be converted to poison in opt, leaving
us with garbage:

```
> cat t.f90
program main
  real(kind=16)   :: f
  integer(kind=4) :: i
  f=huge(f)
  i=f
  print *, i
end program main

# current upstream
> for i in `seq 10`; do; ./a.out; done
 -862156992
 -1497393344
 -739096768
 -1649494208
 1761228608
 -1959270592
 -746244288
 -1629194432
 -231217344
 382322496
```

With the saturated fptoui/fptosi intrinsics, we get the appropriate
values

```
# mine
> flang -O2 ./t.f90 && ./a.out
 2147483647

> perl -e 'printf "%d\n", (2 ** 31) - 1'
2147483647
```

One notable difference: NaNs being converted to ints will become zero, unlike current flang (and some other compilers). Newer versions of GCC have this behavior.
2025-03-12 08:14:46 -07:00
jeanPerier
15e335f04f
[flang] also set llvm ABI argument attributes on direct calls (#130736)
So far, flang was not setting argument attributes on direct calls
assuming that putting them on the function operation was enough.

It was clarified in
38565da525
that they must be set on both call and functions, even for direct calls.

Crashes have been observed because of the lack of the attribute when
compiling `abs(x)` at `O2` and above on X86-64 for complex(16).
2025-03-12 09:55:05 +01:00
Slava Zakharin
74eba972ca
[flang] Definitions of fir.pack/unpack_array operations. (#130698)
As defined in #127147.
2025-03-11 14:15:29 -07:00
jeanPerier
1ddf18057a
[flang] introduce fir.copy to avoid load store of aggregates (#130289)
Introduce a FIR operation to do memcopy/memmove of compile time constant size types.

This is to avoid requiring derived type copies to done with load/store
which is badly supported in LLVM when the aggregate type is "big" (no
threshold can easily be defined here, better to always avoid them for
fir.type).

This was the root cause of the regressions caused by #114002 which introduced a
load/store of fir.type<> which caused hand/asserts to fire in LLVM on
several benchmarks.

See https://llvm.org/docs/Frontend/PerformanceTips.html#avoid-creating-values-of-aggregate-type
2025-03-11 09:31:03 +01:00
R
1dffe8f364
Reland [flang] In AllocMemOp lowering, convert types for calling malloc on 32-bit (#130386)
Previous PR: https://github.com/llvm/llvm-project/pull/129308

Changes:
* The alloc-32.fir test is now marked as requiring the X86 target.
* Drive-by fixes uncovered when fixing tests involving malloc
2025-03-11 02:01:57 +00:00
R
3121da52aa Revert "[flang] In AllocMemOp lowering, convert types for calling malloc on 32-bit (#129308)"
This reverts commit cf1964af5a461196904b663ede04c26555fcff69.

This causes breakage on all the non-x86 buildbots as they don't have the i686
target enabled. This was missed in pre-commit CI.
2025-03-08 02:42:24 +00:00
R
cf1964af5a
[flang] In AllocMemOp lowering, convert types for calling malloc on 32-bit (#129308)
Although 32-bit targets are currently not officially supported, add a type conversion in the AllocMemOp lowering when calling the `malloc` function on 32-bit targets. This fixes a type mismatch, and this fix makes it easier to potentially support such targets in the future.

This involves making sure the `LLVMTypeConverter` has the necessary information to know the target bit width.

Co-authored-by: Valentin Clement (バレンタイン クレメン) <clementval@gmail.com>
2025-03-08 02:25:17 +00:00
Kelvin Li
83f8721201
[flang] handle passing bind(c) derived type by value for ppc64le and powerpc64-aix (#128780) 2025-03-03 14:43:43 -05:00
jeanPerier
9805854699
[flang][NFC] clean-up fir.field_index legacy usages in tests (#129219)
After #127231, fir.coordinate_of should directly carry the field.

I updated the lowering and codegen tests in #12731, but not the FIR to
FIR tests, which is what this patch is cleaning up.
2025-03-03 10:01:54 +01:00
jeanPerier
a8db1fb9b5
[flang] update fir.coordinate_of to carry the fields (#127231)
This patch updates fir.coordinate_op to carry the field index as
attributes instead of relying on getting it from the fir.field_index
operations defining its operands.

The rational is that FIR currently has a few operations that require
DAGs to be preserved in order to be able to do code generation. This is
the case of fir.coordinate_op, which requires its fir.field operand
producer to be visible.
This makes IR transformation harder/brittle, so I want to update FIR to
get rid if this.

Codegen/printer/parser of fir.coordinate_of and many tests need to be
updated after this change.
2025-02-28 09:50:05 +01:00
Slava Zakharin
0caa8f42be Reland "[flang] Set LLVM specific attributes to fir.call's of Fortran runtime. (#128093)"
This change is inspired by a case in facerec benchmark, where
performance
of scalar code may improve by about 6%@aarch64 due to getting rid of
redundant
loads from Fortran descriptors. These descriptors are corresponding
to subroutine local ALLOCATABLE, SAVE variables. The scalar loop nest
in LocalMove subroutine contains call to Fortran runtime IO functions,
and LLVM globals-aa analysis cannot prove that these calls do not modify
the globalized descriptors with internal linkage.

This patch sets and propagates llvm.memory_effects attribute for
fir.call
operations calling Fortran runtime functions. In particular, it tries
to set the Other memory effect to NoModRef. The Other memory effect
includes accesses to globals and captured pointers, so we cannot set
it for functions taking Fortran descriptors with one exception
for calls where the Fortran descriptor arguments are all null.

As long as different calls to the same Fortran runtime function may have
different attributes, I decided to attach the attributes to the calls
rather than functions. Moreover, attaching the attributes to func.func
will require propagating these attributes to llvm.func, which is not
happening right now.

In addition to llvm.memory_effects, the new pass sets llvm.nosync
and llvm.nocallback attributes that may also help LLVM alias analysis
(e.g. see #127707). These attributes are ignored currently.
I will support them in LLVM IR dialect in a separate patch.

I also added another pass for developers to be able to print
declarations/calls of all Fortran runtime functions that are recognized
by the attributes setting pass. It should help with maintenance
of the LIT tests.
2025-02-24 14:18:17 -08:00
Slava Zakharin
69cc16fb55 Revert "[flang] Set LLVM specific attributes to fir.call's of Fortran runtime. (#128093)"
This reverts commit 36fdeb2aded08a776fcffefa73cb7667e7fc6c2d.
2025-02-24 10:52:53 -08:00
Slava Zakharin
36fdeb2ade
[flang] Set LLVM specific attributes to fir.call's of Fortran runtime. (#128093)
This change is inspired by a case in facerec benchmark, where
performance
of scalar code may improve by about 6%@aarch64 due to getting rid of
redundant
loads from Fortran descriptors. These descriptors are corresponding
to subroutine local ALLOCATABLE, SAVE variables. The scalar loop nest
in LocalMove subroutine contains call to Fortran runtime IO functions,
and LLVM globals-aa analysis cannot prove that these calls do not modify
the globalized descriptors with internal linkage.

This patch sets and propagates llvm.memory_effects attribute for
fir.call
operations calling Fortran runtime functions. In particular, it tries
to set the Other memory effect to NoModRef. The Other memory effect
includes accesses to globals and captured pointers, so we cannot set
it for functions taking Fortran descriptors with one exception
for calls where the Fortran descriptor arguments are all null.

As long as different calls to the same Fortran runtime function may have
different attributes, I decided to attach the attributes to the calls
rather than functions. Moreover, attaching the attributes to func.func
will require propagating these attributes to llvm.func, which is not
happening right now.

In addition to llvm.memory_effects, the new pass sets llvm.nosync
and llvm.nocallback attributes that may also help LLVM alias analysis
(e.g. see #127707). These attributes are ignored currently.
I will support them in LLVM IR dialect in a separate patch.

I also added another pass for developers to be able to print
declarations/calls of all Fortran runtime functions that are recognized
by the attributes setting pass. It should help with maintenance
of the LIT tests.
2025-02-24 09:27:48 -08:00
Valentin Clement (バレンタイン クレメン)
93b2e47f12
[flang][cuda] Avoid assign element mismatch when doing data transfer from a constant (#128252)
Currently when we do a CUDA data transfer from a constant, we embox it
and delegate the assignment to the runtime. When the type of the
constant is not exactly the same as the destination descriptor, the
runtime will emit an assignment mismatch error.

Convert the constant when necessary so the assignment is fine.
2025-02-21 17:46:46 -08:00
David Truby
449f84fea6
[flang] fix AArch64 PCS for struct following pointer (#127802)
Pointers are already handled as taking up a register in the ABI
handling, but the handling for structs was not taking this into account.
This patch changes the struct handling to acknowledge that pointer
arguments take up an integer register.

Fixes #123075
2025-02-21 10:50:52 -08:00
Akash Banerjee
d6ab12c7cc
[MLIR][OpenMP] Add conversion support from FIR to LLVM Dialect for OMP DeclareMapper (#121005)
Add conversion support from FIR to LLVM Dialect for OMP DeclareMapper.

Depends on #121001
2025-02-18 17:47:32 +00:00
Nikita Popov
75c356c488
[MLIR][LLVMIR] Always use TargetFolder in IRBuilder (#126929)
This is a followup to https://github.com/llvm/llvm-project/pull/126745,
generalizing it to always use TargetFolder, including inside function
bodies.

This avoids generating non-canonical constant expressions that can be
folded away.
2025-02-13 08:53:59 +01:00
Razvan Lupusoru
7b473dfe84
[flang][acc] Implement type categorization for FIR types (#126964)
The OpenACC type interfaces have been updated to require that a type
self-identify which type category it belongs to. Ensure that FIR types
are able to provide this self identification.

In addition to implementing the new API, the PointerLikeType interface
attachment was moved to FIROpenACCSupport library like MappableType to
ensure all type interfaces and their implementation are now in the same
spot.
2025-02-12 21:09:59 -08:00
jeanPerier
5836d91845
[flang] add ABI argument attributes in indirect calls (#126896)
Last piece that implements the TODO for sret and byval setting on
indirect calls.

This includes a fix to the codegen last patch. I thought types in in
type attributes were automatically converted in dialect conversion
passes, but that is not the case. The sret and byval type needs to be
converted to llvm types in codegen (mlir FuncOp conversion is doing a
similar conversion).
2025-02-12 17:31:34 +01:00
Nikita Popov
c03325cead
[MLIR][LLVMIR] Use TargetFolder when creating globals (#126745)
The LLVM dialect lowers globals using IRBuilder, relying on it creating
constant expressions where possible. As we remove support for more
constant expressions (per
https://discourse.llvm.org/t/rfc-remove-most-constant-expressions/63179),
this can cause issues for cases where the constant expression is no
longer supported, and the operation cannot be constant folded without
DataLayout being available. In particular, I ran into this issue with
flang and the removal of mul constant expressions.

Address this by using TargetFolder when creating globals, which will
perform DL-aware constant folding. I think it would make sense to also
do this in general, but I'm starting with globals where not doing this
can result in translation failures.

Ideally, globals with these problematic expressions would never be
generated in the first place, but there has been little movement on
fixing this (https://github.com/llvm/llvm-project/issues/96047).
2025-02-12 10:14:00 +01:00
jeanPerier
65075a863b
[flang][FIR] handle argument attributes in fir.call (#126711)
Add pretty printer/parser for fir.call argument/result attributes and
propagate them to llvm.call.

This will allow implementing the TODO about ABI relevant argument
attribute in indirect calls.
2025-02-12 09:49:52 +01:00
klensy
c491cbfe75
[flang][test] Fix filecheck annotation typos (#92387) 2025-02-05 18:24:47 +00:00
Valentin Clement (バレンタイン クレメン)
f1b075df2e
[flang][cuda] Pass the pinned variable in allocate calls (#125310) 2025-02-02 18:05:59 -08:00
Tom Eccles
aeaafce464
[mlir][OpenMP][flang] make private variable allocation implicit in omp.private (#124019)
The intention of this work is to give MLIR->LLVMIR conversion freedom to
control how the private variable is allocated so that it can be
allocated on the stack in ordinary cases or as part of a structure used
to give closure context for tasks which might outlive the current stack
frame. See RFC:

https://discourse.llvm.org/t/rfc-openmp-supporting-delayed-task-execution-with-firstprivate-variables/83084

For example, a privatizer for an integer used to look like
```mlir
  omp.private {type = private} @x.privatizer : !fir.ref<i32> alloc {
  ^bb0(%arg0: !fir.ref<i32>):
    %0 = ... allocate proper memory for the private clone ...
    omp.yield(%0 : !fir.ref<i32>)
  }
```

After this change, allocation become implicit in the operation:
```mlir
  omp.private {type = private} @x.privatizer : i32
```

For more complex types that require initialization after allocation, an
init region can be used:
``` mlir
  omp.private {type = private} @x.privatizer : !some.type init {
  ^bb0(%arg0: !some.pointer<!some.type>, %arg1: !some.pointer<!some.type>):
    // initialize %arg1, using %arg0 as a mold for allocations
    omp.yield(%arg1 : !some.pointer<!some.type>)
  } dealloc {
    ^bb0(%arg0: !some.pointer<!some.type>):
    ... deallocate memory allocated by the init region ...
    omp.yield
  }
```

This patch lays the groundwork for delayed task execution but is not
enough on its own.

After this patch all gfortran tests which previously passed still pass.
There
are the following changes to the Fujitsu test suite:
- 0380_0009 and 0435_0009 are fixed
- 0688_0041 now fails at runtime. This patch is testing firstprivate
variables with tasks. Previously we got lucky with the undefined
behavior and won the race. After these changes we no longer get lucky.
This patch lays the groundwork for a proper fix for this issue.

In flang the lowering re-uses the existing lowering used for reduction
init and dealloc regions.

In flang, before this patch we hit a TODO with the same wording when
generating the copy region for firstprivate polymorphic variables. After
this patch the box-like fir.class is passed by reference into the copy
region, leading to a different path that didn't hit that old TODO but
the generated code still didn't work so I added a new TODO in
DataSharingProcessor.
2025-01-31 09:35:26 +00:00
Valentin Clement (バレンタイン クレメン)
382d3599c2
[flang][cuda] Propagate the data attribute on the converted calls (#124877) 2025-01-29 08:04:22 -08:00
Nikita Popov
29441e4f5f
[IR] Convert from nocapture to captures(none) (#123181)
This PR removes the old `nocapture` attribute, replacing it with the new
`captures` attribute introduced in #116990. This change is
intended to be essentially NFC, replacing existing uses of `nocapture`
with `captures(none)` without adding any new analysis capabilities.
Making use of non-`none` values is left for a followup.

Some notes:
* `nocapture` will be upgraded to `captures(none)` by the bitcode
   reader.
* `nocapture` will also be upgraded by the textual IR reader. This is to
   make it easier to use old IR files and somewhat reduce the test churn in
   this PR.
* Helper APIs like `doesNotCapture()` will check for `captures(none)`.
* MLIR import will convert `captures(none)` into an `llvm.nocapture`
   attribute. The representation in the LLVM IR dialect should be updated
   separately.
2025-01-29 16:56:47 +01:00
Slava Zakharin
0b80491cd5
[flang] Support non-index shape/shift/slice for CG box operations. (#124625)
That is another problem uncovered during hlfir.reshape inlining,
where the shape bits could be any integer type.
This patch adds explicit convertions to `index` type where needed.
2025-01-28 09:38:33 -08:00
ssijaric-nv
16e9601e19
[Flang] Adjust the trampoline size for AArch64 and PPC (#118678)
Set  the trampoline size to match that in compiler-rt/lib/builtins/trampoline_setup.c
and AArch64 and PPC lowering.
2025-01-27 08:02:18 -08:00