llvm-project

mirror of https://github.com/llvm/llvm-project.git synced 2025-04-19 02:26:46 +00:00

Author	SHA1	Message	Date
Kareem Ergawy	5d364481e3	[flang][OpenMP] Upstream first part of `do concurrent` mapping (#126026 ) This PR starts the effort to upstream AMD's internal implementation of `do concurrent` to OpenMP mapping. This replaces #77285 since we extended this WIP quite a bit on our fork over the past year. An important part of this PR is a document that describes the current status downstream, the upstreaming status, and next steps to make this pass much more useful. In addition to this document, this PR also contains the skeleton of the pass (no useful transformations are done yet) and some testing for the added command line options. This looks like a huge PR but a lot of the added stuff is documentation. It is also worth noting that the downstream pass has been validated on https://github.com/BerkeleyLab/fiats. For the CPU mapping, this achived performance speed-ups that match pure OpenMP, for GPU mapping we are still working on extending our support for implicit memory mapping and locality specifiers. PR stack: - https://github.com/llvm/llvm-project/pull/126026 (this PR) - https://github.com/llvm/llvm-project/pull/127595 - https://github.com/llvm/llvm-project/pull/127633 - https://github.com/llvm/llvm-project/pull/127634 - https://github.com/llvm/llvm-project/pull/127635	2025-04-02 09:24:38 +02:00
Valentin Clement (バレンタインクレメン)	ae8dd63681	[flang][cuda] Add interface and lowering for all_sync (#134001 )	2025-04-01 17:59:11 -07:00
Valentin Clement (バレンタインクレメン)	bb179c483a	[flang][rt] Allow ReportFatalUserError to be build on device (#133979 )	2025-04-01 13:50:42 -07:00
Valentin Clement (バレンタインクレメン)	01889de8e9	[flang][device] Enable Stop functions on device build (#133803 ) Update `StopStatement` and `StopStatementText` to be build for the device.	2025-04-01 10:06:45 -07:00
Jean-Didier PAILLEUX	513a91a5f1	[flang/flang-rt] Implement PERROR intrinsic form GNU Extension (#132406 ) Add the implementation of the `PERROR(STRING) ` intrinsic from the GNU Extension to prints on the stderr a newline-terminated error message corresponding to the last system error prefixed by `STRING`. (https://gcc.gnu.org/onlinedocs/gfortran/PERROR.html)	2025-04-01 15:47:54 +02:00
Slava Zakharin	5f268d04f9	[flang] Code generation for fir.pack/unpack_array. (#132080 ) The code generation relies on `ShallowCopyDirect` runtime to copy data between the original and the temporary arrays (both directions). The allocations are done by the compiler generated code. The heap allocations could have been passed to `ShallowCopy` runtime, but I decided to expose the allocations so that the temporary descriptor passed to `ShallowCopyDirect` has `nocapture` - maybe this will be better for LLVM optimizations.	2025-03-31 11:42:17 -07:00
swatheesh-mcw	fe30cf18ab	Revert "Revert "[flang][openmp] Adds Parser and Semantic Support for Interop Construct, and Init and Use Clauses."" (#132343 ) Reverts llvm/llvm-project#132005	2025-03-28 15:21:52 +00:00
Krzysztof Parzyszek	33cd00f8c8	[flang] Use more generic overload for Operation in Traverse (#133305 ) Currently there are two specific overloads: for unary operations, i.e. `Operation<D, R, O>`, and binary ones `Operation<D, R, LO, RO>`. This makes it impossible for a derived class to use a single overload to handle all types of operations: `Operation<D, R, O...>`. Since the base overloads need to be included in the derived class's scope, via `using Base::operator()` either one of the specific overloads will always be a better candidate than the more generic derived one. ``` class MyVisitor : public Traverse<...> { using Traverse<...>::operator(); template <typename D, typename R, typename... O> Result operator()(const Operation<D, R, O...> &op) const { // Will never be used. } }; ``` This patch replaces the two specific overloads for Operation in Traverse with a single generic overload, while preserving the existing functionality, and allowing derived classes to use a single overload as well.	2025-03-28 08:17:31 -05:00
Peter Klausler	3bc8aa7823	[flang] Catch whole assumed-size array as RHS (#132819 ) The right-hand side expression of an intrinsic assignment statement may not be the name of an assumed-size array dummy argument.	2025-03-26 12:09:57 -07:00
Peter Klausler	4ea5aa09de	[flang][NFC] Restore I/O runtime API header name (#132423 ) flang/include/flang/Runtime/io-api.h was changed into io-api-consts.h, then wrapped into a new io-api.h that includes io-api-consts.h, does some redundant includes and declarations, and then declares the prototype of one function, InquiryKeywordHashDecode. Make that function static in io-stmt.cpp prior to its sole call site, then undo the renaming, to reduce confusion and redundancy.	2025-03-26 12:09:16 -07:00
Valentin Clement (バレンタインクレメン)	e6dda9c23a	[flang][cuda] Only create shared memory global when needed (#132999 )	2025-03-26 09:26:50 -07:00
Kajetan Puchalski	529c5b71c6	[flang] Add -f[no-]slp-vectorize flags (#132801 ) Add -f[no-]slp-vectorize to the flang driver. Add corresponding -fvectorize-slp to the flang frontend. Enable -fslp-vectorize at -O2 and higher in flang to match the current behaviour in clang. --------- Signed-off-by: Kajetan Puchalski <kajetan.puchalski@arm.com>	2025-03-26 16:10:35 +00:00
Eugene Epshteyn	2c8e26081f	[flang] Add HOSTNM runtime and lowering intrinsics implementation (#131910 ) Implement GNU extension intrinsic HOSTNM, both function and subroutine forms. Add HOSTNM documentation to `flang/docs/Intrinsics.md`. Add lowering and semantic unit tests. (This change is modeled after GETCWD implementation.)	2025-03-25 13:17:17 -04:00
vdonaldson	92e0560347	[flang] ieee_denorm (#132307 ) Add support for the nonstandard ieee_denorm exception for real kinds 3, 4, 8 on x86 processors.	2025-03-25 13:02:43 -04:00
Joseph Huber	ef2735d243	[Flang] Detect endianness in the preprocessor (#132767 ) Summary: Currently we use `TestBigEndian` in CMake to determine endianness. This doesn't work on all platforms and is deprecated since CMake 3.20. Instead of using CMake, we can just use the GNU/Clang preprocessor definitions. The only difficulty is MSVC, mostly because they don't support the same macros. But, as far as I'm aware, MSVC / Windows targets are always little endian, and if not we can just override it for that specific target in the future.	2025-03-24 18:29:05 -05:00
Krzysztof Parzyszek	c221d64206	[flang] Remove mentions of evaluate::Variable<T> (#132805 ) The template itself was not defined anywhere. The closest thing was a forward declaration in flang/include/flang/Evaluate/variable.h.	2025-03-24 18:26:57 -05:00
Kareem Ergawy	6328506536	[flang][fir] Add rewrite pattern to convert `fir.do_concurrent` to `fir.do_loop` (#132207 ) Rewrites `fir.do_concurrent` ops to a corresponding nest of `fir.do_loop ... unordered` ops.	2025-03-24 12:09:32 +01:00
Krzysztof Parzyszek	68180d8d16	[flang][OpenMP] Use OmpDirectiveSpecification in standalone directives (#131163 ) This uses OmpDirectiveSpecification in the rest of the standalone directives.	2025-03-20 06:50:43 -05:00
Peter Klausler	6b9716b7f4	[flang] Catch bad usage case of whole assumed-size array (#132052 ) Whole assumed-size arrays are generally not allowed outside specific contexts, where expression analysis notes that they can appear. But contexts can nest, and in the case of an actual argument that turns out to be an array constructor, the permission to use a whole assumed-size array must be rescinded. Fixes https://github.com/llvm/llvm-project/issues/131909.	2025-03-19 12:02:34 -07:00
Peter Klausler	9f284e1784	[flang] Disabling REAL kinds must also disable their COMPLEX (#131353 ) When disabling kinds of REAL in the TargetCharacteristics, one must also disable the corresponding kinds of COMPLEX. Fixes https://github.com/llvm/llvm-project/issues/131088.	2025-03-19 12:00:51 -07:00
Peter Klausler	587f997db7	[flang] Catch C15104(4) violations when coindexing is present (#130677 ) The value of a structure constructor component can't have a pointer ultimate component if it is a coindexed designator.	2025-03-19 11:58:59 -07:00
Krzysztof Parzyszek	cd26dd5595	[flang][OpenMP] Use OmpDirectiveSpecification in simple directives (#131162 ) The `OmpDirectiveSpecification` contains directive name, the list of arguments, and the list of clauses. It was introduced to store the directive specification in METADIRECTIVE, and could be reused everywhere a directive representation is needed. In the long term this would unify the handling of common directive properties, as well as creating actual constructs from METADIRECTIVE by linking the contained directive specification with any associated user code.	2025-03-19 11:34:40 -05:00
Kiran Chandramohan	96b112fb61	Revert "[flang][openmp] Adds Parser and Semantic Support for Interop Construct, and Init and Use Clauses." (#132005 ) Reverts llvm/llvm-project#120584 Reverting due to CI failure https://lab.llvm.org/buildbot/#/builders/157/builds/22946	2025-03-19 11:13:52 +00:00
swatheesh-mcw	ee8a759bfb	[flang][openmp] Adds Parser and Semantic Support for Interop Construct, and Init and Use Clauses. (#120584 ) Adds Parser and Semantic Support for the below construct and clauses: - Interop Construct - Init Clause - Use Clause Note: The other clauses supported by Interop Construct such as Destroy, Use, Depend and Device are added already.	2025-03-19 10:49:17 +00:00
Slava Zakharin	fd0e20a64b	[flang] Generate fir.pack/unpack_array in Lowering. (#131704 ) Basic generation of array repacking operations in Lowering.	2025-03-18 21:26:33 -07:00
Slava Zakharin	7d7b58bc5d	[flang-rt] Added ShallowCopy API. (#131702 ) This API will be used for copying non-contiguous arrays into contiguous temporaries to support `-frepack-arrays`. The builder factory API will be used in the following commits.	2025-03-18 12:58:25 -07:00
Kareem Ergawy	1094ffcafb	[flang][fir] Add MLIR op for `do concurrent` (#130893 ) Adds new MLIR ops to model `do concurrent`. In order to make `do concurrent` representation self-contained, a loop is modeled using 2 ops, one wrapper and one that contains the actual body of the loop. For example, a 2D `do concurrent` loop is modeled as follows: ```mlir fir.do_concurrent { %i = fir.alloca i32 %j = fir.alloca i32 fir.do_concurrent.loop (%i_iv, %j_iv) = (%i_lb, %j_lb) to (%i_ub, %j_ub) step (%i_st, %j_st) { %0 = fir.convert %i_iv : (index) -> i32 fir.store %0 to %i : !fir.ref<i32> %1 = fir.convert %j_iv : (index) -> i32 fir.store %1 to %j : !fir.ref<i32> } } ``` The `fir.do_concurrent` wrapper op encapsulates both the actual loop and the allocations required for the iteration variables. The `fir.do_concurrent.loop` op is a multi-dimensional op that contains the loop control and body. See the ops' docs for more info.	2025-03-18 10:53:44 +01:00
Valentin Clement (バレンタインクレメン)	74d4fc0a3e	[flang][cuda][NFC] Use ssa value for offset in shared memory op (#131661 ) Switch from attribute to a value as we need to support dynamic offset when multiple variables are used with dynamic shared memory.	2025-03-17 14:23:34 -07:00
Valentin Clement (バレンタインクレメン)	4fb20b85fd	[flang][cuda] Compute offset on cuf.shared_memory ops (#131395 ) Add a pass to compute the size of the shared memory (static shared memory) and the offsets of each variables to be placed in shared memory. The global representing the shared memory is also created during this pass. In case of dynamic shared memory, the global as a type of `!fir.array<0xi8>` and the size of the memory is set at kernel launch.	2025-03-14 19:34:35 -07:00
Valentin Clement (バレンタインクレメン)	4818623924	[flang][cuda] Add cuf.shared_memory operation (#131392 ) Introduce `cuf.shared_memory` operation. The operation is used to get the pointer in shared memory for a specific variable. The shared memory is materialized as a global in address space 3 and the different variables are pointing to it at different offset. Follow up patches will add lowering and conversion of this operation.	2025-03-14 15:43:25 -07:00
Slava Zakharin	00f9c855fb	[flang] Added fir.is_contiguous_box and fir.box_total_elements ops. (#131047 ) These are helper operations to aid with expanding of fir.pack_array.	2025-03-14 08:25:05 -07:00
jeanPerier	3ff3b29dd6	[flang] lower remaining cases of pointer assignments inside forall (#130772 ) Implement handling of `NULL()` RHS, polymorphic pointers, as well as lower bounds or bounds remapping in pointer assignment inside FORALL. These cases eventually do not require updating hlfir.region_assign, lowering can simply prepare the new descriptor for the LHS inside the RHS region. Looking more closely at the polymorphic cases, there is not need to call the runtime, fir.rebox and fir.embox do handle the dynamic type setting correctly. After this patch, the last remaining TODO is the allocatable assignment inside FORALL, which like some cases here, is more likely an accidental feature given FORALL was deprecated in F2003 at the same time than allocatable components where added.	2025-03-14 10:51:46 +01:00
Valentin Clement (バレンタインクレメン)	57d87ed7f0	[flang][NFC] Add parenthesis to avoid warning (#131219 ) Remove warning introduced in 369da8421c2f7	2025-03-13 14:28:35 -07:00
Valentin Clement (バレンタインクレメン)	369da8421c	[flang][cuda] Allow assumed-size declaration for SHARED variable (#130833 ) Avoid triggering an assertion for shared variable using the assumed-size syntax. ``` attributes(global) subroutine sharedstar() real, shared :: s(*) ! ok. dynamic shared memory. end subroutine ```	2025-03-13 11:06:17 -07:00
Tom Eccles	01aca42363	[flang] Add support for -f[no-]verbose-asm (#130788 ) This flag provides extra commentary in the assembly output.	2025-03-13 15:22:13 +00:00
Mats Petersson	d0188ebcc2	[flang][OpenMP]Add symbls omp_in, omp_out and omp_priv in DECLARE RED… (#129908 ) …UCTION This patch allows better parsing of the reduction and initializer components, including supporting derived types in both those places. There is more work needed here, but this is a definite improvement in what can be handled through parser and semantics. Note that declare reduction is still not supported in lowering, so any attempt to compile DECLARE REDUCTION code will end with a TODO aka "Not yet implemented" abort in the compiler. Note that this version of the code does not cover declaring multiple reductions using the same name with different types. This is will be fixed in a future patch. [This was also the case before this change]. One existing test modified to actually compile (as it didn't in the original form).	2025-03-13 09:39:45 +00:00
Slava Zakharin	74eba972ca	[flang] Definitions of fir.pack/unpack_array operations. (#130698 ) As defined in #127147.	2025-03-11 14:15:29 -07:00
jeanPerier	356bf3fa2d	Reland " [flang] Rely on global initialization for simpler derived types" (#130290 ) Currently, all derived types are initialized through `_FortranAInitialize`, which is functionally correct, but bears poor runtime performance. This patch falls back on global initialization for "simpler" derived types to speed up the initialization. Note: this relands #114002 with the fix for the LLVM timeout regressions that have been seen. The fix is to use the added fir.copy to avoid aggregate load/store. Co-authored-by: NimishMishra <42909663+NimishMishra@users.noreply.github.com>	2025-03-11 15:19:43 +01:00
jeanPerier	1ddf18057a	[flang] introduce fir.copy to avoid load store of aggregates (#130289 ) Introduce a FIR operation to do memcopy/memmove of compile time constant size types. This is to avoid requiring derived type copies to done with load/store which is badly supported in LLVM when the aggregate type is "big" (no threshold can easily be defined here, better to always avoid them for fir.type). This was the root cause of the regressions caused by #114002 which introduced a load/store of fir.type<> which caused hand/asserts to fire in LLVM on several benchmarks. See https://llvm.org/docs/Frontend/PerformanceTips.html#avoid-creating-values-of-aggregate-type	2025-03-11 09:31:03 +01:00
Peter Klausler	c189852218	[flang] Ignore empty keyword macros before directives (#130333 ) Ignore any keyword macros with empty directives that might appear before a compiler directive. Fixes https://github.com/llvm/llvm-project/issues/126459.	2025-03-10 13:21:10 -07:00
Peter Klausler	d53079055e	[flang] Catch coindexed procedure pointer/binding references (#129931 ) A procedure designator cannot be coindexed, except for cases in which the coindexing doesn't matter (i.e. a binding that can't be overridden).	2025-03-10 13:18:07 -07:00
Peter Klausler	53c3a2c69a	[flang] Static checking for empty coarrays (#129610 ) A coarray must not have a zero extent on a codimension; that would yield an empty coarray. When cobounds are constants, verify them.	2025-03-10 13:16:31 -07:00
مهدي شينون (Mehdi Chinoune)	cf5aa559a8	[flang] Don't redefine pid_t on MinGW-w64. (#130288 )	2025-03-10 17:27:47 +00:00
Krzysztof Parzyszek	5ba7a3bd4c	[flang][OpenMP] Parse cancel-directive-name as clause (#130146 ) The cancellable construct names on CANCEL or CANCELLATION POINT directives are actually clauses (with the same names as the corresponding constructs). Instead of parsing them into a custom structure, parse them as a clause, which will make CANCEL/CANCELLATION POINT follow the same uniform scheme as other constructs (<directive> [(<arguments>)] [clauses]).	2025-03-10 11:58:02 -05:00
Krzysztof Parzyszek	4e453d5292	[flang][OpenMP] Accept old FLUSH syntax in METADIRECTIVE (#130122 ) Accommodate it in OmpDirectiveSpecification, which may become the primary component of the actual FLUSH construct in the future.	2025-03-10 08:12:46 -05:00
Krzysztof Parzyszek	d67947162f	[flang][OpenMP] Implement HAS_DEVICE_ADDR clause (#128568 ) The HAS_DEVICE_ADDR indicates that the object(s) listed exists at an address that is a valid device address. Specifically, `has_device_addr(x)` means that (in C/C++ terms) `&x` is a device address. When entering a target region, `x` does not need to be allocated on the device, or have its contents copied over (in the absence of additional mapping clauses). Passing its address verbatim to the region for use is sufficient, and is the intended goal of the clause. Some Fortran objects use descriptors in their in-memory representation. If `x` had a descriptor, both the descriptor and the contents of `x` would be located in the device memory. However, the descriptors are managed by the compiler, and can be regenerated at various points as needed. The address of the effective descriptor may change, hence it's not safe to pass the address of the descriptor to the target region. Instead, the descriptor itself is always copied, but for objects like `x`, no further mapping takes place (as this keeps the storage pointer in the descriptor unchanged). --------- Co-authored-by: Sergio Afonso <safonsof@amd.com>	2025-03-10 08:11:01 -05:00
Kajetan Puchalski	0c7e895de3	[flang] Move parser invocations into ParserActions (#130309 ) FrontendActions.cpp is currently one of the biggest compilation units in all of flang. Measuring its compilation gives the following metrics: User time (seconds): 139.21 System time (seconds): 4.65 Maximum resident set size (kbytes): 5891440 (5.61 GB) This commit separates out explicit invocations of the parser into a separate compilation unit - ParserActions.cpp - through helper functions in order to decrease the maximum compilation time and memory usage of a single unit. After the split, the measurements of FrontendActions.cpp are as follows: User time (seconds): 70.08 System time (seconds): 3.16 Maximum resident set size (kbytes): 3961492 (3.7 GB) While the ones for the newly created ParserActions.cpp as follows: User time (seconds): 104.33 System time (seconds): 3.37 Maximum resident set size (kbytes): 4185600 (3.99 GB) --------- Signed-off-by: Kajetan Puchalski <kajetan.puchalski@arm.com>	2025-03-10 11:33:47 +00:00
Valentin Clement (バレンタインクレメン)	ae42f07103	[flang][cuda] Allow array pointer for atomicexch and atomiccas (#130363 )	2025-03-07 15:36:08 -08:00
Valentin Clement (バレンタインクレメン)	829e8993e5	[flang][cuda] Lower __LDCA, __LDCS, __LDLU, __LDCV, __LDCG with arrays (#130357 )	2025-03-07 15:35:52 -08:00
Valentin Clement (バレンタインクレメン)	dcda314b6c	[flang][cuda] Fix atmoicxor lowering to accept arrays (#130331 ) The first agrument can be an address of a scalare, an array element or even just the address of the first element of an array. Update lowering to not trigger elemental lowering.	2025-03-07 13:05:42 -08:00

1 2 3 4 5 ...

2529 Commits