llvm-project

mirror of https://github.com/llvm/llvm-project.git synced 2025-04-23 23:46:05 +00:00

Author	SHA1	Message	Date
vdonaldson	8a0f694381	[flang] Legacy ASSIGN statement target processing (#133737 ) Like other target statements, the statement associated with the label in a legacy ASSIGN statement could be inside a construct. Constructs containing such a target must therefore be marked as unstructured, fairly similar to how targets are processed in `markBranchTarget`.	2025-04-02 09:52:13 -04:00
Kareem Ergawy	de6c9096ba	[flang][OpenMP] Handle "loop-local values" in `do concurrent` nests (#127635 ) Extends `do concurrent` mapping to handle "loop-local values". A loop-local value is one that is used exclusively inside the loop but allocated outside of it. This usually corresponds to temporary values that are used inside the loop body for initialzing other variables for example. After collecting these values, the pass localizes them to the loop nest by moving their allocations. PR stack: - https://github.com/llvm/llvm-project/pull/126026 - https://github.com/llvm/llvm-project/pull/127595 - https://github.com/llvm/llvm-project/pull/127633 - https://github.com/llvm/llvm-project/pull/127634 - https://github.com/llvm/llvm-project/pull/127635 (this PR)	2025-04-02 15:43:19 +02:00
مهدي شينون (Mehdi Chinoune)	666df54ea6	[flang] Fold double bessel functions on Windows. (#130253 ) There are no functions for `float`. see: https://learn.microsoft.com/en-us/cpp/c-runtime-library/reference/bessel-functions-j0-j1-jn-y0-y1-yn	2025-04-02 14:43:09 +01:00
Jean-Didier PAILLEUX	c309abd925	[flang] Implement !DIR$ NOVECTOR and !DIR$ NOUNROLL[_AND_JAM] (#133885 ) Hi, This patch implements support for the following directives : - `!DIR$ NOUNROLL_AND_JAM` to disable unrolling and jamming on a DO LOOP. - `!DIR$ NOUNROLL` to disable unrolling on a DO LOOP. - `!DIR$ NOVECTOR` to disable vectorization on a DO LOOP.	2025-04-02 14:30:01 +02:00
Kareem Ergawy	ef56b53712	[flang][OpenMP] Extend `do concurrent` mapping to multi-range loops (#127634 ) Adds support for converting mulit-range loops to OpenMP (on the host only for now). The changes here "prepare" a loop nest for collapsing by sinking iteration variables to the innermost `fir.do_loop` op in the nest. PR stack: - https://github.com/llvm/llvm-project/pull/126026 - https://github.com/llvm/llvm-project/pull/127595 - https://github.com/llvm/llvm-project/pull/127633 - https://github.com/llvm/llvm-project/pull/127634 (this PR) - https://github.com/llvm/llvm-project/pull/127635	2025-04-02 12:43:04 +02:00
Tom Eccles	9b2fd1a6ec	[flang][OpenMP] Bump default OpenMP version to 3.1 (#133745 ) Precise OpenMP standards support information is being documented in #132707 Flang now has good support for OpenMP Version 3.1 and earlier.	2025-04-02 10:43:48 +01:00
Kareem Ergawy	3f8bfc9f7f	[flang][OpenMP] Map simple `do concurrent` loops to OpenMP host constructs (#127633 ) Upstreams one more part of the ROCm `do concurrent` to OpenMP mapping pass. This PR add support for converting simple loops to the equivalent OpenMP constructs on the host: `omp parallel do`. Towards that end, we have to collect more information about loop nests for which we add new utils in the `looputils` name space. PR stack: - https://github.com/llvm/llvm-project/pull/126026 - https://github.com/llvm/llvm-project/pull/127595 - https://github.com/llvm/llvm-project/pull/127633 (this PR) - https://github.com/llvm/llvm-project/pull/127634 - https://github.com/llvm/llvm-project/pull/127635	2025-04-02 11:26:58 +02:00
Kareem Ergawy	41d718b1cf	[flang][OpenMP] Upstream `do concurrent` loop-nest detection. (#127595 ) Upstreams the next part of do concurrent to OpenMP mapping pass (from AMD's ROCm implementation). See https://github.com/llvm/llvm-project/pull/126026 for more context. This PR add loop nest detection logic. This enables us to discover muli-range do concurrent loops and then map them as "collapsed" loop nests to OpenMP. This is a follow up for https://github.com/llvm/llvm-project/pull/126026, only the latest commit is relevant. This is a replacement for https://github.com/llvm/llvm-project/pull/127478 using a `/user/<username>/<branchname>` branch. PR stack: - https://github.com/llvm/llvm-project/pull/126026 - https://github.com/llvm/llvm-project/pull/127595 (this PR) - https://github.com/llvm/llvm-project/pull/127633 - https://github.com/llvm/llvm-project/pull/127634 - https://github.com/llvm/llvm-project/pull/127635	2025-04-02 10:12:52 +02:00
Kareem Ergawy	5d364481e3	[flang][OpenMP] Upstream first part of `do concurrent` mapping (#126026 ) This PR starts the effort to upstream AMD's internal implementation of `do concurrent` to OpenMP mapping. This replaces #77285 since we extended this WIP quite a bit on our fork over the past year. An important part of this PR is a document that describes the current status downstream, the upstreaming status, and next steps to make this pass much more useful. In addition to this document, this PR also contains the skeleton of the pass (no useful transformations are done yet) and some testing for the added command line options. This looks like a huge PR but a lot of the added stuff is documentation. It is also worth noting that the downstream pass has been validated on https://github.com/BerkeleyLab/fiats. For the CPU mapping, this achived performance speed-ups that match pure OpenMP, for GPU mapping we are still working on extending our support for implicit memory mapping and locality specifiers. PR stack: - https://github.com/llvm/llvm-project/pull/126026 (this PR) - https://github.com/llvm/llvm-project/pull/127595 - https://github.com/llvm/llvm-project/pull/127633 - https://github.com/llvm/llvm-project/pull/127634 - https://github.com/llvm/llvm-project/pull/127635	2025-04-02 09:24:38 +02:00
Valentin Clement (バレンタインクレメン)	ae8dd63681	[flang][cuda] Add interface and lowering for all_sync (#134001 )	2025-04-01 17:59:11 -07:00
Andre Kuhlenschmidt	b6edd25f17	[flang][intrinsics] NFC: make comment consistent (#133972 ) Just makes this named argument comment consistent with all the others in the file.	2025-04-01 14:30:10 -07:00
Valentin Clement (バレンタインクレメン)	bb179c483a	[flang][rt] Allow ReportFatalUserError to be build on device (#133979 )	2025-04-01 13:50:42 -07:00
Valentin Clement (バレンタインクレメン)	01889de8e9	[flang][device] Enable Stop functions on device build (#133803 ) Update `StopStatement` and `StopStatementText` to be build for the device.	2025-04-01 10:06:45 -07:00
Slava Zakharin	58551faaf1	[flang] Inline fir.is_contiguous_box in some cases. (#133812 ) Added inlining for `rank == 1` and `innermost` cases.	2025-04-01 08:41:11 -07:00
Jean-Didier PAILLEUX	513a91a5f1	[flang/flang-rt] Implement PERROR intrinsic form GNU Extension (#132406 ) Add the implementation of the `PERROR(STRING) ` intrinsic from the GNU Extension to prints on the stderr a newline-terminated error message corresponding to the last system error prefixed by `STRING`. (https://gcc.gnu.org/onlinedocs/gfortran/PERROR.html)	2025-04-01 15:47:54 +02:00
Jeremy Morse	1ebc308bba	[DebugInfo][RemoveDIs] Remove debug-intrinsic printing cmdline options (#131855 ) During the transition from debug intrinsics to debug records, we used several different command line options to customise handling: the printing of debug records to bitcode and textual could be independent of how the debug-info was represented inside a module, whether the autoupgrader ran could be customised. This was all valuable during development, but now that totally removing debug intrinsics is coming up, this patch removes those options in favour of a single flag (experimental-debuginfo-iterators), which enables autoupgrade, in-memory debug records, and debug record printing to bitcode and textual IR. We need to do this ahead of removing the experimental-debuginfo-iterators flag, to reduce the amount of test-juggling that happens at that time. There are quite a number of weird test behaviours related to this -- some of which I simply delete in this commit. Things like print-non-instruction-debug-info.ll , the test suite now checks for debug records in all tests, and we don't want to check we can print as intrinsics. Or the update_test_checks tests -- these are duplicated with write-experimental-debuginfo=false to ensure file writing for intrinsics is correct, but that's something we're imminently going to delete. A short survey of curious test changes: * free-intrinsics.ll: we don't need to test that debug-info is a zero cost intrinsic, because we won't be using intrinsics in the future. * undef-dbg-val.ll: apparently we pinned this to non-RemoveDIs in-memory mode while we sorted something out; it works now either way. * salvage-cast-debug-info.ll: was testing intrinsics-in-memory get salvaged, isn't necessary now * localize-constexpr-debuginfo.ll: was producing "dead metadata" intrinsics for optimised-out variable values, dbg-records takes the (correct) representation of poison/undef as an operand. Looks like we didn't update this in the past to avoid spurious test differences. * Transforms/Scalarizer/dbginfo.ll: this test was explicitly testing that debug-info affected codegen, and we deferred updating the tests until now. This is just one of those silent gnochange issues that get fixed by RemoveDIs. Finally: I've added a bitcode test, dbg-intrinsics-autoupgrade.ll.bc, that checks we can autoupgrade debug intrinsics that are in bitcode into the new debug records.	2025-04-01 14:27:11 +01:00
Tom Eccles	e17d864f55	[flang][OpenMP][Lower] lower array subscripts for task depend (#132994 ) The OpenMP standard says that all dependencies in the same set of inter-dependent tasks must be non-overlapping. This simplification means that the OpenMP only needs to keep track of the base addresses of dependency variables. This can be seen in kmp_taskdeps.cpp, which stores task dependency information in a hash table, using the base address as a key. This patch generates a rebox operation to slice boxed arrays, but only the box data address is used for the task dependency. The extra box is optimized away by LLVM at O3. Vector subscripts are TODO (I will address in my next patch). This also fixes a bug for ordinary subscripts when the symbol was mapped to a box: Fixes #132647	2025-04-01 10:26:14 +01:00
Jean-Didier PAILLEUX	bae3577002	[flang] Define ERF, ERFC and ERFC_SCALED intrinsics with Q and D prefix (#125217 ) `ERF`, `ERFC` and `ERFC_SCALED` intrinsics prefixed by `Q` and `D` are missing. Codes such as `CP2K`(https://github.com/cp2k/cp2k) and `TurboRVB`(https://github.com/sissaschool/turborvb) use these intrinsics just like defined in the GNU standard and here: https://www.ibm.com/docs/fr/xl-fortran-aix/16.1.0?topic=reference-intrinsic-procedures These intrinsics are based on the existing intrinsics but apply a restriction on the type kind. - `DERF`, `DERFC` and `DERFC_SCALED` are for double précision only. - `QERF`, `QERFC` and `QERFC_SCALED` are for quad précision only.	2025-04-01 08:07:26 +02:00
Thirumalai Shaktivel	091dcb8fc2	[Flang] Make a private copy for the common block variables in copyin clause (#111359 ) Fixes: https://github.com/llvm/llvm-project/issues/82949	2025-04-01 11:35:44 +05:30
Paul Osmialowski	cb7c223625	[clang][driver] Fix -fveclib=ArmPL issue: with -nostdlib do not link against libm (#133578 ) Although combining -fveclib=ArmPL with -nostdlib is a rare situation, it should still be supported correctly and should effect in avoidance of linking against libm.	2025-03-31 21:55:58 +01:00
Slava Zakharin	5f268d04f9	[flang] Code generation for fir.pack/unpack_array. (#132080 ) The code generation relies on `ShallowCopyDirect` runtime to copy data between the original and the temporary arrays (both directions). The allocations are done by the compiler generated code. The heap allocations could have been passed to `ShallowCopy` runtime, but I decided to expose the allocations so that the temporary descriptor passed to `ShallowCopyDirect` has `nocapture` - maybe this will be better for LLVM optimizations.	2025-03-31 11:42:17 -07:00
Slava Zakharin	0ac8cb1b3d	[flang] Recognize fir.pack_array in LoopVersioning. (#133191 ) This change enables LoopVersioning when `fir.pack_array` is met in the def-use chain. It fixes a couple of huge performance regressions caused by enabling `-frepack-arrays`.	2025-03-31 11:41:43 -07:00
Thirumalai Shaktivel	374a5bea52	[Flang][OpenMP] Add PointerAssociateScalar to Cray Pointer used in the DSA (#133232 ) Issue: Cray Pointer is not associated to Cray Pointee, leading to Segmentation fault Fix: GetUltimate, retrieves the base symbol in the current scope, which gets passed all the references and returns the original symbol --------- Co-authored-by: Michael Klemm <michael.klemm@amd.com>	2025-03-29 15:39:12 +01:00
swatheesh-mcw	fe30cf18ab	Revert "Revert "[flang][openmp] Adds Parser and Semantic Support for Interop Construct, and Init and Use Clauses."" (#132343 ) Reverts llvm/llvm-project#132005	2025-03-28 15:21:52 +00:00
Nick Sarnie	48b7530273	[clang][flang][Triple][llvm] Add isOffload function to LangOpts and isGPU function to Triple (#126956 ) I'm adding support for SPIR-V, so let's consolidate these checks. --------- Signed-off-by: Sarnie, Nick <nick.sarnie@intel.com>	2025-03-28 14:19:20 +00:00
Krzysztof Parzyszek	33cd00f8c8	[flang] Use more generic overload for Operation in Traverse (#133305 ) Currently there are two specific overloads: for unary operations, i.e. `Operation<D, R, O>`, and binary ones `Operation<D, R, LO, RO>`. This makes it impossible for a derived class to use a single overload to handle all types of operations: `Operation<D, R, O...>`. Since the base overloads need to be included in the derived class's scope, via `using Base::operator()` either one of the specific overloads will always be a better candidate than the more generic derived one. ``` class MyVisitor : public Traverse<...> { using Traverse<...>::operator(); template <typename D, typename R, typename... O> Result operator()(const Operation<D, R, O...> &op) const { // Will never be used. } }; ``` This patch replaces the two specific overloads for Operation in Traverse with a single generic overload, while preserving the existing functionality, and allowing derived classes to use a single overload as well.	2025-03-28 08:17:31 -05:00
Joseph Huber	772173f548	[Clang][AMDGPU] Remove special handling for COV4 libraries (#132870 ) Summary: When we were first porting to COV5, this lead to some ABI issues due to a change in how we looked up the work group size. Bitcode libraries relied on the builtins to emit code, but this was changed between versions. This prevented the bitcode libraries, like OpenMP or libc, from being used for both COV4 and COV5. The solution was to have this 'none' functionality which effectively emitted code that branched off of a global to resolve to either version. This isn't a great solution because it forced every TU to have this variable in it. The patch in https://github.com/llvm/llvm-project/pull/131033 removed support for COV4 from OpenMP, which was the only consumer of this functionality. Other users like HIP and OpenCL did not use this because they linked the ROCm Device Library directly which has its own handling (The name was borrowed from it after all). So, now that we don't need to worry about backward compatibility with COV4, we can remove this special handling. Users can still emit COV4 code, this simply removes the special handling used to make the OpenMP device runtime bitcode version agnostic.	2025-03-28 07:35:16 -05:00
Jean-Didier PAILLEUX	5b36835df0	[flang] Expose -m64 option (#132409 ) Exposes `-m64` option for Flang. These options can be used to build libraries or tools (e.g. OpenBlas).	2025-03-28 08:15:01 +01:00
Bruno Cardoso Lopes	7c3ecffe9b	[MLIR][LLVMIR] Add support for the full form of global_{ctor,dtor} (#133176 ) Currently only ctor/dtor list and their priorities are supported. This PR adds support for the missing data field. Few implementation notes: - The assembly printer has a fixed form because previous `attr_dict` will sort the dict by key name, making global_dtor and global_ctor differ in the order of printed arguments. - LLVM's `ptr null` is being converted to `#llvm.zero` otherwise we'd have to create a region to use the default operation conversion from `ptr null`, which is silly given that the field only support null or a symbol.	2025-03-27 14:11:05 -07:00
Andre Kuhlenschmidt	077940621d	[flang][openacc] Make OpenACC block construct parse errors less verbose. (#131042 ) This PR does reduces the verbosity of parser errors for OpenACC block constructs that do not parse correctly because they are missing their trailing end block directive by: - Removing the redundant error messages created by parsing 3 different styles of directive tokens. - Providing a general mechanism of configuring the max number of contexts printed for every syntax error. - Not printing less specific contexts that are at the same location. Prior to the changes: ``` $ flang -fc1 -fopenacc -fsyntax-only flang/test/Parser/acc-data-statement.f90 2>&1 \| tee acc-data-statement.prior.log \| wc -l 262 ``` [acc-data-statement.prior.log](https://github.com/user-attachments/files/19298165/acc-data-statement.prior.log) ``` $ flang -fc1 -fopenacc -fsyntax-only flang/test/Parser/acc-data-statement.f90 2>&1 \| tee acc-data-statement.prior.log \| wc -l 73 ``` [acc-data-statement.post.log](https://github.com/user-attachments/files/19298181/acc-data-statement.post.log)	2025-03-26 12:36:04 -07:00
Andre Kuhlenschmidt	3ab70e3f90	[Flang] Change sizeof argument name to "x" (#130189 ) This closes #128610 by fixing the name of the argument to the sizeof function to be "x" and adds a test.	2025-03-26 12:34:36 -07:00
Peter Klausler	3bc8aa7823	[flang] Catch whole assumed-size array as RHS (#132819 ) The right-hand side expression of an intrinsic assignment statement may not be the name of an assumed-size array dummy argument.	2025-03-26 12:09:57 -07:00
Peter Klausler	6df27dd42d	[flang] Fix missed case of symbol renaming in module file generation (#132475 ) The map of symbols requiring new local aliases for USE association needs to use the symbols' ultimate resolutions to avoid missing cases that can arise in convoluted codes with lots of confusing renamings. Fixes https://github.com/llvm/llvm-project/issues/132435.	2025-03-26 12:09:38 -07:00
Peter Klausler	4ea5aa09de	[flang][NFC] Restore I/O runtime API header name (#132423 ) flang/include/flang/Runtime/io-api.h was changed into io-api-consts.h, then wrapped into a new io-api.h that includes io-api-consts.h, does some redundant includes and declarations, and then declares the prototype of one function, InquiryKeywordHashDecode. Make that function static in io-stmt.cpp prior to its sole call site, then undo the renaming, to reduce confusion and redundancy.	2025-03-26 12:09:16 -07:00
Peter Klausler	38207a52a7	[flang] Test SYNC IMAGES, increase checking (#132279 ) Add a test for the SYNC IMAGES statement, and add a check for invalid image numbers.	2025-03-26 12:08:48 -07:00
Peter Klausler	f3991e10bb	[flang] Allow macro replacement in numeric kind suffix (#132120 ) When a numeric value has a kind suffix containing an identifier, allow macro replacement for that identifier by treating it as its own token. Fixes https://github.com/llvm/llvm-project/issues/131548.	2025-03-26 12:08:26 -07:00
Michael Kruse	27539c3f90	Revert "[Flang] Remove FLANG_INCLUDE_RUNTIME (#124126 )" The production buildbot master apparently has not yet been restarted since https://github.com/llvm/llvm-zorg/pull/393 landed. This reverts commit 96d1baedefc3581b53bc4389bb171760bec6f191.	2025-03-26 19:02:13 +01:00
Michael Kruse	96d1baedef	[Flang] Remove FLANG_INCLUDE_RUNTIME (#124126 ) Remove the FLANG_INCLUDE_RUNTIME option which was replaced by LLVM_ENABLE_RUNTIMES=flang-rt. The FLANG_INCLUDE_RUNTIME option was added in #122336 which disables the non-runtimes build instructions for the Flang runtime so they do not conflict with the LLVM_ENABLE_RUNTIMES=flang-rt option added in #110217. In order to not maintain multiple build instructions for the same thing, this PR completely removes the old build instructions (effectively forcing FLANG_INCLUDE_RUNTIME=OFF). As per discussion in https://discourse.llvm.org/t/buildbot-changes-with-llvm-enable-runtimes-flang-rt/83571/2 we now implicitly add LLVM_ENABLE_RUNTIMES=flang-rt whenever Flang is compiled in a bootstrapping (non-standalone) build. Because it is possible to build Flang-RT separately, this behavior can be disabled using `-DFLANG_ENABLE_FLANG_RT=OFF`. Also see the discussion an implicitly adding runtimes/projects in #123964.	2025-03-26 18:50:41 +01:00
Kajetan Puchalski	4cabee35b7	[flang] Fix slp-vectorize.f90 test (#133128 ) The test was missing "-o /dev/null" and inadvertently generating a .ll file in the test directory. Signed-off-by: Kajetan Puchalski <kajetan.puchalski@arm.com>	2025-03-26 17:29:15 +00:00
Valentin Clement (バレンタインクレメン)	e6dda9c23a	[flang][cuda] Only create shared memory global when needed (#132999 )	2025-03-26 09:26:50 -07:00
Kajetan Puchalski	529c5b71c6	[flang] Add -f[no-]slp-vectorize flags (#132801 ) Add -f[no-]slp-vectorize to the flang driver. Add corresponding -fvectorize-slp to the flang frontend. Enable -fslp-vectorize at -O2 and higher in flang to match the current behaviour in clang. --------- Signed-off-by: Kajetan Puchalski <kajetan.puchalski@arm.com>	2025-03-26 16:10:35 +00:00
Slava Zakharin	b022f676fc	[flang] Include needed CMake files. (#133012 ) `FlangCommon.cmake` uses some CMake macros without including the corresponding modules. This change makes it self-sufficient.	2025-03-25 15:39:05 -07:00
Slava Zakharin	613a077b05	[flang] Generate quadmath_wrapper.h for Flang Evaluate. (#132817 ) When building Flang with Clang, we need to do the same quadmath.h wrapping as we do for flang-rt. I extracted the CMake code into FlangCommon.cmake, and cleaned up the arguments passing to execute_process (note that `-###` was treated as `-` in the original code, because `#` starts a comment). I believe the Clang command does not require the input source file, so I removed it as well.	2025-03-25 12:08:38 -07:00
Eugene Epshteyn	2c8e26081f	[flang] Add HOSTNM runtime and lowering intrinsics implementation (#131910 ) Implement GNU extension intrinsic HOSTNM, both function and subroutine forms. Add HOSTNM documentation to `flang/docs/Intrinsics.md`. Add lowering and semantic unit tests. (This change is modeled after GETCWD implementation.)	2025-03-25 13:17:17 -04:00
vdonaldson	92e0560347	[flang] ieee_denorm (#132307 ) Add support for the nonstandard ieee_denorm exception for real kinds 3, 4, 8 on x86 processors.	2025-03-25 13:02:43 -04:00
Valentin Clement (バレンタインクレメン)	5be9082fed	[flang][cuda] Carry over the dynamic shared memory size to gpu.launch_func (#132837 )	2025-03-24 18:37:19 -07:00
Joseph Huber	ef2735d243	[Flang] Detect endianness in the preprocessor (#132767 ) Summary: Currently we use `TestBigEndian` in CMake to determine endianness. This doesn't work on all platforms and is deprecated since CMake 3.20. Instead of using CMake, we can just use the GNU/Clang preprocessor definitions. The only difficulty is MSVC, mostly because they don't support the same macros. But, as far as I'm aware, MSVC / Windows targets are always little endian, and if not we can just override it for that specific target in the future.	2025-03-24 18:29:05 -05:00
Krzysztof Parzyszek	c221d64206	[flang] Remove mentions of evaluate::Variable<T> (#132805 ) The template itself was not defined anywhere. The closest thing was a forward declaration in flang/include/flang/Evaluate/variable.h.	2025-03-24 18:26:57 -05:00
Joseph Huber	85974a0537	[flang-rt] Add experimental support for GPU build (#131826 ) Summary: This patch adds initial support for compiling `flang-rt` directly for the GPU. The method used here matches what's already done for `libc` and `libc++` for the GPU and builds off of those projects. Mainly this requires setting up some flags and setting the sources that currently work. This will deposit the resulting library in the appropriate directory. These files are then intended to be linked via `-Xoffload-linker` support in the offloading driver. ``` lib/clang/21/lib/nvptx64-nvidia-cuda/libflang_rt.runtime.a lib/clang/21/lib/amdgcn-amd-amdhsa/libflang_rt.runtime.a ``` This is obviously missing a lot of functions, mainly the `io` support. Most of what we cannot support is due to using POSIX things that just don't make sense on the GPU. Stuff like `pthreads` or `sema`. Getting unit tests to run on this will also be a challenge. We could run tests the same way we do with `libc`, but the problem there is that the `libc` test suite is freestanding while `gtest` currently doesn't compile on the GPU bcause it uses a lot of weird stuff. If the unit tests were simply `int main` then it would work. I don't understand the actual runtime code very well, I'd appreciate some guidance on how to actually support Fortran IO from this interface. As I understand it, Fortran IO requires a stack-like operation, which conflicts with the SIMT model GPUs use. Worst case scenario we could burn some LDS to keep a stack, or serialize it somehow since we can always just iterate over all the active lanes. Building this right now looks like this, which depends on the arguments added in https://github.com/llvm/llvm-project/pull/131695. ``` -DRUNTIMES_nvptx64-nvidia-cuda_LLVM_ENABLE_RUNTIMES=compiler-rt;libc;libcxx;libcxxabi;flang-rt \ -DRUNTIMES_amdgcn-amd-amdhsa_LLVM_ENABLE_RUNTIMES=compiler-rt;libc;libcxx;libcxxabi;flang-rt \ -DRUNTIMES_nvptx64-nvidia-cuda_FLANG_RT_LIBC_PROVIDER=llvm \ -DRUNTIMES_nvptx64-nvidia-cuda_FLANG_RT_LIBCXX_PROVIDER=llvm \ -DRUNTIMES_amdgcn-amd-amdhsa_FLANG_RT_LIBC_PROVIDER=llvm \ -DRUNTIMES_amdgcn-amd-amdhsa_FLANG_RT_LIBCXX_PROVIDER=llvm ```	2025-03-24 08:31:42 -05:00
Leandro Lupori	ef56f4b5a0	[flang][OpenMP] Fix reduction of arrays with non-default lower bounds (#132228 ) Using LoopNest's indices with ShapeShifts that have non-default lower bounds results in accesses to incorrect array elements. To avoid having to adjust each index, a ShapeShift with default lower bounds can be used instead. Fixes #131751	2025-03-24 09:48:41 -03:00

1 2 3 4 5 ...

10055 Commits