* DescribeIEEESignaledExceptions() is unused on the device - warning.
* StopStatementText() could return while marked noreturn - warning.
* Including cuda/std/complex only in the device compilation
may cause nvcc to try to register variables in `cuda` namespace,
while they are not defined in the host compilation - error.
I decided to include cuda/std/complex always under RT_USE_LIBCUDACXX.
Make some minor tweaks (inlining, caching) to the formatting input path
to improve integer input in a SPEC code. (None of the I/O library has
been tuned yet for performance, and there are some easy optimizations
for common cases.) Input integer values are now calculated with native
C/C++ 128-bit integers.
A benchmark that only reads about 5M lines of three integer values each
speeds up from over 8 seconds to under 3 in my environment with these
changeds.
If this works out, the code here can be used to optimize the formatted
input paths for real and character data, too.
Fixes https://github.com/llvm/llvm-project/issues/134026.
Use `cudaMallocAsync` in the `CUFAllocDevice` allocator when asyncId is
provided.
More work is needed to be able to call `cudaFreeAsync` since the
allocated address and stream needs to be tracked.
For Fujitsu test case 0561/0561_0168.f90, adjust both input and output
sides of the extension I (and G) edit descriptors with no width (as
distinct from I0/G0). On input, be sure to halt on a separator character
rather than complaining about an invalid character; on output, be sure
to emit a leading space.
Implement extended intrinsic PUTENV, both function and subroutine forms.
Add PUTENV documentation to flang/docs/Intrinsics.md. Add functional and
semantic unit tests.
The RUNTIME_CHECK in question doesn't allow for the possibility that an
allocatable or pointer component could be processed by defined I/O.
Remove it in favor of a dynamic allocation check.
Fortran::runtime::Descriptor::BytesFor() only works for Fortran
intrinsic types for which a C++ type counterpart exists, so it crashes
on some types that are legitimate Fortran types like REAL(2). Move some
logic from Evaluate into a new header in flang/Common, then use it to
avoid this needless dependence on C++.
Add function and subroutine forms of FSEEK and FTELL as intrinsic
procedures. Accept common aliases from legacy compilers as well.
A separate patch to llvm-test-suite will enable tests for these
procedures once this patch has merged.
Depends on https://github.com/llvm/llvm-project/pull/132423; CI builds
will likely fail until that patch is merged and this PR is rebased.
This PR implements the nonstandard intrinsic time.
In addition to running the unit tests, I also double checked that the
example code works by manually compiling and running it.
This PR adds the intrinsic `unlink` to flang.
## Test plan
- Added two codegen unit tests and ensured flang-check continues to
pass.
- Manually compiled and ran the example from the documentation.
This PR is to improve the driver code to build `flang-rt` path by
re-using the logic and code of `compiler-rt`.
1. Moved `addFortranRuntimeLibraryPath` and `addFortranRuntimeLibs` to
`ToolChain.h` and made them virtual so that they can be overridden if
customization is needed. The current implementation of those two
procedures is moved to `ToolChain.cpp` as the base implementation to
default to.
2. Both AIX and PPCLinux now override `addFortranRuntimeLibs`.
The overriding function of `addFortranRuntimeLibs` for both AIX and
PPCLinux calls `getCompilerRTArgString` => `getCompilerRT` =>
`buildCompilerRTBasename` to get the path to `flang-rt`. This code
handles `LLVM_ENABLE_PER_TARGET_RUNTIME_DIR` setting. As shown in
`PPCLinux.cpp`, `FT_static` is the default. If not found, it will search
and build for `FT_shared`. To differentiate `flang-rt` from `clang-rt`,
a boolean flag `IsFortran` is passed to the chain of functions in order
to reach `buildCompilerRTBasename`.
It happened in https://lab.llvm.org/buildbot/#/builders/152/builds/1131
when the buildbot was switched from CTK12.3 to CTK12.8.
The logs are gone by now, so the above link is useless.
The error was:
error: ‘auto’ not permitted in template argument
This workaround helps, but I also reported the issue to NVCC devs.
Add the implementation of the `PERROR(STRING) ` intrinsic from the GNU
Extension to prints on the stderr a newline-terminated error message
corresponding to the last system error prefixed by `STRING`.
(https://gcc.gnu.org/onlinedocs/gfortran/PERROR.html)
flang/include/flang/Runtime/io-api.h was changed into io-api-consts.h,
then wrapped into a new io-api.h that includes io-api-consts.h, does
some redundant includes and declarations, and then declares the
prototype of one function, InquiryKeywordHashDecode.
Make that function static in io-stmt.cpp prior to its sole call site,
then undo the renaming, to reduce confusion and redundancy.
The production buildbot master apparently has not yet been restarted
since https://github.com/llvm/llvm-zorg/pull/393 landed.
This reverts commit 96d1baedefc3581b53bc4389bb171760bec6f191.
Remove the FLANG_INCLUDE_RUNTIME option which was replaced by
LLVM_ENABLE_RUNTIMES=flang-rt.
The FLANG_INCLUDE_RUNTIME option was added in #122336 which disables the
non-runtimes build instructions for the Flang runtime so they do not
conflict with the LLVM_ENABLE_RUNTIMES=flang-rt option added in #110217.
In order to not maintain multiple build instructions for the same thing,
this PR completely removes the old build instructions (effectively
forcing FLANG_INCLUDE_RUNTIME=OFF).
As per discussion in
https://discourse.llvm.org/t/buildbot-changes-with-llvm-enable-runtimes-flang-rt/83571/2
we now implicitly add LLVM_ENABLE_RUNTIMES=flang-rt whenever Flang is
compiled in a bootstrapping (non-standalone) build. Because it is
possible to build Flang-RT separately, this behavior can be disabled
using `-DFLANG_ENABLE_FLANG_RT=OFF`. Also see the discussion an
implicitly adding runtimes/projects in #123964.
When building Flang with Clang, we need to do the same quadmath.h
wrapping as we do for flang-rt. I extracted the CMake code
into FlangCommon.cmake, and cleaned up the arguments passing
to execute_process (note that `-###` was treated as `-` in the original
code, because `#` starts a comment). I believe the Clang command
does not require the input source file, so I removed it as well.
Implement GNU extension intrinsic HOSTNM, both function and subroutine
forms. Add HOSTNM documentation to `flang/docs/Intrinsics.md`. Add
lowering and semantic unit tests.
(This change is modeled after GETCWD implementation.)
Summary:
This patch adds initial support for compiling `flang-rt` directly for
the GPU. The method used here matches what's already done for `libc` and
`libc++` for the GPU and builds off of those projects.
Mainly this requires setting up some flags and setting the sources that
currently work. This will deposit the resulting library in the
appropriate directory. These files are then intended to be linked via
`-Xoffload-linker` support in the offloading driver.
```
lib/clang/21/lib/nvptx64-nvidia-cuda/libflang_rt.runtime.a
lib/clang/21/lib/amdgcn-amd-amdhsa/libflang_rt.runtime.a
```
This is obviously missing a lot of functions, mainly the `io` support.
Most of what we cannot support is due to using POSIX things that just
don't make sense on the GPU. Stuff like `pthreads` or `sema`.
Getting unit tests to run on this will also be a challenge. We could run
tests the same way we do with `libc`, but the problem there is that the
`libc` test suite is freestanding while `gtest` currently doesn't
compile on the GPU bcause it uses a lot of weird stuff. If the unit
tests were simply `int main` then it would work.
I don't understand the actual runtime code very well, I'd appreciate
some guidance on how to actually support Fortran IO from this interface.
As I understand it, Fortran IO requires a stack-like operation, which
conflicts with the SIMT model GPUs use. Worst case scenario we could
burn some LDS to keep a stack, or serialize it somehow since we can
always just iterate over all the active lanes.
Building this right now looks like this, which depends on the arguments
added in https://github.com/llvm/llvm-project/pull/131695.
```
-DRUNTIMES_nvptx64-nvidia-cuda_LLVM_ENABLE_RUNTIMES=compiler-rt;libc;libcxx;libcxxabi;flang-rt \
-DRUNTIMES_amdgcn-amd-amdhsa_LLVM_ENABLE_RUNTIMES=compiler-rt;libc;libcxx;libcxxabi;flang-rt \
-DRUNTIMES_nvptx64-nvidia-cuda_FLANG_RT_LIBC_PROVIDER=llvm \
-DRUNTIMES_nvptx64-nvidia-cuda_FLANG_RT_LIBCXX_PROVIDER=llvm \
-DRUNTIMES_amdgcn-amd-amdhsa_FLANG_RT_LIBC_PROVIDER=llvm \
-DRUNTIMES_amdgcn-amd-amdhsa_FLANG_RT_LIBCXX_PROVIDER=llvm
```
Summary:
This patch adds an interface that uses an in-tree build of LLVM's libc
and libc++.
This is done using the `-DFLANG_RT_LIBC_PROVIDER=llvm` and
`-DFLANG_RT_LIBCXX_PROVIDER=llvm` options. Using `libc` works in terms
of CMake, but the LLVM libc is not yet complete enough to compile all
the files.
The flang runtime will complain when the number of elements in the two
descriptors involved in the data transfer are not matching.
In some cases, we can still perform the data transfer to match the
behavior of the reference compiler.
When the RHS elements count is bigger than the LHS elements count and
both descriptors are contiguous, we can perform the data transfer with
the bare pointers and the number of bytes from the LHS.
We don't really have unit tests set up for data transfer, this is why I
didn't include one here.
This reverts commit 95d28fe503cc3d2bc0bb980442d3defaf199ea5a.
I did not fully realize the implications of this change when reviewing.
With how it is set up currently, it causes clang and all of the runtimes
to be built and tested everytime a change to MLIR is made. This is a
large regression in build/test time, which seems to have been causing
large queueing delays.
Reverting for now. Once we rework the runtimes build for premerge (which
I hope to do soon, ideally in the next week), I will make sure flang-rt
gets added in.
This API will be used for copying non-contiguous arrays
into contiguous temporaries to support `-frepack-arrays`.
The builder factory API will be used in the following commits.
I keep getting these warnings when building with clang-17:
`warning: 'StaticDescriptor' may not intend to support class template
argument deduction [-Wctad-maybe-unsupported]`
This change should help avoiding them.
I want to be able to check if the storage is contiguous
in the innermost dimension, so I decided to add an entry point
that takes `dim` as the number of leading dimensions to check.
It seems that a runtime call might result in less code size
even when `dim` is 1, so here it is.
For opt-for-speed I am going to inline it in FIR.
Depends on #131047.
Flang's runtime can now be built using LLVM's LLVM_ENABLE_RUNTIMES
mechanism, with the intent to remove the old mechanism in #124126.
Update the pre-merge builders to use the new mechanism.
In the current form, #124126 actually will add
LLVM_ENABLE_RUNTIMES=flang-rt implicitly, so no change is strictly
needed. I still think it is a good idea to do it explicitly and in
advance.
On Windows, flang-rt also requires compiler-rt, but which is not
building on Windows anyway.
I thought I guessed a fix in #130836, but I was wrong.
We actually had the same code in
`flang/cmake/modules/FlangCommon.cmake`.
The check does not pass in flang-rt bootstrap build, because
`-nostdinc++` is added for all `runtimes` checks.
I decided to make the check with the C header, though, I am still
unsure whether it is reliable with a clang that has not been
installed (it is taken from the build structure during flang-rt
configure step).
I verified that this PR enables REAL(16) math entries on aarch64.