* DescribeIEEESignaledExceptions() is unused on the device - warning.
* StopStatementText() could return while marked noreturn - warning.
* Including cuda/std/complex only in the device compilation
may cause nvcc to try to register variables in `cuda` namespace,
while they are not defined in the host compilation - error.
I decided to include cuda/std/complex always under RT_USE_LIBCUDACXX.
Make some minor tweaks (inlining, caching) to the formatting input path
to improve integer input in a SPEC code. (None of the I/O library has
been tuned yet for performance, and there are some easy optimizations
for common cases.) Input integer values are now calculated with native
C/C++ 128-bit integers.
A benchmark that only reads about 5M lines of three integer values each
speeds up from over 8 seconds to under 3 in my environment with these
changeds.
If this works out, the code here can be used to optimize the formatted
input paths for real and character data, too.
Fixes https://github.com/llvm/llvm-project/issues/134026.
Use `cudaMallocAsync` in the `CUFAllocDevice` allocator when asyncId is
provided.
More work is needed to be able to call `cudaFreeAsync` since the
allocated address and stream needs to be tracked.
For Fujitsu test case 0561/0561_0168.f90, adjust both input and output
sides of the extension I (and G) edit descriptors with no width (as
distinct from I0/G0). On input, be sure to halt on a separator character
rather than complaining about an invalid character; on output, be sure
to emit a leading space.
Implement extended intrinsic PUTENV, both function and subroutine forms.
Add PUTENV documentation to flang/docs/Intrinsics.md. Add functional and
semantic unit tests.
The RUNTIME_CHECK in question doesn't allow for the possibility that an
allocatable or pointer component could be processed by defined I/O.
Remove it in favor of a dynamic allocation check.
Fortran::runtime::Descriptor::BytesFor() only works for Fortran
intrinsic types for which a C++ type counterpart exists, so it crashes
on some types that are legitimate Fortran types like REAL(2). Move some
logic from Evaluate into a new header in flang/Common, then use it to
avoid this needless dependence on C++.
Add function and subroutine forms of FSEEK and FTELL as intrinsic
procedures. Accept common aliases from legacy compilers as well.
A separate patch to llvm-test-suite will enable tests for these
procedures once this patch has merged.
Depends on https://github.com/llvm/llvm-project/pull/132423; CI builds
will likely fail until that patch is merged and this PR is rebased.
This PR implements the nonstandard intrinsic time.
In addition to running the unit tests, I also double checked that the
example code works by manually compiling and running it.
This PR adds the intrinsic `unlink` to flang.
## Test plan
- Added two codegen unit tests and ensured flang-check continues to
pass.
- Manually compiled and ran the example from the documentation.
It happened in https://lab.llvm.org/buildbot/#/builders/152/builds/1131
when the buildbot was switched from CTK12.3 to CTK12.8.
The logs are gone by now, so the above link is useless.
The error was:
error: ‘auto’ not permitted in template argument
This workaround helps, but I also reported the issue to NVCC devs.
Add the implementation of the `PERROR(STRING) ` intrinsic from the GNU
Extension to prints on the stderr a newline-terminated error message
corresponding to the last system error prefixed by `STRING`.
(https://gcc.gnu.org/onlinedocs/gfortran/PERROR.html)
flang/include/flang/Runtime/io-api.h was changed into io-api-consts.h,
then wrapped into a new io-api.h that includes io-api-consts.h, does
some redundant includes and declarations, and then declares the
prototype of one function, InquiryKeywordHashDecode.
Make that function static in io-stmt.cpp prior to its sole call site,
then undo the renaming, to reduce confusion and redundancy.
When building Flang with Clang, we need to do the same quadmath.h
wrapping as we do for flang-rt. I extracted the CMake code
into FlangCommon.cmake, and cleaned up the arguments passing
to execute_process (note that `-###` was treated as `-` in the original
code, because `#` starts a comment). I believe the Clang command
does not require the input source file, so I removed it as well.
Implement GNU extension intrinsic HOSTNM, both function and subroutine
forms. Add HOSTNM documentation to `flang/docs/Intrinsics.md`. Add
lowering and semantic unit tests.
(This change is modeled after GETCWD implementation.)
Summary:
This patch adds initial support for compiling `flang-rt` directly for
the GPU. The method used here matches what's already done for `libc` and
`libc++` for the GPU and builds off of those projects.
Mainly this requires setting up some flags and setting the sources that
currently work. This will deposit the resulting library in the
appropriate directory. These files are then intended to be linked via
`-Xoffload-linker` support in the offloading driver.
```
lib/clang/21/lib/nvptx64-nvidia-cuda/libflang_rt.runtime.a
lib/clang/21/lib/amdgcn-amd-amdhsa/libflang_rt.runtime.a
```
This is obviously missing a lot of functions, mainly the `io` support.
Most of what we cannot support is due to using POSIX things that just
don't make sense on the GPU. Stuff like `pthreads` or `sema`.
Getting unit tests to run on this will also be a challenge. We could run
tests the same way we do with `libc`, but the problem there is that the
`libc` test suite is freestanding while `gtest` currently doesn't
compile on the GPU bcause it uses a lot of weird stuff. If the unit
tests were simply `int main` then it would work.
I don't understand the actual runtime code very well, I'd appreciate
some guidance on how to actually support Fortran IO from this interface.
As I understand it, Fortran IO requires a stack-like operation, which
conflicts with the SIMT model GPUs use. Worst case scenario we could
burn some LDS to keep a stack, or serialize it somehow since we can
always just iterate over all the active lanes.
Building this right now looks like this, which depends on the arguments
added in https://github.com/llvm/llvm-project/pull/131695.
```
-DRUNTIMES_nvptx64-nvidia-cuda_LLVM_ENABLE_RUNTIMES=compiler-rt;libc;libcxx;libcxxabi;flang-rt \
-DRUNTIMES_amdgcn-amd-amdhsa_LLVM_ENABLE_RUNTIMES=compiler-rt;libc;libcxx;libcxxabi;flang-rt \
-DRUNTIMES_nvptx64-nvidia-cuda_FLANG_RT_LIBC_PROVIDER=llvm \
-DRUNTIMES_nvptx64-nvidia-cuda_FLANG_RT_LIBCXX_PROVIDER=llvm \
-DRUNTIMES_amdgcn-amd-amdhsa_FLANG_RT_LIBC_PROVIDER=llvm \
-DRUNTIMES_amdgcn-amd-amdhsa_FLANG_RT_LIBCXX_PROVIDER=llvm
```
The flang runtime will complain when the number of elements in the two
descriptors involved in the data transfer are not matching.
In some cases, we can still perform the data transfer to match the
behavior of the reference compiler.
When the RHS elements count is bigger than the LHS elements count and
both descriptors are contiguous, we can perform the data transfer with
the bare pointers and the number of bytes from the LHS.
We don't really have unit tests set up for data transfer, this is why I
didn't include one here.
This API will be used for copying non-contiguous arrays
into contiguous temporaries to support `-frepack-arrays`.
The builder factory API will be used in the following commits.
I want to be able to check if the storage is contiguous
in the innermost dimension, so I decided to add an entry point
that takes `dim` as the number of leading dimensions to check.
It seems that a runtime call might result in less code size
even when `dim` is 1, so here it is.
For opt-for-speed I am going to inline it in FIR.
Depends on #131047.
When compiling Flang-RT with Clang, query Clang for the GCC installation
it uses. If found, create `quadmath_wrapper.h` that points to the
`quadmath.h` of that GCC installation.
`quadmath.h` is only available when compiling with gcc, and Clang has no
equivalent even though gcc's version compiles fine with Clang (at least
up to and including gcc 13). It is still available into gcc's
installation resource dir (in constrast to a system-wide indirectory
such as `/usr/include` or `/usr/local/include`) and therefore not
available to any compiler other than the gcc of that installation.
quadmath may also be a different OS package than gcc itself, so it is
not necessarily presesent.
Clang actually already appropriates a GCC installation for its libraries
such that `libquadmath.a` is already found, but it does not do so for
the include paths. Because adding that directory to the header search
path may have wide-reaching consquences, we create only a wrapper header
that points to the real `quadmath.h` in the same GCC installation that
Clang uses.
pointer allocation is done through `AllocateValidatedPointerPayload`.
This function was not updated to use the registered allocators in the
descriptor to perform the allocation. This patch makes use of the
allocator.
The footer word is not set and not checked for allocator other than the
default one. The support will likely come in a follow up patch but this
will necessitate more functions to be registered to be able to set and
get the footer value when the allocation in on the device.
When reading an unformatted sequential file with variable-length
records, detect byte order reversal problems with the first record's
header and footer words, and emit a more detailed error message.
Under non-Windows platforms, also create a dynamic library version of
the runtime. Build of either version of the library can be switched on
using FLANG_RT_ENABLE_STATIC=ON respectively FLANG_RT_ENABLE_SHARED=ON.
Default is to build only the static library, consistent with previous
behaviour. This is because the way the flang driver invokes the linker,
most linkers choose the dynamic library by default, if available.
Building the dynamic library therefore causes flang-built executables to
depend on `libflang_rt.so`, unless explicitly told otherwise.
Extract Flang's runtime library to use the LLVM_ENABLE_RUNTIME
mechanism. It will only become active when
`LLVM_ENABLE_RUNTIMES=flang-rt` is used, which also changes the
`FLANG_INCLUDE_RUNTIME` to `OFF` so the old runtime build rules do not
conflict. This also means that unless `LLVM_ENABLE_RUNTIMES=flang-rt` is
passed, nothing changes with the current build process.
Motivation:
* Consistency with LLVM's other runtime libraries (compiler-rt, libc,
libcxx, openmp offload, ...)
* Allows compiling the runtime for multiple targets at once using the
LLVM_RUNTIME_TARGETS configuration options
* Installs the runtime into the compiler's per-target resource directory
so it can be automatically found even when cross-compiling
Also see RFC discussion at
https://discourse.llvm.org/t/rfc-use-llvm-enable-runtimes-for-flangs-runtime/80826
There seems to be multiple declarations of __libcpp_verbose_abort, some
with noexcept and some without. Reverting to the previous
forward-declaration (without noexcept) which seemes to have worked
before.
Mostly mechanical changes in preparation of extracting the Flang-RT
"subproject" in #110217. This PR intends to only move pre-existing files
to the new folder structure, with no behavioral change. Common files
(headers, testing, cmake) shared by Flang-RT and Flang remain in
`flang/`.
Some cosmetic changes and files paths were necessary:
* Relative paths to the new path for the source files and
`add_subdirectory`.
* Add the new location's include directory to `include_directories`
* The unittest/Evaluate directory has unitests for flang-rt and Flang. A
new `CMakeLists.txt` was introduced for the flang-rt tests.
* Change the `#include` paths relative to the include directive
* clang-format on the `#include` directives
* Since the paths are part if the copyright header and include guards, a
script was used to canonicalize those
* `test/Runtime` and runtime tests in `test/Driver` are moved, but the
lit.cfg.py mechanism to execute the will only be added in #110217.