Split some headers into headers for public and private declarations in
preparation for #110217. Moving the runtime-private headers in
runtime-private include directory will occur in #110298.
* Do not use `sizeof(Descriptor)` in the compiler. The size of the
descriptor is target-dependent while `sizeof(Descriptor)` is the size of
the Descriptor for the host platform which might be too small when
cross-compiling to a different platform. Another problem is that the
emitted assembly ((cross-)compiling to the same target) is not identical
between Flang's running on different systems. Moving the declaration of
`class Descriptor` out of the included header will also reduce the
amount of #included sources.
* Do not use `sizeof(ArrayConstructorVector)` and
`alignof(ArrayConstructorVector)` in the compiler. Same reason as with
`Descriptor`.
* Compute the descriptor's extra flags without instantiating a
Descriptor. `Fortran::runtime::Descriptor` is defined in the runtime
source, but not the compiler source.
* Move `InquiryKeywordHashDecode` into runtime-private header. The
function is defined in the runtime sources and trying to call it in the
compiler would lead to a link-error.
* Move allocator-kind magic numbers into common header. They are the
only declarations out of `allocator-registry.h` in the compiler as well.
This does not make Flang cross-compile ready yet, the main goal is to
avoid transitive header dependencies from Flang to clang-rt. There are
more assumptions that host platform is the same as the target platform.
Fortran.h and target.h are defining symbols where some are used by both, the Fortran runtime (Flang-RT) and Fortran compiler (Flang), and others are used by Flang only. With the upcoming refactoring of the Fortran runtime into its own subproject (#110217), move the declarations that are used by both into new headers to minimize the amount of code that will need to be shared by Flang-RT and Flang.
Details:
* `Fortran.h`: Flang-RT only uses some enum definitions out of this file, but not `AsFortran` which is defined in `Fortran.cpp`. Moving the enums into `Fortran-consts.h` allows keeping `Fortran.cpp` within Flang.
* `target.h`: Contains some floating-point definitions that is used by the non-GTest unittests in `fp-testing.h`. Flang-RT also uses some non-GTest as well. Moving those definitions avoids the dependence on the entire FortranEvaluate library.
This is a patch in preparation for the support stream ordered memory
allocator in CUDA Fortran.
This patch adds an asynchronous id to the AllocatableAllocate runtime
function and to Descriptor::Allocate so it can be passed down to the
registered allocator. It is up to the allocator to use this value or
not.
A follow up patch will implement that asynchronous allocator for CUDA
Fortran.
One of the execute_command_line tests currently runs `cat` on an invalid
file and checks its return value, but since we don't control `cat` or
the user's path, the return value might not be reliably stable on a
per-platform basis. For example, if `git` is installed on Windows in
certain configurations it adds a directory to the path containing a
`cat` with a different set of error codes to the default Windows one.
This patch changes the test to use the `not` binary built by LLVM for
testing purposes, which should always return 1 on any platform
regardless of the user's environment.
GETUID and GETGID are non-standard intrinsics supported by a number of
other Fortran compilers. On supported platforms these intrinsics simply
call the POSIX getuid() and getgid() functions and return the result.
The only platform we support that does not have these is Windows.
Windows does not have the same concept of UIDs and GIDs, so on Windows
we issue a warning indicating this and return 1 from both functions.
Co-authored-by: Yi Wu <yi.wu2@arm.com>
GETUID and GETGID are non-standard intrinsics supported by a number of
other Fortran compilers. On supported platforms these intrinsics simply
call the POSIX getuid() and getgid() functions and return the result.
The only platform we support that does not have these is Windows.
Windows does not have the same concept of UIDs and GIDs, so on Windows
we issue a warning indicating this and return 1 from both functions.
Co-authored-by: Yi Wu <yi.wu2@arm.com>
---------
Co-authored-by: Yi Wu <yi.wu2@arm.com>
When compiling on aarch64 some `LDBL_MANT_DIG == 113` entries
end up trying to use `complex<long double>` for which there are
no certain specializations in `libcudacxx`. This change-set
includes a clean-up for `LDBL_MANT_DIG == 113` usage, which is replaced
with `HAS_LDBL128` that is set in `float128.h`.
This patch adds new runtime entry points that perform the simple
allocation/deallocation of module allocatable variable with cuda
attributes.
When the allocation is initiated on the host, the descriptor on the
device is synchronized. Both descriptors point to the same data on the
device.
This is the first PR of a stack.
`std::complex` operators do not work for the CUDA device compilation
of F18 runtime. This change makes use of `cuda::std::complex` from
`libcudacxx`.
`cuda::std::complex` does not have specializations for `long double`,
so the change is accompanied with a clean-up for `long double` usage.
Additional change on top of #109078 is to use `cuda::std::complex`
only for the device compilation, otherwise the host compilation
fails because `libcudacxx` may not support `long double` specialization
at all (depending on the compiler).
`std::complex` operators do not work for the CUDA device compilation
of F18 runtime. This change makes use of `cuda::std::complex` from
`libcudacxx`.
`cuda::std::complex` does not have specializations for `long double`,
so the change is accompanied with a clean-up for `long double` usage.
As specified in the docs,
1) raw_string_ostream is always unbuffered and
2) the underlying buffer may be used directly
( 65b13610a5226b84889b923bae884ba395ad084d for further reference )
Avoid unneeded calls to raw_string_ostream::str(), to avoid excess indirection.
A few other Fortran compilers silently accept real values for integer
variables in NAMELIST input. Handling an exponent would be difficult,
but it's easy to skip and ignore a fractional part when one is present.
Add runtime APIs for the intrinsic function SPACING for REAL kinds 2 & 3
in two ways: Spacing2 (& 3) for build environments with std::float16_t,
and Spacing2By4 (& 3By4) variants (for any build environment) which
compute SPACING for those types but accept and return their values as
32-bit floats.
SPACING for REAL(2) is needed by HDF5.
While experimenting with some more recent C++ features, I ran into
trouble with warnings from GCC 12.3.0 and 14.2.0. These warnings looked
legitimate, so I've tweaked the code to avoid them.
CUDA Fortran is meant to be an equivalent to the runtime API. Therefore,
it makes more sense to use the cuda rt API in the allocators for CUF.
@bdudleback
This patch adds entry point in the runtime to be able to allocate
descriptors in managed memory. These entry points currently only call
`CUFAllocManaged` and `CUFFreeManaged` but could be more complicated in
the future.
`cuf.alloc` and `cuf.free` related to local descriptors are converted
into runtime calls.
Add allocators for CUDA fortran allocation on the device. 3 allocators
are added for pinned, device and managed/unified memory allocation.
`CUFRegisterAllocator()` is called to register the allocators in the
allocator registry added in #100690.
Since this require CUDA, a cmake option `FLANG_CUF_RUNTIME` is added to
conditionally build these.
IEEE_ARITHMETIC intrinsic module procedures IEEE_NEXT_AFTER,
IEEE_NEXT_DOWN, and IEEE_NEXT_UP, and intrinsic NEAREST return larger or
smaller values adjacent to their primary REAL argument. The four
procedures vary in how the direction is chosen, in how special cases are
treated, and in what exceptions are generated. Implement the three
IEEE_ARITHMETIC procedures. Update the NEAREST implementation to support
all six REAL kinds 2,3,4,8,10,16, and fix several bugs.
IEEE_NEXT_AFTER(X,Y) returns a NaN when Y is a NaN as that seems to be
the universal choice of other compilers.
Change the front end compile time implementation of these procedures to
return normal (HUGE) values for infinities when applicable, rather than
always returning the input infinity.
Device compilation is much faster for separate MATMUL[_TRANPOSE]
entries than for a single one that covers all data types.
The lowering changes and the removal of the generic entries will follow.
The fix broke llvm-test-suite, so it was reverted previously. With test
fixes added in https://github.com/llvm/llvm-test-suite/pull/137, it
should now pass the tests
This reverts commit 435635652fd226fa292abcff6a10d3df9dbd74e3.
This PR adds -mtune as a valid flang flag and passes the information
through to LLVM IR as an attribute on all functions. No specific
architecture optimizations are added at this time.
…ne continuation
Allow preprocessing directives to appear between a source line and its
continuation, including conditional compilation directives (#if, #ifdef,
&c.).
Fixes https://github.com/llvm/llvm-project/issues/95476.
Accommodate operations with VALUE dummy arguments in the runtime support
for the REDUCE intrinsic function by splitting most entry points into
Reduce...Ref and Reduce...Value variants.
Further work will be needed in lowering to call the ...Value entry
points.