Summary:
Previously, we removed the special handling for the code object version
global. I erroneously thought that this meant we cold get rid of this
weird `-Xclang` option. However, this also emits an LLVM IR module flag,
which will then cause linking issues.
The try-compile mechanism requires that `CMAKE_REQUIRED_FLAGS` is a
space-separated string instead of a list of flags. The original code
expanded `BUILTIN_FLAGS` into `CMAKE_REQUIRED_FLAGS` as a
space-separated string and then would overwrite `CMAKE_REQUIRED_FLAGS`
with `TARGET_${arch}_CFLAGS` prepended to the unexpanded
`BUILTIN_CFLAGS_${arch}`. This resulted in the first two arguments being
passed into the try-compile invocation, but dropping the other arguments
listed in `BUILTIN_CFLAGS_${arch}`.
This patch appends `TARGET_${arch}_CFLAGS` and `BUILTIN_CFLAGS_${arch}` to
`CMAKE_REQUIRED_FLAGS` before expanding CMAKE_REQUIRED_FLAGS as a
space-separated string. This passes any pre-set required flags, in addition to
all of the builtin and target flags to the Float16 detection.
Summary:
When we were first porting to COV5, this lead to some ABI issues due to
a change in how we looked up the work group size. Bitcode libraries
relied on the builtins to emit code, but this was changed between
versions. This prevented the bitcode libraries, like OpenMP or libc,
from being used for both COV4 and COV5. The solution was to have this
'none' functionality which effectively emitted code that branched off of
a global to resolve to either version.
This isn't a great solution because it forced every TU to have this
variable in it. The patch in
https://github.com/llvm/llvm-project/pull/131033 removed support for
COV4 from OpenMP, which was the only consumer of this functionality.
Other users like HIP and OpenCL did not use this because they linked the
ROCm Device Library directly which has its own handling (The name was
borrowed from it after all).
So, now that we don't need to worry about backward compatibility with
COV4, we can remove this special handling. Users can still emit COV4
code, this simply removes the special handling used to make the OpenMP
device runtime bitcode version agnostic.
Our platform has some constraints that allow us to make assumptions that
aren't generally applicable to other platforms. We keep an entirely separate
.s file for the routines.
According to the conversation
[here](https://github.com/llvm/llvm-project/pull/119414#issuecomment-2536495859),
some platforms don't enable `__arm_cpu_features` with a global
constructor, but rather do so lazily when called from the FMV resolver.
PR #119414 removed the CMake guard to check to see if the targetted
platform is baremetal or supports sys/auxv. Without this check, the
routines rely on `__arm_cpu_features` being initialised when they may
not be, depending on the platform.
This PR simply avoids building the SME routines for those platforms for
now.
When #92921 added the `__arm_get_current_vg` functionality, it used the
FMV feature bits mechanism rather than the mechanism that was previously
added for SME which called `getauxval` on Linux platforms or
`__aarch64_sme_accessible` required for baremetal libraries. It is
better to always use `__aarch64_cpu_features`.
For baremetal we still need to rely on `__arm_sme_accessible` to
initialise the struct.
Given that FMV support is required for the SME builtins to be built, the
FMV constructor as defined in:
compiler-rt/lib/builtins/cpu_model/aarch64.c
already initialises the feature bits, so there's no need to create
another one.
It turns out we were not passing -m32 to the check_c_source_compiles()
invocation since CMAKE_REQUIRE_FLAGS needs to be string separated list
and
we were passing a ;-separated CMake list which appears to be parsed by
CMake as 'ignore all arguments beyond the first'.
Fix this by transforming the list to a command line first.
With this change, Clang 17 no longer claims to support _Float16 for
i386.
Issue: #105181
extendhfxf2 calls extendhfXfy to convert _Float16 to double, then type
casts this converted value to long double.
__uint128_t may not be available on all architectures. Thus I din't use
extendhfXfy to widen precision to 128 bits.
This fixes#108936, but the calling convention doesn't match with GCC. I
doubt we have such a lib function for now, so leave the calling
convention as is.
The CMake docs state that `check_c_source_compiles()` checks whether the
supplied code "can be compiled as a C source file and linked as an
executable (so it must contain at least a `main()` function)."
https://cmake.org/cmake/help/v3.30/module/CheckCSourceCompiles.html
In practice, this command is a wrapper around `try_compile()`:
- 2904ce00d2/Modules/CheckCSourceCompiles.cmake (L54)
- 2904ce00d2/Modules/Internal/CheckSourceCompiles.cmake (L101)
When `CMAKE_SOURCE_DIR` is compiler-rt/lib/builtins/,
`CMAKE_TRY_COMPILE_TARGET_TYPE` is set to `STATIC_LIBRARY`, so the
checks for `float16` and `bfloat16` support work as intended in a
Clang + compiler-rt runtime build for example, as it runs CMake
recursively from that directory.
However, when using llvm/ or compiler-rt/ as CMake source directory, as
`CMAKE_TRY_COMPILE_TARGET_TYPE` defaults to `EXECUTABLE`, these checks
will indeed fail if the code doesn't have a `main()` function. This
results in LLVM using x86 SIMD registers when generating calls to
builtins that, with Arch Linux's compiler-rt package for example,
actually use a GPR for their argument or return value as they use
`uint16_t` instead of `_Float16`.
This had been caught in post-commit review:
https://reviews.llvm.org/D145237#4521152. Use of the internal
`CMAKE_C_COMPILER_WORKS` variable is not what hides the issue, however.
PR #69842 tried to fix this by unconditionally setting
`CMAKE_TRY_COMPILE_TARGET_TYPE` to `STATIC_LIBRARY`, but it apparently
caused other issues, so it was reverted. This PR just adds a `main()`
function in the checks, as per the CMake docs.
Changes included:
- Adding CONSTRUCTOR_ATTRIBUTE so that the static data is setup early on
in process lifetime. This is required by gcc docs for
__builtin_cpu_supports which we hope to implement in terms of this.
- Move the length initialization outside of the #if defined(linux) block
so that the length field always reflects the size of the structures even
if non of the feature bits are non-zero.
- Change the __riscv_vendor_feature_bits.length field to match the
length of the actual structure.
Note: Copy from https://github.com/llvm/llvm-project/pull/99958
---------
Co-authored-by: Philip Reames <preames@rivosinc.com>
A number of streaming-compatible versions of standard C functions
were added to compiler-rt, however there are already optimised
versions of most of these in libc which are valid in streaming-SVE
mode. This patch replaces the implementations of __arm_sc_mem* with
these versions where possible.
Base on https://github.com/riscv-non-isa/riscv-c-api-doc/pull/74, this
patch defines the `__riscv_feature_bits` and
`__riscv_vendor_feature_bits` structures to store the enabled feature
bits at runtime.
It also introduces the `__init_riscv_feature_bits` function to update
these structures based on the platform query mechanism.
Additionally, the groupid/bitmask definitions from
https://github.com/riscv-non-isa/riscv-c-api-doc/pull/74 are declared
and used to update the `__riscv_feature_bits` and
`__riscv_vendor_feature_bits` structures.
---------
Co-authored-by: Kito Cheng <kito.cheng@gmail.com>
Summary:
This patch adds initial support to build the `builtins` library for GPU
targets. Primarily this requires adding a few new architectures for
`amdgcn` and `nvptx64`. I built this using the following invocations.
```console
$ cmake ../compiler-rt -DCMAKE_C_COMPILER=clang
-DCMAKE_CXX_COMPILER=clang++ -DCMAKE_BUILD_TYPE=Release -GNinja
-DCMAKE_C_COMPILER_TARGET=<nvptx64-nvidia-cuda|amdgcn-amd-amdhsa>
-DCMAKE_CXX_COMPILER_TARGET=<nvptx64-nvidia-cuda|amdgcn-amd-amdhsa>
-DCMAKE_C_COMPILER_WORKS=1 -DCMAKE_CXX_COMPILER_WORKS=1
-DLLVM_CMAKE_DIR=../cmake/Modules -DCOMPILER_RT_DEFAULT_TARGET_ONLY=ON
-C ../compiler-rt/cmake/caches/GPU.cmake
```
Some pointers would be appreciated for how to test this using a standard
(non-default target only) build.
GPU builds are somewhat finnicky. We only expect this to be built with a
sufficiently new clang, as it's the only compiler that supports the
target and output we distribute. Distribution is done as LLVM-IR blobs
for now.
GPUs have little backward compatibility, so linking object files is
left to a future patch.
More work is necessary to build correctly for all targets and ship into
the correct clang resource directory. Additionally we need to use the
`libc` project's support for running unit tests.
When an uninstrumented libatomic is used with a TSan instrumented
memcpy, TSan may report a data race in circumstances where writes are
arguably safe.
This occurs because __atomic_compare_exchange won't be instrumented in
an uninstrumented libatomic, so TSan doesn't know that the subsequent
memcpy is race-free.
On the other hand, pthread_mutex_(un)lock will be intercepted by TSan,
meaning an uninstrumented libatomic will not report this false-positive.
pthread_mutexes also may try a number of different strategies to acquire
the lock, which may bound the amount of time a thread has to wait for a
lock during contention.
While pthread_mutex_lock has a larger overhead (due to the function call
and some dispatching), a dispatch to libatomic already predicates a lack
of performance guarantees.
Update the folder titles for targets in the monorepository that have not
seen taken care of for some time. These are the folders that targets are
organized in Visual Studio and XCode
(`set_property(TARGET <target> PROPERTY FOLDER "<title>")`)
when using the respective CMake's IDE generator.
* Ensure that every target is in a folder
* Use a folder hierarchy with each LLVM subproject as a top-level folder
* Use consistent folder names between subprojects
* When using target-creating functions from AddLLVM.cmake, automatically
deduce the folder. This reduces the number of
`set_property`/`set_target_property`, but are still necessary when
`add_custom_target`, `add_executable`, `add_library`, etc. are used. A
LLVM_SUBPROJECT_TITLE definition is used for that in each subproject's
root CMakeLists.txt.
Most of GCC's Linux targets have a link spec
`%{!static|static-pie:--eh-frame-hdr}` that doesn't pass --eh-frame-hdr
for `-static` links. `-static` links are supposed to utilize
`__register_frame_info` (called by `crtbeginT.o`, not by crtbegin.o or
crtbeginS.o) as a replacement.
compiler-rt crtbegin (not used with GCC) has some ehframe code, which is
not utilized because Clang driver unconditionally passes --eh-frame-hdr
for Linux targets, even for -static. In addition, LLVM libunwind
implements `__register_frame_info` as an empty stub.
Furthermore, in a non-static link, the `__register_frame_info`
references can cause an undesired weak dynamic symbol.
For now, just disable the config by default.
This variable was explicitly removed from the cache to ease transition
from existing build directories but that breaks passing
COMPILER_RT_BUILD_CRT=OFF on the command line. I was surprised to see
the CRT builds being run for my builtins-only build config (I noticed
because one of the tests was failing despite having `REQUIRES: crt`).
If I pass `-DCOMPILER_RT_BUILD_CRT=OFF` to cmake and add some prints
around the `unset` statement it shows the following:
```
-- before unset(): COMPILER_RT_BUILD_CRT=OFF
-- after unset: COMPILER_RT_BUILD_CRT=
-- after cmake_dependent_option COMPILER_RT_BUILD_CRT=ON
```
Drop this temporary workaround now that over 6 months have passed.
Whenever linking with -nodefaultlibs for a MinGW target, we manually
need to specify a bunch of libraries - listed in ${MINGW_LIBRARIES}; the
same is already done for sanitizers and libunwind/libcxxabi/libcxx.
Practically speaking, linking with -nodefaultlibs but manually passing
the libraries in ${MINGW_LIBRARIES} restores most of the libraries that
are linked by default, except for the potential compiler builtins and
unwind library; i.e. it has essentially the same effect as linking with
"--unwindlib=none -rtlib=none", except that -rtlib doesn't accept such a
value.
When building only compiler-rt/lib/builtins, not all of compiler-rt,
${MINGW_LIBRARIES} is unset - set it manually here for that case. This
matches what is set in
compiler-rt/cmake/config-ix.cmake, except that the builtins (libgcc or
compiler-rt builtins) is omitted; the only use within lib/buitlins is
for the standalone libatomic, which explicitly already links against the
just-built builtins.
GCC is able to check that the signatures of the builtins are as expected
and this shows some incorrect signatures on ld80 platforms (i.e. x86).
The *tf* functions should take 128-bit arguments but until the latest fixes
they used 80-bit long double.
Differential Revision: https://reviews.llvm.org/D153814
compiler-rt/builtins doesn't depend on anything from the CRT but
currently links against it and embeds a `/defaultlib:msvcrt` in the
`.lib` file, forcing anyone linking against it to also link against that
specific CRT. This isn't necessary as the end user can just choose which
CRT they want to use independently.
GCC provides these functions (e.g. __addtf3, etc.) in libgcc on x86_64.
Since Clang supports float128, we can also enable the existing code by
using float128 for fp_t if either __FLOAT128__ or __SIZEOF_FLOAT128__ is
defined instead of only supporting these builtins for platforms with
128-bit IEEE long doubles.
This commit defines a new tf_float typedef that matches a float with
attribute((mode(TF)) on each given architecture.
There are more tests that could be enabled for x86, but to keep the diff
smaller, I restricted test changes to ones that started failing as part
of this refactoring.
This change has been tested on x86 (natively) and
aarch64,powerpc64,riscv64 and sparc64 via qemu-user.
This supersedes https://reviews.llvm.org/D98261 and should also cover
the changes from https://github.com/llvm/llvm-project/pull/68041.
This patch implements __extendxftf2 (long double -> f128) and
__trunctfxf2 (f128 -> long double) on x86_64.
This is a preparation to unblock https://reviews.llvm.org/D53608,
We intentionally do not modify compiler-rt/lib/builtins/fp_lib.h in this
PR
(in particular, to limit the scope and avoid exposing other functions on
X86_64 in this PR).
Instead, TODOs were added to use fp_lib.h once it is available.
Test plan:
1. ninja check-compiler-rt (verified on X86_64 and on Aarch64)
In particular, new tests (extendxftf2_test.c and trunctfxf2_test.c) were
added.
2. compared the results of conversions with what other compilers (gcc)
produce.
Resolved issue with green dragon build by fixing relocations for
MachO/Darwin which doesn't compile without @page/@pageoff directives.
Also silenced a warning about constructor(90) priority being < 101,
which is reserved for the implementation. In this case, we're compiling
the implementation so we should be able to use 90.
This reverts commit 072713add4408199d4bce7b3b02cc74a4a382ee0.