Summary:
The GPU build needs to be able to inline stuff in LTO. Builtin
transformations cause problems on the functions that the optimizer does
heavy libcall recognition on. Previously we moved to using
`-fno-builtin-*` to allow us to only disable the problematic ones.
However, this still didn't allow inlining because each function had the
attribute that told the inliner not to inlining a nobuiltin function
into a non-nobuiltin function
This patch fixes that by only applying these attributes to the
entrypoints that define them. That is enough to prevent recursive calls
within the definitoins themselves.
(cherry picked from commit 8e43acbfedf53ded43ec693ddaaf518cb7416c1c)
Summary:
The NVPTX backend optimizes the ABI for functions that are internal,
however, this is not legal for indirect call prototypes. Previously, we
would modify the ABI on an aggregate byval type passed to an indirect
call prototype, which would make PTXAS error. This patch just passes the
function as a nullptr to force strict ABI compliance without
modification in the helper function.
Fixes https://github.com/llvm/llvm-project/issues/100055
(cherry picked from commit e0649a5dfc6b859d652318f578bc3d49674787a4)
Summary:
Currently we have several hacks to work around the fact that the NVPTX
linker, 'nvlink', does not support static libraries or LTO linking.
The patch in https://github.com/llvm/llvm-project/pull/96561 introduces
a wrapper in the toolchain that allows us to use a standard `ld.lld`
like interface. This means all the divergence with this target can be
removed.
Depends on https://github.com/llvm/llvm-project/pull/96561
This patch reverts #99781 and part of #99771 since `epoll_pwait2` is not
in fact available on all supported systems. It is my opinion that we
shouldn't provide a version of a function that doesn't perform as
expected, which is why this revert needs to happen.
The `epoll_pwait2` function can be reenabled when we have a way to check
if it is available on the target system, tracking bug for that is #80060
While looking through the list of includes for #99693 I found these
includes that also need to be cleaned up. I removed the extra includes
in expm1.cpp, but the include in thread.h needs more attention so I just
marked it with a todo.
Summary:
This file is an object library, but uses the `LIBC_COPT_PUBLIC_PACKAING`
option. This will always be undefined which leads to a type mismatch
when uses actually try to link against it. This patch simply removes
this and turns it into a header only library. This means that the
implementations of the callback lists and the mutexes need to live in
their respective files. The result is that `atexit` needs to be defined
for `at_quick_exit` to be valid.
This patch removes the recursion in fcntl introduced by PR #99675 as it is not required and may be dangerous in some cases: some toolchains define F_GETLK == F_GETLK64 causing infinite recursion.
Summary:
This value is a uint32_t but is printed as a uint64_t, leading to
invalid offsets when done on AMDGPU due to its packed format extending
past the buffer.
In some systems like rv32, only fcntl64 is available and it employs a different structure for file locking and the correspoding F_GETLK64, F_SETLK64, and F_SETLKW64 commands.
So if we use fcntl64, the F_GETLK, F_SETLK, and F_SETLKW commands need to be changed to their 64 versions. This patch adds new cases to the swich(cmd) in our implementation of fcntl to do that.
The default case was moved to outside the switch, so we don't need to change anything, the F_GETLK, F_SETLK, and F_SETLKW commands will just go through the old implementation.
In 32-bit systems with 64-bit offsets, both fsfilcnt_t and fsblkcnt_t are 64-bit long, just like 64-bit systems. This patch changes both types to be 64-bit long for all platforms and follows the reasoning used to change off_t: the standard only requires it to be an unsigned int, so making it 64-bit long doesn't violate this property.
It should be NFC for 64-bit systems.
This PR adds a `BENCHMARK_N_THREADS()` helper to register benchmarks
with a specific number of threads. This PR replaces the flags used
originally to allow any amount of threads.
When SYS_statfs64 is used, struct statfs64 is used instead of struct statfs. This patch adds a define to select the appropriate struct, similar to how it's done internally.
This patch also enables fstatvfs and statvfs on riscv, which would not be compiled without this change.
Summary:
We get the `strlen` to know how much memory to allocate here, but it
wasn't taking into account if the padding was larger than the string
itself. This patch sets it to an empty string so we always add the
minimum size. This implementation is slightly wasteful with memory, but
I am not concerned with a few extra bytes here and there for some memory
that gets immediately free'd.
The fail test case only makes sense if SYS_epoll_create is used internally to implement epoll_create, since the only argument to epoll_create (size) is dropped if SYS_epoll_create1 is used.
This patch implements pwait2 using pwait. The implementation is an
approximation of pwait2, since pwait only only supports timeouts in
milliseconds, not nanoseconds, as required by pwait2.
This patch enables most of the libc entrypoints for riscv, except for fstatvfs, statvfs, dmull and fmull which are currently failing compilation. float16 is also not added, as rv32 doesn't seem to support it yet.
This patch also fixes the call to seek, which should take an off_t, and was missed in PR #68269.
This patch fixes:
randomness.h and getauxval.cpp were passing ssize_t as size_t
kernel_statx.h was assigning an uint64_t to uintptr_t
fopencookie.cpp was trying to create a FileIOResult using ssize_t but the constructor expected a size_t
thread.h was trying to call free_stack (which takes a size_t) with an unsigned long long. free_stack does the calculations using uintptr_t, so I changed the passing values to size_t
This addresses an issue introduced in #98826 where static_assert was
made defined only when NDEBUG is not set which is different from all
other C libraries and breaks any code that uses static_assert and
doesn't guard it with NDEBUG.
This patch increases the timeout of the libc test from 1s to 10s, mostly because rv32 runs in a qemu image and sometimes 1s is just not enough for the test to finish when there are other tests running in parallel
When crosscompiling, we need to search for the linux kernel headers in the sysroot but since #97486 the linux kernel headers were always searched in /usr/include.
This patch fixes this behaviour by prepending a '=' to where we search for the kernel headers. As per the gcc/clang's documentation a '=' before the path is replaced by the sysroot.
This patch also includes a fix for rv32, that fails to compile due to a missing definition of CLOCK_REALTIME after this change.
`libc/benchmarks/gpu/timing/CMakeLists.txt` did not correctly build
`amdgpu` utils. This PR fixes that issue by adding `amdgpu` to the loop
that adds the correct sub directories.
The DECLS_FILE_PATH property is supposed to be set on the targets for
the generated headers for the GPU build installation. It got missed when
creating the cmake rule for new headergen.