513 Commits

Author SHA1 Message Date
Joseph Huber
89614ceb40
[libc] Move RPC interface to libc/shared to export it (#117034)
Summary:
Previous patches have made the `rpc.h` header independent of the `libc`
internals. This allows us to include it directly rather than providing
an indirect C API. This patch only does the work to move the header. A
future patch will pull out the `rpc_server` interface and simply replace
it with a single function that handles the opcodes.
2024-11-22 15:32:25 -06:00
Joseph Huber
676a1e6643
[AMDGPU] Remove uses of deprecreated HSA executable functions (#117241)
Summary:
These functions were deprecated in ROCR 1.3 which was released quite
some time ago. The main functionality that was lost was modifying and
inspecting the code object indepedently of the executable, however we do
all of that custom through our ELF API. This should be within the
versions of other functions we use.
2024-11-22 07:16:40 -06:00
Joseph Huber
27d25d1c12
[libc] Increase RPC opcode to 32-bit and use a class byte (#116905)
Summary:
Currently, the RPC interface uses a basic opcode to communicate with the
server. This currently is 16 bits. There's no reason for this to be 16
bits, because on the GPU a 32-bit write is the same as a 16-bit write
performance wise.

Additionally, I am now making all the `libc` based opcodes qualified
with the 'c' type, mimiciing how Linux handles `ioctls` all coming from
the same driver. This will make it easier to extend the interface when
it's exported directly.
2024-11-19 21:56:10 -06:00
wldfngrs
f7bb12901e
[libc][math][c23] Add tanpif16 function (#115183)
- Implementation of `tan` for 16-bit floating point inputs scaled by pi.
i.e,. `tanpif16()`
- Implementation of Tanpi in MPFRWrapper for MPFR versions < 4.2
- Exhaustive tests for `tanpif16()`
2024-11-08 09:56:31 -05:00
lntue
88a0a318e8
[libc] Use relative inclusion for public headers. (#114324)
We are finalizing the header inclusion policy, and for our public
headers in the `libc/include` folder, they must use relative path in
`"..."` when including each other.

This PR does the cleanup making sure that all the public header
inclusions in `libc/include` folder use relative paths.

---------

Co-authored-by: Nick Desaulniers <nickdesaulniers@users.noreply.github.com>
2024-11-01 14:33:30 -04:00
lntue
296a9ba77d
[libc] Fix memory leak in MPFRWrapper cospif with MPFR pre 4.2. (#114415) 2024-10-31 11:19:54 -04:00
wldfngrs
7395ef5419
[libc][math][c23] Add cospif16 function (#113001)
Implementation of `cos` for half precision floating point inputs scaled
by pi (i.e., `cospi`), correctly rounded for all rounding modes.

---------

Co-authored-by: OverMighty <its.overmighty@gmail.com>
2024-10-29 03:39:57 -07:00
OverMighty
95c24cb9de
[libc][math][c23] Add exp10m1f16 C23 math function (#105706)
Part of #95250.
2024-10-16 16:33:13 +02:00
wldfngrs
ddc3f2dd26
[libc] Add sinpif16 function (#110994)
Half-precision floating point (16-bit) implementation of the
trigonometric function Sin for inputs scaled by pi
2024-10-15 18:40:08 -04:00
Joseph Huber
be0c67c90e
[libc] Remove dependency on cpp::function in rpc.h (#112422)
Summary:
I'm going to attempt to move the `rpc.h` header to a separate folder
that we can install and include outside of `libc`. Before doing this I'm
going to try to trim up the file so there's not as many things I need to
copy to make it work. This dependency on `cpp::functional` is a low
hanging fruit. I only did it so that I could overload the argument of
the work function so that passing the id was optional in the lambda,
that's not a *huge* deal and it makes it more explicit I suppose.
2024-10-15 12:31:06 -07:00
Joseph Huber
ee57a685fa
[libc] Make a dedicated thread for the RPC server (#111210)
Summary:
Make a separate thread to run the server when we launch. This is
required by CUDA, which you can force with `export
CUDA_LAUNCH_BLOCKING=1`. I figured I might as well be consistent and do
it for the AMD implementation as well even though I believe it's not
necessary.
2024-10-07 05:30:44 -07:00
Ivan Butygin
26ca8ef836
[libc] GPU RPC interface: add return value to rpc_host_call (#111288) 2024-10-06 20:22:07 +03:00
Rahul Joshi
a140931be5
[TableGen] Change getValueAsListOfDefs to return const pointer vector (#110713)
Change `getValueAsListOfDefs` to return a vector of const Record
pointer, and remove `getValueAsListOfConstDefs` that was added as a
transition aid.

This is a part of effort to have better const correctness in TableGen
backends:


https://discourse.llvm.org/t/psa-planned-changes-to-tablegen-getallderiveddefinitions-api-potential-downstream-breakages/81089
2024-10-01 14:30:38 -07:00
Rahul Joshi
a86e966a20
[TableGen] Change TableGenMain to use const RecordKeeper (#110578)
Change TableGenMain's `MainFn` argument to be a function that accepts a
const reference to RecordKeeper.

This is a part of effort to have better const correctness in TableGen
backends:


https://discourse.llvm.org/t/psa-planned-changes-to-tablegen-getallderiveddefinitions-api-potential-downstream-breakages/81089
2024-10-01 06:51:07 -07:00
Rahul Joshi
005f815313
[LIBC] Fix build failure caused by #110032 (#110539)
Fix LibC TableGen build failure caused by
https://github.com/llvm/llvm-project/pull/110032
2024-09-30 10:36:01 -07:00
Joseph Huber
6558e5615a
[libc] Update HSA queues to use the maximum size and set the barrier bit (#110034)
Summary:
It's safer to use the maximum size, as this prevents the runtime from
oversubscribing with multiple producers. Additionally we should set the
barrier bit to ensure that the queue entries block if multiple are
submitted (Which shouldn't happen for this tool).
2024-09-28 16:49:28 -05:00
Ivan Butygin
bbe79a803c
[libc] Use RAII alloc in gpu rpc printf impl (#110352) 2024-09-28 15:44:01 +03:00
Ivan Butygin
ef390b36ca
[libc] Use RAII based alloc in gpu rpc_server instead of manual new/delete (#110341)
Co-authored-by: Joseph Huber <huberjn@outlook.com>
2024-09-28 11:53:21 +03:00
Joseph Huber
b712a1445b [libc] Fix memory leak and accidentally ignoring dimensions in loader
Summary:
The loader had a bug where we weren't setting the dimensions correctly,
also I forgot to delete the paths for this  RPC call.
2024-09-27 09:57:44 -05:00
Joseph Huber
fe6a3d46aa
[libc] Implement the 'rename' function on the GPU (#109814)
Summary:
Straightforward implementation like the other `stdio.h` functions.
2024-09-24 09:32:42 -07:00
Joseph Huber
16d11e26f3
[libc] Add GPU support for the 'system' function (#109687)
Summary:
This function can easily be implemented by forwarding it to the host
process. This shows up in a few places that we might want to test the
GPU so it should be provided. Also, I find the idea of the GPU
offloading work to the CPU via `system` very funny.
2024-09-23 14:04:28 -07:00
OverMighty
127349fcba
[libc][math] Add floating-point cast independent of compiler runtime (#105152)
Fixes build and tests with compiler-rt on x86.
2024-09-23 19:35:39 +02:00
Michael Jones
010c0d36e1
[libc][AMDGPU] Disable %m in RPC server (#109317)
The RPC server directly includes the printf code, but doesn't support
errno, so the %m conversion needs to be disabled there as well. This
patch does that.
2024-09-19 13:33:23 -05:00
Rahul Joshi
98563b19c2
[libc][TableGen] Migrate libc-hdrgen backend to use const RecordKeeper (#107542)
Migrate libc-hdrgen backend to use const RecordKeeper
2024-09-07 15:14:07 -07:00
lntue
fc7a893620
[libc] Remove -ffreestanding when building MPFR wrapper. (#107637)
MPFR/GMP headers do not work with -ffreestanding flags.
2024-09-06 16:54:36 -04:00
lntue
80cf21dad1
[libc] Fix unit test compile flags propagation. (#106128)
With this change, I was able to build and test for aarch64 & riscv64 on
x86-64 host as follow:

Pre-requisite:
- cross build toolchain for aarch64
```
$ sudo apt install binutils-aarch64-linux-gnu gcc-aarch64-linux-gnu g++-aarch64-linux-gnu
```
- cross build toolchain for riscv64
```
$ sudo apt install binutils-riscv64-linux-gnu gcc-riscv64-linux-gnu g++-riscv64-linux-gnu
```
- qemu user:
```
$ sudo apt install qemu qemu-user qemu-user-static
```

CMake invocation:
```
$ cmake ../runtimes -GNinja -DLLVM_ENABLE_RUNTIMES=libc -DCMAKE_C_COMPILER=clang -DCMAKE_CXX_COMPILER=clang++ -DLIBC_TARGET_TRIPLE=<aarch64-linux-gnu/riscv64-linux-gnu> -DCMAKE_BUILD_TYPE=Release -DLIBC_TEST_COMPILE_OPTIONS_DEFAULT="-static"
$ ninja libc
$ ninja check-libc
```
2024-09-06 11:56:07 -04:00
lntue
54c6b93bcb
[libc][NFC] Add sollya script to compute worst case range reduction. (#104803) 2024-08-19 17:58:46 -04:00
Schrodinger ZHU Yifan
b7c7dbd473
Revert "libc: Remove extern "C" from main declarations" (#102827)
Reverts llvm/llvm-project#102825
2024-08-11 13:40:50 -07:00
David Blaikie
1b71c471c7
libc: Remove extern "C" from main declarations (#102825)
This is invalid in C++, and clang recently started warning on it as of
#101853
2024-08-11 13:17:27 -07:00
Joseph Huber
f126bc984c [libc] Fix conflict values from internal limits.h when used externally 2024-08-07 10:09:02 -05:00
Joseph Huber
06a808c4f4 [libc] Fix bot accidentally picking up conflicting MB_LEN_MAX 2024-08-07 09:19:53 -05:00
Joseph Huber
2e9f15e1df [libc] Fix index into argument vector 2024-08-06 14:06:51 -05:00
Joseph Huber
3983bf6040 [libc] Fix GPU argument vector writing nullptr to string
Summary:
The intention behind this code was to null terminate the `envp` string,
but it accidentally went into the string data.
2024-08-06 13:03:06 -05:00
aaryanshukla
0395bf7636
[libc][math][c23] Add ffma{,l,f128} and fdiv{,l,f128} C23 math functions #101089 (#101253)
- added all variations of ffma and fdiv 
- will add all new headers into yaml  for next patch 
- only fsub is left then all basic operations for float is complete

---------

Co-authored-by: OverMighty <its.overmighty@gmail.com>
2024-08-06 10:19:54 -07:00
Joseph Huber
8c6a6f1a70 [libc] Make RPC malloc implementation return 'nullptr' on alloc failure
Summary:
`malloc` is supposed to return `nullptr` if it fails, not exit with an
error code.
2024-08-06 11:03:40 -05:00
Joseph Huber
d1b2940290
[libc] Add loader option to force serial execution of GPU region (#101601)
Summary:
The loader is used as a test utility to run traditionally CPU based unit
tests on the GPU. This has issues when used with something like
`llvm-lit` because the GPU runtimes have a nasty habit of either running
out of resources or hanging when they are overloaded. To combat this, I
added this option to force each process to perform the GPU part
serially.

This is done right now with a simple file lock on the executing file. I
was originally thinking about using more complex IPC to allow N
processes to share execution, but that seemed overly complicated given
the incredibly large number of failure modes it introduces. File locks
are nice here because if the process crashes or is killed it will
release the lock automatically (at least on Linux). This is in contrast
to something like POSIX shared memory which will stick around until it's
unlinked, meaning that if someone did `sigkill` on the program it would
never get cleaned up and other threads might wait on a mutex that never
occurs.

Restricting this to one thread isn't overly ideal, given the fact that
the runtime can likely handle at least a *few* separate processes, but
this was easy and it works, so might as well start here. This will
hopefully unblock me on running `libcxx` tests, as those ran with so
much parallelism spurious failures were very common.
2024-08-05 14:49:15 -05:00
Joseph Huber
5e326983b6
[libc] Use LLVM CommandLine for loader tool (#101501)
Summary:
This patch removes the ad-hoc parsing that I used previously and
replaces it with the LLVM CommnadLine interface. This doesn't change any
functionality, but makes it easier to maintain.
2024-08-01 14:07:28 -05:00
Joseph Huber
097a1d28ed [libc] Remove extra parens 2024-08-01 07:16:44 -05:00
Joseph Huber
feeb8335a0
[libc] Change the GPU loaders to LLVM executables (#101442)
Summary:
I am going to rework these tools to just me LLVM tools. This patch is
pretty much NFC to set up the CMake for that.
2024-08-01 07:13:41 -05:00
aaryanshukla
30b5d4a763
[libc][math][c23] Add dfma{l,f128} and dsub{l,f128} C23 math functions (#101089)
Co-authored-by: OverMighty <its.overmighty@gmail.com>
2024-07-31 13:07:03 -07:00
Job Henandez Lara
c1562374c8
[libc][math][c23] Add entrypoints and tests for dsqrt{l,f128} (#99815) 2024-07-21 15:55:11 -04:00
Job Henandez Lara
af0f58cf14
[libc][math][c23] Add entrypoints and tests for fsqrt{,l,f128} (#99669) 2024-07-21 11:17:41 -04:00
Joseph Huber
c8e69fa4a0 [libc] Fix GPU 'printf' on strings with padding
Summary:
We get the `strlen` to know how much memory to allocate here, but it
wasn't taking into account if the padding was larger than the string
itself. This patch sets it to an empty string so we always add the
minimum size. This implementation is slightly wasteful with memory, but
I am not concerned with a few extra bytes here and there for some memory
that gets immediately free'd.
2024-07-20 22:36:12 -05:00
OverMighty
f61c9a9485
[libc][CMake] Set library type of libcMPFRWrapper to STATIC (#99527)
Fixes linker errors due to hidden symbols when running CMake with
-DBUILD_SHARED_LIBS=ON.
2024-07-18 23:16:48 +02:00
OverMighty
9fb049c8c6
[libc][math][c23] Add {f,d}mul{l,f128} and f16mul{,f,l,f128} C23 math functions (#98972)
Part of #93566.
                
Fixes #94833.
2024-07-18 19:50:49 +02:00
Joseph Huber
10b4834b76 [libc] Fix wrong printf usage in AMDGPU loader 2024-07-17 16:34:47 -05:00
jameshu15869
1ecffdaf27
[libc] Add Kernel Resource Usage to nvptx-loader (#97503)
This PR allows `nvptx-loader` to read the resource usage of `_start`,
`_begin`, and `_end` when executing CUDA binaries.

Example output:
```
$ nvptx-loader --print-resource-usage libc/benchmarks/gpu/src/ctype/libc.benchmarks.gpu.src.ctype.isalnum_benchmark.__build__
[ RUN      ] LlvmLibcIsAlNumGpuBenchmark.IsAlnumWrapper
[       OK ] LlvmLibcIsAlNumGpuBenchmark.IsAlnumWrapper: 93 cycles, 76 min, 470 max, 23 iterations, 78000 ns, 80 stddev
_begin registers: 25
_start registers: 80
_end registers: 62
  ```

---------

Co-authored-by: Joseph Huber <huberjn@outlook.com>
2024-07-17 16:07:12 -05:00
Joseph Huber
40effc7af5
[libc] Implement (v|f)printf on the GPU (#96369)
Summary:
This patch implements the `printf` family of functions on the GPU using
the new variadic support. This patch adapts the old handling in the
`rpc_fprintf` placeholder, but adds an extra RPC call to get the size of
the buffer to copy. This prevents the GPU from needing to parse the
string. While it's theoretically possible for the pass to know the size
of the struct, it's prohibitively difficult to do while maintaining ABI
compatibility with NVIDIA's varargs.

Depends on https://github.com/llvm/llvm-project/pull/96015.
2024-07-12 19:36:13 -05:00
Petr Hosek
5ff3ff33ff
[libc] Migrate to using LIBC_NAMESPACE_DECL for namespace declaration (#98597)
This is a part of #97655.
2024-07-12 09:28:41 -07:00
Mehdi Amini
ce9035f5bd
Revert "[libc] Migrate to using LIBC_NAMESPACE_DECL for namespace declaration" (#98593)
Reverts llvm/llvm-project#98075

bots are broken
2024-07-12 09:12:13 +02:00