138 Commits

Author SHA1 Message Date
Joseph Huber
f6f4744176
[libc] Install RPC server to shared/rpc.h (#120170)
Summary:
This installs the shared header to the users installation. I couldn't
decide if this should be a standalone thing or use the existing support
in `include/` mostly because this is completely separate from hdrgen
stuff and it's C++.
2024-12-17 07:45:13 -06:00
Jinsong Ji
e85a9f5540
libc: Prefix RPC Status code to avoid conflict in windows build (#119991)
Somehow conflict with define in wingdi.h.

Fix build failures:

[ 52%] Building CXX object
projects/offload/plugins-nextgen/common/CMakeFiles/PluginCommon.dir/src/RPC.cpp.obj
In file included from
...llvm\offload\plugins-nextgen\common\src\RPC.cpp:16:
...\llvm\libc\shared\rpc.h(48,3): error: expected identifier
   48 |   ERROR = 0x1000,
      |   ^
c:\Program files (x86)\Windows
Kits\10\include\10.0.22000.0\um\wingdi.h(118,29): note: expanded from
macro 'ERROR'
  118 | #define ERROR               0
      |                             ^
...\llvm\offload\plugins-nextgen\common\src\RPC.cpp(75,17): error:
expected unqualified-id
   75 |     return rpc::ERROR;
      |                 ^
c:\Program files (x86)\Windows
Kits\10\include\10.0.22000.0\um\wingdi.h(118,29): note: expanded from
macro 'ERROR'
  118 | #define ERROR               0
      |                             ^
2 errors generated.
2024-12-15 09:35:44 -05:00
Joseph Huber
a2fc276ed2 [libc] Remove complicated header guards on HSA include
Summary:
This is much more standard now, we already require new HSA with what we
use, so no point checking for this.
2024-12-04 16:28:13 -06:00
Joseph Huber
a6ef0debb1 [libc][NFC] Rename RPC opcodes to better reflect their usage
Summary:
RPC_ is a generic prefix here, use LIBC_ to indicate that these are
opcodes used to implement the C library
2024-12-02 15:35:08 -06:00
Joseph Huber
1d810ece2b
[libc] Move libc server handlers to a shared header (#117908)
Summary:
We can simply include this header from the shared directory now and do
not need to have this level of indirection. Simply stash it with the
other libc opcode handlers.

If we were able to move the printf handlers to the shared directory then
this could just be a header as well, which would HEAVILY simplify the
mess associated with building the RPC server first in the projects
build, then copying it to the runtimes build.
2024-11-27 14:57:52 -06:00
Joseph Huber
89d8e70031
[libc] Export a pointer to the RPC client directly (#117913)
Summary:
We currently have an unnecessary level of indirection when initializing
the RPC client. This is a holdover from when the RPC client was not
trivially copyable and simply makes it more complicated. Here we use the
`asm` syntax to give the C++ variable a valid name so that we can just
copy to it directly.

Another advantage to this, is that if users want to piggy-back on the
same RPC interface they need only declare theirs as extern with the same
symbol name, or make it weak to optionally use it if LIBC isn't
avaialb.e
2024-11-27 14:57:38 -06:00
Joseph Huber
38049dc8ee
[libc] Handle differing wavefront sizes correctly in the AMDHSA loader (#117788)
Summary:
The AMDGPU backend can handle wavefront sizes of 32 and 64, with the
native hardware preferring one or the other. The user can override the
hardware with `-mwavefrontsize64` or `-mwavefrontsize32` which
previously wasn't handled. We need to know the wavefront size to know
how much memory to allocate and how to index the RPC buffer. There isn't
a good way to do this with ROCm so we just use the LLVM support for
offloading to check this from the image.
2024-11-27 10:04:00 -06:00
Joseph Huber
d7c20a6f0c [libc][NFC] Move RPC opcodes to the 'shared/' directory as well 2024-11-25 12:04:10 -06:00
Joseph Huber
b4d49fb52e
[libc] Remove RPC server API and use the header directly (#117075)
Summary:
This patch removes much of the `llvmlibc_rpc_server` interface. This
pretty much deletes all of this code and just replaces it with including
`rpc.h` directly. We still maintain the file to let `libc` handle the
opcodes, since those depend on the `printf` impelmentation.

This will need to be cleaned up more, but I don't want to put too much
into a single patch.
2024-11-25 07:13:28 -06:00
Joseph Huber
89614ceb40
[libc] Move RPC interface to libc/shared to export it (#117034)
Summary:
Previous patches have made the `rpc.h` header independent of the `libc`
internals. This allows us to include it directly rather than providing
an indirect C API. This patch only does the work to move the header. A
future patch will pull out the `rpc_server` interface and simply replace
it with a single function that handles the opcodes.
2024-11-22 15:32:25 -06:00
Joseph Huber
676a1e6643
[AMDGPU] Remove uses of deprecreated HSA executable functions (#117241)
Summary:
These functions were deprecated in ROCR 1.3 which was released quite
some time ago. The main functionality that was lost was modifying and
inspecting the code object indepedently of the executable, however we do
all of that custom through our ELF API. This should be within the
versions of other functions we use.
2024-11-22 07:16:40 -06:00
Joseph Huber
27d25d1c12
[libc] Increase RPC opcode to 32-bit and use a class byte (#116905)
Summary:
Currently, the RPC interface uses a basic opcode to communicate with the
server. This currently is 16 bits. There's no reason for this to be 16
bits, because on the GPU a 32-bit write is the same as a 16-bit write
performance wise.

Additionally, I am now making all the `libc` based opcodes qualified
with the 'c' type, mimiciing how Linux handles `ioctls` all coming from
the same driver. This will make it easier to extend the interface when
it's exported directly.
2024-11-19 21:56:10 -06:00
lntue
88a0a318e8
[libc] Use relative inclusion for public headers. (#114324)
We are finalizing the header inclusion policy, and for our public
headers in the `libc/include` folder, they must use relative path in
`"..."` when including each other.

This PR does the cleanup making sure that all the public header
inclusions in `libc/include` folder use relative paths.

---------

Co-authored-by: Nick Desaulniers <nickdesaulniers@users.noreply.github.com>
2024-11-01 14:33:30 -04:00
Joseph Huber
be0c67c90e
[libc] Remove dependency on cpp::function in rpc.h (#112422)
Summary:
I'm going to attempt to move the `rpc.h` header to a separate folder
that we can install and include outside of `libc`. Before doing this I'm
going to try to trim up the file so there's not as many things I need to
copy to make it work. This dependency on `cpp::functional` is a low
hanging fruit. I only did it so that I could overload the argument of
the work function so that passing the id was optional in the lambda,
that's not a *huge* deal and it makes it more explicit I suppose.
2024-10-15 12:31:06 -07:00
Joseph Huber
ee57a685fa
[libc] Make a dedicated thread for the RPC server (#111210)
Summary:
Make a separate thread to run the server when we launch. This is
required by CUDA, which you can force with `export
CUDA_LAUNCH_BLOCKING=1`. I figured I might as well be consistent and do
it for the AMD implementation as well even though I believe it's not
necessary.
2024-10-07 05:30:44 -07:00
Ivan Butygin
26ca8ef836
[libc] GPU RPC interface: add return value to rpc_host_call (#111288) 2024-10-06 20:22:07 +03:00
Joseph Huber
6558e5615a
[libc] Update HSA queues to use the maximum size and set the barrier bit (#110034)
Summary:
It's safer to use the maximum size, as this prevents the runtime from
oversubscribing with multiple producers. Additionally we should set the
barrier bit to ensure that the queue entries block if multiple are
submitted (Which shouldn't happen for this tool).
2024-09-28 16:49:28 -05:00
Ivan Butygin
bbe79a803c
[libc] Use RAII alloc in gpu rpc printf impl (#110352) 2024-09-28 15:44:01 +03:00
Ivan Butygin
ef390b36ca
[libc] Use RAII based alloc in gpu rpc_server instead of manual new/delete (#110341)
Co-authored-by: Joseph Huber <huberjn@outlook.com>
2024-09-28 11:53:21 +03:00
Joseph Huber
b712a1445b [libc] Fix memory leak and accidentally ignoring dimensions in loader
Summary:
The loader had a bug where we weren't setting the dimensions correctly,
also I forgot to delete the paths for this  RPC call.
2024-09-27 09:57:44 -05:00
Joseph Huber
fe6a3d46aa
[libc] Implement the 'rename' function on the GPU (#109814)
Summary:
Straightforward implementation like the other `stdio.h` functions.
2024-09-24 09:32:42 -07:00
Joseph Huber
16d11e26f3
[libc] Add GPU support for the 'system' function (#109687)
Summary:
This function can easily be implemented by forwarding it to the host
process. This shows up in a few places that we might want to test the
GPU so it should be provided. Also, I find the idea of the GPU
offloading work to the CPU via `system` very funny.
2024-09-23 14:04:28 -07:00
Michael Jones
010c0d36e1
[libc][AMDGPU] Disable %m in RPC server (#109317)
The RPC server directly includes the printf code, but doesn't support
errno, so the %m conversion needs to be disabled there as well. This
patch does that.
2024-09-19 13:33:23 -05:00
Joseph Huber
f126bc984c [libc] Fix conflict values from internal limits.h when used externally 2024-08-07 10:09:02 -05:00
Joseph Huber
06a808c4f4 [libc] Fix bot accidentally picking up conflicting MB_LEN_MAX 2024-08-07 09:19:53 -05:00
Joseph Huber
2e9f15e1df [libc] Fix index into argument vector 2024-08-06 14:06:51 -05:00
Joseph Huber
3983bf6040 [libc] Fix GPU argument vector writing nullptr to string
Summary:
The intention behind this code was to null terminate the `envp` string,
but it accidentally went into the string data.
2024-08-06 13:03:06 -05:00
Joseph Huber
8c6a6f1a70 [libc] Make RPC malloc implementation return 'nullptr' on alloc failure
Summary:
`malloc` is supposed to return `nullptr` if it fails, not exit with an
error code.
2024-08-06 11:03:40 -05:00
Joseph Huber
d1b2940290
[libc] Add loader option to force serial execution of GPU region (#101601)
Summary:
The loader is used as a test utility to run traditionally CPU based unit
tests on the GPU. This has issues when used with something like
`llvm-lit` because the GPU runtimes have a nasty habit of either running
out of resources or hanging when they are overloaded. To combat this, I
added this option to force each process to perform the GPU part
serially.

This is done right now with a simple file lock on the executing file. I
was originally thinking about using more complex IPC to allow N
processes to share execution, but that seemed overly complicated given
the incredibly large number of failure modes it introduces. File locks
are nice here because if the process crashes or is killed it will
release the lock automatically (at least on Linux). This is in contrast
to something like POSIX shared memory which will stick around until it's
unlinked, meaning that if someone did `sigkill` on the program it would
never get cleaned up and other threads might wait on a mutex that never
occurs.

Restricting this to one thread isn't overly ideal, given the fact that
the runtime can likely handle at least a *few* separate processes, but
this was easy and it works, so might as well start here. This will
hopefully unblock me on running `libcxx` tests, as those ran with so
much parallelism spurious failures were very common.
2024-08-05 14:49:15 -05:00
Joseph Huber
5e326983b6
[libc] Use LLVM CommandLine for loader tool (#101501)
Summary:
This patch removes the ad-hoc parsing that I used previously and
replaces it with the LLVM CommnadLine interface. This doesn't change any
functionality, but makes it easier to maintain.
2024-08-01 14:07:28 -05:00
Joseph Huber
097a1d28ed [libc] Remove extra parens 2024-08-01 07:16:44 -05:00
Joseph Huber
feeb8335a0
[libc] Change the GPU loaders to LLVM executables (#101442)
Summary:
I am going to rework these tools to just me LLVM tools. This patch is
pretty much NFC to set up the CMake for that.
2024-08-01 07:13:41 -05:00
Joseph Huber
c8e69fa4a0 [libc] Fix GPU 'printf' on strings with padding
Summary:
We get the `strlen` to know how much memory to allocate here, but it
wasn't taking into account if the padding was larger than the string
itself. This patch sets it to an empty string so we always add the
minimum size. This implementation is slightly wasteful with memory, but
I am not concerned with a few extra bytes here and there for some memory
that gets immediately free'd.
2024-07-20 22:36:12 -05:00
Joseph Huber
10b4834b76 [libc] Fix wrong printf usage in AMDGPU loader 2024-07-17 16:34:47 -05:00
jameshu15869
1ecffdaf27
[libc] Add Kernel Resource Usage to nvptx-loader (#97503)
This PR allows `nvptx-loader` to read the resource usage of `_start`,
`_begin`, and `_end` when executing CUDA binaries.

Example output:
```
$ nvptx-loader --print-resource-usage libc/benchmarks/gpu/src/ctype/libc.benchmarks.gpu.src.ctype.isalnum_benchmark.__build__
[ RUN      ] LlvmLibcIsAlNumGpuBenchmark.IsAlnumWrapper
[       OK ] LlvmLibcIsAlNumGpuBenchmark.IsAlnumWrapper: 93 cycles, 76 min, 470 max, 23 iterations, 78000 ns, 80 stddev
_begin registers: 25
_start registers: 80
_end registers: 62
  ```

---------

Co-authored-by: Joseph Huber <huberjn@outlook.com>
2024-07-17 16:07:12 -05:00
Joseph Huber
40effc7af5
[libc] Implement (v|f)printf on the GPU (#96369)
Summary:
This patch implements the `printf` family of functions on the GPU using
the new variadic support. This patch adapts the old handling in the
`rpc_fprintf` placeholder, but adds an extra RPC call to get the size of
the buffer to copy. This prevents the GPU from needing to parse the
string. While it's theoretically possible for the pass to know the size
of the struct, it's prohibitively difficult to do while maintaining ABI
compatibility with NVIDIA's varargs.

Depends on https://github.com/llvm/llvm-project/pull/96015.
2024-07-12 19:36:13 -05:00
Joseph Huber
ec0e6ef09b
[libc] Implement the 'remove' function on the GPU (#97096)
Summary:
Straightforward RPC implementation of the `remove` function for the GPU.
Copies over the string and calls `remove` on it, passing the result
back. This is required for building some `libc++` functionality.
2024-07-01 06:29:48 -05:00
Joseph Huber
5849cbad0f
[libc] Add line numbers to libc utility error messages (#94010)
Summary:
Currently we just print the error as seen, this makes it difficult if
something goes wrong to know where it failed. This patch just adds in
line numbers to all the error handling routines so you can trace it
back.
2024-05-31 14:34:37 -05:00
Joseph Huber
7327014b49
[libc] Implement temporary printf on the GPU (#85331)
Summary:
This patch adds a temporary implementation that uses a struct-based
interface in lieu of varargs support. Once varargs support exists we
will move this implementation to the "real" printf implementation.

Conceptually, this patch has the client copy over its format string and
arguments to the server. The server will then scan the format string
searching for any specifiers that are actually a string. If it is a
string then we will send the pointer back to the server to tell it to
copy it back. This copied value will then replace the pointer when the
final formatting is done.

This will require a built-in extension to the varargs support to get
access to the underlying struct. The varargs used on the GPU will simply
be a struct wrapped in a varargs ABI.
2024-04-02 16:25:18 -05:00
Joseph Huber
a1a8bb1d3a
[libc] Change RPC interface to not use device ids (#87087)
Summary:
The current implementation of RPC tied everything to device IDs and
forced us to do init / shutdown to manage some global state. This turned
out to be a bad idea in situations where we want to track multiple
hetergeneous devices that may report the same device ID in the same
process.

This patch changes the interface to instead create an opaque handle to
the internal device and simply allocates it via `new`. The user will
then take this device and store it to interface with the attached
device. This interface puts the burden of tracking the device identifier
to mapped d evices onto the user, but in return heavily simplifies the
implementation.
2024-03-29 12:49:16 -05:00
Marc Auberer
77118536b5
[libc] Remove obsolete LIBC_HAS_BUILTIN macro (#86554)
Fixes #86546 and removes the macro `LIBC_HAS_BUILTIN`. This was
necessary to support older compilers that did not support
`__has_builtin`. All of the compilers we support already have this
builtin.
See: https://libc.llvm.org/compiler_support.html
All uses now use `__has_builtin` directly

cc @nickdesaulniers
2024-03-27 17:22:41 +01:00
Gheorghe-Teodor Bercea
c25e77436e
Revert "[libomptarget][nextgen-plugin] Use SCRELEASE/SCACQUIRE in packet headers" (#85950)
Reverts llvm/llvm-project#85678
2024-03-20 11:40:12 -04:00
Gheorghe-Teodor Bercea
927308a52b
[libomptarget][nextgen-plugin] Use SCRELEASE/SCACQUIRE in packet headers (#85678)
This patch updates the construction of packet headers to replace the
usage of ACQUIRE/RELEASE with SCACQUIRE/SCRELEASE which is now
recommended.
The patch also ensures consistency across kernel dispatches.
2024-03-20 11:22:01 -04:00
Joseph Huber
9bc294f9be
[libc] Build the GPU during the projects setup like libc-hdrgen (#84667)
Summary:
The libc build has a few utilties that need to be built before we can do
everything in the full build. The one requirement currently is the
`libc-hdrgen` binary. If we are doing a full build runtimes mode we
first add `libc` to the projects list and then only use the `projects`
portion to buld the `libc` portion. We also use utilities for the GPU
build, namely the loader utilities. Previously we would build these
tools on-demand inside of the cross-build, which tool some hacky
workarounds for the dependency finding and target triple. This patch
instead just builds them similarly to libc-hdrgen and then passses them
in. We now either pass it manually it it was built, or just look it up
like we do with the other `clang` tools.

Depends on https://github.com/llvm/llvm-project/pull/84664
2024-03-11 09:18:47 -05:00
Joseph Huber
8a79003307 [libc] Move RPC opcodes include out of the header
Summary:
This header isn't strictly necessary, and is currently broken because we
install these to separate locations.
2024-03-10 14:07:47 -05:00
Joseph Huber
033dbbe4f1 [libc][NFC] Clean up stray ';' and default enum warning
Summary:
Cleans up two warnings I get locally while building.
2024-03-10 09:32:12 -05:00
Joseph Huber
29762e3722
[libc][NFCI] Remove lane size template argument on RPC server (#84557)
Summary:
We previously changed the data layout for the RPC buffer to make it lane
size agnostic. I put off changing the size for the server case to make
the patch smaller. This patch simply reorganizes code by making the lane
size an argument to the port rather than a templtae size. Heavily
simplifies a lot of code, no more `std::variant`.
2024-03-08 15:02:19 -06:00
Joseph Huber
06ac828dc1
[libc] Fix flipped AMDGPU kernel launch arguments (#83648)
Summary:
These values were incorrectly flipped, setting the size of the blocks to
the threads and vice-versa. When I originally wrote the thread utilities
it was using COV4 which used an implicit format. Then when I updated I
accidentally flipped them and never noticed because nothing depended on
the size of the threads until I checked it manually.
2024-03-01 20:56:07 -06:00
lntue
73aab2f697
[libc] Revert https://github.com/llvm/llvm-project/pull/83199 since it broke Fuchsia. (#83374)
With some header fix forward for GPU builds.
2024-02-29 14:43:53 -05:00
Joseph Huber
04e8653f18
[libc] Add "include/" to the LLVM include directories (#83199)
Summary:
Recent changes added an include path in the float128 type that used the
internal `libc` path to find the macro. This doesn't work once it's
installed because we need to search from the root of the install dir.
This patch adds "include/" to the include path so that our inclusion
of installed headers always match the internal use.
2024-02-27 17:45:15 -06:00