llvm-project

mirror of https://github.com/llvm/llvm-project.git synced 2025-04-26 05:06:06 +00:00

Author	SHA1	Message	Date
Joseph Huber	89614ceb40	[libc] Move RPC interface to `libc/shared` to export it (#117034 ) Summary: Previous patches have made the `rpc.h` header independent of the `libc` internals. This allows us to include it directly rather than providing an indirect C API. This patch only does the work to move the header. A future patch will pull out the `rpc_server` interface and simply replace it with a single function that handles the opcodes.	2024-11-22 15:32:25 -06:00
Joseph Huber	676a1e6643	[AMDGPU] Remove uses of deprecreated HSA executable functions (#117241 ) Summary: These functions were deprecated in ROCR 1.3 which was released quite some time ago. The main functionality that was lost was modifying and inspecting the code object indepedently of the executable, however we do all of that custom through our ELF API. This should be within the versions of other functions we use.	2024-11-22 07:16:40 -06:00
Joseph Huber	27d25d1c12	[libc] Increase RPC opcode to 32-bit and use a class byte (#116905 ) Summary: Currently, the RPC interface uses a basic opcode to communicate with the server. This currently is 16 bits. There's no reason for this to be 16 bits, because on the GPU a 32-bit write is the same as a 16-bit write performance wise. Additionally, I am now making all the `libc` based opcodes qualified with the 'c' type, mimiciing how Linux handles `ioctls` all coming from the same driver. This will make it easier to extend the interface when it's exported directly.	2024-11-19 21:56:10 -06:00
lntue	88a0a318e8	[libc] Use relative inclusion for public headers. (#114324 ) We are finalizing the header inclusion policy, and for our public headers in the `libc/include` folder, they must use relative path in `"..."` when including each other. This PR does the cleanup making sure that all the public header inclusions in `libc/include` folder use relative paths. --------- Co-authored-by: Nick Desaulniers <nickdesaulniers@users.noreply.github.com>	2024-11-01 14:33:30 -04:00
Joseph Huber	be0c67c90e	[libc] Remove dependency on `cpp::function` in `rpc.h` (#112422 ) Summary: I'm going to attempt to move the `rpc.h` header to a separate folder that we can install and include outside of `libc`. Before doing this I'm going to try to trim up the file so there's not as many things I need to copy to make it work. This dependency on `cpp::functional` is a low hanging fruit. I only did it so that I could overload the argument of the work function so that passing the id was optional in the lambda, that's not a huge deal and it makes it more explicit I suppose.	2024-10-15 12:31:06 -07:00
Joseph Huber	ee57a685fa	[libc] Make a dedicated thread for the RPC server (#111210 ) Summary: Make a separate thread to run the server when we launch. This is required by CUDA, which you can force with `export CUDA_LAUNCH_BLOCKING=1`. I figured I might as well be consistent and do it for the AMD implementation as well even though I believe it's not necessary.	2024-10-07 05:30:44 -07:00
Ivan Butygin	26ca8ef836	[libc] GPU RPC interface: add return value to `rpc_host_call` (#111288 )	2024-10-06 20:22:07 +03:00
Joseph Huber	6558e5615a	[libc] Update HSA queues to use the maximum size and set the barrier bit (#110034 ) Summary: It's safer to use the maximum size, as this prevents the runtime from oversubscribing with multiple producers. Additionally we should set the barrier bit to ensure that the queue entries block if multiple are submitted (Which shouldn't happen for this tool).	2024-09-28 16:49:28 -05:00
Ivan Butygin	bbe79a803c	[libc] Use RAII alloc in gpu rpc printf impl (#110352 )	2024-09-28 15:44:01 +03:00
Ivan Butygin	ef390b36ca	[libc] Use RAII based alloc in gpu rpc_server instead of manual new/delete (#110341 ) Co-authored-by: Joseph Huber <huberjn@outlook.com>	2024-09-28 11:53:21 +03:00
Joseph Huber	b712a1445b	[libc] Fix memory leak and accidentally ignoring dimensions in loader Summary: The loader had a bug where we weren't setting the dimensions correctly, also I forgot to delete the paths for this RPC call.	2024-09-27 09:57:44 -05:00
Joseph Huber	fe6a3d46aa	[libc] Implement the 'rename' function on the GPU (#109814 ) Summary: Straightforward implementation like the other `stdio.h` functions.	2024-09-24 09:32:42 -07:00
Joseph Huber	16d11e26f3	[libc] Add GPU support for the 'system' function (#109687 ) Summary: This function can easily be implemented by forwarding it to the host process. This shows up in a few places that we might want to test the GPU so it should be provided. Also, I find the idea of the GPU offloading work to the CPU via `system` very funny.	2024-09-23 14:04:28 -07:00
Michael Jones	010c0d36e1	[libc][AMDGPU] Disable %m in RPC server (#109317 ) The RPC server directly includes the printf code, but doesn't support errno, so the %m conversion needs to be disabled there as well. This patch does that.	2024-09-19 13:33:23 -05:00
Joseph Huber	f126bc984c	[libc] Fix conflict values from internal `limits.h` when used externally	2024-08-07 10:09:02 -05:00
Joseph Huber	06a808c4f4	[libc] Fix bot accidentally picking up conflicting MB_LEN_MAX	2024-08-07 09:19:53 -05:00
Joseph Huber	2e9f15e1df	[libc] Fix index into argument vector	2024-08-06 14:06:51 -05:00
Joseph Huber	3983bf6040	[libc] Fix GPU argument vector writing `nullptr` to string Summary: The intention behind this code was to null terminate the `envp` string, but it accidentally went into the string data.	2024-08-06 13:03:06 -05:00
Joseph Huber	8c6a6f1a70	[libc] Make RPC malloc implementation return 'nullptr' on alloc failure Summary: `malloc` is supposed to return `nullptr` if it fails, not exit with an error code.	2024-08-06 11:03:40 -05:00
Joseph Huber	d1b2940290	[libc] Add loader option to force serial execution of GPU region (#101601 ) Summary: The loader is used as a test utility to run traditionally CPU based unit tests on the GPU. This has issues when used with something like `llvm-lit` because the GPU runtimes have a nasty habit of either running out of resources or hanging when they are overloaded. To combat this, I added this option to force each process to perform the GPU part serially. This is done right now with a simple file lock on the executing file. I was originally thinking about using more complex IPC to allow N processes to share execution, but that seemed overly complicated given the incredibly large number of failure modes it introduces. File locks are nice here because if the process crashes or is killed it will release the lock automatically (at least on Linux). This is in contrast to something like POSIX shared memory which will stick around until it's unlinked, meaning that if someone did `sigkill` on the program it would never get cleaned up and other threads might wait on a mutex that never occurs. Restricting this to one thread isn't overly ideal, given the fact that the runtime can likely handle at least a few separate processes, but this was easy and it works, so might as well start here. This will hopefully unblock me on running `libcxx` tests, as those ran with so much parallelism spurious failures were very common.	2024-08-05 14:49:15 -05:00
Joseph Huber	5e326983b6	[libc] Use LLVM CommandLine for loader tool (#101501 ) Summary: This patch removes the ad-hoc parsing that I used previously and replaces it with the LLVM CommnadLine interface. This doesn't change any functionality, but makes it easier to maintain.	2024-08-01 14:07:28 -05:00
Joseph Huber	097a1d28ed	[libc] Remove extra parens	2024-08-01 07:16:44 -05:00
Joseph Huber	feeb8335a0	[libc] Change the GPU loaders to LLVM executables (#101442 ) Summary: I am going to rework these tools to just me LLVM tools. This patch is pretty much NFC to set up the CMake for that.	2024-08-01 07:13:41 -05:00
Joseph Huber	c8e69fa4a0	[libc] Fix GPU 'printf' on strings with padding Summary: We get the `strlen` to know how much memory to allocate here, but it wasn't taking into account if the padding was larger than the string itself. This patch sets it to an empty string so we always add the minimum size. This implementation is slightly wasteful with memory, but I am not concerned with a few extra bytes here and there for some memory that gets immediately free'd.	2024-07-20 22:36:12 -05:00
Joseph Huber	10b4834b76	[libc] Fix wrong printf usage in AMDGPU loader	2024-07-17 16:34:47 -05:00
jameshu15869	1ecffdaf27	[libc] Add Kernel Resource Usage to nvptx-loader (#97503 ) This PR allows `nvptx-loader` to read the resource usage of `_start`, `_begin`, and `_end` when executing CUDA binaries. Example output: ``` $ nvptx-loader --print-resource-usage libc/benchmarks/gpu/src/ctype/libc.benchmarks.gpu.src.ctype.isalnum_benchmark.__build__ [ RUN ] LlvmLibcIsAlNumGpuBenchmark.IsAlnumWrapper [ OK ] LlvmLibcIsAlNumGpuBenchmark.IsAlnumWrapper: 93 cycles, 76 min, 470 max, 23 iterations, 78000 ns, 80 stddev _begin registers: 25 _start registers: 80 _end registers: 62 ``` --------- Co-authored-by: Joseph Huber <huberjn@outlook.com>	2024-07-17 16:07:12 -05:00
Joseph Huber	40effc7af5	[libc] Implement (v\|f)printf on the GPU (#96369 ) Summary: This patch implements the `printf` family of functions on the GPU using the new variadic support. This patch adapts the old handling in the `rpc_fprintf` placeholder, but adds an extra RPC call to get the size of the buffer to copy. This prevents the GPU from needing to parse the string. While it's theoretically possible for the pass to know the size of the struct, it's prohibitively difficult to do while maintaining ABI compatibility with NVIDIA's varargs. Depends on https://github.com/llvm/llvm-project/pull/96015.	2024-07-12 19:36:13 -05:00
Joseph Huber	ec0e6ef09b	[libc] Implement the 'remove' function on the GPU (#97096 ) Summary: Straightforward RPC implementation of the `remove` function for the GPU. Copies over the string and calls `remove` on it, passing the result back. This is required for building some `libc++` functionality.	2024-07-01 06:29:48 -05:00
Joseph Huber	5849cbad0f	[libc] Add line numbers to libc utility error messages (#94010 ) Summary: Currently we just print the error as seen, this makes it difficult if something goes wrong to know where it failed. This patch just adds in line numbers to all the error handling routines so you can trace it back.	2024-05-31 14:34:37 -05:00
Joseph Huber	7327014b49	[libc] Implement temporary `printf` on the GPU (#85331 ) Summary: This patch adds a temporary implementation that uses a struct-based interface in lieu of varargs support. Once varargs support exists we will move this implementation to the "real" printf implementation. Conceptually, this patch has the client copy over its format string and arguments to the server. The server will then scan the format string searching for any specifiers that are actually a string. If it is a string then we will send the pointer back to the server to tell it to copy it back. This copied value will then replace the pointer when the final formatting is done. This will require a built-in extension to the varargs support to get access to the underlying struct. The varargs used on the GPU will simply be a struct wrapped in a varargs ABI.	2024-04-02 16:25:18 -05:00
Joseph Huber	a1a8bb1d3a	[libc] Change RPC interface to not use device ids (#87087 ) Summary: The current implementation of RPC tied everything to device IDs and forced us to do init / shutdown to manage some global state. This turned out to be a bad idea in situations where we want to track multiple hetergeneous devices that may report the same device ID in the same process. This patch changes the interface to instead create an opaque handle to the internal device and simply allocates it via `new`. The user will then take this device and store it to interface with the attached device. This interface puts the burden of tracking the device identifier to mapped d evices onto the user, but in return heavily simplifies the implementation.	2024-03-29 12:49:16 -05:00
Marc Auberer	77118536b5	[libc] Remove obsolete LIBC_HAS_BUILTIN macro (#86554 ) Fixes #86546 and removes the macro `LIBC_HAS_BUILTIN`. This was necessary to support older compilers that did not support `__has_builtin`. All of the compilers we support already have this builtin. See: https://libc.llvm.org/compiler_support.html All uses now use `__has_builtin` directly cc @nickdesaulniers	2024-03-27 17:22:41 +01:00
Gheorghe-Teodor Bercea	c25e77436e	Revert "[libomptarget][nextgen-plugin] Use SCRELEASE/SCACQUIRE in packet headers" (#85950 ) Reverts llvm/llvm-project#85678	2024-03-20 11:40:12 -04:00
Gheorghe-Teodor Bercea	927308a52b	[libomptarget][nextgen-plugin] Use SCRELEASE/SCACQUIRE in packet headers (#85678 ) This patch updates the construction of packet headers to replace the usage of ACQUIRE/RELEASE with SCACQUIRE/SCRELEASE which is now recommended. The patch also ensures consistency across kernel dispatches.	2024-03-20 11:22:01 -04:00
Joseph Huber	9bc294f9be	[libc] Build the GPU during the projects setup like libc-hdrgen (#84667 ) Summary: The libc build has a few utilties that need to be built before we can do everything in the full build. The one requirement currently is the `libc-hdrgen` binary. If we are doing a full build runtimes mode we first add `libc` to the projects list and then only use the `projects` portion to buld the `libc` portion. We also use utilities for the GPU build, namely the loader utilities. Previously we would build these tools on-demand inside of the cross-build, which tool some hacky workarounds for the dependency finding and target triple. This patch instead just builds them similarly to libc-hdrgen and then passses them in. We now either pass it manually it it was built, or just look it up like we do with the other `clang` tools. Depends on https://github.com/llvm/llvm-project/pull/84664	2024-03-11 09:18:47 -05:00
Joseph Huber	8a79003307	[libc] Move RPC opcodes include out of the header Summary: This header isn't strictly necessary, and is currently broken because we install these to separate locations.	2024-03-10 14:07:47 -05:00
Joseph Huber	033dbbe4f1	[libc][NFC] Clean up stray ';' and default enum warning Summary: Cleans up two warnings I get locally while building.	2024-03-10 09:32:12 -05:00
Joseph Huber	29762e3722	[libc][NFCI] Remove lane size template argument on RPC server (#84557 ) Summary: We previously changed the data layout for the RPC buffer to make it lane size agnostic. I put off changing the size for the server case to make the patch smaller. This patch simply reorganizes code by making the lane size an argument to the port rather than a templtae size. Heavily simplifies a lot of code, no more `std::variant`.	2024-03-08 15:02:19 -06:00
Joseph Huber	06ac828dc1	[libc] Fix flipped AMDGPU kernel launch arguments (#83648 ) Summary: These values were incorrectly flipped, setting the size of the blocks to the threads and vice-versa. When I originally wrote the thread utilities it was using COV4 which used an implicit format. Then when I updated I accidentally flipped them and never noticed because nothing depended on the size of the threads until I checked it manually.	2024-03-01 20:56:07 -06:00
lntue	73aab2f697	[libc] Revert https://github.com/llvm/llvm-project/pull/83199 since it broke Fuchsia. (#83374 ) With some header fix forward for GPU builds.	2024-02-29 14:43:53 -05:00
Joseph Huber	04e8653f18	[libc] Add "include/" to the LLVM include directories (#83199 ) Summary: Recent changes added an include path in the float128 type that used the internal `libc` path to find the macro. This doesn't work once it's installed because we need to search from the root of the install dir. This patch adds "include/" to the include path so that our inclusion of installed headers always match the internal use.	2024-02-27 17:45:15 -06:00
Joseph Huber	69c0b2febe	[libc][NFC] Remove all trailing spaces from libc (#82831 ) Summary: There are a lot of random training spaces on various lines. This patch just got rid of all of them with `sed 's/\ \+$//g'.	2024-02-23 16:34:00 -06:00
Joseph Huber	1a2ecbb398	[libc] Remove 'llvm-gpu-none' directory from build (#82816 ) Summary: This directory is leftover from when we handled both AMDGPU and NVPTX in the same build and merged them into a pseudo triple. Now the only thing it contains is the RPC server header. This gets rid of it, but now that it's in the base install directory we should make it clear that it's an LLVM libc header.	2024-02-23 14:11:31 -06:00
Joseph Huber	47b7c91abe	[libc] Rework the GPU build to be a regular target (#81921 ) Summary: This is a massive patch because it reworks the entire build and everything that depends on it. This is not split up because various bots would fail otherwise. I will attempt to describe the necessary changes here. This patch completely reworks how the GPU build is built and targeted. Previously, we used a standard runtimes build and handled both NVPTX and AMDGPU in a single build via multi-targeting. This added a lot of divergence in the build system and prevented us from doing various things like building for the CPU / GPU at the same time, or exporting the startup libraries or running tests without a full rebuild. The new appraoch is to handle the GPU builds as strict cross-compiling runtimes. The first step required https://github.com/llvm/llvm-project/pull/81557 to allow the `LIBC` target to build for the GPU without touching the other targets. This means that the GPU uses all the same handling as the other builds in `libc`. The new expected way to build the GPU libc is with `LLVM_LIBC_RUNTIME_TARGETS=amdgcn-amd-amdhsa;nvptx64-nvidia-cuda`. The second step was reworking how we generated the embedded GPU library by moving it into the library install step. Where we previously had one `libcgpu.a` we now have `libcgpu-amdgpu.a` and `libcgpu-nvptx.a`. This patch includes the necessary clang / OpenMP changes to make that not break the bots when this lands. We unfortunately still require that the NVPTX target has an `internal` target for tests. This is because the NVPTX target needs to do LTO for the provided version (The offloading toolchain can handle it) but cannot use it for the native toolchain which is used for making tests. This approach is vastly superior in every way, allowing us to treat the GPU as a standard cross-compiling target. We can now install the GPU utilities to do things like use the offload tests and other fun things. Some certain utilities need to be built with `--target=${LLVM_HOST_TRIPLE}` as well. I think this is a fine workaround as we will always assume that the GPU `libc` is a cross-build with a functioning host. Depends on https://github.com/llvm/llvm-project/pull/81557	2024-02-22 15:29:29 -06:00
Joseph Huber	f879ac0385	[libc] Rework the RPC interface to accept runtime wave sizes (#80914 ) Summary: The RPC interface needs to handle an entire warp or wavefront at once. This is currently done by using a compile time constant indicating the size of the buffer, which right now defaults to some value on the client (GPU) side. However, there are currently attempts to move the `libc` library to a single IR build. This is problematic as the size of the wave fronts changes between ISAs on AMDGPU. The builitin `__builtin_amdgcn_wavefrontsize()` will return the appropriate value, but it is only known at runtime now. In order to support this, this patch restructures the packet. Now instead of having an array of arrays, we simply have a large array of buffers and slice it according to the runtime value if we don't know it ahead of time. This also somewhat has the advantage of making the buffer contiguous within a page now that the header has been moved out of it.	2024-02-13 10:45:43 -06:00
Joseph Huber	5470ea4e36	[libc] Change the starting port index to use the SMID (#79200 ) Summary: The RPC interface uses several ports to provide parallel access. Right now we begin the search at the beginning, which heavily contests the early ports. Using the SMID allows us to stagger the starting index based off of the cluster identifier that is executing the current warp. Multiple warps can share an SM, but it will guaruntee that the contention for the low indices is lower. This also increases the maximum port size to around 4096, this is because 512 isn't enough to cover the full hardare parallelism needed to guarantee this doesdn't deadlock.	2024-01-30 13:06:58 -06:00
Petr Hosek	b86d02375e	[libc] Redo the install targets (#78795 ) Prior to this change, we wouldn't build headers that aren't referenced by other parts of the libc which would result in a build error during installation. To address this, we make the header target a dependency of the libc archive. Additionally, we also redo the install targets, moving the install targets closer to build targets and simplifying the hierarchy and generally matching what we do for other runtimes.	2024-01-19 15:45:22 -08:00
Joseph Huber	9553e156cb	[libc] Allocate fine-grained memory for the RPC host symbol Summary: This pointer has been causing issues. Allocating and reading from coarse memory on the CPU is not guaranteed and varies depending on the kernel version and support. Previously we attempted to pin the memory but this caused unexpected failures. This should be a legal operation and work around the problem as fine-grained memory should be always legal to write to by both sides.	2023-12-01 13:47:33 -06:00
Joseph Huber	8c1d476db0	Revert "[libc] Explicitly pin memory for the client symbol lookup (#73988 )" Summary: This caused the bots to begin failing. Revert for now to get the bot green. This reverts commit 8bea804923a1b028e86b177caccb3258708ca01c. This reverts commit e1395c7bdbe74b632ba7fbd90e2be2b4d82ee09e.	2023-12-01 13:04:49 -06:00
Joseph Huber	8bea804923	[libc] Move the pointer to pin off the stack to the heap (#74118 ) Summary: This may be problematic to pin a stack pointer. Allocate it via the OS allocator instead as the documentation suggests. For some reason, if you attempt to free this pointer after the memory region has been unlocked, it will return an invalid pointer.	2023-12-01 12:31:34 -06:00

1 2 3

129 Commits