We are finalizing the header inclusion policy, and for our public
headers in the `libc/include` folder, they must use relative path in
`"..."` when including each other.
This PR does the cleanup making sure that all the public header
inclusions in `libc/include` folder use relative paths.
---------
Co-authored-by: Nick Desaulniers <nickdesaulniers@users.noreply.github.com>
Summary:
This patch removes the ad-hoc parsing that I used previously and
replaces it with the LLVM CommnadLine interface. This doesn't change any
functionality, but makes it easier to maintain.
Summary:
Currently we just print the error as seen, this makes it difficult if
something goes wrong to know where it failed. This patch just adds in
line numbers to all the error handling routines so you can trace it
back.
Summary:
The current implementation of RPC tied everything to device IDs and
forced us to do init / shutdown to manage some global state. This turned
out to be a bad idea in situations where we want to track multiple
hetergeneous devices that may report the same device ID in the same
process.
This patch changes the interface to instead create an opaque handle to
the internal device and simply allocates it via `new`. The user will
then take this device and store it to interface with the attached
device. This interface puts the burden of tracking the device identifier
to mapped d evices onto the user, but in return heavily simplifies the
implementation.
Summary:
Recent changes added an include path in the float128 type that used the
internal `libc` path to find the macro. This doesn't work once it's
installed because we need to search from the root of the install dir.
This patch adds "include/" to the include path so that our inclusion
of installed headers always match the internal use.
Summary:
This directory is leftover from when we handled both AMDGPU and NVPTX in
the same build and merged them into a pseudo triple. Now the only thing
it contains is the RPC server header. This gets rid of it, but now that
it's in the base install directory we should make it clear that it's an
LLVM libc header.
Summary:
This patch removes the `rpc_reset` function. This was previously used to
initialize the RPC client on the device by setting up the pointers to
communicate with the server. The purpose of this was to make it easier
to initialize the device for testing. However, this prevented us from
enforcing an invariant that the buffers are all read-only from the
client side.
The expected way to initialize the server is now to copy it from the
host runtime. This will allow us to maintain that the RPC client is in
the constant address space on the GPU, potentially through inference,
and improving caching behaviour.
This `MAX_LANE_SIZE` was a hack from the days when we used a single
instance of the server and had some GPU state handle it. Now that we
have everything templated this really shouldn't be used. This patch
removes its use and replaces it with template arguments.
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D158633
This patch does the noisy work of removing the test opcodes from the
exported interface to an interface that is only visible in `libc`. The
benefit of this is that we both test the exported RPC registration more
directly, and we do not need to give this interface to users.
I have decided to export any opcode that is not a "core" libc feature as
having its MSB set in the opcode. We can think of these as non-libc
"extensions".
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D154848
This patch prepares the RPC interface to be installed. We place this in
the existing `llvm-gpu-none` directory as it will also give us access to
the generated `libc` headers for the opcodes.
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D153040
This patch begins providing a generic static library that wraps around
the raw `rpc.h` interface. As discussed in the corresponding RFC,
https://discourse.llvm.org/t/rfc-libc-exporting-the-rpc-interface-for-the-gpu-libc/71030,
we want to begin exporting RPC services to external users. In order to
do this we decided to not expose the `rpc.h` header by wrapping around
its functionality. This is done with a C-interface as we make heavy use
of callbacks and allows us to provide a predictable interface.
Reviewed By: JonChesterfield, sivachandra
Differential Revision: https://reviews.llvm.org/D147054
Allows moving the pointer swap between server and client into reset.
Single allocation simplifies whatever allocates the client/server, currently
the libc loaders.
Reviewed By: jhuber6
Differential Revision: https://reviews.llvm.org/D150337
The GPU has a different execution model to standard `_start`
implementations. On the GPU, all threads are active at the start of a
kernel. In order to correctly intitialize and call the constructors we
want single threaded semantics. Previously, this was done using a
makeshift global barrier with atomics. However, it should be easier to
simply put the portions of the code that must be single threaded in
separate kernels and then call those with only one thread. Generally,
mixing global state between kernel launches makes optimizations more
difficult, similarly to calling a function outside of the TU, but for
testing it is better to be correct.
Depends on D149527 D148943
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D149581
The execution model of the GPU expects that groups of threads will
execute in lock-step in SIMD fashion. It's both important for
performance and correctness that we treat this as the smallest possible
granularity for an RPC operation. Thus, we map multiple threads to a
single larger buffer and ship that across the wire.
This patch makes the necessary changes to support executing the RPC on
the GPU with multiple threads. This requires some workarounds to mimic
the model when handling the protocol from the CPU. I'm not completely
happy with some of the workarounds required, but I think it should work.
Uses some of the implementation details from D148191.
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D148943
We will want to test the GPU `libc` with multiple threads in the future.
This patch adds the `--threads` and `--blocks` option to set the `x`
dimension of the kernel. Using CUDA terminology instead of OpenCL for
familiarity.
Depends on D148288 D148342
Reviewed By: jdoerfert, sivachandra, tra
Differential Revision: https://reviews.llvm.org/D148485
Summary:
This implementation was buggy and inefficient. Fine-grained memory can
only be allocated on a page-level granularity. Which means that each
allocated string used about 4096 bytes. This is wasteful in general, and
also allowed for buggy behaviour. The previous copying of the
environment vector only worked because the large buffer size meant that
we would typically have a null byte after the allocated memory. However
this would break if the vector was larger than a page. This patch
allocates everything into a single buffer. It makes it easier to free,
use, and it more correct.
This patch adds a loader utility targeting the CUDA driver API to launch
NVPTX images called `nvptx_loader`. This takes a GPU image on the
command line and launches the `_start` kernel with the appropriate
arguments. The `_start` kernel is provided by the already implemented
`nvptx/start.cpp`. So, an application with a `main` function can be
compiled and run as follows.
```
clang++ --target=nvptx64-nvidia-cuda main.cpp crt1.o -march=sm_70 -o image
./nvptx_loader image args to kernel
```
This implementation is not tested and does not yet support RPC. This
requires further development to work around NVIDIA specific limitations
in atomics and linking.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D146681
This patch performs the same operation to copy over the `argv` array to
the `envp` array. This allows the GPU tests to use environment
variables.
Reviewed By: sivachandra
Differential Revision: https://reviews.llvm.org/D146322
This is the first attempt to get some testing support for GPUs in LLVM's
libc. We want to be able to compile for and call generic code while on
the device. This is difficult as most GPU applications also require the
support of large runtimes that may contain their own bugs (e.g. CUDA /
HIP / OpenMP / OpenCL / SYCL). The proposed solution is to provide a
"loader" utility that allows us to execute a "main" function on the GPU.
This patch implements a simple loader utility targeting the AMDHSA
runtime called `amdhsa_loader` that takes a GPU program as its first
argument. It will then attempt to load a predetermined `_start` kernel
inside that image and launch execution. The `_start` symbol is provided
by a `start` utility function that will be linked alongside the
application. Thus, this should allow us to run arbitrary code on the
user's GPU with the following steps for testing.
```
clang++ Start.cpp --target=amdgcn-amd-amdhsa -mcpu=<arch> -ffreestanding -nogpulib -nostdinc -nostdlib -c
clang++ Main.cpp --target=amdgcn-amd-amdhsa -mcpu=<arch> -nogpulib -nostdinc -nostdlib -c
clang++ Start.o Main.o --target=amdgcn-amd-amdhsa -o image
amdhsa_loader image <args, ...>
```
We determine the `-mcpu` value using the `amdgpu-arch` utility provided
either by `clang` or `rocm`. If `amdgpu-arch` isn't found or returns an
error we shouldn't run the tests as the machine does not have a valid
HSA compatible GPU. Alternatively we could make this utility in-source
to avoid the external dependency.
This patch provides a single test for this untility that simply checks
to see if we can compile an application containing a simple `main`
function and execute it.
The proposed solution in the future is to create an alternate
implementation of the LibcTest.cpp source that can be compiled and
launched using this utility. This approach should allow us to use the
same test sources as the other applications.
This is primarily a prototype, suggestions for how to better integrate
this with the existing LibC infastructure would be greatly appreciated.
The loader code should also be cleaned up somewhat. An implementation
for NVPTX will need to be written as well.
Reviewed By: sivachandra, JonChesterfield
Differential Revision: https://reviews.llvm.org/D139839