514 Commits

Author SHA1 Message Date
Joseph Huber
69c0b2febe
[libc][NFC] Remove all trailing spaces from libc (#82831)
Summary:
There are a lot of random training spaces on various lines. This patch
just got rid of all of them with `sed 's/\ \+$//g'.
2024-02-23 16:34:00 -06:00
Joseph Huber
1a2ecbb398
[libc] Remove 'llvm-gpu-none' directory from build (#82816)
Summary:
This directory is leftover from when we handled both AMDGPU and NVPTX in
the same build and merged them into a pseudo triple. Now the only thing
it contains is the RPC server header. This gets rid of it, but now that
it's in the base install directory we should make it clear that it's an
LLVM libc header.
2024-02-23 14:11:31 -06:00
Joseph Huber
47b7c91abe
[libc] Rework the GPU build to be a regular target (#81921)
Summary:
This is a massive patch because it reworks the entire build and
everything that depends on it. This is not split up because various bots
would fail otherwise. I will attempt to describe the necessary changes
here.

This patch completely reworks how the GPU build is built and targeted.
Previously, we used a standard runtimes build and handled both NVPTX and
AMDGPU in a single build via multi-targeting. This added a lot of
divergence in the build system and prevented us from doing various
things like building for the CPU / GPU at the same time, or exporting
the startup libraries or running tests without a full rebuild.

The new appraoch is to handle the GPU builds as strict cross-compiling
runtimes. The first step required
https://github.com/llvm/llvm-project/pull/81557 to allow the `LIBC`
target to build for the GPU without touching the other targets. This
means that the GPU uses all the same handling as the other builds in
`libc`.

The new expected way to build the GPU libc is with
`LLVM_LIBC_RUNTIME_TARGETS=amdgcn-amd-amdhsa;nvptx64-nvidia-cuda`.

The second step was reworking how we generated the embedded GPU library
by moving it into the library install step. Where we previously had one
`libcgpu.a` we now have `libcgpu-amdgpu.a` and `libcgpu-nvptx.a`. This
patch includes the necessary clang / OpenMP changes to make that not
break the bots when this lands.

We unfortunately still require that the NVPTX target has an `internal`
target for tests. This is because the NVPTX target needs to do LTO for
the provided version (The offloading toolchain can handle it) but cannot
use it for the native toolchain which is used for making tests.

This approach is vastly superior in every way, allowing us to treat the
GPU as a standard cross-compiling target. We can now install the GPU
utilities to do things like use the offload tests and other fun things.

Some certain utilities need to be built with 
`--target=${LLVM_HOST_TRIPLE}` as well. I think this is a fine
workaround as we
will always assume that the GPU `libc` is a cross-build with a
functioning host.

Depends on https://github.com/llvm/llvm-project/pull/81557
2024-02-22 15:29:29 -06:00
Schrodinger ZHU Yifan
6cfbf8ccce
[libc] add support for function level attributes (#79891)
See discussion at
https://discourse.llvm.org/t/rfc-support-function-attributes-in-user-interface/76624

Demo macro:
```c++
#if !defined(__LIBC_CONST_ATTR) && defined(__cplusplus) && defined(__GNUC__)
#if __has_attribute(const)
#define __LIBC_CONST_ATTR [[gnu::const]]
#endif
#endif
#if !defined(__LIBC_CONST_ATTR) && defined(__GNUC__)
#if __has_attribute(const)
#define __LIBC_CONST_ATTR __attribute__((const))
#endif
#endif
#if !defined(__LIBC_CONST_ATTR)
#define __LIBC_CONST_ATTR
#endif
```
2024-02-15 12:20:53 -05:00
Joseph Huber
f879ac0385
[libc] Rework the RPC interface to accept runtime wave sizes (#80914)
Summary:
The RPC interface needs to handle an entire warp or wavefront at once.
This is currently done by using a compile time constant indicating the
size of the buffer, which right now defaults to some value on the client
(GPU) side. However, there are currently attempts to move the `libc`
library to a single IR build. This is problematic as the size of the
wave fronts changes between ISAs on AMDGPU. The builitin
`__builtin_amdgcn_wavefrontsize()` will return the appropriate value,
but it is only known at runtime now.

In order to support this, this patch restructures the packet. Now
instead of having an array of arrays, we simply have a large array of
buffers and slice it according to the runtime value if we don't know it
ahead of time. This also somewhat has the advantage of making the buffer
contiguous within a page now that the header has been moved out of it.
2024-02-13 10:45:43 -06:00
lntue
fbf43b0121
[libc] Only declare float128 math functions in the generated math.h if float128 type is supported. (#81010) 2024-02-08 20:13:27 -05:00
Joseph Huber
5470ea4e36
[libc] Change the starting port index to use the SMID (#79200)
Summary:
The RPC interface uses several ports to provide parallel access. Right
now we begin the search at the beginning, which heavily contests the
early ports. Using the SMID allows us to stagger the starting index
based off of the cluster identifier that is executing the current warp.
Multiple warps can share an SM, but it will guaruntee that the
contention for the low indices is lower.

This also increases the maximum port size to around 4096, this is
because 512 isn't enough to cover the full hardare parallelism needed to
guarantee this doesdn't deadlock.
2024-01-30 13:06:58 -06:00
Guillaume Chatelet
2856db0d3b
[libc][NFC] Remove FPBits cast operator (#79142)
The semantics for casting can range from "bitcast" (same representation)
to "different representation", to "type promotion". Here we remove the
cast operator and force usage of `get_val` as the only function to get
the floating point value, making the intent clearer and more consistent.
2024-01-23 17:30:19 +01:00
Guillaume Chatelet
6b02d2f863
[reland][libc] Remove unnecessary FPBits functions and properties (#79128)
- reland #79113
- Fix aarch64 RISC-V build
2024-01-23 13:48:03 +01:00
Guillaume Chatelet
b524eed925
Revert "[libc] Remove unnecessary FPBits functions and properties" (#79118)
Reverts llvm/llvm-project#79113
It broke aarch64 build bot machines.
2024-01-23 11:51:18 +01:00
Guillaume Chatelet
3bc86bf3bf
[libc] Remove unnecessary FPBits functions and properties (#79113)
This patch reduces the surface of `FPBits`.
2024-01-23 11:48:28 +01:00
Schrodinger ZHU Yifan
be0fa319f9
[libc] fix unit tests in fullbuild (#78864)
fixes https://github.com/llvm/llvm-project/issues/78743

- For normal objects, the patch removes `RTTI` and exceptions in `fullbuild`
- For FP tests, the patch adds links to `stdc++` and `gcc_s` if `MPFR` is used.
2024-01-21 21:37:17 -05:00
Petr Hosek
b86d02375e
[libc] Redo the install targets (#78795)
Prior to this change, we wouldn't build headers that aren't referenced
by other parts of the libc which would result in a build error during
installation. To address this, we make the header target a dependency of
the libc archive. Additionally, we also redo the install targets, moving
the install targets closer to build targets and simplifying the
hierarchy and generally matching what we do for other runtimes.
2024-01-19 15:45:22 -08:00
Guillaume Chatelet
14f0c06f48
[libc] Fix is_subnormal for Intel Extended Precision (#78592)
Also turn a set of `get_biased_exponent() == 0` into `is_subnormal()`
which is clearer.
2024-01-19 09:36:03 +01:00
Petr Hosek
c20811b659
[libc] Fix libc-hdrgen crosscompiling (#78227)
The support introduced in 675702f356b0c3a540fa2e8af4192f7d658b2988 is
not working correctly in all scenarios. Instead of setup_host_tool
function, we can use the existing targets introduced by add_tablegen
macro.
2024-01-16 07:39:06 -06:00
Petr Hosek
b348126b21
[libc] Build native libc-hdrgen when crosscompiling (#77848)
When crosscompiling tools for a different architecture, we need to build
native libc-hdrgen which can be achieved using the existing CMake
support for crosscompiling tablegen tools.
2024-01-12 11:36:01 -08:00
Guillaume Chatelet
c09e690556
[libc][NFC] Remove FloatProperties (#76508)
Access is now done through `FPBits` exclusively.
This patch also renames a few internal structs and uses `T` instead of
`FP` as a template parameter.
2024-01-03 09:51:58 +01:00
Guillaume Chatelet
c23991478a
[libc][NFC] Integrate FloatProperties into FPBits (#76506)
`FloatProperties` is always included when `FPBits` is. This will help
further refactoring.
2023-12-28 15:42:47 +01:00
Kazu Hirata
b8f89b84bc Use StringRef::{starts,ends}_with (NFC)
This patch replaces uses of StringRef::{starts,ends}with with
StringRef::{starts,ends}_with for consistency with
std::{string,string_view}::{starts,ends}_with in C++20.

I'm planning to deprecate and eventually remove
StringRef::{starts,ends}with.
2023-12-16 15:02:17 -08:00
Guillaume Chatelet
3546f4da19
[libc][NFC] Rename MANTISSA_WIDTH in FRACTION_LEN (#75489)
This one might be a bit controversial since the terminology has been
introduced from the start but I think `FRACTION_LEN` is a better name
here. AFAICT it really is "the number of bits after the decimal dot when
the number is in normal form."

`MANTISSA_WIDTH` is less precise as it's unclear whether we take the
leading bit into account.
This patch also renames most of the properties to use the `_LEN` suffix
and fixes useless casts or variables.
2023-12-15 13:57:35 +01:00
Guillaume Chatelet
493cc71d72
[libc][NFC] Remove MantissaWidth traits (#75458)
Same as #75362, the traits does not bring a lot of value over
`FloatProperties::MANTISSA_WIDTH` (or `FPBits::MANTISSA_WIDTH`).
2023-12-14 15:07:09 +01:00
Guillaume Chatelet
7b387d2758
[libc][NFC] Fix mixed up biased/unbiased exponent (#75037)
According to [wikipedia](https://en.wikipedia.org/wiki/Exponent_bias)
the "biased exponent" is the encoded form that is always positive
whereas the unbiased form is the actual "real" exponent that can be
positive or negative.
`FPBits` seems to be using `unbiased_exponent` to describe the encoded
form (unsigned). This patch simply use `biased` instead of `unbiased`.
2023-12-11 17:06:48 +01:00
Joseph Huber
9553e156cb [libc] Allocate fine-grained memory for the RPC host symbol
Summary:
This pointer has been causing issues. Allocating and reading from coarse
memory on the CPU is not guaranteed and varies depending on the kernel
version and support. Previously we attempted to pin the memory but this
caused unexpected failures. This should be a legal operation and work
around the problem as fine-grained memory should be always legal to
write to by both sides.
2023-12-01 13:47:33 -06:00
Joseph Huber
8c1d476db0 Revert "[libc] Explicitly pin memory for the client symbol lookup (#73988)"
Summary:
This caused the bots to begin failing. Revert for now to get the bot
green.

This reverts commit 8bea804923a1b028e86b177caccb3258708ca01c.
This reverts commit e1395c7bdbe74b632ba7fbd90e2be2b4d82ee09e.
2023-12-01 13:04:49 -06:00
Joseph Huber
8bea804923
[libc] Move the pointer to pin off the stack to the heap (#74118)
Summary:
This may be problematic to pin a stack pointer. Allocate it via the OS
allocator instead as the documentation suggests.

For some reason, if you attempt to free this pointer after the memory
region has been unlocked, it will return an invalid pointer.
2023-12-01 12:31:34 -06:00
Joseph Huber
e1395c7bdb
[libc] Explicitly pin memory for the client symbol lookup (#73988)
Summary:
Previously, we determined that coarse grained memory cannot be used in
the general case. That removed the buffer used to transfer the memory,
however we still had this lookup. Though we do not access the symbol
directly, it still conflicts with the agents apparently. Pin this as
well.

This resolves the problems @lntue was having with the `libc` GPU build.
2023-11-30 15:35:33 -06:00
Joseph Huber
0584e6c166
[libc] Explicitly pin memory for the HSA memory transfer (#73973)
Summary:
This portion of code handles mapping the RPC client memory over to the
device. HSA copies need to be between two slices of memory that HSA has
allocated. Previously we used coarse-grained memory to act as the host
source. However, the support for this varies depending on the kernel and
version and should not be relied upon. This patch changes that handling
to use the `hsa_amd_memory_lock` API to explicitly pin memory to a
location sufficient for a DMA transfer to the GPU.
2023-11-30 13:46:52 -06:00
Joseph Huber
bf02c84cb8
[libc] Use file lock to join newline on RPC puts call (#73373)
Summary:
The puts call appends a newline. With multiple threads, this can be done
out of order such that another thread puts something before we finish
appending the newline. Add a flockfile and funlockfile to ensure that
the whole string is printed before another string can appear.
2023-11-27 08:41:15 -06:00
Guillaume Chatelet
d924c5d721
[libc][NFC] Sink "PlatformDefs.h" into "FloatProperties.h" (#73226)
`PlatformDefs.h` does not bring a lot of value as a separate file.
It is transitively included in `FloatProperties.h` and `FPBits.h`. This
patch sinks it into `FloatProperties.h` and removes the associated build
targets.
2023-11-23 11:23:18 +01:00
Joseph Huber
8341a40ec1
[libc] Update the AMDGPU implementation to use code object 5 (#72580)
Summary:
This patch includes the necessary changes to make the `libc` tests
running on AMD GPUs run using the newer code object version. The 'code
object version' is AMD's internal ABI for making kernel calls. The move
from 4 to 5 changed how we handle arguments for builtins such as
obtaining the grid size or setting up the size of the private stack.

Fixes: https://github.com/llvm/llvm-project/issues/72517
2023-11-21 07:14:10 -06:00
Joseph Huber
dc30fa6aca [libc][fix] Call GPU destructors in the correct order
Summary:
I was mistakenly iterating the list backwards. Regular semantics puts
both arrays in priority order but the destructors are called backwards.
2023-11-09 09:22:41 -06:00
lntue
bc7a3bd864
[libc][math] Implement powf function correctly rounded to all rounding modes. (#71188)
We compute `pow(x, y)` using the formula
```
  pow(x, y) = x^y = 2^(y * log2(x))
```
We follow similar steps as in `log2f(x)` and `exp2f(x)`, by breaking
down into `hi + mid + lo` parts, in which `hi` parts are computed using
the exponent field directly, `mid` parts will use look-up tables, and
`lo` parts are approximated by polynomials.

We add some speedup for common use-cases:
```
  pow(2, y) = exp2(y)
  pow(10, y) = exp10(y)
  pow(x, 2) = x * x
  pow(x, 1/2) = sqrt(x)
  pow(x, -1/2) = rsqrt(x) - to be added
```
2023-11-06 16:54:25 -05:00
Jon Chesterfield
f0e100a05a
[amdgpu][openmp] Treat missing TIMESTAMP_FREQUENCY as non-fatal (#70987)
If you build with dynamic_hsa, the symbol is known and compilation
succeeds. If you then run with a slightly older libhsa, this argument is
not recognised and an error returned. I'd rather the program runs with a
misleading omp wtime than refuses to run at all.
2023-11-01 22:43:34 +00:00
Joseph Huber
9e390a1408 [libc][Obvious] Fix missing semicolon in AMDGPU loader implementation
Summary:
Title
2023-10-30 14:58:46 -05:00
Jon Chesterfield
896749aa0d
[amdgpu][openmp] Avoiding writing to packet header twice (#70695)
I think it follows from the HSA spec that a write to the first byte is
deemed significant to the GPU in which case writing to the second short
and reading back from it later would be safe. However, the examples for
this all involve an atomic write to the first 32 bits and it seems a
credible risk that the occasional CI errors abound invalid packets have
as their root cause that the firmware notices the early write to
packet->setup and treats that as a sign that the packet is ready to go.

That was overly-paranoid, however in passing noticed the code in libc is
genuinely invalid. The memset writes a zero to the header byte, changing
it from type_invalid (1) to type_vendor (0), at which point the GPU is
free to read the 64 byte packet and interpret it as a vendor packet,
which is probably why libc CI periodically errors about invalid packets.

Also a drive by change to do the atomic store on a uint32_t
consistently. I'm not sure offhand what __atomic_store_n on a uint16_t*
and an int resolves to, seems better to be unambiguous there.
2023-10-30 18:35:52 +00:00
Hans Wennborg
e2fc68c3db Typos: 'maxium', 'minium' 2023-10-23 10:42:28 +02:00
Joseph Huber
a39215768b
[libc] Rework the 'fgets' implementation on the GPU (#69635)
Summary:
The `fgets` function as implemented is not functional currently when
called with multiple threads. This is because we rely on reapeatedly
polling the character to detect EOF. This doesn't work when there are
multiple threads that may with to poll the characters. this patch pulls
out the logic into a standalone RPC call to handle this in a single
operation such that calling it from multiple threads functions as
expected. It also makes it less slow because we no longer make N RPC
calls for N characters.
2023-10-19 17:00:01 -04:00
alfredfo
f350532099
[libc] Fix accidental LIBC_NAMESPACE_clock_freq (#69620)
See-also: https://github.com/llvm/llvm-project/pull/69548
2023-10-19 19:39:02 +02:00
Joseph Huber
ddc30ff802
[libc] Implement the 'ungetc' function on the GPU (#69248)
Summary:
This function follows closely with the pattern of all the other
functions. That is, making a new opcode and forwarding the call to the
host. However, this also required modifying the test somewhat. It seems
that not all `libc` implementations follow the same error rules as are
tested here, and it is not explicit in the standard, so we simply
disable these EOF checks when targeting the GPU.
2023-10-17 13:02:31 -05:00
Mikhail R. Gadelha
714b4c82bb
[libc][NFC] Fix -Wdangling-else when compiling libc with gcc >= 7 (#67833)
Explicit braces were added to fix the "suggest explicit braces to avoid
ambiguous ‘else’" warning since the current solution (switch (0) case 0:
default:) doesn't work since gcc 7 (see
https://github.com/google/googletest/issues/1119)

gcc 13 generates about 5000 of these warnings when building libc without
this patch.
2023-10-04 11:44:42 -04:00
Mikhail R. Gadelha
8fc87f54a8
[libc][NFC] Couple of small warning fixes (#67847)
This patch fixes a couple of warnings when compiling with gcc 13:

* CPP/type_traits_test.cpp: 'apply' overrides a member function but is
not marked 'override'
* UnitTest/LibcTest.cpp:98: control reaches end of non-void function
* MPFRWrapper/MPFRUtils.cpp:75: control reaches end of non-void function
* smoke/FrexpTest.h:92: backslash-newline at end of file
* __support/float_to_string.h:118: comparison of unsigned expression in ‘>= 0’ is always true
* test/src/__support/CPP/bitset_test.cpp:197: comparison of unsigned expression in ‘>= 0’ is always true

---------

Signed-off-by: Mikhail R. Gadelha <mikhail@igalia.com>
2023-10-02 19:29:26 -04:00
Joseph Huber
1a5d3b6cda
[libc] Scan the ports more fairly in the RPC server (#66680)
Summary:
Currently, we use the RPC server to respond to different ports which
each contain a request from some client thread wishing to do work on the
server. This scan starts at zero and continues until its checked all
ports at which point it resets. If we find an active port, we service it
and then restart the search.

This is bad for two reasons. First, it means that we will always bias
the lower ports. If a thread grabs a high port it will be stuck for a
very long time until all the other work is done. Second, it means that
the `handle_server` function can technically run indefinitely as long as
the client is always pushing new work. Because the OpenMP implementation
uses the user thread to service the kernel, this means that it could be
stalled with another asyncrhonous device's kernels.

This patch addresses this by making the server restart at the next port
over. This means we will always do a full scan of the ports before
quitting.
2023-09-26 16:09:48 -05:00
Joseph Huber
2b7227db1e [libc] Fix RPC server global after mass replace of __llvm_libc
Summary:
This variable needs a reserved name starting with `__`. It was
mistakenly changed with a mass replace. It happened to work because the
tests still picked up the associated symbol, but it just became a bad
name because it's not reserved anymore.
2023-09-26 14:28:48 -05:00
Joseph Huber
7ac8e26fc7
[libc] Implement fseek, fflush, and ftell on the GPU (#67160)
Summary:
This patch adds the necessary entrypoints to handle the `fseek`,
`fflush`, and `ftell` functions. These are all very straightfoward, we
simply make RPC calls to the associated function on the other end.
Implementing it this way allows us to more or less borrow the state of
the stream from the server as we intentionally maintain no internal
state on the GPU device. However, this does not implement the `errno`
functinality so that must be ignored.
2023-09-26 09:46:46 -05:00
Guillaume Chatelet
b6bc9d72f6
[libc] Mass replace enclosing namespace (#67032)
This is step 4 of
https://discourse.llvm.org/t/rfc-customizable-namespace-to-allow-testing-the-libc-when-the-system-libc-is-also-llvms-libc/73079
2023-09-26 11:45:04 +02:00
Joseph Huber
791b279924
[libc] Change the puts implementation on the GPU (#67189)
Summary:
Normally, the implementation of `puts` simply writes a second newline
charcter after printing the first string. However, because the GPU does
everything in batches of the SIMT group size, this will end up with very
poor output where you get the strings printed and then 1-64 newline
characters all in a row. Optimizations like to turn `printf` calls into
`puts` so it's a good idea to make this produce the expected output.

The least invasive way I could do this was to add a new opcode. It's a
little bloated, but it avoids an unneccessary and slow send operation to
configure this.
2023-09-25 11:17:22 -05:00
Joseph Huber
f548d19fc8
[libc] Fix and simplify the implementation of 'fread' on the GPU (#66948)
Summary:
Previously, the `fread` operation was wrong in cases when we read less
data than was requested. That is, if we tried to read N bytes while the
file was in EOF, it would still copy N bytes of garbage. This is fixed
by only copying over the sizes we got from locally opening it rather
than just using the provided size.

Additionally, this patch simplifies the interface. The output functions
have special variants for writing to stdout / stderr. This is primarily
an optimization for these common cases so we can avoid sending the
stream as an argument which has a high delay. Because for input, we
already need to start with a `send` to tell the server how much data to
read, it costs us nothing to send the file along with it so this is
redundant. Re-use the file encoding scheme from the other
implementations, the one that stores the stream type in the LSBs of the
FILE pointer.
2023-09-21 14:28:06 -05:00
michaelrj-google
5bd34e0a55
[libc] Fix Off By One Errors In Printf Long Double (#66957)
Two major off-by-one errors are fixed in this patch. The first is in
float_to_string.h with length_for_num, which wasn't accounting for the
implicit leading bit when calculating the length of a number, causing
a missing digit on 80 bit float max. The other off-by-one is the
ryu_long_double_constants.h (a.k.a the Mega Table) not having any
entries for the last POW10_OFFSET in POW10_SPLIT. This was also found on
80 bit float max. Finally, the integer calculation mode was using a
slightly too short integer, again on 80 bit float max, not accounting
for the mantissa width. All of these are fixed in this patch.
2023-09-21 11:43:29 -07:00
Joseph Huber
e2bc0f9266 [libc][NFC] Remove unused function from the RPC server
Summary:
I missed removing this now-unused function in the previous patch. Remove
it to clean up the interface.
2023-09-21 11:56:48 -05:00
Joseph Huber
59896c168a
[libc] Remove the 'rpc_reset' routine from the RPC implementation (#66700)
Summary:
This patch removes the `rpc_reset` function. This was previously used to
initialize the RPC client on the device by setting up the pointers to
communicate with the server. The purpose of this was to make it easier
to initialize the device for testing. However, this prevented us from
enforcing an invariant that the buffers are all read-only from the
client side.

The expected way to initialize the server is now to copy it from the
host runtime. This will allow us to maintain that the RPC client is in
the constant address space on the GPU, potentially through inference,
and improving caching behaviour.
2023-09-21 11:07:09 -05:00