Update string_utils' string_length to work with char* or wchar_t*, so that it
may be reusable when implementing wmemchr, wcspbrk, wcsrchr, wcsstr.
Link: #121183
Link: #124027
Co-authored-by: Nick Desaulniers <ndesaulniers@google.com>
---------
Co-authored-by: Tristan Ross <tristan.ross@midstall.com>
Summary:
This is a holdover from when these targets were merged. They're
basically the same but there's no reason they should be treated as
identical. I think we will live with a little duplication.
Summary:
This was originally a hacked together function that served to just
implement some features for OpenMP. That has been moved into OpenMP
itself now that we have exported RPC properly. This can now be deleted.
docgen relies on the convention that we have a file foo.cpp in
libc/src/\<header\>/. Because the above functions weren't in libc/src/strings/
but rather libc/src/string/, docgen could not find that we had implemented
these.
Rather than add special carve outs to docgen, let's fix up our sources for
these 7 functions to stick with the existing conventions the rest of the
codebase follows.
Link: #118860Fixes: #118875
Thanks to the effort of @RoseZhang03 and @aaryanshukla under the
guidance of
@michaelrj-google and @amykhuang, we now have newhdrgen and no longer
have a
dependency on TableGen and thus LLVM in order to start bootstrapping a
full
build.
This PR removes:
- LIBC_HDRGEN_EXE; the in tree newhdrgen is the only hdrgen that can be
used.
- LIBC_USE_NEW_HEADER_GEN; newhdrgen is the default and only option.
- LIBC_HDRGEN_ONLY; there is no need to have a distinct build step for
old
hdrgen.
- libc-api-test and libc-api-test-tidy build targets.
- Deletes all .td files.
It does not rename newhdrgen to just hdrgen. Will follow up with a
distinct PR
for that.
Link: #117209
Link: #117254Fixes: #117208
Summary:
This function can easily be implemented by forwarding it to the host
process. This shows up in a few places that we might want to test the
GPU so it should be provided. Also, I find the idea of the GPU
offloading work to the CPU via `system` very funny.
This patch adds the %m conversion to printf, which prints the
strerror(errno). Explanation of why is below, this patch also updates
the docs, tests, and build system to accomodate this.
The standard for syslog in posix specifies it uses the same format as
printf, but adds %m which prints the error message string for the
current value of errno. For ease of implementation, it's standard
practice for libc implementers to just add %m to printf instead of
creating a separate parser for syslog.
Summary:
This adds the locale variants of the string functions. As previously,
these do not use the locale information at all and simply copy the
non-locale version which expects the "C" locale.
Summary:
This provides the `_l` variants for the `stdlib.h` functions. These are
just copies of the same entrypoint and don't do anything with the locale
information.
Summary:
This patch adds all the libc ctype variants. These ignore the locale
ingormation completely, so they're pretty much just stubs. Because these
use locale information, which is system scope, we do not enable building
them outisde of full build mode.
Summary:
This patch adds the macros and entrypoints associated with the
`locale.h` entrypoints. These are mostly stubs, as we (for now and the
forseeable future) only expect to support the C and maybe C.UTF-8
locales in the LLVM libc.
Summary:
This patch adds all the libc ctype variants. These ignore the locale
ingormation completely, so they're pretty much just stubs. Because these
use locale information, which is system scope, we do not enable building
them outisde of full build mode.
Summary:
The `scanf` function has a "system file" configuration, which is pretty
much what the GPU implementation does at this point. So we should be
able to use it in much the same way.
Summary:
Simply copies the x64 versions to the GPU directory. Ignoring f128 for
now, but adding long double entrypoints which are identical to `double`
on the target.
Summary:
This patch implements 'getenv'. I was torn on how to implement this,
since realistically we only have access to this environment pointer in
the "loader" interface. An alternative would be to use an RPC call every
time, but I think that's overkill for what this will be used for. A
better solution is just to emit a common `DataEnvironment` that contains
all of the host visible resources to initialize. Right now this is the
`env_ptr`, `clock_freq`, and `rpc_client`.
I did this by making the `app.h` interface that Linux uses more general,
could possibly move that into a separate patch, but I figured it's
easier to see with the usage.
Summary:
These functions are used by the <random> implementation in libc++ and
cause a lot of tests to fail. For now we provide these through the
vendor abstraction until we have a real version. The NVPTX version
doesn't even update the output correctly so these are just temporary.
Summary:
This header is practically useless, but we provide it mostly for the
macros so that applications can compile. I'm only doing this for the
`libc++` unittests that want it, and it is part of the C standard
technically. I just made an RPC call to do `raise`. Anything more isn't
going to work since it'd be way too annoying to make the CPU call into
some signal handler the GPU registered.
Summary:
These functions are needed for `libc++` to link successfully. We can't
implement them well currently, so simply provide some stand-in
implementations. `realloc` will currently copy garbage and potentially
fault and `aligned_alloc` will work unless your alignment is more than
4K alignment. However, these should work in practice to get tests
running. I will write a real allocator soon™.
Summary:
Currently there are several layers to handle `printf`. Since we now have
varargs and an implementation of `printf` this can be heavily
simplified.
1. The frontend renames `printf` into `omp_vprintf` and gives it an
argument buffer.
Removing 1. triggered some code in the AMDGPU backend menat for HIP /
OpenCL, so I hadded an exception to it.
2. Forward this to CUDA vprintf or ignore it.
We no longer need special handling for it since we have varargs. So now
we just forward this to CUDA vprintf if we have libc, otherwise just
leave `printf` as an external function and expect that `libc` will be
linked in.
Summary:
We can enable the sscanf function on the GPU now. This required adding
the configs to the scanf list so that the GPU build didn't do float
conversions.
Summary:
The NVPTX backend optimizes the ABI for functions that are internal,
however, this is not legal for indirect call prototypes. Previously, we
would modify the ABI on an aggregate byval type passed to an indirect
call prototype, which would make PTXAS error. This patch just passes the
function as a nullptr to force strict ABI compliance without
modification in the helper function.
Fixes https://github.com/llvm/llvm-project/issues/100055
Summary:
Currently we have several hacks to work around the fact that the NVPTX
linker, 'nvlink', does not support static libraries or LTO linking.
The patch in https://github.com/llvm/llvm-project/pull/96561 introduces
a wrapper in the toolchain that allows us to use a standard `ld.lld`
like interface. This means all the divergence with this target can be
removed.
Depends on https://github.com/llvm/llvm-project/pull/96561
Division-less Newton iterations algorithm for cube roots.
1. **Range reduction**
For `x = (-1)^s * 2^e * (1.m)`, we get 2 reduced arguments `x_r` and `a`
as:
```
x_r = 1.m
a = (-1)^s * 2^(e % 3) * (1.m)
```
Then `cbrt(x) = x^(1/3)` can be computed as:
```
x^(1/3) = 2^(e / 3) * a^(1/3).
```
In order to avoid division, we compute `a^(-2/3)` using Newton method
and then
multiply the results by a:
```
a^(1/3) = a * a^(-2/3).
```
2. **First approximation to a^(-2/3)**
First, we use a degree-7 minimax polynomial generated by Sollya to
approximate `x_r^(-2/3)` for `1 <= x_r < 2`.
```
p = P(x_r) ~ x_r^(-2/3),
```
with relative errors bounded by:
```
| p / x_r^(-2/3) - 1 | < 1.16 * 2^-21.
```
Then we multiply with `2^(e % 3)` from a small lookup table to get:
```
x_0 = 2^(-2*(e % 3)/3) * p
~ 2^(-2*(e % 3)/3) * x_r^(-2/3)
= a^(-2/3)
```
with relative errors:
```
| x_0 / a^(-2/3) - 1 | < 1.16 * 2^-21.
```
This step is done in double precision.
3. **First Newton iteration**
We follow the method described in:
Sibidanov, A. and Zimmermann, P., "Correctly rounded cubic root
evaluation
in double precision", https://core-math.gitlabpages.inria.fr/cbrt64.pdf
to derive multiplicative Newton iterations as below:
Let `x_n` be the nth approximation to `a^(-2/3)`. Define the n^th error
as:
```
h_n = x_n^3 * a^2 - 1
```
Then:
```
a^(-2/3) = x_n / (1 + h_n)^(1/3)
= x_n * (1 - (1/3) * h_n + (2/9) * h_n^2 - (14/81) * h_n^3 + ...)
```
using the Taylor series expansion of `(1 + h_n)^(-1/3)`.
Apply to `x_0` above:
```
h_0 = x_0^3 * a^2 - 1
= a^2 * (x_0 - a^(-2/3)) * (x_0^2 + x_0 * a^(-2/3) + a^(-4/3)),
```
it's bounded by:
```
|h_0| < 4 * 3 * 1.16 * 2^-21 * 4 < 2^-17.
```
So in the first iteration step, we use:
```
x_1 = x_0 * (1 - (1/3) * h_n + (2/9) * h_n^2 - (14/81) * h_n^3)
```
Its relative error is bounded by:
```
| x_1 / a^(-2/3) - 1 | < 35/242 * |h_0|^4 < 2^-70.
```
Then we perform Ziv's rounding test and check if the answer is exact.
This step is done in double-double precision.
4. **Second Newton iteration**
If the Ziv's rounding test from the previous step fails, we define the
error
term:
```
h_1 = x_1^3 * a^2 - 1,
```
And perform another iteration:
```
x_2 = x_1 * (1 - h_1 / 3)
```
with the relative errors exceed the precision of double-double.
We then check the Ziv's accuracy test with relative errors < 2^-102 to
compensate for rounding errors.
5. **Final iteration**
If the Ziv's accuracy test from the previous step fails, we perform
another
iteration in 128-bit precision and check for exact outputs.