The config currently includes ctype, math, stdlib, inttypes and string
functions.
Reviewed By: lntue
Differential Revision: https://reviews.llvm.org/D140378
Implement a high-precision floating point class using UInt<> as its
mantissa. This will be used in accurate pass for double precision math
functions.
Reviewed By: sivachandra
Differential Revision: https://reviews.llvm.org/D136799
Implement full multiplication `UInt<A> * UInt<B> -> UInt<A + B>` and
`quick_mul_hi` that returns the higher half of the product `UInt<A> * UInt<A>`.
These 2 functions will be used for dyadic floating point class.
Reviewed By: sivachandra
Differential Revision: https://reviews.llvm.org/D138541
This patch contains the initial support for building LLVM's libc as a
target for the GPU. Currently this only supports a handful of very basic
functions that can be implemented without an operating system. The GPU
code is build using the existing OpenMP toolchain. This allows us to
minimally change the existing codebase and get a functioning static
library. This patch allows users to create a static library called
`libcgpu.a` that contains fat binaries containing device IR.
Current limitations are the lack of test support and the fact that only
one target OS can be built at a time. That is, the user cannot get a
`libc` for Linux and one for the GPU simultaneously.
This introduces two new CMake variables to control the behavior
`LLVM_LIBC_TARET_OS` is exported so the user can now specify it to equal
`"gpu"`. `LLVM_LIBC_GPU_ARCHITECTURES` is also used to configure how
many targets to build for at once.
Depends on D138607
Reviewed By: sivachandra
Differential Revision: https://reviews.llvm.org/D138608
When using `LLVM_ENABLE_RUNTIMES=libc` we need to perform a few extra
steps to include LLVM utilities similar to if we were performing a
standalone build. Libc depends on the tablegen utilities and the LLVM
libraries when performing a full build. When using an
`LLVM_ENABLE_PROJECTS=libc` build these are included as a part of the
greater LLVM build, but here we need to perform it maunally. This patch
should allow using `LLVM_LIBC_FULL_BUILD=ON` when building with
runtimes.
Reviewed By: lntue
Differential Revision: https://reviews.llvm.org/D138040
This is the first piece of scanf. It's very similar in design to printf,
and so much of the code is copied from that. There were potential issues
with conflicting macros so I've also renamed the "ASSERT_FORMAT_EQ"
macro for printf to "ASSERT_PFORMAT_EQ".
Reviewed By: sivachandra
Differential Revision: https://reviews.llvm.org/D136288
Adding `EXPECT_MPFR_MATCH_ROUNDING_SILENTLY` macro that does not call
`explainError` when the tests fail. This is useful to check the passing or
failing rates, such as hitting percentages of fast passes in math
implementations.
Reviewed By: michaelrj, sivachandra
Differential Revision: https://reviews.llvm.org/D136731
The implementation currently ignores all spawn attributes. Support for
them will be added in future changes.
A simple allocator for integration tests has been added so that the
integration test for posix_spawn can use the
posix_spawn_file_actions_add* functions.
Reviewed By: michaelrj
Differential Revision: https://reviews.llvm.org/D135752
Implement exp10f function correctly rounded to all rounding modes.
Algorithm: perform range reduction to reduce
```
10^x = 2^(hi + mid) * 10^lo
```
where:
```
hi is an integer,
0 <= mid * 2^5 < 2^5
-log10(2) / 2^6 <= lo <= log10(2) / 2^6
```
Then `2^mid` is stored in a table of 32 entries and the product `2^hi * 2^mid` is
performed by adding `hi` into the exponent field of `2^mid`.
`10^lo` is then approximated by a degree-5 minimax polynomials generated by Sollya with:
```
> P = fpminimax((10^x - 1)/x, 4, [|D...|], [-log10(2)/64. log10(2)/64]);
```
Performance benchmark using perf tool from the CORE-MATH project on Ryzen 1700:
```
$ CORE_MATH_PERF_MODE="rdtsc" ./perf.sh exp10f
GNU libc version: 2.35
GNU libc release: stable
CORE-MATH reciprocal throughput : 10.215
System LIBC reciprocal throughput : 7.944
LIBC reciprocal throughput : 38.538
LIBC reciprocal throughput : 12.175 (with `-msse4.2` flag)
LIBC reciprocal throughput : 9.862 (with `-mfma` flag)
$ CORE_MATH_PERF_MODE="rdtsc" ./perf.sh exp10f --latency
GNU libc version: 2.35
GNU libc release: stable
CORE-MATH latency : 40.744
System LIBC latency : 37.546
BEFORE
LIBC latency : 48.989
LIBC latency : 44.486 (with `-msse4.2` flag)
LIBC latency : 40.221 (with `-mfma` flag)
```
This patch relies on https://reviews.llvm.org/D134002
Reviewed By: orex, zimmermann6
Differential Revision: https://reviews.llvm.org/D134104
Implement acosf function correctly rounded for all rounding modes.
We perform range reduction as follows:
- When `|x| < 2^(-10)`, we use cubic Taylor polynomial:
```
acos(x) = pi/2 - asin(x) ~ pi/2 - x - x^3 / 6.
```
- When `2^(-10) <= |x| <= 0.5`, we use the same approximation that is used for `asinf(x)` when `|x| <= 0.5`:
```
acos(x) = pi/2 - asin(x) ~ pi/2 - x - x^3 * P(x^2).
```
- When `0.5 < x <= 1`, we use the double angle formula: `cos(2y) = 1 - 2 * sin^2 (y)` to reduce to:
```
acos(x) = 2 * asin( sqrt( (1 - x)/2 ) )
```
- When `-1 <= x < -0.5`, we reduce to the positive case above using the formula:
```
acos(x) = pi - acos(-x)
```
Performance benchmark using perf tool from the CORE-MATH project on Ryzen 1700:
```
$ CORE_MATH_PERF_MODE="rdtsc" ./perf.sh acosf
GNU libc version: 2.35
GNU libc release: stable
CORE-MATH reciprocal throughput : 28.613
System LIBC reciprocal throughput : 29.204
LIBC reciprocal throughput : 24.271
$ CORE_MATH_PERF_MODE="rdtsc" ./perf.sh asinf --latency
GNU libc version: 2.35
GNU libc release: stable
CORE-MATH latency : 55.554
System LIBC latency : 76.879
LIBC latency : 62.118
```
Reviewed By: orex, zimmermann6
Differential Revision: https://reviews.llvm.org/D133550
The libc.src.__support.FPUtil.fputil target encompassed many unrelated
files, and provided a lot of hidden dependencies. This patch splits out
all of these files into component parts and cleans up the cmake files
that used them. It does not touch any source files for simplicity, but
there may be changes made to them in future patches.
Reviewed By: lntue
Differential Revision: https://reviews.llvm.org/D132980
Performance by core-math (core-math/glibc 2.31/current llvm-14):
10.845/43.174/13.467
The review is done on top of D132809.
Differential Revision: https://reviews.llvm.org/D132811
The FormatSection and the writer functions both previously took a char*
and a length to represent a string. Now they use the StringView class to
represent that more succinctly. This change also required fixing
everywhere these were used, so it touches a lot of files.
Reviewed By: sivachandra
Differential Revision: https://reviews.llvm.org/D131994
To use the FILE data structure, LLVM-libc must be in fullbuild mode
since it expects its own implementation. This means that (f)printf can't
be used without fullbuild, but s(n)printf only uses strings. This patch
adjusts the CMake to allow for this.
Reviewed By: sivachandra, lntue
Differential Revision: https://reviews.llvm.org/D131913
Previously, the integer_to_string tests used EXPECT_TRUE(.equals)
which doesn't have useful error messages. Now they properly check
equality with the EXPECT_EQ macro, which allows for comparing the
strings more naturally.
Reviewed By: sivachandra
Differential Revision: https://reviews.llvm.org/D131300
Migrating all private STL code to the standard STL case but keeping it under the CPP namespace to avoid confusion. Starting with the type_traits header.
Differential Revision: https://reviews.llvm.org/D130727
The specified rounding mode will be used and restored
to what it was before the test ran.
Additionally, it moves ForceRoundingMode and RoundingMode
out of MPFRUtils to be used in more places.
Differential Revision: https://reviews.llvm.org/D129685
This is a implementation of find remainder fmod function from standard libm.
The underline algorithm is developed by myself, but probably it was first
invented before.
Some features of the implementation:
1. The code is written on more-or-less modern C++.
2. One general implementation for both float and double precision numbers.
3. Spitted platform/architecture dependent and independent code and tests.
4. Tests covers 100% of the code for both float and double numbers. Tests cases with NaN/Inf etc is copied from glibc.
5. The new implementation in general 2-4 times faster for “regular” x,y values. It can be 20 times faster for x/y huge value, but can also be 2 times slower for double denormalized range (according to perf tests provided).
6. Two different implementation of division loop are provided. In some platforms division can be very time consuming operation. Depend on platform it can be 3-10 times slower than multiplication.
Performance tests:
The test is based on core-math project (https://gitlab.inria.fr/core-math/core-math). By Tue Ly suggestion I took hypot function and use it as template for fmod. Preserving all test cases.
`./check.sh <--special|--worst> fmodf` passed.
`CORE_MATH_PERF_MODE=rdtsc ./perf.sh fmodf` results are
```
GNU libc version: 2.35
GNU libc release: stable
21.166 <-- FPU
51.031 <-- current glibc
37.659 <-- this fmod version.
```
This is mostly a mechanical change. In a future pass, all tests from
pthread which create threads will also be converted to integration tests.
Some of thread related features are tightly coupled with the loader. So,
they can only be tested with the in-house loader. Hence, going forward, all
tests which create threads will have to be integration tests.
Reviewed By: lntue
Differential Revision: https://reviews.llvm.org/D128381