6933 Commits

Author SHA1 Message Date
Piotr Fusik
1c1b8c20c2
[ELFAttributeParser][NFC] Make string arrays const (#101460) 2024-08-02 18:37:05 +02:00
David Blaikie
45ef0d492f Add llvm::Error C API, LLVMCantFail
It's barely testable - the test does exercise the code, but wouldn't
fail on an empty implementation. It would cause a memory leak though
(because the error handle wouldn't be unwrapped/reowned) which could be
detected by asan and other leak detectors.
2024-07-31 16:40:26 +00:00
Alexandre Ganea
39e192b379
[Support] Silence warnings when retrieving exported functions (#97905)
Since functions exported from DLLs are type-erased, before this patch I
was seeing the new Clang 19 warning `-Wcast-function-type-mismatch`.

This happens when building LLVM on Windows.

Following discussion in
593f708118 (commitcomment-143905744)
2024-07-30 19:06:03 -04:00
Alexander Pivovarov
abc2fe31fc
[APFloat] Add support for f8E3M4 IEEE 754 type (#99698)
This PR adds `f8E4M3` type to APFloat.

`f8E3M4` type  follows IEEE 754 convention

```c
f8E3M4 (IEEE 754)
- Exponent bias: 3
- Maximum stored exponent value: 6 (binary 110)
- Maximum unbiased exponent value: 6 - 3 = 3
- Minimum stored exponent value: 1 (binary 001)
- Minimum unbiased exponent value: 1 − 3 = −2
- Precision specifies the total number of bits used for the significand (mantissa), 
    including implicit leading integer bit = 4 + 1 = 5
- Follows IEEE 754 conventions for representation of special values
- Has Positive and Negative zero
- Has Positive and Negative infinity
- Has NaNs

Additional details:
- Max exp (unbiased): 3
- Min exp (unbiased): -2
- Infinities (+/-): S.111.0000
- Zeros (+/-): S.000.0000
- NaNs: S.111.{0,1}⁴ except S.111.0000
- Max normal number: S.110.1111 = +/-2^(6-3) x (1 + 15/16) = +/-2^3 x 31 x 2^(-4) = +/-15.5
- Min normal number: S.001.0000 = +/-2^(1-3) x (1 + 0) = +/-2^(-2)
- Max subnormal number: S.000.1111 = +/-2^(-2) x 15/16 = +/-2^(-2) x 15 x 2^(-4) = +/-15 x 2^(-6)
- Min subnormal number: S.000.0001 = +/-2^(-2) x 1/16 =  +/-2^(-2) x 2^(-4) = +/-2^(-6)
```

Related PRs:
- [PR-97179](https://github.com/llvm/llvm-project/pull/97179) [APFloat]
Add support for f8E4M3 IEEE 754 type
2024-07-30 00:11:10 -07:00
Daniel Bertalan
90569e02e6
[Support] Add Arm NEON implementation for llvm::xxh3_64bits (#99634)
Compared to the generic scalar code, using Arm NEON instructions yields
a ~11x speedup: 31 vs 339.5 ms to hash 1 GiB of random data on the Apple
M1.

This follows the upstream implementation closely, with some
simplifications made:
- Removed workarounds for suboptimal codegen on older GCC
- Removed instruction reordering barriers which seem to have a
negligible impact according to my measurements
- We do not support WebAssembly's mostly NEON-compatible API
- There is no configurable mixing of SIMD and scalar code; according to
the upstream comments, this is only relevant for smaller Cortex cores
which can dispatch relatively few NEON micro-ops per cycle.

This commit intends to use only standard ACLE intrinsics and datatypes,
so it should build with all supported versions of GCC, Clang and MSVC.

This feature is enabled by default when targeting AArch64, but the
`LLVM_XXH_USE_NEON=0` macro can be set to explicitly disable it.

XXH3 is used for ICF, string deduplication and computing the UUID in
ld64.lld; this commit results in a -1.77% +/- 0.59% speed improvement
for a `--threads=8` link of Chromium.framework.
2024-07-22 19:06:43 +02:00
Connor Sughrue
76321b9f08
[llvm][Support] Implement raw_socket_stream::read with optional timeout (#92308)
This PR implements `raw_socket_stream::read`, which overloads the base
class `raw_fd_stream::read`. `raw_socket_stream::read` provides a way to
timeout the underlying `::read`. The timeout functionality was not added
to `raw_fd_stream::read` to avoid needlessly increasing compile times
and allow for convenient code reuse with `raw_socket_stream::accept`,
which also requires timeout functionality. This PR supports the module
build daemon and will help guarantee it never becomes a zombie process.
2024-07-21 23:50:28 -04:00
Utkarsh Saxena
ecaacd14c3
Reapply "Add source file name for template instantiations in -ftime-trace" (#99757)
Reverts https://github.com/llvm/llvm-project/pull/99731

Remove accidentally added temporary file.
Also, fix the uninitialized read of line number.
2024-07-21 20:55:34 +02:00
Jorge Gorbe Moya
abaf13ad58
Revert "Reapply "Add source file name for template instantiations in -ftime-trace"" (#99731)
Reverts llvm/llvm-project#99545

There were a couple of issues reported in the PR: a sanitizer warning
(https://lab.llvm.org/buildbot/#/builders/164/builds/1246/steps/14/logs/stdio)
and a tmp file accidentally included in the commit.
2024-07-19 18:58:25 -07:00
Utkarsh Saxena
f1c27a9b26
Reapply "Add source file name for template instantiations in -ftime-trace" (#99545)
Fix the Windows test.
2024-07-19 16:40:28 +02:00
Utkarsh Saxena
04bcd74df7
Revert "Add source file name for template instantiations in -ftime-trace" (#99534)
Reverts llvm/llvm-project#98320

Breaks windows tests:

```
Step 8 (test-build-unified-tree-check-clang-unit) failure: test (failure)
******************** TEST 'Clang-Unit :: Support/./ClangSupportTests.exe/1/3' FAILED ********************
Script(shard):
--
GTEST_OUTPUT=json:C:\buildbot\as-builder-3\llvm-clang-x86_64-win-fast\build\tools\clang\unittests\Support\.\ClangSupportTests.exe-Clang-Unit-4296-1-3.json GTEST_SHUFFLE=0 GTEST_TOTAL_SHARDS=3 GTEST_SHARD_INDEX=1 C:\buildbot\as-builder-3\llvm-clang-x86_64-win-fast\build\tools\clang\unittests\Support\.\ClangSupportTests.exe
--

Script:
--
C:\buildbot\as-builder-3\llvm-clang-x86_64-win-fast\build\tools\clang\unittests\Support\.\ClangSupportTests.exe --gtest_filter=TimeProfilerTest.TemplateInstantiations
--
C:\buildbot\as-builder-3\llvm-clang-x86_64-win-fast\llvm-project\clang\unittests\Support\TimeProfilerTest.cpp(278): error: Expected equality of these values:
  R"(
Frontend
| ParseFunctionDefinition (fooB)
| ParseFunctionDefinition (fooMTA)
| ParseFunctionDefinition (fooA)
| ParseDeclarationOrFunctionDefinition (test.cc:3:5)
| | ParseFunctionDefinition (user)
| PerformPendingInstantiations
| | InstantiateFunction (fooA<int>, ./a.h:7)
| | | InstantiateFunction (fooB<int>, ./b.h:3)
| | | InstantiateFunction (fooMTA<int>, ./a.h:4)
)"
    Which is: "\nFrontend\n| ParseFunctionDefinition (fooB)\n| ParseFunctionDefinition (fooMTA)\n| ParseFunctionDefinition (fooA)\n| ParseDeclarationOrFunctionDefinition (test.cc:3:5)\n| | ParseFunctionDefinition (user)\n| PerformPendingInstantiations\n| | InstantiateFunction (fooA<int>, ./a.h:7)\n| | | InstantiateFunction (fooB<int>, ./b.h:3)\n| | | InstantiateFunction (fooMTA<int>, ./a.h:4)\n"
  buildTraceGraph(Json)
    Which is: "\nFrontend\n| ParseFunctionDefinition (fooB)\n| ParseFunctionDefinition (fooMTA)\n| ParseFunctionDefinition (fooA)\n| ParseDeclarationOrFunctionDefinition (test.cc:3:5)\n| | ParseFunctionDefinition (user)\n| PerformPendingInstantiations\n| | InstantiateFunction (fooA<int>, .\\a.h:7)\n| | | InstantiateFunction (fooB<int>, .\\b.h:3)\n| | | InstantiateFunction (fooMTA<int>, .\\a.h:4)\n"
With diff:
@@ -7,5 +7,5 @@
 | | ParseFunctionDefinition (user)
 | PerformPendingInstantiations
-| | InstantiateFunction (fooA<int>, ./a.h:7)
-| | | InstantiateFunction (fooB<int>, ./b.h:3)
-| | | InstantiateFunction (fooMTA<int>, ./a.h:4)\n
+| | InstantiateFunction (fooA<int>, .\\a.h:7)
+| | | InstantiateFunction (fooB<int>, .\\b.h:3)
+| | | InstantiateFunction (fooMTA<int>, .\\a.h:4)\n



C:\buildbot\as-builder-3\llvm-clang-x86_64-win-fast\llvm-project\clang\unittests\Support\TimeProfilerTest.cpp:278
Expected equality of these values:
  R"(
Frontend
| ParseFunctionDefinition (fooB)
| ParseFunctionDefinition (fooMTA)
| ParseFunctionDefinition (fooA)
| ParseDeclarationOrFunctionDefinition (test.cc:3:5)
| | ParseFunctionDefinition (user)
| PerformPendingInstantiations
| | InstantiateFunction (fooA<int>, ./a.h:7)

```
2024-07-18 19:49:01 +02:00
Utkarsh Saxena
cd495d2cdd
Add source file name for template instantiations in -ftime-trace (#98320)
This is helpful in identifying file and location which contain the particular template declaration.
2024-07-18 16:18:34 +02:00
Alexander Pivovarov
f363317702
[APFloat] Add support for f8E4M3 IEEE 754 type (#97179)
This PR adds `f8E4M3` type to APFloat.

`f8E4M3` type  follows IEEE 754 convention

```c
f8E4M3 (IEEE 754)
- Exponent bias: 7
- Maximum stored exponent value: 14 (binary 1110)
- Maximum unbiased exponent value: 14 - 7 = 7
- Minimum stored exponent value: 1 (binary 0001)
- Minimum unbiased exponent value: 1 − 7 = −6
- Precision specifies the total number of bits used for the significand (mantisa), 
    including implicit leading integer bit = 3 + 1 = 4
- Follows IEEE 754 conventions for representation of special values
- Has Positive and Negative zero
- Has Positive and Negative infinity
- Has NaNs

Additional details:
- Max exp (unbiased): 7
- Min exp (unbiased): -6
- Infinities (+/-): S.1111.000
- Zeros (+/-): S.0000.000
- NaNs: S.1111.{001, 010, 011, 100, 101, 110, 111}
- Max normal number: S.1110.111 = +/-2^(7) x (1 + 0.875) = +/-240
- Min normal number: S.0001.000 = +/-2^(-6)
- Max subnormal number: S.0000.111 = +/-2^(-6) x 0.875 = +/-2^(-9) x 7
- Min subnormal number: S.0000.001 = +/-2^(-6) x 0.125 = +/-2^(-9)
```

Related PRs:
- [PR-97118](https://github.com/llvm/llvm-project/pull/97118) Add f8E4M3
IEEE 754 type to mlir
2024-07-17 23:33:52 -07:00
Rainer Orth
0fc4e30524
[Support] Don't use StringRef::equals in Path.inc (#98839)
The removal of StringRef::equals in
3fa409f2318ef790cc44836afe9a72830715ad84 broke the
[Solaris/sparcv9](https://lab.llvm.org/buildbot/#/builders/13/builds/724)
and
[Solaris/amd64](https://lab.llvm.org/staging/#/builders/94/builds/5176)
buildbots:
```
In file included from /vol/llvm/src/llvm-project/git/llvm/lib/Support/Path.cpp:1200:
/vol/llvm/src/llvm-project/git/llvm/lib/Support/Unix/Path.inc:519:18: error: no member named 'equals' in 'llvm::StringRef'
  519 |   return !fstype.equals("nfs");
      |           ~~~~~~ ^
```

Fixed by switching to `operator!=` instead.

Tested on sparcv9-sun-solaris2.11 and amd64-pc-solaris2.11.
2024-07-14 23:50:21 +02:00
Craig Topper
13ed3b472b [DivisionByConstantInfo] Use APInt::getLowBitsSet instead of getAllOnes+lshr. NFC 2024-07-07 22:32:14 -07:00
Kazu Hirata
75bc20ff89
[llvm] Remove redundant calls to std::unique_ptr<T>::get (NFC) (#97914) 2024-07-07 08:23:41 +09:00
Alexandre Ganea
57b76b4210 Revert "[Support] Silence function cast warning when building with Clang ToT targetting Windows"
This reverts commit 593f708118aef792f434185547f74fedeaf51dd4.
2024-07-06 12:00:41 -04:00
Ramkumar Ramachandra
4a9aef683d
DynamicAPInt: optimize size of structure (#97831)
Reuse the APInt::BitWidth to eliminate DynamicAPInt::HoldsLarge, cutting
the size of DynamicAPInt by four bytes. This is implemented by making
DynamicAPInt a friend of SlowDynamicAPInt and APInt, so it can directly
access SlowDynamicAPInt::Val and APInt::BitWidth.

We get a speedup of 4% with this patch.
2024-07-06 13:16:07 +01:00
Alexandre Ganea
593f708118 [Support] Silence function cast warning when building with Clang ToT targetting Windows 2024-07-05 20:49:40 -04:00
Kazu Hirata
c1a5f1ea36
[Support] Use range-based for loops (NFC) (#97657) 2024-07-05 05:47:00 +09:00
Ariel-Burton
bbc6504b3d
[NFC] [APFloat] Refactor IEEEFloat::toString (#97117)
This PR lifts the body of IEEEFloat::toString out to a standalone
function. We do this to facilitate code sharing with other floating
point types, e.g., the forthcoming support for HexFloat.

There is no change in functionality.
2024-07-04 08:43:45 -04:00
Youngsuk Kim
82f9a5ba96 [llvm] Avoid 'raw_string_ostream::str' (NFC)
Since `raw_string_ostream` doesn't own the string buffer, it is
desirable (in terms of memory safety) for users to directly reference
the string buffer rather than use `raw_string_ostream::str()`.

Work towards TODO comment to remove `raw_string_ostream::str()`.
2024-07-03 06:37:48 -05:00
Alexis Engelke
d8c07342c0
[Support] Move raw_ostream::tie to raw_fd_ostream (#97396)
Originally, tie was introduced by D81156 to flush stdout before writing
to stderr. 030897523 reverted this due to race conditions. Nonetheless,
it does cost performance, causing an extra check in the "cold" path,
which is actually the hot path for raw_svector_ostream. Given that this
feature is only used for errs(), move it to raw_fd_ostream so that it no
longer affects performance of other stream classes.
2024-07-03 11:15:02 +02:00
Matt Arsenault
e852725e5d Support: Fix typo in comment 2024-07-02 19:55:42 +02:00
Paul Kirth
b146a57f67
Reapply "[RISCV] Support RISCV Atomics ABI attributes (#84597)"
This patch adds support for the atomic_abi attribute, specifid in
https://github.com/riscv-non-isa/riscv-elf-psabi-doc/blob/master/riscv-elf.adoc#tag_riscv_atomic_abi-14-uleb128version.

This was previously reverted due to ld.bfd segfaulting w/ unknown riscv
attributes. Attribute emission is now guarded by a backend flag
`--riscv-abi-attributes`, which is off by default. Linker support in
LLD for attribute merging is now in a standalone patch.

Reviewers: kito-cheng, MaskRay, asb

Reviewed By: MaskRay

Pull Request: https://github.com/llvm/llvm-project/pull/90266
2024-07-02 08:23:03 -07:00
Balazs Benics
b3b0d09cce
Reland "[analyzer][NFC] Reorganize Z3 report refutation" (#97265)
This is exactly as originally landed in #95128,
but now the minimal Z3 version was increased to meet this change in #96682.

https://discourse.llvm.org/t/bump-minimal-z3-requirements-from-4-7-1-to-4-8-9/79664/4

---

This change keeps existing behavior, namely that if we hit a Z3 timeout
we will accept the report as "satisfiable".

This prepares for the commit "Harden safeguards for Z3 query times".
https://discourse.llvm.org/t/analyzer-rfc-taming-z3-query-times/79520

(cherry picked from commit 89c26f6c7b0a6dfa257ec090fcf5b6e6e0c89aab)
2024-07-01 16:03:18 +02:00
Alexander Pivovarov
2628a5fd24
Rename f8E4M3 to f8E4M3FN in mlir.extras.types py package (#97102)
Currently `f8E4M3` is mapped to `Float8E4M3FNType`.

This PR renames `f8E4M3` to `f8E4M3FN` to accurately reflect the actual
type.

This PR is needed to avoid names conflict in upcoming PR which will add
IEEE 754 `Float8E4M3Type`.
https://github.com/llvm/llvm-project/pull/97118 Add f8E4M3 IEEE 754 type

Maksim, can you review this PR? @makslevental ?
2024-06-29 12:48:11 -07:00
Fangrui Song
b15fcdaf79 [JSON] Export sortedElements and fix CLANGD_TRACE non-determinism
clangd/test/trace.test might fail as llvm::hash_value(StringRef) is
non-deterministic per process (#96282).
2024-06-28 22:10:15 -07:00
Fangrui Song
7082536840 [JSON] Prefer static to namespace {}. NFC 2024-06-28 22:06:14 -07:00
Fangrui Song
ce80c80dca
[Hashing] Use a non-deterministic seed if LLVM_ENABLE_ABI_BREAKING_CHECKS
Hashing.h provides hash_value/hash_combine/hash_combine_range, which are
primarily used by `DenseMap<StringRef, X>`

Users shouldn't rely on specific hash values due to size_t differences
on 32-bit/64-bit platforms and potential algorithm changes.
`set_fixed_execution_hash_seed` is provided but it has never been used.

In LLVM_ENABLE_ABI_BREAKING_CHECKS builds, take the the address of a
static storage duration variable as the seed like
absl/hash/internal/hash.h `kSeed`. (See https://reviews.llvm.org/D93931
for workaround for older Clang. Mach-O x86-64 forces PIC, so absl's
`__apple_build_version__` check is unnecessary.)

LLVM_ENABLE_ABI_BREAKING_CHECKS defaults to `WITH_ASSERTS` and is
enabled in an assertion build.

In a non-assertion build, `get_execution_seed` returns the fixed value
regardless of `NDEBUG`. Removing a variable load yields noticeable
size/performance improvement.

A few users relying on the iteration order of `DenseMap<StringRef, X>`
have been fixed (e.g., f8f4235612b9 c025bd1fdbbd 89e8e63f47ff
86eb6bf6715c eb8d03656549 0ea6b8e476c2 58d7a6e0e636 8ea31db27211
592abf29f9f7 664497557ae7).
From my experience fixing [`StringMap`](https://discourse.llvm.org/t/reverse-iteration-bots/72224)
iteration order issues, the scale of issues is similar.

Pull Request: https://github.com/llvm/llvm-project/pull/96282
2024-06-28 15:30:19 -07:00
Fangrui Song
72055622e9
[Support] Fix xxh3_128bits for Win32 builds after #95863
`__emulu` is used without including `intrin.h`. Actually, it's better to
rely on compiler optimizations. In this LLVM copy, we try to eliminate
unneceeded workarounds for old compilers.

Pull Request: https://github.com/llvm/llvm-project/pull/96931
2024-06-28 00:19:46 -07:00
Chelsea Cassanova
1abe22cab5
Revert "[ADT] Always use 32-bit size type for SmallVector with 16-bit elements" (#96826)
Reverts llvm/llvm-project#95536, this is breaking macOS GreenDragon
buildbots on arm64 and x86_64
https://green.lab.llvm.org/job/llvm.org/view/LLDB/job/as-lldb-cmake/6522/console,
also breaks the Darwin LLVM buildbots:
https://lab.llvm.org/buildbot/#/builders/23/builds/398
2024-06-26 15:24:51 -07:00
Jay Foad
2582d11f1a
[ADT] Always use 32-bit size type for SmallVector with 16-bit elements (#95536)
`SmallVector` has a special case to allow vector of char to exceed 4 GB
in
size on 64-bit hosts. Apply this special case only for 8-bit element
types, instead of all element types < 32 bits.

This makes `SmallVector<MCPhysReg>` more compact because `MCPhysReg` is
`uint16_t`.

---------

Co-authored-by: Nikita Popov <github@npopov.com>
2024-06-26 21:48:38 +01:00
Kyle Huey
0f24a46238
[llvm-config] Make llvm-config --system-libs obey LLVM_USE_STATIC_ZSTD (#93754)
LLVM's build system does the right thing but LLVM_SYSTEM_LIBS ends up
containing the shared library. Emit the static library instead when
appropriate.

With LLVM_USE_STATIC_ZSTD, before:

khuey@zhadum:~/dev/llvm-project/build$ ./bin/llvm-config --system-libs
-lrt -ldl -lm -lz -lzstd -lxml2

after:

khuey@zhadum:~/dev/llvm-project/build$ ./bin/llvm-config --system-libs
-lrt -ldl -lm -lz /usr/local/lib/libzstd.a -lxml2
2024-06-26 11:16:29 -07:00
Nikita Popov
60bdcc02ad [GraphWriter] Add missing ManagedStatic.h include (NFC)
This include is only necessary on __APPLE__.
2024-06-21 16:16:04 +02:00
Nikita Popov
30299b8717 [CommandLine] Avoid ManagedStatic.h include (NFC)
The two variables using ManagedStatic that are exported by this
header are not actually used anywhere -- they are used through
SubCommand::getTopLevel() and SubCommand::getAll() instead.
Drop the extern declarations and the include.
2024-06-21 15:45:17 +02:00
Nikita Popov
48ef912e2b [VFS] Avoid <stack> include (NFC)
Directly use a vector instead of wrapping it in a stack, like we
do in most places.
2024-06-21 15:17:41 +02:00
Alexandre Ganea
67226bad15
[Support] Vendor rpmalloc in-tree and use it for the Windows 64-bit release (#91862)
### Context

We have a longstanding performance issue on Windows, where to this day,
the default heap allocator is still lockfull. With the number of cores
increasing, building and using LLVM with the default Windows heap
allocator is sub-optimal. Notably, the ThinLTO link times with LLD are
extremely long, and increase proportionally with the number of cores in
the machine.

In
a6a37a2fcd,
I introduced the ability build LLVM with several popular lock-free
allocators. Downstream users however have to build their own toolchain
with this option, and building an optimal toolchain is a bit tedious and
long. Additionally, LLVM is now integrated into Visual Studio, which
AFAIK re-distributes the vanilla LLVM binaries/installer. The point
being that many users are impacted and might not be aware of this
problem, or are unable to build a more optimal version of the toolchain.

The symptom before this PR is that most of the CPU time goes to the
kernel (darker blue) when linking with ThinLTO:


![16c_ryzen9_windows_heap](https://github.com/llvm/llvm-project/assets/37383324/86c3f6b9-6028-4c1a-ba60-a2fa3876fba7)

With this PR, most time is spent in user space (light blue):


![16c_ryzen9_rpmalloc](https://github.com/llvm/llvm-project/assets/37383324/646b88f3-5b6d-485d-a2e4-15b520bdaf5b)

On higher core count machines, before this PR, the CPU usage becomes
pretty much flat because of contention:

<img width="549" alt="VM_176_windows_heap"
src="https://github.com/llvm/llvm-project/assets/37383324/f27d5800-ee02-496d-a4e7-88177e0727f0">


With this PR, similarily most CPU time is now used:

<img width="549" alt="VM_176_with_rpmalloc"
src="https://github.com/llvm/llvm-project/assets/37383324/7d4785dd-94a7-4f06-9b16-aaa4e2e505c8">

### Changes in this PR

The avenue I've taken here is to vendor/re-licence rpmalloc in-tree, and
use it when building the Windows 64-bit release. Given the permissive
rpmalloc licence, prior discussions with the LLVM foundation and
@lattner suggested this vendoring. Rpmalloc's author (@mjansson) kindly
agreed to ~~donate~~ re-licence the rpmalloc code in LLVM (please do
correct me if I misinterpreted our past communications).

I've chosen rpmalloc because it's small and gives the best value
overall. The source code is only 4 .c files. Rpmalloc is statically
replacing the weak CRT alloc symbols at link time, and has no dynamic
patching like mimalloc. As an alternative, there were several
unsuccessfull attempts made by Russell Gallop to use SCUDO in the past,
please see thread in https://reviews.llvm.org/D86694. If later someone
comes up with a PR of similar performance that uses SCUDO, we could then
delete this vendored rpmalloc folder.

I've added a new cmake flag `LLVM_ENABLE_RPMALLOC` which essentialy sets
`LLVM_INTEGRATED_CRT_ALLOC` to the in-tree rpmalloc source.

### Performance

The most obvious test is profling a ThinLTO linking step with LLD. I've
used a Clang compilation as a testbed, ie.
```
set OPTS=/GS- /D_ITERATOR_DEBUG_LEVEL=0 -Xclang -O3 -fstrict-aliasing -march=native -flto=thin -fwhole-program-vtables -fuse-ld=lld
cmake -G Ninja %ROOT%/llvm -DCMAKE_BUILD_TYPE=Release -DLLVM_ENABLE_ASSERTIONS=TRUE -DLLVM_ENABLE_PROJECTS="clang" -DLLVM_ENABLE_PDB=ON -DLLVM_OPTIMIZED_TABLEGEN=ON -DCMAKE_C_COMPILER=clang-cl.exe -DCMAKE_CXX_COMPILER=clang-cl.exe -DCMAKE_LINKER=lld-link.exe -DLLVM_ENABLE_LLD=ON -DCMAKE_CXX_FLAGS="%OPTS%" -DCMAKE_C_FLAGS="%OPTS%" -DLLVM_ENABLE_LTO=THIN
```
I've profiled the linking step with no LTO cache, with Powershell, such
as:
```
Measure-Command { lld-link /nologo @CMakeFiles\clang.rsp /out:bin\clang.exe /implib:lib\clang.lib /pdb:bin\clang.pdb /version:0.0 /machine:x64 /STACK:10000000 /DEBUG /OPT:REF /OPT:ICF /INCREMENTAL:NO /subsystem:console /MANIFEST:EMBED,ID=1 }`
```

Timings:

| Machine | Allocator | Time to link |
|--------|--------|--------|
| 16c/32t AMD Ryzen 9 5950X | Windows Heap | 10 min 38 sec |
|  | **Rpmalloc** | **4 min 11 sec** |
| 32c/64t AMD Ryzen Threadripper PRO 3975WX | Windows Heap | 23 min 29
sec |
|  | **Rpmalloc** | **2 min 11 sec** |
|  | **Rpmalloc + /threads:64** | **1 min 50 sec** |
| 176 vCPU (2 socket) Intel Xeon Platinium 8481C (fixed clock 2.7 GHz) |
Windows Heap | 43 min 40 sec |
|  | **Rpmalloc** | **1 min 45 sec** |

This also improves the overall performance when building with clang-cl.
I've profiled a regular compilation of clang itself, ie:
```
set OPTS=/GS- /D_ITERATOR_DEBUG_LEVEL=0 /arch:AVX -fuse-ld=lld
cmake -G Ninja %ROOT%/llvm -DCMAKE_BUILD_TYPE=Release -DLLVM_ENABLE_ASSERTIONS=TRUE -DLLVM_ENABLE_PROJECTS="clang;lld" -DLLVM_ENABLE_PDB=ON -DLLVM_OPTIMIZED_TABLEGEN=ON -DCMAKE_C_COMPILER=clang-cl.exe -DCMAKE_CXX_COMPILER=clang-cl.exe -DCMAKE_LINKER=lld-link.exe -DLLVM_ENABLE_LLD=ON -DCMAKE_CXX_FLAGS="%OPTS%" -DCMAKE_C_FLAGS="%OPTS%"
```
This saves approx. 30 sec when building on the Threadripper PRO 3975WX:
```
(default Windows Heap)
C:\src\git\llvm-project>hyperfine -r 5 -p "make_llvm.bat stage1_test2" "ninja clang -C stage1_test2"
Benchmark 1: ninja clang -C stage1_test2
  Time (mean ± σ):     392.716 s ±  3.830 s    [User: 17734.025 s, System: 1078.674 s]
  Range (min … max):   390.127 s … 399.449 s    5 runs

(rpmalloc)
C:\src\git\llvm-project>hyperfine -r 5 -p "make_llvm.bat stage1_test2" "ninja clang -C stage1_test2"
Benchmark 1: ninja clang -C stage1_test2
  Time (mean ± σ):     360.824 s ±  1.162 s    [User: 15148.637 s, System: 905.175 s]
  Range (min … max):   359.208 s … 362.288 s    5 runs
```
2024-06-20 10:54:02 -04:00
Brendan Duke
f991ebbb46
[Support] Add llvm::xxh3_128bits (#95863)
Add a 128-bit xxhash function, following the existing
`llvm::xxh3_64bits` and `llvm::xxHash` implementations. Previously,
48e93f57f1ee914ca29aa31bf2ccd916565a3610 added support for
`llvm::xxh3_64bits`, which closely follows the upstream implementation
at https://github.com/Cyan4973/xxHash, with simplifications from Devin
Hussey's xxhash-clean.
However, it is desirable to have a larger 128-bit hash key for use cases
such as filesystem checksums where chance of collision needs to be
negligible.

So to that end this also ports over the 128-bit xxh3_128bits as
`llvm::xxh3_128bits`.

Testing:
- Add a test based on xsum_sanity_check.c in upstream xxhash.
2024-06-19 15:24:54 -04:00
Xuan Zhang
d9a00ed366
[MachineOutliner] Leaf Descendants (#90275)
This PR  depends on https://github.com/llvm/llvm-project/pull/90264

In the current implementation, only leaf children of each internal node
in the suffix tree are included as candidates for outlining. But all
leaf descendants are outlining candidates, which we include in the new
implementation. This is enabled on a flag `outliner-leaf-descendants`
which is default to be true.

The reason for _enabling this on a flag_ is because machine outliner is
not the only pass that uses suffix tree.

The reason for _having this default to be true_ is because including all
leaf descendants show consistent size win.
* For Clang/LLD, it shows around 3% reduction in text segment size when
compared to the baseline `-Oz` linker binary.
 * For selected benchmark tests in LLVM test suite 
 
| run (CTMark/) | only leaf children | all leaf descendants | reduction
% |

|------------------|--------------------|----------------------|-------------|
| lencod | 349624 | 348564 | -0.2004% |
| SPASS | 219672 | 218440 | -0.4738% |
| kc | 271956 | 250068 | -0.4506% |
| sqlite3 | 223920 | 222484 | -0.5471% |
| 7zip-benchmark | 405364 | 401244 | -0.3428% |
| bullet | 139820 | 138340 | -0.8315% |
| consumer-typeset | 295684 | 286628 | -1.2295% |
| pairlocalalign | 72236 | 71936 | -0.2164% |
| tramp3d-v4 | 189572 | 183676 | -2.9668% |

This is part of an enhanced version of machine outliner -- see
[RFC](https://discourse.llvm.org/t/rfc-enhanced-machine-outliner-part-1-fulllto-part-2-thinlto-nolto-to-come/78732).
2024-06-18 07:13:05 -07:00
Balazs Benics
8fc9c03cde
[analyzer] Revert Z3 changes (#95916)
Requested in:
https://github.com/llvm/llvm-project/pull/95128#issuecomment-2176008007

Revert "[analyzer] Harden safeguards for Z3 query times"
Revert "[analyzer][NFC] Reorganize Z3 report refutation"

This reverts commit eacc3b3504be061f7334410dd0eb599688ba103a.
This reverts commit 89c26f6c7b0a6dfa257ec090fcf5b6e6e0c89aab.
2024-06-18 14:59:28 +02:00
Balazs Benics
89c26f6c7b
[analyzer][NFC] Reorganize Z3 report refutation
This change keeps existing behavior, namely that if we hit a Z3 timeout
we will accept the report as "satisfiable".

This prepares for the commit "Harden safeguards for Z3 query times".
https://discourse.llvm.org/t/analyzer-rfc-taming-z3-query-times/79520

Reviewers: NagyDonat, haoNoQ, Xazax-hun, mikhailramalho, Szelethus

Reviewed By: NagyDonat

Pull Request: https://github.com/llvm/llvm-project/pull/95128
2024-06-18 09:42:29 +02:00
Ahmed Bougacha
61069bd5a3
[Support] Add SipHash-based 16-bit ptrauth ABI-stable hash. (#93902)
This finally wraps the now-lightly-modified SipHash C reference
implementation, for the main interface we need (16-bit ptrauth
discriminators).

The exact algorithm is the little-endian interpretation of the
non-doubled (i.e. 64-bit) result of applying a SipHash-2-4 using the
constant seed `b5d4c9eb79104a796fec8b1b428781d4` (big-endian), with the
result reduced by modulo to the range of non-zero discriminators (i.e.
`(rawHash % 65535) + 1`).

By "stable" we mean that the result of this hash algorithm will the same
across different compiler versions and target platforms.

The 16-bit hashes are used extensively for the AArch64 ptrauth ABI,
because AArch64 can efficiently load a 16-bit immediate into the high
bits of a register without disturbing the remainder of the value, which
serves as a nice blend operation.

16 bits is also sufficiently compact to not inflate a loader relocation.
We disallow zero to guarantee a different discriminator from the places
in the ABI that use a constant zero.

Co-authored-by: John McCall <rjmccall@apple.com>
2024-06-14 17:57:12 -07:00
Ahmed Bougacha
577c3f114b
[Support] Integrate SipHash.cpp into libSupport. (#94394)
Start building it as part of the library, with some minor
tweaks compared to the reference implementation:
- clang-format to match libSupport
- remove tracing support
- add file header
- templatize cROUNDS/dROUNDS, as well as 8B/16B result length
- replace assert with static_assert
- use LLVM_FALLTHROUGH

This also exports interfaces for SipHash-2-4-64/-128, and
tests them using the reference test vectors.
2024-06-14 17:07:41 -07:00
Ahmed Bougacha
cfbed2c0e6
[Support] Import SipHash c reference implementation. (#94393)
This brings the unmodified SipHash reference implementation:
  https://github.com/veorq/SipHash
which has been very graciously licensed under our llvm license
(Apache-2.0 WITH LLVM-exception) by Jean-Philippe Aumasson.

SipHash is a lightweight hash function optimized for speed on short
messages. We use it as part of the AArch64 ptrauth ABI (in arm64e and
ELF PAuth) to generate discriminators based on language identifiers and
mangled names.

This commit brings the unmodified reference implementation and tests
as of f26d35e, specifically siphash.c and vectors.h, as SipHash.cpp and
SipHashTest.cpp.

Next, we will integrate it properly into libSupport, with a wrapping API
suited for the ptrauth use-case.
2024-06-14 16:56:04 -07:00
Jeremy Day
cc7a18c180
Set Support system_libs if WIN32, not just MSVC or MINGW (#95505)
The previous check was false when compiling with `clang++`, which
prevented `ntdll` from being specified as a link library, causing an
undefined symbol error when trying to resolve `RtlGetLastNtStatus`.
Since we always want to link these libraries on Windows, the check can
be simplified to just `if( WIN32 )`.
2024-06-14 22:17:39 +03:00
Nikita Popov
44df1167f8
[Error] Add non-consuming toString (#95375)
There are some places that want to convert an Error to string, but still
retain the original Error object, for example to emit a non-fatal
warning.

This currently isn't possible, because the entire Error infra is
move-based. And what people end up doing in this case is to move the
Error... twice.

This patch introduces a toStringWithoutConsuming() function to
accommodate this use case. This also requires some infrastructure that
allows visiting Errors without consuming them.
2024-06-14 11:35:27 +02:00
Durgadoss R
880d37038c
[APFloat] Add APFloat support for FP4 data type (#95392)
This patch adds APFloat type support for the E2M1
FP4 datatype. The definitions for this format are
detailed in section 5.3.3 of the OCP specification,
which can be accessed here:

https://www.opencompute.org/documents/ocp-microscaling-formats-mx-v1-0-spec-final-pdf

Signed-off-by: Durgadoss R <durgadossr@nvidia.com>
2024-06-14 14:17:37 +05:30
Jay Foad
d4a0154902
[llvm-project] Fix typo "seperate" (#95373) 2024-06-13 20:20:27 +01:00
Simon Pilgrim
fa9301f35b [KnownBits] avgCompute - don't create on-the-fly Carry. NFC.
Use the internal computeForAddCarry directly since we know the exact values of the carry bit.
2024-06-13 10:17:15 +01:00
Tom Stellard
58df646714
[Support/BLAKE3] Remove some dead stores (#95120)
These were caught by the clang static analyzer.
2024-06-12 22:00:11 -07:00