This is part of a series of patches that tries to improve DILocation bug
detection in Debugify. This first patch adds the necessary CMake flag to
LLVM and a variable defined by that flag to LLVM's config header, allowing
the next patch to track information without affecting normal builds.
This series of patches adds a "DebugLoc coverage tracking" feature, that
inserts conditionally-compiled tracking information into DebugLocs (and
by extension, to Instructions), which is used by Debugify to provide
more accurate and detailed coverage reports. When enabled, this features
tracks whether and why we have intentionally dropped a DebugLoc,
allowing Debugify to ignore false positives. An optional additional
feature allows also storing a stack trace of the point where a DebugLoc
was unintentionally dropped/not generated, which is used to make fixing
detected errors significantly easier. The goal of these features is to
provide useful tools for developers to fix existing DebugLoc errors and
allow reliable detection of regressions by either manual inspection or
an automated script.
...and add shared libs as a suggestion.
* Mark options, option values and program names as plain text.
* Add a blank line between the option and the explanatory text
so that it doesn't get printed on the same line.
(this seems to be the original intent of the rst source anyway)
* Update the phrasing of a couple of the options.
* Add BUILD_SHARED_LIBS to suggestions.
* Add line breaks so it's clear what should be passed to CMake.
* Make the note into an RST note block.
* Fix a couple of markdown style plain text markers.
### Context
We have a longstanding performance issue on Windows, where to this day,
the default heap allocator is still lockfull. With the number of cores
increasing, building and using LLVM with the default Windows heap
allocator is sub-optimal. Notably, the ThinLTO link times with LLD are
extremely long, and increase proportionally with the number of cores in
the machine.
In
a6a37a2fcd,
I introduced the ability build LLVM with several popular lock-free
allocators. Downstream users however have to build their own toolchain
with this option, and building an optimal toolchain is a bit tedious and
long. Additionally, LLVM is now integrated into Visual Studio, which
AFAIK re-distributes the vanilla LLVM binaries/installer. The point
being that many users are impacted and might not be aware of this
problem, or are unable to build a more optimal version of the toolchain.
The symptom before this PR is that most of the CPU time goes to the
kernel (darker blue) when linking with ThinLTO:

With this PR, most time is spent in user space (light blue):

On higher core count machines, before this PR, the CPU usage becomes
pretty much flat because of contention:
<img width="549" alt="VM_176_windows_heap"
src="https://github.com/llvm/llvm-project/assets/37383324/f27d5800-ee02-496d-a4e7-88177e0727f0">
With this PR, similarily most CPU time is now used:
<img width="549" alt="VM_176_with_rpmalloc"
src="https://github.com/llvm/llvm-project/assets/37383324/7d4785dd-94a7-4f06-9b16-aaa4e2e505c8">
### Changes in this PR
The avenue I've taken here is to vendor/re-licence rpmalloc in-tree, and
use it when building the Windows 64-bit release. Given the permissive
rpmalloc licence, prior discussions with the LLVM foundation and
@lattner suggested this vendoring. Rpmalloc's author (@mjansson) kindly
agreed to ~~donate~~ re-licence the rpmalloc code in LLVM (please do
correct me if I misinterpreted our past communications).
I've chosen rpmalloc because it's small and gives the best value
overall. The source code is only 4 .c files. Rpmalloc is statically
replacing the weak CRT alloc symbols at link time, and has no dynamic
patching like mimalloc. As an alternative, there were several
unsuccessfull attempts made by Russell Gallop to use SCUDO in the past,
please see thread in https://reviews.llvm.org/D86694. If later someone
comes up with a PR of similar performance that uses SCUDO, we could then
delete this vendored rpmalloc folder.
I've added a new cmake flag `LLVM_ENABLE_RPMALLOC` which essentialy sets
`LLVM_INTEGRATED_CRT_ALLOC` to the in-tree rpmalloc source.
### Performance
The most obvious test is profling a ThinLTO linking step with LLD. I've
used a Clang compilation as a testbed, ie.
```
set OPTS=/GS- /D_ITERATOR_DEBUG_LEVEL=0 -Xclang -O3 -fstrict-aliasing -march=native -flto=thin -fwhole-program-vtables -fuse-ld=lld
cmake -G Ninja %ROOT%/llvm -DCMAKE_BUILD_TYPE=Release -DLLVM_ENABLE_ASSERTIONS=TRUE -DLLVM_ENABLE_PROJECTS="clang" -DLLVM_ENABLE_PDB=ON -DLLVM_OPTIMIZED_TABLEGEN=ON -DCMAKE_C_COMPILER=clang-cl.exe -DCMAKE_CXX_COMPILER=clang-cl.exe -DCMAKE_LINKER=lld-link.exe -DLLVM_ENABLE_LLD=ON -DCMAKE_CXX_FLAGS="%OPTS%" -DCMAKE_C_FLAGS="%OPTS%" -DLLVM_ENABLE_LTO=THIN
```
I've profiled the linking step with no LTO cache, with Powershell, such
as:
```
Measure-Command { lld-link /nologo @CMakeFiles\clang.rsp /out:bin\clang.exe /implib:lib\clang.lib /pdb:bin\clang.pdb /version:0.0 /machine:x64 /STACK:10000000 /DEBUG /OPT:REF /OPT:ICF /INCREMENTAL:NO /subsystem:console /MANIFEST:EMBED,ID=1 }`
```
Timings:
| Machine | Allocator | Time to link |
|--------|--------|--------|
| 16c/32t AMD Ryzen 9 5950X | Windows Heap | 10 min 38 sec |
| | **Rpmalloc** | **4 min 11 sec** |
| 32c/64t AMD Ryzen Threadripper PRO 3975WX | Windows Heap | 23 min 29
sec |
| | **Rpmalloc** | **2 min 11 sec** |
| | **Rpmalloc + /threads:64** | **1 min 50 sec** |
| 176 vCPU (2 socket) Intel Xeon Platinium 8481C (fixed clock 2.7 GHz) |
Windows Heap | 43 min 40 sec |
| | **Rpmalloc** | **1 min 45 sec** |
This also improves the overall performance when building with clang-cl.
I've profiled a regular compilation of clang itself, ie:
```
set OPTS=/GS- /D_ITERATOR_DEBUG_LEVEL=0 /arch:AVX -fuse-ld=lld
cmake -G Ninja %ROOT%/llvm -DCMAKE_BUILD_TYPE=Release -DLLVM_ENABLE_ASSERTIONS=TRUE -DLLVM_ENABLE_PROJECTS="clang;lld" -DLLVM_ENABLE_PDB=ON -DLLVM_OPTIMIZED_TABLEGEN=ON -DCMAKE_C_COMPILER=clang-cl.exe -DCMAKE_CXX_COMPILER=clang-cl.exe -DCMAKE_LINKER=lld-link.exe -DLLVM_ENABLE_LLD=ON -DCMAKE_CXX_FLAGS="%OPTS%" -DCMAKE_C_FLAGS="%OPTS%"
```
This saves approx. 30 sec when building on the Threadripper PRO 3975WX:
```
(default Windows Heap)
C:\src\git\llvm-project>hyperfine -r 5 -p "make_llvm.bat stage1_test2" "ninja clang -C stage1_test2"
Benchmark 1: ninja clang -C stage1_test2
Time (mean ± σ): 392.716 s ± 3.830 s [User: 17734.025 s, System: 1078.674 s]
Range (min … max): 390.127 s … 399.449 s 5 runs
(rpmalloc)
C:\src\git\llvm-project>hyperfine -r 5 -p "make_llvm.bat stage1_test2" "ninja clang -C stage1_test2"
Benchmark 1: ninja clang -C stage1_test2
Time (mean ± σ): 360.824 s ± 1.162 s [User: 15148.637 s, System: 905.175 s]
Range (min … max): 359.208 s … 362.288 s 5 runs
```
Building the Apple way turns off plugin support, meaning we don't need
to export unloadable symbols from all executables. While deadstripping
effects aren't expected to change, enabling this across all tools
prevents the creation of export tries. This saves us ~3.5 MB in just the
universal build of `clang`.
Add the ability to set the number of tablegen jobs that can run in
parallel
similar to the LLVM_PARALLEL_[COMPILE|LINK]_JOBS options that already
exist.
Currently `add_lit_target` sets the `USES_TERMINAL` CMake option. When
using Ninja, this forces all lit testsuite targets into the
single-threaded `console` pool.
This PR adds a new option `LLVM_PARALLEL_LIT` which drops the
`USES_TERMINAL` flag, allowing Ninja to run them in parallel.
The default setting (`LLVM_PARALLEL_LIT=OFF`) retains the existing
behavior of serial testsuite execution.
Both LLVM_LINK_LLVM_DYLIB and LLVM_PARALLEL_LINK_JOBS help with some
common gotchas. It seems worth documenting them here explicitly.
Based on a review comment, also "refactor" the documentation to avoid duplication.
This patch makes it so that specifying an invalid value for
CMAKE_BUILD_TYPE is a fatal error. Having this simply as a warning has
caused me (and probably others) a decent amount of headache. The check
was present before, but was proposed to be modified to a warning in
https://github.com/llvm/llvm-project/issues/60975 and changed to a
warning in c75dbeda15c10424910ddc83a9ff7669776c19ac. This patch
reenables that behavior to hopefully reduce frustration for people
building LLVM in the common case while still allowing for alternative
build types to be setup without needing to perform source modification
through the addition of a CMake flag.
This PR adds options to let CMake calculate the ninja job pools
depending on free memory and available cores.
You can provide memory requirements for each compile and link job which
is checked against CMake AVAILABLE_PHYSICAL_MEMORY and
NUMBER_OF_LOGICAL_CORES. [This information are available since CMake
3.0](https://cmake.org/cmake/help/v3.0/command/cmake_host_system_information.html).
This is very helpful in CI environments with multiple jobs per
environment or a VM with multiple users.
Its different to LLVM_PARALLEL_LINK_JOBS / LLVM_PARALLEL_COMPILE_JOBS
(or ninja -j 1) because it tries to use the resources more efficient
without being terminated. Only downside currently is that compile and
link jobs can run at the same time so there is an offset for link job
memory suggested which is added to the documentation.
The definitions aren't added as cache because if I understand it
correctly this would break it because values could be outdated.
`CMAKE_{C/CXX}_FLAGS` affects all targets in LLVM. This can
be undesirable in situations, like the case of enabling thinLTO,
where `-flto` is added to every source file. In reality, we only
care about optimizing a select few of binaries, such as clang or lld,
that dominate the compilation pipeline. Auxiliary binaries in a
distribution and not on the critical path can be kept non-optimized.
This PR adds support of per-target linker flags, which can solve the
thinLTO problem by negating the effects of LTO via targeted linker
flags on the targets. The example of negating thinLTO
above can be done by doing the following:
```
set(LLVM_llvm-dwarfdump_LINKER_FLAGS "-Wl,--lto-O0" CACHE STRING "Custom linker flags to llvm-dwarfdump")
set(LLVM_lldb_LINKER_FLAGS "-Wl,--lto-O0" CACHE STRING "Custom linker flags to lldb")
```
There's other applications where this could be used (e.g. avoid
optimizing host tools for build speed improvement etc.).
I've generalized this so that users can apply their desired flags to
targets that are generated by `llvm_add_library` or
`add_llvm_executable`.
Internally, our toolchain builds were on average 1.4x faster when
selectively choosing the binaries that we want optimized.
When custom install names and rpaths setups are used they may not
work in the build tree as-is (namely when using absolute paths for
install names in order to avoid rpath juggling in downstream projects).
Add a flag for opting out of this behaviour.
See: https://reviews.llvm.org/D42463
This patch adds a LLVM_FORCE_VC_REVISION option to force a custom
VC revision to be included instead of trying to fetch one from a
git command. This is helpful in environments where git is not
available or is non-functional but the vc revision is available
through some other means.
This has been deprecated in favor of CMake's CMAKE_MSVC_RUNTIME_LIBRARY
in c6bd873403a8ac6538a3fe3b3c2fe39c16b146a6 .
Current release branch still contains it in deprecated status so no
existing end-users will be affected.
This patch is the first part of https://llvm.org/OpenProjects.html#llvm_patch_coverage.
We have first define a new variable LLVM_TEST_COVERAGE which when set, pass --per-test-coverage option to
llvm-lit which will help in setting a unique value to LLVM_PROFILE_FILE for each RUN. So for example
coverage data for test case llvm/test/Analysis/AliasSet/memtransfer.ll will be emitted as
build/test/Analysis/AliasSet/memtransfer0.profraw
Reviewed By: hnrklssn
Differential Revision: https://reviews.llvm.org/D154280
This patch is the first part of https://llvm.org/OpenProjects.html#llvm_patch_coverage.
We have first define a new variable LLVM_TEST_COVERAGE which when set, pass --emit-coverage option to
llvm-lit which will help in setting a unique value to LLVM_PROFILE_FILE for each RUN. So for example
coverage data for test case llvm/test/Analysis/AliasSet/memtransfer.ll will be emitted as
build/test/Analysis/AliasSet/memtransfer.profraw
Reviewed By: hnrklssn
Differential Revision: https://reviews.llvm.org/D154280
With the new behaviour, the /MD or similar options aren't added to
e.g. CMAKE_CXX_FLAGS_RELEASE, but are added separately by CMake.
They can be changed by the cmake variable
CMAKE_MSVC_RUNTIME_LIBRARY or with the target property
MSVC_RUNTIME_LIBRARY.
LLVM has had its own custom CMake flags, e.g. LLVM_USE_CRT_RELEASE,
which affects which CRT is used for release mode builds. Deprecate
these and direct users to use CMAKE_MSVC_RUNTIME_LIBRARY directly
instead (and do a best effort attempt at setting CMAKE_MSVC_RUNTIME_LIBRARY
based on the existing LLVM_USE_CRT_ flags). This only handles the
simple cases, it doesn't handle multi-config generators with
different LLVM_USE_CRT_* variables for different configs though,
but that's probably fine - we should move over to the new upstream
CMake mechanism anyway, and push users towards that.
Change code in compiler-rt, that previously tried to override the
CRT choice to /MT, to set CMAKE_MSVC_RUNTIME_LIBRARY instead of
meddling in the old variables.
This resolves the policy issue in
https://github.com/llvm/llvm-project/issues/63286, and should
handle the issues that were observed originally when the
minimum CMake version was bumped, in
https://github.com/llvm/llvm-project/issues/62719 and
https://github.com/llvm/llvm-project/issues/62739.
Differential Revision: https://reviews.llvm.org/D155233
This patch adds in CMake option LLVM_ENABLE_LLVM_LIBC which when set to
true automatically builds LLVM libc in overlay mode and links all
generated executables against the libc overlay. This is intended to
somewhat mirror the LLVM_ENABLE_LIBCXX flag.
Differential Revision: https://reviews.llvm.org/D151013
This patch adds LLVM_ENABLE_HTTPLIB to the list of CMake options to make
it more clear exactly what it does and also provide clarity on which
specific project it is referring to/installation.
Reviewed By: phosek
Differential Revision: https://reviews.llvm.org/D152060
This reverts commit d763c6e5e2d0a6b34097aa7dabca31e9aff9b0b6.
Adds the patch by @hans from
https://github.com/llvm/llvm-project/issues/62719
This patch fixes the Windows build.
d763c6e5e2d0a6b34097aa7dabca31e9aff9b0b6 reverted the reviews
D144509 [CMake] Bumps minimum version to 3.20.0.
This partly undoes D137724.
This change has been discussed on discourse
https://discourse.llvm.org/t/rfc-upgrading-llvms-minimum-required-cmake-version/66193
Note this does not remove work-arounds for older CMake versions, that
will be done in followup patches.
D150532 [OpenMP] Compile assembly files as ASM, not C
Since CMake 3.20, CMake explicitly passes "-x c" (or equivalent)
when compiling a file which has been set as having the language
C. This behaviour change only takes place if "cmake_minimum_required"
is set to 3.20 or newer, or if the policy CMP0119 is set to new.
Attempting to compile assembly files with "-x c" fails, however
this is workarounded in many cases, as OpenMP overrides this with
"-x assembler-with-cpp", however this is only added for non-Windows
targets.
Thus, after increasing cmake_minimum_required to 3.20, this breaks
compiling the GNU assembly for Windows targets; the GNU assembly is
used for ARM and AArch64 Windows targets when building with Clang.
This patch unbreaks that.
D150688 [cmake] Set CMP0091 to fix Windows builds after the cmake_minimum_required bump
The build uses other mechanism to select the runtime.
Fixes#62719
Reviewed By: #libc, Mordante
Differential Revision: https://reviews.llvm.org/D151344
This reverts commit 65429b9af6a2c99d340ab2dcddd41dab201f399c.
Broke several projects, see https://reviews.llvm.org/D144509#4347562 onwards.
Also reverts follow-up commit "[OpenMP] Compile assembly files as ASM, not C"
This reverts commit 4072c8aee4c89c4457f4f30d01dc9bb4dfa52559.
Also reverts fix attempt "[cmake] Set CMP0091 to fix Windows builds after the cmake_minimum_required bump"
This reverts commit 7d47dac5f828efd1d378ba44a97559114f00fb64.
Funnel fetching and building LLVM instructions into GettingStarted.
Modernize the build steps a little.
Remove comments saying CMAKE_BUILD_TYPE defaults to Debug as that's not true anymore (must explicitly pass it).
Reviewed By: MaskRay, hans
Differential Revision: https://reviews.llvm.org/D145413
Some build bots have not been updated to the new minimal CMake version.
Reverting for now and ping the buildbot owners.
This reverts commit 44c6b905f8527635e49bb3ea97dea315f92d38ec.
This partly undoes D137724.
This change has been discussed on discourse
https://discourse.llvm.org/t/rfc-upgrading-llvms-minimum-required-cmake-version/66193
Note this does not remove work-arounds for older CMake versions, that
will be done in followup patches.
Reviewed By: mehdi_amini, MaskRay, ChuanqiXu, to268, thieta, tschuett, phosek, #libunwind, #libc_vendors, #libc, #libc_abi, sivachandra, philnik, zibi
Differential Revision: https://reviews.llvm.org/D144509
CMake supports CMAKE_CXX_COMPILER_LAUNCHER since CMake 3.4
so this custom CMake logic we had in LLVM can now be removed.
The only downside with this is that we can't set ccache
options from LLVM CMake, but it's arguable that this doesn't
belong in LLVM but should be done in a script calling the
build.
This was discussed in the forums here:
https://discourse.llvm.org/t/tips-for-incremental-building/67289/4?u=tobiashieta
Reviewed By: phosek
Differential Revision: https://reviews.llvm.org/D143468