- _Float16 is now accepted by Clang.
- The half IR type is fully handled by the backend.
- These values are passed in FP registers and converted to/from float around
each operation.
- Compiler-rt conversion functions are now built for s390x including the missing
extendhfdf2 which was added.
Fixes#50374
[compiler-rt] The test `addtf3_test.c` is currently guarded by `#if
defined(CRT_HAS_IEEE_TF)`, a macro that is declared in `int_lib.h`.
However, `int_lib.h` is included *after* the preprocessor check, which
results in the macro not being defined in time and causes the test to
always be skipped.
This patch moves the includes of `fp_test.h` and `int_lib.h` to the top
of the file so that `CRT_HAS_IEEE_TF` is defined before it is checked.
Co-authored-by: Kostiantyn Lazukin <koslaz01@ip-10-252-21-142.eu-west-1.compute.internal>
### Description
This PR resolves a deadlock between AddressSanitizer (ASan) and
LeakSanitizer (LSan)
that occurs when both sanitizers attempt to acquire locks in conflicting
orders across
threads. The fix ensures safe lock acquisition ordering by preloading
module information
before error reporting.
---
### Issue Details
**Reproducer**
```cpp
// Thread 1: ASan error path
int arr[1] = {0};
std::thread t([&]() {
arr[1] = 1; // Triggers ASan OOB error
});
// Thread 2: LSan check path
__lsan_do_leak_check();
```
**Lock Order Conflict**:
- Thread 1 (ASan error reporting):
1. Acquires ASan thread registry lock (B)
1. Attempts to acquire libdl lock (A) via `dl_iterate_phdr`
- Thread 2 (LSan leak check):
1. Acquires libdl lock (A) via `dl_iterate_phdr`
1. Attempts to acquire ASan thread registry lock (B)
This creates a circular wait condition (A -> B -> A) meeting all four
Coffman deadlock criteria.
---
### Fix Strategy
The root cause lies in ASan's error reporting path needing
`dl_iterate_phdr` (requiring lock A)
while already holding its thread registry lock (B). The solution:
1. **Preload Modules Early**: Force module list initialization _before_
acquiring ASan's thread lock
2. **Avoid Nested Locking**: Ensure symbolization (via dl_iterate_phdr)
completes before error reporting locks
Key code change:
```cpp
// Before acquiring ASan's thread registry lock:
Symbolizer::GetOrInit()->GetRefreshedListOfModules();
```
This guarantees module information is cached before lock acquisition,
eliminating
the need for `dl_iterate_phdr` calls during error reporting.
---
### Testing
Added **asan_lsan_deadlock.cpp** test case:
- Reproduces deadlock reliably without fix **under idle system
conditions**
- Uses watchdog thread to detect hangs
- Verifies ASan error reports correctly without deadlock
**Note**: Due to the inherent non-determinism of thread scheduling and
lock acquisition timing,
this test may not reliably reproduce the deadlock on busy systems (e.g.,
during parallel
`ninja check-asan` runs).
---
### Impact
- Fixes rare but severe deadlocks in mixed ASan+LSan environments
- Maintains thread safety guarantees for both sanitizers
- No user-visible behavior changes except deadlock elimination
---
### Relevant Buggy Code
- Code in ASan's asan_report.cpp
```cpp
explicit ScopedInErrorReport(bool fatal = false)
: halt_on_error_(fatal || flags()->halt_on_error) {
// Acquire lock B
asanThreadRegistry().Lock();
}
~ScopedInErrorReport() {
...
// Try to acquire lock A under holding lock B via the following path
// #4 0x000071a353d83e93 in __GI___dl_iterate_phdr (
// callback=0x5d1a07a39580 <__sanitizer::dl_iterate_phdr_cb(dl_phdr_info*, unsigned long, void*)>,
// data=0x6da3510fd3f0) at ./elf/dl-iteratephdr.c:39
// #5 0x00005d1a07a39574 in __sanitizer::ListOfModules::init (this=0x71a353ebc080)
// at llvm-project/compiler-rt/lib/sanitizer_common/sanitizer_linux_libcdep.cpp:784
// #6 0x00005d1a07a429e3 in __sanitizer::Symbolizer::RefreshModules (this=0x71a353ebc058)
// at llvm-project/compiler-rt/lib/sanitizer_common/sanitizer_symbolizer_libcdep.cpp:188
// #7 __sanitizer::Symbolizer::FindModuleForAddress (this=this@entry=0x71a353ebc058,
// address=address@entry=102366378805727)
// at llvm-project/compiler-rt/lib/sanitizer_common/sanitizer_symbolizer_libcdep.cpp:214
// #8 0x00005d1a07a4291b in __sanitizer::Symbolizer::SymbolizePC (this=0x71a353ebc058, addr=102366378805727)
// at llvm-project/compiler-rt/lib/sanitizer_common/sanitizer_symbolizer_libcdep.cpp:88
// #9 0x00005d1a07a40df7 in __sanitizer::(anonymous namespace)::StackTraceTextPrinter::ProcessAddressFrames (
// this=this@entry=0x6da3510fd520, pc=102366378805727)
// at llvm-project/compiler-rt/lib/sanitizer_common/sanitizer_stacktrace_libcdep.cpp:37
// #10 0x00005d1a07a40d27 in __sanitizer::StackTrace::PrintTo (this=this@entry=0x6da3510fd5e8,
// output=output@entry=0x6da3510fd588)
// at llvm-project/compiler-rt/lib/sanitizer_common/sanitizer_stacktrace_libcdep.cpp:110
// #11 0x00005d1a07a410a1 in __sanitizer::StackTrace::Print (this=0x6da3510fd5e8)
// at llvm-project/compiler-rt/lib/sanitizer_common/sanitizer_stacktrace_libcdep.cpp:133
// #12 0x00005d1a0798758d in __asan::ErrorGeneric::Print (
// this=0x5d1a07aa4e08 <__asan::ScopedInErrorReport::current_error_+8>)
// at llvm-project/compiler-rt/lib/asan/asan_errors.cpp:617
current_error_.Print();
...
}
```
- Code in LSan's lsan_common_linux.cpp
```cpp
void LockStuffAndStopTheWorld(StopTheWorldCallback callback,
CheckForLeaksParam *argument) {
// Acquire lock A
dl_iterate_phdr(LockStuffAndStopTheWorldCallback, ¶m);
}
static int LockStuffAndStopTheWorldCallback(struct dl_phdr_info *info,
size_t size, void *data) {
// Try to acquire lock B under holding lock A via the following path
// #3 0x000055555562b34a in __sanitizer::ThreadRegistry::Lock (this=<optimized out>)
// at llvm-project/compiler-rt/lib/asan/../sanitizer_common/sanitizer_thread_registry.h:99
// #4 __lsan::LockThreads () at llvm-project/compiler-rt/lib/asan/asan_thread.cpp:484
// #5 0x0000555555652629 in __lsan::ScopedStopTheWorldLock::ScopedStopTheWorldLock (this=<optimized out>)
// at llvm-project/compiler-rt/lib/lsan/lsan_common.h:164
// #6 __lsan::LockStuffAndStopTheWorldCallback (info=<optimized out>, size=<optimized out>, data=0x0,
// data@entry=0x7fffffffd158) at llvm-project/compiler-rt/lib/lsan/lsan_common_linux.cpp:120
ScopedStopTheWorldLock lock;
DoStopTheWorldParam *param = reinterpret_cast<DoStopTheWorldParam *>(data);
StopTheWorld(param->callback, param->argument);
return 1;
}
```
Because of the nature of this test (mmap stress-test) it tests too large
of memory allocations to be stable on watchos. WatchOS has memory limits
that can lead to the termination of this process before it reaches the
limit set by the flag. Typical only on devices that are under heavy
load. disable on watchOS.
rdar://147222346
This adds an experimental flag that will keep track of where the manual memory poisoning (`__asan_poison_memory_region`) is called from, and print the stack trace if the poisoned region is accessed. (Absent this flag, ASan will tell you what code accessed a poisoned region, but not which code set the poison.)
This implementation performs best-effort record keeping using ring buffers, as suggested by Vitaly. The size of each ring buffer is set by the `poison_history_size` flag.
TestCases/Linux/asan_rt_confict_test-2.cpp started failing in https://lab.llvm.org/buildbot/#/builders/66/builds/12265/steps/9/logs/stdio
The only change is "[LLD][ELF] Allow merging XO and RX sections, and add --[no-]xosegment flag (#132412)" (2c1bdd4a08). Based on the test case (which deliberately tries to mix static and dynamically linked ASan), I suspect it's actually the test case that needs to be fixed (probably with a different error message check).
This patch disables TestCases/Linux/asan_rt_confict_test-2.cpp to make the buildbots green while I investigate.
This is an optional mechanism that automatically detects roots. It's a best-effort mechanism, and its main goal is to *avoid* pointing at the message pump function as a root. This is the function that polls message queue(s) in an infinite loop, and is thus a bad root (it never exits).
High-level, when collection is requested - which should happen when a server has already been set up and handing requests - we spend a bit of time sampling all the server's threads. Each sample is a stack which we insert in a `PerThreadCallsiteTrie`. After a while, we run for each `PerThreadCallsiteTrie` the root detection logic. We then traverse all the `FunctionData`, find the ones matching the detected roots, and allocate a `ContextRoot` for them. From here, we special case `FunctionData` objects, in `__llvm_ctx_profile_get_context, that have a `CtxRoot` and route them to `__llvm_ctx_profile_start_context`.
For this to work, on the llvm side, we need to have all functions call `__llvm_ctx_profile_release_context` because they _might_ be roots. This comes at a slight (percentages) penalty during collection - which we can afford since the overall technique is ~5x faster than normal instrumentation. We can later explore conditionally enabling autoroot detection and avoiding this penalty, if desired.
Note that functions that `musttail call` can't have their return instrumented this way, and a subsequent patch will harden the mechanism against this case.
The mechanism could be used in combination with explicit root specification, too.
…ncorrect name
Clang needs variables to be represented with unique names. This means
that if a variable shadows another, its given a different name
internally to ensure it has a unique name. If ASan tries to use this
name when printing an error, it will print the modified unique name,
rather than the variable's source code name
Fixes#47326
The malloc_zone.cpp test currently fails on Darwin hosts, in
SanitizerCommon
tests with lsan enabled.
Need to XFAIL this test to buy time to investigate this failure. Also
we're trying to bring the number of test failing on Darwin bots to 0, to
get clearer signal of any new failures.
rdar://145873843
Co-authored-by: Mariusz Borsa <m_borsa@apple.com>
`env -u` is not supported by the system `env` utility on AIX.
`/opt/freeware/bin/env` is the standard path for the GNU coreutils `env`
utility as distributed by the AIX Toolbox for Open Source Software.
Adding `/opt/freeware/bin` to `PATH` causes issues by picking up other
utilities that are less capable, in an AIX context, than the system
ones.
This patch modifies the relevant usage of `env` to use (on AIX) the full
path to `/opt/freeware/bin/env`.
Introduced a cmake option that is disabled by default that suppresses
searching via the PATH variable for a symbolizer. The option will be
enabled for downstream builds where the user will need to specify the
symbolizer path more explicitly, e.g., by using ASAN_SYMBOLIZER_PATH.
guard_acquire is a helper function used to implement TSan's
__cxa_guard_acquire and pthread_once interceptors.
https://reviews.llvm.org/D54664 introduced optional hooks to support
cooperative multi-threading. It worked by marking the entire
guard_acquire call as a potentially blocking region.
In principle, only the contended case needs to be a potentially blocking
region. This didn't matter for __cxa_guard_acquire because the compiler
emits an inline fast path before calling __cxa_guard_acquire. That is,
once we call __cxa_guard_acquire at all, we know we're in the contended
case.
https://reviews.llvm.org/D107359 then unified the __cxa_guard_acquire
and pthread_once interceptors, adding the hooks to pthread_once.
However, unlike __cxa_guard_acquire, pthread_once callers are not
expected to have an inline fast path. The fast path is inside the
function.
As a result, TSan unnecessarily calls into the cooperative
multi-threading engine on every pthread_once call, despite applications
generally expecting pthread_once to be fast after initialization. Fix
this by deferring the hooks to the contended case inside guard_acquire.
CFStringCreateWithBytes may not always appear on stack due to
optimizations.
Create a wrapper function for the purposes of testing suppression files
that will always appear on stack for test stability.
Test should be suppressing ASan for a function outside of sanitized
code.
Update function to be extern "C" to match function decoration in
original framework and avoid the leak caused by DemangleCXXABI.
rdar://144800068
Relands #130976 with adjustments to test requirements.
Calls to __noreturn__ functions result in region termination for
coverage mapping. But this creates incorrect coverage results when
__noreturn__ functions (or other constructs that result in region
termination) occur within [GNU statement expressions][1].
In this scenario an extra gap region is introduced within VisitStmt,
such that if the following line does not introduce a new region it
is unconditionally counted as uncovered.
This change adjusts the mapping such that terminate statements
within statement expressions do not propagate that termination
state after the statement expression is processed.
[1]: https://gcc.gnu.org/onlinedocs/gcc/Statement-Exprs.htmlFixes#124296
On AIX, when accessing mmap'ed memory associated to a file on NFS, a
SIGBUS might be raised at random.
The problem is still in open state with the OS team.
This PR teaches the profile runtime, under certain conditions, to avoid
the mmap when reading the profile file during online merging.
This PR has no effect on any platform other than AIX because I'm not
aware of this problem on other platforms.
Other platforms can easily opt-in to this functionality in the future.
The logic in function `is_local_filesystem` was copied from
[llvm/lib/Support/Unix/Path.inc](f388ca3d9d/llvm/lib/Support/Unix/Path.inc (L515))
(https://reviews.llvm.org/D58801), because it seems that the
compiler-rt/profile cannot reuse code from llvm except through
`InstrProfData.inc`.
Thanks to @hubert-reinterpretcast for substantial feedback downstream.
---------
Co-authored-by: Wael Yehia <wyehia@ca.ibm.com>
Co-authored-by: Hubert Tong <hubert.reinterpretcast@gmail.com>
Collect profiles for functions we encounter when collecting a contextual profile, that are not associated with a call site. This is expected to happen for signal handlers, but it also - problematically - currently happens for mem{memset|copy|move|set}, which are currently inserted after profile instrumentation.
Collecting a "regular" flat profile in these cases would hide the problem - that we loose better profile opportunities.
Calls to __noreturn__ functions result in region termination for
coverage mapping. But this creates incorrect coverage results when
__noreturn__ functions (or other constructs that result in region
termination) occur within [GNU statement expressions][1].
In this scenario an extra gap region is introduced within VisitStmt,
such that if the following line does not introduce a new region it
is unconditionally counted as uncovered.
This change adjusts the mapping such that terminate statements
within statement expressions do not propagate that termination
state after the statement expression is processed.
[1]: https://gcc.gnu.org/onlinedocs/gcc/Statement-Exprs.htmlFixes#124296
`detect_leaks` option for asan does not work well on Apple Silicon
(arm64) MacOS devices and results in hundreds of ASan test failures when
run with this option set for all tests.
We should not add this option for tests unless we are targeting an
x86_64 device for Darwin, where this seems to be tested and working
well.
rdar://147069153
On AIX, when accessing mmap'ed memory associated to a file on NFS, a
SIGBUS might be raised at random.
The problem is still in open state with the OS team.
This PR teaches the profile runtime, under certain conditions, to avoid
the mmap when reading the profile file during online merging.
This PR has no effect on any platform other than AIX because I'm not
aware of this problem on other platforms.
Other platforms can easily opt-in to this functionality in the future.
The logic in function `is_local_filesystem` was copied from
[llvm/lib/Support/Unix/Path.inc](f388ca3d9d/llvm/lib/Support/Unix/Path.inc (L515))
(https://reviews.llvm.org/D58801), because it seems that the
compiler-rt/profile cannot reuse code from llvm except through
`InstrProfData.inc`.
Thanks to @hubert-reinterpretcast for substantial feedback downstream.
---------
Co-authored-by: Wael Yehia <wyehia@ca.ibm.com>
Co-authored-by: Hubert Tong <hubert.reinterpretcast@gmail.com>
ld64 issues a warning about section alignment which was counted as an
unexpected exported symbol and the test failed.
Fixed by disabling all linker warnings using -Wl,-w.
PR #124353 introduced the clang option `-fprofile-continuous` to enable
continuous mode. Use this option in all compiler-rt tests, where applicable.
Changes can be summarized as follows:
1) tests that use `-fprofile-instr-generate` (`%clang_profgen`), which
is an option that takes profile file name, are changed like so:
```
-// RUN: %clang_profgen_cont <SOME-OPTIONS> -o %t.exe %s
-// RUN: env LLVM_PROFILE_FILE="%c%t.profraw" %run %t.exe
+// RUN: %clang_profgen=%t.profraw -fprofile-continuous <SOME-OPTIONS> -o %t.exe %s
+// RUN: %run %t.exe
```
2) tests that use `-fprofile-generate` (`%clang_pgogen`), which is an
option that takes a profile directory, are on case-by-case basis. Where
the default name "default_%m.profraw" works, those tests were changed to
use `%clang_pgogen=<dir>`, and the rest (`set-filename.c` and
`get-filename.c`) continued to use the `LLVM_PROFILE_FILE` environment
variable .
3) `set-file-object.c` uses different filename for different run of the
same executable, so it continued to use the `LLVM_PROFILE_FILE`
environment variable.
4) `pid-substitution.c` add a clang_profgen variation.
---------
Co-authored-by: Wael Yehia <wyehia@ca.ibm.com>
When we collect a contextual profile, we sample the threads entering its root and only collect on one at a time (see `ContextRoot::Taken`). If we want to compare profiles between contextual profiles, and/or flat profiles, we have a problem: we don't know how to compare the counter values relative to each other. To that end, we add `ContextRoot::TotalEntries`, which is incremented every time a root is entered and serves as multiplier for the counter values collected under that root.
We expose this in the profile and leave the normalization to the user of the profile, for a few reasons:
* it's only needed if reasoning about all profiles in aggregate.
* the goal, in compiler_rt, is to flush out the profile as quickly as possible, and performing multiplications adds an overhead that may not even be necessary if the consumer of the profile doesn't care about combining profiles
* the information itself may be interesting as an indication of relative sampling of various contexts.
CFStringCreateWithBytes may not always appear on stack due to
optimizations. Create a wrapper function for the purposes of testing
suppression files that will always appear on stack for test stability.
rdar://144800068
Collect flat profiles. We only do this for function activations that aren't otherwise collectible under a context root are encountered.
This allows us to reason about the full profile without concerning ourselves wether we are double-counting. For example we can combine (during profile use) flattened contextual profiles with flat profiles.
The profile format has now a separate section called "Contexts" - there will be a corresponding one for flat profiles. The root has a separate tag because, in addition to not having a callsite ID as all the other context nodes have under it, it will have additional fields in subsequent patches.
The rest of this patch amounts to a bit of refactorings in the reader/writer (for better reuse later) and tests fixups.
Introduce a `ProfileWriter` abstraction to replace the callback passed to `__llvm_ctx_profile_fetch`. Subsequent changes will add support for flat profile collection (as in, collection of non-contextual profile for those functions not under a contextual root), which require also a change in the profile format. The abstraction makes it easy to add "write flat" - related capabilities without constantly complicating the signature of `__llvm_ctx_profile_fetch`.