On AIX, when accessing mmap'ed memory associated to a file on NFS, a
SIGBUS might be raised at random.
The problem is still in open state with the OS team.
This PR teaches the profile runtime, under certain conditions, to avoid
the mmap when reading the profile file during online merging.
This PR has no effect on any platform other than AIX because I'm not
aware of this problem on other platforms.
Other platforms can easily opt-in to this functionality in the future.
The logic in function `is_local_filesystem` was copied from
[llvm/lib/Support/Unix/Path.inc](f388ca3d9d/llvm/lib/Support/Unix/Path.inc (L515))
(https://reviews.llvm.org/D58801), because it seems that the
compiler-rt/profile cannot reuse code from llvm except through
`InstrProfData.inc`.
Thanks to @hubert-reinterpretcast for substantial feedback downstream.
---------
Co-authored-by: Wael Yehia <wyehia@ca.ibm.com>
Co-authored-by: Hubert Tong <hubert.reinterpretcast@gmail.com>
ld64 issues a warning about section alignment which was counted as an
unexpected exported symbol and the test failed.
Fixed by disabling all linker warnings using -Wl,-w.
PR #124353 introduced the clang option `-fprofile-continuous` to enable
continuous mode. Use this option in all compiler-rt tests, where applicable.
Changes can be summarized as follows:
1) tests that use `-fprofile-instr-generate` (`%clang_profgen`), which
is an option that takes profile file name, are changed like so:
```
-// RUN: %clang_profgen_cont <SOME-OPTIONS> -o %t.exe %s
-// RUN: env LLVM_PROFILE_FILE="%c%t.profraw" %run %t.exe
+// RUN: %clang_profgen=%t.profraw -fprofile-continuous <SOME-OPTIONS> -o %t.exe %s
+// RUN: %run %t.exe
```
2) tests that use `-fprofile-generate` (`%clang_pgogen`), which is an
option that takes a profile directory, are on case-by-case basis. Where
the default name "default_%m.profraw" works, those tests were changed to
use `%clang_pgogen=<dir>`, and the rest (`set-filename.c` and
`get-filename.c`) continued to use the `LLVM_PROFILE_FILE` environment
variable .
3) `set-file-object.c` uses different filename for different run of the
same executable, so it continued to use the `LLVM_PROFILE_FILE`
environment variable.
4) `pid-substitution.c` add a clang_profgen variation.
---------
Co-authored-by: Wael Yehia <wyehia@ca.ibm.com>
When we collect a contextual profile, we sample the threads entering its root and only collect on one at a time (see `ContextRoot::Taken`). If we want to compare profiles between contextual profiles, and/or flat profiles, we have a problem: we don't know how to compare the counter values relative to each other. To that end, we add `ContextRoot::TotalEntries`, which is incremented every time a root is entered and serves as multiplier for the counter values collected under that root.
We expose this in the profile and leave the normalization to the user of the profile, for a few reasons:
* it's only needed if reasoning about all profiles in aggregate.
* the goal, in compiler_rt, is to flush out the profile as quickly as possible, and performing multiplications adds an overhead that may not even be necessary if the consumer of the profile doesn't care about combining profiles
* the information itself may be interesting as an indication of relative sampling of various contexts.
CFStringCreateWithBytes may not always appear on stack due to
optimizations. Create a wrapper function for the purposes of testing
suppression files that will always appear on stack for test stability.
rdar://144800068
Currently, the code model check is always performed even if there are no
globals, because:
1) the HWASan compiler pass always leaves a note
2) the HWASan runtime always performs the check if there is a HWASan
globals note.
This unnecessarily adds a 2**32 byte size limit.
This patch elides the check if the globals note doesn't actually contain
globals, thus allowing larger libraries to be successfully instrumented
without globals.
Sent from my iPhone
Follows the discussion here:
https://github.com/llvm/llvm-project/pull/129309
Recently, the test
`TestRtsan.AccessingALargeAtomicVariableDiesWhenRealtime` has been
failing on newer MacOS versions, because the internal locking mechanism
in `std::atomic<T>::load` (for types `T` that are larger than the
hardware lock-free limit), has changed to a function that wasn't being
intercepted by rtsan.
This PR introduces an interceptor for `_os_nospin_lock_lock`, which is
the new internal locking mechanism.
_Note: we'd probably do well to introduce interceptors for
`_os_nospin_lock_unlock` (and `os_unfair_lock_unlock`) too, which also
appear to have blocking implementations. This can follow in a separate
PR._
Collect flat profiles. We only do this for function activations that aren't otherwise collectible under a context root are encountered.
This allows us to reason about the full profile without concerning ourselves wether we are double-counting. For example we can combine (during profile use) flattened contextual profiles with flat profiles.
The condition to stop iterating so far was to look for load command cmd
field == 0. The iteration would continue past the commands area, and
would finally find lc->cmd ==0, if lucky. Or crash with bus error, if
out of luck.
Correcting this by limiting the number of iterations to the count
specified in mach_header(_64) ncmds field.
rdar://143903403
---------
Co-authored-by: Mariusz Borsa <m_borsa@apple.com>
The profile format has now a separate section called "Contexts" - there will be a corresponding one for flat profiles. The root has a separate tag because, in addition to not having a callsite ID as all the other context nodes have under it, it will have additional fields in subsequent patches.
The rest of this patch amounts to a bit of refactorings in the reader/writer (for better reuse later) and tests fixups.
Introduce a `ProfileWriter` abstraction to replace the callback passed to `__llvm_ctx_profile_fetch`. Subsequent changes will add support for flat profile collection (as in, collection of non-contextual profile for those functions not under a contextual root), which require also a change in the profile format. The abstraction makes it easy to add "write flat" - related capabilities without constantly complicating the signature of `__llvm_ctx_profile_fetch`.
This PR cleans up cast-overflow.cpp, more specifically:
1. avoid using undefined value as an exit code (old case `9`)
2. narrowing conversions are allowed to produce `inf` and they are
well-defined. Remove dead code (old case `8`)
3. the same applies to the conversion int -> float16. Remove dead code
(old case `7`)
See also
https://clang.llvm.org/docs/UndefinedBehaviorSanitizer.html#:~:text=%2Dfsanitize%3Dfloat%2Dcast,to%20integer%20types.
Currently ubsan doesn't properly detect UB on float16 -> int casts, I
have a fix for that (will send as a separate PR).
Fix#126541
Since ```t->Destroy``` cannot be called after ```start_routine```(When
calling standard thread_start in crt)
Intercept `ExitThread` and free the memory created by `VirtualAlloc'
Disabled in 2ab51bf13a1f6ca96823b755c036227dfd0892f9, doesn't hang for
me on AArch64 (Graviton 3, tested 1000 iterations). May still be an
issue, but hard to know unless we enable it again to find out.
n.b. test was also disabled on PowerPC in
467afc5f847f72221a42d9142c5b4733b44b52dc for same reason and it has also
been observed on x86:
https://lists.llvm.org/pipermail/llvm-dev/2016-January/094607.html
Fixes: https://github.com/llvm/llvm-project/issues/24763
With the goal of eventually being able to make `-Wreturn-type` default to an
error in all language modes, this is a follow-up to #123464 and updates even
more tests, mainly clang-tidy and clangd tests.
Updates JITLinkRedirectableSymbolManager to take alias flags into account when
setting the scope and linkage of the created stubs (weak aliases get now get weak
linkage, hidden stubs get hidden visibility).
Updates lazyReexports to propagate alias flags (rather than trampoline flags)
when building the initial destinations map for the redirectable symbols manager.
Together these changes allow the LazyObjectLinkingLayer to link objects
containing weak and hidden symbols.
This pull request fixes an issue that was introduced in #93365.
`__llvm_write_custom_profile` visibility was causing issues on Darwin.
This function needs to be publicly accessible in order to be accessed by
libomptarget, so this pull request makes `__llvm_write_custom_profile`
an explicitly exported symbol on Darwin. Tested on M3 and X86 macs.
In soft floating-point ABI, this function takes the double argument as a
pair of registers r0 and r1.
The ordering of these two registers follow the endianness rules,
therefore the register on which the bit flipping must happen depends on
the endianness.
This pull request is the second part of an ongoing effort to extends PGO
instrumentation to GPU device code and depends on #76587. This PR makes
the following changes:
- Introduces `__llvm_write_custom_profile` to PGO compiler-rt library.
This is an external function that can be used to write profiles with
custom data to target-specific files.
- Adds `__llvm_write_custom_profile` as weak symbol to libomptarget so
that it can write the collected data to a profraw file.
- Adds `PGODump` debug flag and only displays dump when the
aforementioned flag is set
Update the error message to be explicit that this is likely due to
memory corruption.
In addition, check if the chunk header is all zero, which could mean
corruption or an attempt to free a pointer after the memory has been
released to the kernel. This case results in a slightly different error
message to also indicate this could still be a double free.