33 Commits

Author SHA1 Message Date
Mircea Trofin
e7aed23d32
[ctxprof] Handle instrumenting functions with musttail calls (#135121)
Functions with `musttail` calls can't be roots because we can't instrument their `ret` to release the context. This patch tags their `CtxRoot` field in their `FunctionData`. In compiler-rt we then know not to allow such functions become roots, and also not confuse `CtxRoot == 0x1` with there being a context root.

Currently we also lose the context tree under such cases. We can, in a subsequent patch, have the root detector search past these functions.
2025-04-14 10:01:25 -07:00
Mircea Trofin
b2dea4fd22
[ctxprof] root autodetection mechanism (#133147)
This is an optional mechanism that automatically detects roots. It's a best-effort mechanism, and its main goal is to *avoid* pointing at the message pump function as a root. This is the function that polls message queue(s) in an infinite loop, and is thus a bad root (it never exits).

High-level, when collection is requested - which should happen when a server has already been set up and handing requests - we spend a bit of time sampling all the server's threads. Each sample is a stack which we insert in a `PerThreadCallsiteTrie`. After a while, we run for each `PerThreadCallsiteTrie` the root detection logic. We then traverse all the `FunctionData`, find the ones matching the detected roots, and allocate a `ContextRoot` for them. From here, we special case `FunctionData` objects, in `__llvm_ctx_profile_get_context, that have a `CtxRoot` and route them to `__llvm_ctx_profile_start_context`.

For this to work, on the llvm side, we need to have all functions call `__llvm_ctx_profile_release_context` because they _might_ be roots. This comes at a slight (percentages) penalty during collection - which we can afford since the overall technique is ~5x faster than normal instrumentation. We can later explore conditionally enabling autoroot detection and avoiding this penalty, if desired. 

Note that functions that `musttail call` can't have their return instrumented this way, and a subsequent patch will harden the mechanism against this case.

The mechanism could be used in combination with explicit root specification, too.
2025-04-08 06:59:38 -07:00
Mircea Trofin
225f6ddb32
[ctxprof][nfc] Remove redundant SANITIZER_NO_THREAD_SAFETY_ANALYSIS (#133784)
With the refactoring in PR #133744, `__llvm_ctx_profile_start_context` doesn't need to be marked `SANITIZER_NO_THREAD_SAFETY_ANALYSIS` because `tryStartContextGivenRoot` (where the bulk of the logic went) is.
2025-03-31 12:49:54 -07:00
Mircea Trofin
b01e5b23dd
[ctxprof][nfc] Refactor __llvm_ctx_profile_start_context (#133744)
Most of the functionality will be reused with the auto-root detection mechanism (which is introduced subsequently in PR #133147).
2025-03-31 12:26:25 -07:00
Mircea Trofin
8e1d9f2d84
[ctxprof][nfc] Move 2 implementation functions up in CtxInstrProfiling.cpp (#133146) 2025-03-28 20:53:50 -07:00
Mircea Trofin
63bb0078f8
[ctxprof] Auto root detection: trie for stack samples (#133106)
An initial patch for supporting automated root detection. The auto-detector is introduced subsequently, but this patch introduces a datastructure for capturing sampled stacks, per thread, in a trie, and inferring from such samples which functions are reasonable roots.
2025-03-28 20:08:05 -07:00
Mircea Trofin
dd191d3d4f
[ctxprof][nfc] Share the definition of FunctionData between compiler-rt and llvm (#132136)
Mechanism to keep the compiler-rt and llvm view of `FunctionData` in sync. Since CtxInstrContextNode.h is exactly the same on both sides (there's an existing test, `compiler-rt/test/ctx_profile/TestCases/check-same-ctx-node.test`, checking that), we capture the structure in a macro that is then generated as `struct` fields on the compiler-rt side, and as `Type` objects on the llvm side. The macro needs to be told how to render a few kinds of fields. 

If we add more fields to FunctionData that can be described by the current known types of fields, then the llvm side would automatically be updated. If we need to add more kinds of fields, which we do by adding parameters to the macro, the llvm side (if not updated) would trigger a compilation error.
2025-03-20 12:48:18 -07:00
Mircea Trofin
0668bb28cc
[ctxprof] Track unhandled call targets (#131417)
Collect profiles for functions we encounter when collecting a contextual profile, that are not associated with a call site. This is expected to happen for signal handlers, but it also - problematically - currently happens for mem{memset|copy|move|set}, which are currently inserted after profile instrumentation.

Collecting a "regular" flat profile in these cases would hide the problem - that we loose better profile opportunities.
2025-03-19 13:51:22 -07:00
Mircea Trofin
1757a235e3
[ctxprof] Make ContextRoot an implementation detail (#131416)
`ContextRoot` `FunctionData` are currently known by the llvm side, which has to instantiate and zero-initialize them. 

This patch makes `FunctionData` the only global value that needs to be known and instantiated by the compiler. On the compiler-rt side, `ContextRoot`s are hung off `FunctionData`, when applicable.

This is for two reasons. First, it is a step towards root autodetection (in a subsequent patch). An autodetection mechanism would instantiate the `ContextRoot` for the detected roots, and then `__llvm_ctx_profile_get_context` would detect that and route to `__llvm_ctx_profile_start_context`.

The second reason is that we will hang off `ContextRoot` more complex datatypes (next patch), and we want to avoid too deep of a coupling between llvm and compiler-rt. Acting as a place to hang related data, `FunctionData` can stay simple - pointers and an (atomic) int (the mutex).
2025-03-18 22:03:26 -07:00
Mircea Trofin
b034905c82
[ctxprof] Capture sampling info for context roots (#131201)
When we collect a contextual profile, we sample the threads entering its root and only collect on one at a time (see `ContextRoot::Taken`). If we want to compare profiles between contextual profiles, and/or flat profiles, we have a problem: we don't know how to compare the counter values relative to each other. To that end, we add `ContextRoot::TotalEntries`, which is incremented every time a root is entered and serves as multiplier for the counter values collected under that root.

We expose this in the profile and leave the normalization to the user of the profile, for a few reasons:

* it's only needed if reasoning about all profiles in aggregate.
* the goal, in compiler_rt, is to flush out the profile as quickly as possible, and performing multiplications adds an overhead that may not even be necessary if the consumer of the profile doesn't care about combining profiles
* the information itself may be interesting as an indication of relative sampling of various contexts.
2025-03-14 21:10:22 -07:00
Mircea Trofin
8aa835c2b5
[ctxprof] Fix warnings post PR #130655 (#131198) 2025-03-13 12:54:48 -07:00
Mircea Trofin
07d86d25c9
[ctxprof] Flat profile collection (#130655)
Collect flat profiles. We only do this for function activations that aren't otherwise collectible under a context root are encountered. 

This allows us to reason about the full profile without concerning ourselves wether we are double-counting. For example we can combine (during profile use) flattened contextual profiles with flat profiles.
2025-03-12 07:47:58 -07:00
Mircea Trofin
5223ddd83f
[ctxprof] Prepare profile format for flat profiles (#129626)
The profile format has now a separate section called "Contexts" - there will be a corresponding one for flat profiles. The root has a separate tag because, in addition to not having a callsite ID as all the other context nodes have under it, it will have additional fields in subsequent patches.

The rest of this patch amounts to a bit of refactorings in the reader/writer (for better reuse later) and tests fixups.
2025-03-05 07:22:35 -08:00
Mircea Trofin
1b46db7776
[ctxprof] ProfileWriter abstraction (#129590)
Introduce a `ProfileWriter` abstraction to replace the callback passed to `__llvm_ctx_profile_fetch`. Subsequent changes will add support for flat profile collection (as in, collection of non-contextual profile for those functions not under a contextual root), which require also a change in the profile format. The abstraction makes it easy to add "write flat" - related capabilities without constantly complicating the signature of `__llvm_ctx_profile_fetch`.
2025-03-04 12:41:16 -08:00
c8ef
59770a4382
[NFC] Correct imprecise file location in the comment. (#115630) 2024-11-10 15:23:58 +08:00
c8ef
b57b3f6425
[NFC] Simple typo correction. (#114548) 2024-11-02 00:40:57 +08:00
Mircea Trofin
f32e5bdcef
[NFC] Rename the Nr abbreviation to Num (#107151)
It's more clear. (This isn't exhaustive).
2024-09-05 12:34:47 -07:00
NAKAMURA Takumi
9d15fc0060 Quick fix for a waning in clang_rt.ctx_profile [-Wgnu-anonymous-struct]
`__sanitizer_siginfo` has been introduced in D142117.
(llvmorg-16-init-17950-ged9ef9b4f248)
It is incompatible to -pedantic.

`clang_rt.ctx_profile` has been introduced in #92456.
2024-05-19 15:52:40 +09:00
NAKAMURA Takumi
f87ed54e49 Reformat 2024-05-19 15:51:47 +09:00
Mircea Trofin
58c778565c [nfc][ctx_profile] Fix printf - related -Wformat-pedantic 2024-05-18 08:46:26 -07:00
Mircea Trofin
cfe9deb135 Reapply "[ctx_profile] Integration test (#92456)"
This reverts commit 881f20e958e80bd30463fc57d2d3e891bcb8a571.

Passing -ldl -lpthread explicitly
2024-05-17 21:55:39 -07:00
Aiden Grossman
881f20e958 Revert "[ctx_profile] Integration test (#92456)"
This reverts commit 487d5af6482ea5f074c12d29d7e376d3fc697706.

This was causing failures on some buildbots.
https://lab.llvm.org/buildbot/#/builders/247/builds/18559
2024-05-17 23:59:28 +00:00
Mircea Trofin
487d5af648
[ctx_profile] Integration test (#92456)
Compile with clang a program that's instrumented for contextual profiling and verify a profile can be collected.
2024-05-17 11:08:14 -07:00
Mircea Trofin
77a59c3210
[ctx_profile] Fix signed-ness in CtxInstrProfilingTest.cpp
Follow-up from `265953c`
2024-05-10 11:27:44 -07:00
Mircea Trofin
265953cc26 [ctx_profile] Arena should zero-initialize its allocation area. 2024-05-10 10:43:22 -07:00
Mircea Trofin
0fd017ce43 [nfc][ctx_profile] Move CtxInstrContextNode.h in include
Follow-up from PR #91669.
2024-05-09 17:30:46 -07:00
Mircea Trofin
e4763ca83b
[ctx_profile] Pull ContextNode in a .inc file (#91669)
This pulls out `ContextNode` as we need to use it pretty much as-is to implement a writer. The writer will be implemented on the LLVM side because it takes a dependency on BitStreamWriter.

Since we can't reuse a header between compiler-rt and llvm, we use a header file which is copied on both sides, and test that the 2 copies are identical.

The changes adds the necessary other stuff for compiler-rt/ctx_profile testing.
2024-05-09 16:58:40 -07:00
Mircea Trofin
8755d24cb3 [compiler-rt][ctx_profile] Fix signed-ness warnings in test
Follow-up from PR ##89838. Some build bots warn-as-error
about signed/unsigned comparison in CtxInstrProfilingTest.

Example: https://lab.llvm.org/buildbot/#/builders/37/builds/34610
2024-05-07 23:27:54 -07:00
Mircea Trofin
ccf765cfd5
[compiler-rt][ctx_profile] Add the instrumented contextual profiling APIs (#89838)
APIs for contextual profiling. `ContextNode` is the call context-specific counter buffer. `ContextRoot` is associated to those functions that constitute roots into interesting call graphs, and is the object on which we hang off `Arena`s for allocating `ContextNode`s, as well as the `ContextNode` corresponding to such functions. Graphs of `ContextNode`s are accessible by one thread at a time.

(Tracking Issue: #89287, more details in the RFC referenced there)
2024-05-07 15:01:15 -07:00
Mircea Trofin
579efe011b Temporarily remove clang_rt.ctx_profile target
Trying to address the build failure on the `clang-ve-ninja`bot, which
appears hard to repro locally. The target isn't needed currently (there
are unit tests exercising the new functionality). Removing it for now
to green-ify the build bot.
2024-04-23 00:46:52 +02:00
Mircea Trofin
a3e7a125e1 Reapply "[compiler-rt][ctx_instr] Add ctx_profile component" (#89625)
This reverts commit 8b2ba6a144e728ee4116e2804e9b5aed8824e726.

The uild errors (see below) were likely due to the same issue PR #88074 fixed. Addressed by following that PR.

https://lab.llvm.org/buildbot/#/builders/165/builds/52789
https://lab.llvm.org/buildbot/#/builders/91/builds/25273
2024-04-22 22:33:09 +02:00
Mircea Trofin
8b2ba6a144
Revert "[compiler-rt][ctx_instr] Add ctx_profile component" (#89625)
Reverts llvm/llvm-project#89304

Some build bot failures - will fix and reland.

Example: https://lab.llvm.org/buildbot/#/builders/165/builds/52789
2024-04-22 09:35:49 -07:00
Mircea Trofin
6ad22c879a
[compiler-rt][ctx_instr] Add ctx_profile component (#89304)
Add the component structure for contextual instrumented PGO and the bump allocator + test.

(Tracking Issue: #89287, RFC referenced there)
2024-04-22 09:24:22 -07:00