The module currently stores the target triple as a string. This means
that any code that wants to actually use the triple first has to
instantiate a Triple, which is somewhat expensive. The change in #121652
caused a moderate compile-time regression due to this. While it would be
easy enough to work around, I think that architecturally, it makes more
sense to store the parsed Triple in the module, so that it can always be
directly queried.
For this change, I've opted not to add any magic conversions between
std::string and Triple for backwards-compatibilty purses, and instead
write out needed Triple()s or str()s explicitly. This is because I think
a decent number of them should be changed to work on Triple as well, to
avoid unnecessary conversions back and forth.
The only interesting part in this patch is that the default triple is
Triple("") instead of Triple() to preserve existing behavior. The former
defaults to using the ELF object format instead of unknown object
format. We should fix that as well.
ICP builds a symtab from the symbols in the module allowing mapping from
the VP metadata GUIDs to the Function. MemProf uses this same symtab
handling for its ICP during cloning. When symbols are added to the
symtab, the handling adds both a GUID computed from the function name,
or from the attached PGOFuncName metadata for locals, as well as a GUID
computed from the "canonicalized" name, which strips all "." suffixes
other than ".__uniq". This was originally meant to remove the ".llvm.*"
suffix added to promoted locals (done earlier in the ThinLTO backend).
In theory, it should no longer be needed as locals should have
PGOFuncName metadata.
However, this was causing a linker unsat, in code that used coroutines.
For an original coroutine function, there were several additional
functions created that had the same name, but different "." suffixes.
Therefore the canonical name for these additional functions had the same
GUID as that of the original function, leading to extra entries in the
symtab, and to selecting the wrong function for promotion. For regular
ICP this can happen, but is just a performance issue. However, for
memprof the promoted direct call calls a memprof clone, and because we
called the wrong function, in this case it didn't have a memprof clone
and we got a linker unsat.
We may be able to remove the canonical name handling for ICP in general,
but for now disable it for MemProf. At worst this could lead to not
finding a GUID in the symtab and not performing an ICP, so should be
conservatively correct.
This pull request is a revised version of #76587. This pull request
fixes some build issues that were present in the previous version of
this change.
> This pull request is the first part of an ongoing effort to extends
PGO instrumentation to GPU device code. This PR makes the following
changes:
>
> - Adds blank registration functions to device RTL
> - Gives PGO globals protected visibility when targeting a supported
GPU
> - Handles any addrspace casts for PGO calls
> - Implements PGO global extraction in GPU plugins (currently only
dumps info)
>
> These changes can be tested by supplying `-fprofile-instrument=clang`
while targeting a GPU.
This patch fixes another place in ProfileData where we have a pointer
to an array of InstrProfValueData and its length separately.
addValueData is a bit unique in that it remaps incoming values in
place before adding them to ValueSites. AFAICT, no caller of
addValueData uses updated incoming values. With this patch, we add
value data to ValueSites first and then remaps values there. This
way, we can take ArrayRef<InstrProfValueData> as a parameter.
I've migrated uses of the old version of getValueProfDataFromInst to
the one that returns SmallVector<InstrProfValueData, 4>. This patch
removes the old version.
Clang's `-fwhole-program-vtables` is required for this optimization to
take place. If `-fwhole-program-vtables` is not enabled, this change is
no-op.
* Function-comparison (before):
```
%vtable = load ptr, ptr %obj
%vfn = getelementptr inbounds ptr, ptr %vtable, i64 1
%func = load ptr, ptr %vfn
%cond = icmp eq ptr %func, @callee
br i1 %cond, label bb1, label bb2:
bb1:
call @callee
bb2:
call %func
```
* VTable-comparison (after):
```
%vtable = load ptr, ptr %obj
%cond = icmp eq ptr %vtable, @vtable-address-point
br i1 %cond, label bb1, label bb2:
bb1:
call @callee
bb2:
%vfn = getelementptr inbounds ptr, ptr %vtable, i64 1
%func = load ptr, ptr %vfn
call %func
```
Key changes:
1. Find out virtual calls and the vtables they come from.
- The ICP relies on type intrinsic `llvm.type.test` to find out virtual
calls and the
compatible vtables, and relies on type metadata to find the address
point for comparison.
2. ICP pass does cost-benefit analysis and compares vtable only when the
number of vtables for a function candidate is within (option specified)
threshold.
3. Sink the function addressing and vtable load instruction to indirect
fallback.
- The sink helper functions are simplified versions of
`InstCombinerImpl::tryToSinkInstruction`. Currently debug intrinsics are
not handled. Ideally `InstCombinerImpl::tryToSinkInstructionDbgValues`
and `InstCombinerImpl::tryToSinkInstructionDbgVariableRecords` could be
moved into Transforms/Utils/Local.cpp (or another util cpp file) to
handle debug intrinsics when moving instructions across basic blocks.
4. Keep value profiles updated
1) Update vtable value profiles after inline
2) For either function-based comparison or vtable-based comparison,
update both vtable and indirect call value profiles.
This pull request is the first part of an ongoing effort to extends PGO
instrumentation to GPU device code. This PR makes the following changes:
- Adds blank registration functions to device RTL
- Gives PGO globals protected visibility when targeting a supported GPU
- Handles any addrspace casts for PGO calls
- Implements PGO global extraction in GPU plugins (currently only dumps
info)
These changes can be tested by supplying `-fprofile-instrument=clang`
while targeting a GPU.
This patch adds a variant of getValueProfDataFromInst that returns
std::vector<InstrProfValueData> instead of
std::unique<InstrProfValueData[]>. The new return type carries the
length with it, so we can drop out parameter ActualNumValueData.
Also, the caller can directly feed the return value into a range-based
for loop as shown in the patch.
I'm planning to migrate other callers of getValueProfDataFromInst to
the new variant in follow-up patches.
This patch migrates uses of getValueForSite to getValueArrayForSite.
Each hunk is self-contained, meaning that each one can be applied
independently of the others.
In the unit test, there are cases where the array length check is
performed a lot earlier than the array content check. For now, I'm
leaving the length checks where they are. I'll consider moving them
when I migrate uses of getNumValueDataForSite to getValueArrayForSite
in a follow-up patch.
This patch changes the type of ValueData to
std::vector<InstrProfValueData> so that, in a follow-up patch, we can
teach getValueForSite to return ArrayRef<InstrProfValueData>.
Currently, a typical traversal over the value data looks like:
uint32_t NV = Func.getNumValueDataForSite(VK, I);
std::unique_ptr<InstrProfValueData[]> VD = Func.getValueForSite(VK, I);
for (uint32_t V = 0; V < NV; V++)
Do something with VD[V].Value and/or VD[V].Count;
Note that we need to call getNumValueDataForSite and getValueForSite
separately. If getValueForSite returns ArrayRef<InstrProfValueData>
in the future, then we'll be able to do something like:
for (const auto &V : Func.getValueForSite(VK, I))
Do something with V.Value and/or V.Count;
If ArrayRef<InstrProfValueData> directly points to ValueData, then
getValueForSite won't need to allocate memory with std::make_unique.
Now, switching to std::vector requires us to update several places:
- sortByTargetValues switches to llvm::sort because we don't need to
worry about sort stability.
- sortByCount retains sort stability because std::list::sort also
performs stable sort.
- merge builds another array and move it back to ValueData to avoid a
potential quadratic behavior with std::vector::insert into the
middle of a vector.
getValueForSite computes the total count -- the total number of times
a given value site is visited. The problem is that, excluding tests,
annotateValueSite is the only place that needs the total count.
This patch moves the total count computation to annotateValueSite.
This patch removes swapToHostOrder in favor of
llvm::support::endian::readNext as swapToHostOrder is too thin a
wrapper around readNext.
Note that there are two variants of readNext:
- readNext<type, endian, align>(ptr)
- readNext<type, align>(ptr, endian)
swapToHostOrder uses the former, but this patch switches to the latter.
While we are at it, this patch teaches readNext to default to
unaligned just as I did in:
commit 568368a43e5b4adb3c5d105a0eff3e0c13c0af8c
Author: Kazu Hirata <kazu@google.com>
Date: Mon Apr 15 19:05:30 2024 -0700
- In `Header::readFromBuffer`, read the buffer in the forward direction by using `readNext`.
- When compute the header size, spell out the constant.
With the changes above, we can remove `offsetOf` in InstrProf.cpp
---------
Co-authored-by: Kazu Hirata <kazu@google.com>
The `llvm-profdata order` command is used to compute a function order
using traces from the input profile. Add the `--num-test-traces` flag to
keep aside N traces to evalute this order. These test traces are assumed
to be the actual function execution order in some experiment. The output
is a number that represents how many page faults we got. Lower is
better.
I tested on a large profile I already had.
```
llvm-profdata order default.profdata --num-test-traces=30
# Ordered 149103 functions
# Total area under the page fault curve: 2.271827e+09
...
```
I also improved `TemporalProfTraceTy::createBPFunctionNodes()` in a few
ways:
* Simplified how `UN`s are computed
* Change how the initial `Node` order is computed
* Filter out rare and common `UN`s
* Output vector is an aliased argument instead of a return
These changes slightly improved the evaluation in my test.
```
llvm-profdata order default.profdata --num-test-traces=30
# Ordered 149103 functions
# Total area under the page fault curve: 2.268586e+09
...
```
I'm planning to remove StringRef::equals in favor of
StringRef::operator==.
- StringRef::operator==/!= outnumber StringRef::equals by a factor of
70 under llvm/ in terms of their usage.
- The elimination of StringRef::equals brings StringRef closer to
std::string_view, which has operator== but not equals.
- S == "foo" is more readable than S.equals("foo"), especially for
!Long.Expression.equals("str") vs Long.Expression != "str".
* The motivation is to move indirect callee profile update inside the
function-based speculative indirect-call promotion, so that there are
fewer diffs the vtable-based transformation and profile update is
implemented in a follow-up patch.
* The Parent patch is https://github.com/llvm/llvm-project/pull/79381
Add annotated vtable GUID as referenced variables in per function
summary, and update bitcode writer to create value-ids for these
referenced vtables.
- This is the part3 of type profiling work, and described in the "Virtual Table Definition Import" [1] section of the
RFC.
[1] https://github.com/llvm/llvm-project/pull/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
(The profile format change is split into a standalone change into https://github.com/llvm/llvm-project/pull/81691)
* For InstrFDO value profiling, implement instrumentation and lowering for virtual table address.
* This is controlled by `-enable-vtable-value-profiling` and off by default.
* When the option is on, raw profiles will carry serialized `VTableProfData` structs and compressed vtables as payloads.
* Implement profile reader and writer support
* Raw profile reader is used by `llvm-profdata` but not compiler. Raw profile reader will construct InstrProfSymtab with symbol names, and map profiled runtime address to vtable symbols.
* Indexed profile reader is used by `llvm-profdata` and compiler. When initialized, the reader stores a pointer to the beginning of in-memory compressed vtable names and the length of string. When used in `llvm-profdata`, reader decompress the string to show symbols of a profiled site. When used in compiler, string decompression doesn't
happen since IR is used to construct InstrProfSymtab.
* Indexed profile writer collects the list of vtable names, and stores that to index profiles.
* Text profile reader and writer support are added but mostly follow the implementation for indirect-call value type.
* `llvm-profdata show -show-vtables <args> <profile>` is implemented.
rfc in
https://discourse.llvm.org/t/rfc-dynamic-type-profiling-and-optimizations-in-llvm/74600#pick-instrumentation-points-and-instrument-runtime-types-7
New change on top of [reviewed
patch](https://github.com/llvm/llvm-project/pull/81691) are [in commits
after this
one](d0757f46b3).
Previous commits are restored from the remote branch with timestamps.
1. Fix build breakage for non-ELF platforms, by defining the missing
functions {`__llvm_profile_begin_vtables`, `__llvm_profile_end_vtables`,
`__llvm_profile_begin_vtabnames `, `__llvm_profile_end_vtabnames`}
everywhere.
* Tested on mac laptop (for darwins) and Windows. Specifically,
functions in `InstrProfilingPlatformWindows.c` returns `NULL` to make it
more explicit that type prof isn't supported; see comments for the
reason.
* For the rest (AIX, other), mostly follow existing examples (like this
[one](f95b2f1acf))
2. Rename `__llvm_prf_vtabnames` -> `__llvm_prf_vns` for shorter section
name, and make returned pointers
[const](a825d2a4ec (diff-4de780ce726d76b7abc9d3353aef95013e7b21e7bda01be8940cc6574fb0b5ffR120-R121))
**Original Description**
* Raw profile format
- Header: records the byte size of compressed vtable names, and the
number of profiled vtable entries (call it `VTableProfData`). Header
also records padded bytes of each section.
- Payload: adds a section for compressed vtable names, and a section to
store `VTableProfData`. Both sections are padded so the size is a
multiple of 8.
* Indexed profile format
- Header: records the byte offset of compressed vtable names.
- Payload: adds a section to store compressed vtable names. This section
is used by `llvm-profdata` to show the list of vtables profiled for an
instrumented site.
[The originally reviewed
patch](https://github.com/llvm/llvm-project/pull/66825) will have
profile reader/write change and llvm-profdata change.
- To ensure this PR has all the necessary profile format change along
with profile version bump, created a copy of the originally reviewed
patch in https://github.com/llvm/llvm-project/pull/80761. The copy
doesn't have profile format change, but it has the set of tests which
covers type profile generation, profile read and profile merge. Tests
pass there.
rfc in
https://discourse.llvm.org/t/rfc-dynamic-type-profiling-and-optimizations-in-llvm/74600
---------
Co-authored-by: modiking <modiking213@gmail.com>
* Raw profile format
- Header: records the byte size of compressed vtable names, and the
number of profiled vtable entries (call it `VTableProfData`). Header
also records padded bytes of each section.
- Payload: adds a section for compressed vtable names, and a section to
store `VTableProfData`. Both sections are padded so the size is a
multiple of 8.
* Indexed profile format
- Header: records the byte offset of compressed vtable names.
- Payload: adds a section to store compressed vtable names. This section
is used by `llvm-profdata` to show the list of vtables profiled for an
instrumented site.
[The originally reviewed
patch](https://github.com/llvm/llvm-project/pull/66825) will have
profile reader/write change and llvm-profdata change.
- To ensure this PR has all the necessary profile format change along
with profile version bump, created a copy of the originally reviewed
patch in https://github.com/llvm/llvm-project/pull/80761. The copy
doesn't have profile format change, but it has the set of tests which
covers type profile generation, profile read and profile merge. Tests
pass there.
rfc in
https://discourse.llvm.org/t/rfc-dynamic-type-profiling-and-optimizations-in-llvm/74600
---------
Co-authored-by: modiking <modiking213@gmail.com>
- Function getParsedIRPGOFuncName splits name by delimiter. The `[filename;]mangled-name` format could be generalized for non-function global values (e.g., vtables for type profiling). So rename the
function.
- Use kGlobalIdentifierDelimiter rather than semicolon directly for defragmentation.
Adding weights to utility nodes in BP so that we can give more
importance to
certain utilities. This is useful when we optimize several objectives
jointly.
When LargestTraceSize is a power of two, createBPFunctionNodes does not
allocate a group ID for Trace[LargestTraceSize-1] (as N is off by 1).
Fix
this and change floor+log2 to Log2_64.
BalancedPartitioning::bisect can use unstable sort because `Nodes`
contains distinct `InputOrderIndex`s.
BalancedPartitioning::runIterations: use one DenseMap and simplify the
node renumbering code.
Change the format of IRPGO counter names to
`[<filepath>;]<mangled-name>` which is computed by
`GlobalValue::getGlobalIdentifier()` to fix#74565.
In fe051934cbb0aaf25d960d7d45305135635d650b
(https://reviews.llvm.org/D156569) the format of IRPGO counter names was
changed to be `[<filepath>;]<linkage-name>` where `<linkage-name>` is
basically `F.getName()` with some prefix, e.g., `_` or `l_` on Mach-O
(yes, it is confusing that `<linkage-name>` is computed with
`Mangler().getNameWithPrefix()` while `<mangled-name>` is just
`F.getName()`). We discovered in #74565 that this causes some missed
import issues on some targets and #74008 is a partial fix.
Since `<mangled-name>` may not match the `<linkage-name>` on some
targets like Mach-O, we will need to post-process the output of
`llvm-profdata order` before passing to the linker via `-order_file`.
Profiles generated after fe051934cbb0aaf25d960d7d45305135635d650b will
become stale after this diff, but I think this is acceptable since that
patch landed after the LLVM 18 cut which hasn't been released yet.
Simplify the compiler-rt test to make it more general for different
platforms, and use `*DAG` matchers for lines that may be emitted
out-of-order.
- The compiler-rt test passed on a Windows machine. Previously name
matchers don't work for MSVC mangling
(https://lab.llvm.org/buildbot/#/builders/127/builds/59907)
- `*DAG` matchers fixed the error in
https://lab.llvm.org/buildbot/#/builders/94/builds/17924
This is the second reland and fixed errors caught in first reland
(https://github.com/llvm/llvm-project/pull/75860)
**Original commit message**
Commit fe05193 (phab D156569), IRPGO names uses format
`[<filepath>;]<linkage-name>` while prior format is
`[<filepath>:<mangled-name>`. The format change would break the use case
demonstrated in (updated)
`llvm/test/Transforms/PGOProfile/thinlto_indirect_call_promotion.ll` and
`compiler-rt/test/profile/instrprof-thinlto-indirect-call-promotion.cpp`
This patch changes `GlobalValues::getGlobalIdentifer` to use the
semicolon.
To elaborate on the scenario how things break without this PR
1. IRPGO raw profiles stores (compressed) IRPGO names of functions in
one section, and per-function profile data in another section. The
[NameRef](fc715e4cd9/compiler-rt/include/profile/InstrProfData.inc (L72))
field in per-function profile data is the MD5 hash of IRPGO names.
2. When raw profiles are converted to indexed format profiles, the
profiled address is
[mapped](fc715e4cd9/llvm/lib/ProfileData/InstrProf.cpp (L876-L885))
to the MD5 hash of the callee.
3. In `pgo-instr-use` thin-lto prelink pipeline, MD5 hash of IRPGO names
will be
[annotated](fc715e4cd9/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp (L1707))
as value profiles, and used to import indirect-call-prom candidates. If
the annotated MD5 hash is computed from the new format while import uses
the prior format, the callee cannot be imported.
*
`compiler-rt/test/profile/instrprof-thinlto-indirect-call-promotion.cpp`
is added to have an end-to-end test.
* `llvm/test/Transforms/PGOProfile/thinlto_indirect_call_promotion.ll`
is updated to have better test coverage from another aspect (as runtime
tests are more sensitive to the environment and may be skipped by some
contributors)
Fixed build-bot failures caught by post-submit tests
1) Add the list of command line tools needed by new compiler-rt test into dependency.
2) Use `starts_with` to replace deprecated `startswith`.
**Original commit message**
Commit fe05193 (phab D156569), IRPGO names uses format
`[<filepath>;]<linkage-name>` while prior format is
`[<filepath>:<mangled-name>`. The format change would break the use case
demonstrated in (updated)
`llvm/test/Transforms/PGOProfile/thinlto_indirect_call_promotion.ll` and
`compiler-rt/test/profile/instrprof-thinlto-indirect-call-promotion.cpp`
This patch changes `GlobalValues::getGlobalIdentifer` to use the
semicolon.
To elaborate on the scenario how things break without this PR
1. IRPGO raw profiles stores (compressed) IRPGO names of functions in
one section, and per-function profile data in another section. The
[NameRef](fc715e4cd9/compiler-rt/include/profile/InstrProfData.inc (L72))
field in per-function profile data is the MD5 hash of IRPGO names.
2. When raw profiles are converted to indexed format profiles, the
profiled address is
[mapped](fc715e4cd9/llvm/lib/ProfileData/InstrProf.cpp (L876-L885))
to the MD5 hash of the callee.
3. In `pgo-instr-use` thin-lto prelink pipeline, MD5 hash of IRPGO names
will be
[annotated](fc715e4cd9/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp (L1707))
as value profiles, and used to import indirect-call-prom candidates. If
the annotated MD5 hash is computed from the new format while import uses
the prior format, the callee cannot be imported.
*
`compiler-rt/test/profile/instrprof-thinlto-indirect-call-promotion.cpp`
is added to have an end-to-end test.
* `llvm/test/Transforms/PGOProfile/thinlto_indirect_call_promotion.ll`
is updated to have better test coverage from another aspect (as runtime
tests are more sensitive to the environment and may be skipped by some
contributors)
Commit fe05193 (phab D156569), IRPGO names uses format
`[<filepath>;]<linkage-name>` while prior format is
`[<filepath>:<mangled-name>`. The format change would break the use case
demonstrated in (updated)
`llvm/test/Transforms/PGOProfile/thinlto_indirect_call_promotion.ll` and
`compiler-rt/test/profile/instrprof-thinlto-indirect-call-promotion.cpp`
This patch changes `GlobalValues::getGlobalIdentifer` to use the
semicolon.
To elaborate on the scenario how things break without this PR
1. IRPGO raw profiles stores (compressed) IRPGO names of functions in
one section, and per-function profile data in another section. The
[NameRef](fc715e4cd9/compiler-rt/include/profile/InstrProfData.inc (L72))
field in per-function profile data is the MD5 hash of IRPGO names.
2. When raw profiles are converted to indexed format profiles, the
profiled address is
[mapped](fc715e4cd9/llvm/lib/ProfileData/InstrProf.cpp (L876-L885))
to the MD5 hash of the callee.
3. In `pgo-instr-use` thin-lto prelink pipeline, MD5 hash of IRPGO names
will be
[annotated](fc715e4cd9/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp (L1707))
as value profiles, and used to import indirect-call-prom candidates. If
the annotated MD5 hash is computed from the new format while import uses
the prior format, the callee cannot be imported.
*`compiler-rt/test/profile/instrprof-thinlto-indirect-call-promotion.cpp`
is added to have an end-to-end test.
* `llvm/test/Transforms/PGOProfile/thinlto_indirect_call_promotion.ll`
is updated to have better test coverage from another aspect (as runtime
tests are more sensitive to the environment and may be skipped by some
contributors)
## Motivation
Since we don't need the metadata sections at runtime, we can somehow
offload them from memory at runtime. Initially, I explored [debug info
correlation](https://discourse.llvm.org/t/instrprofiling-lightweight-instrumentation/59113),
which is used for PGO with value profiling disabled. However, it
currently only works with DWARF and it's be hard to add such artificial
debug info for every function in to CodeView which is used on Windows.
So, offloading profile metadata sections at runtime seems to be a
platform independent option.
## Design
The idea is to use new section names for profile name and data sections
and mark them as metadata sections. Under this mode, the new sections
are non-SHF_ALLOC in ELF. So, they are not loaded into memory at runtime
and can be stripped away as a post-linking step. After the process
exits, the generated raw profiles will contains only headers + counters.
llvm-profdata can be used correlate raw profiles with the unstripped
binary to generate indexed profile.
## Data
For chromium base_unittests with code coverage on linux, the binary size
overhead due to instrumentation reduced from 64M to 38.8M (39.4%) and
the raw profile files size reduce from 128M to 68M (46.9%)
```
$ bloaty out/cov/base_unittests.stripped -- out/no-cov/base_unittests.stripped
FILE SIZE VM SIZE
-------------- --------------
+121% +30.4Mi +121% +30.4Mi .text
[NEW] +14.6Mi [NEW] +14.6Mi __llvm_prf_data
[NEW] +10.6Mi [NEW] +10.6Mi __llvm_prf_names
[NEW] +5.86Mi [NEW] +5.86Mi __llvm_prf_cnts
+95% +1.75Mi +95% +1.75Mi .eh_frame
+108% +400Ki +108% +400Ki .eh_frame_hdr
+9.5% +211Ki +9.5% +211Ki .rela.dyn
+9.2% +95.0Ki +9.2% +95.0Ki .data.rel.ro
+5.0% +87.3Ki +5.0% +87.3Ki .rodata
[ = ] 0 +13% +47.0Ki .bss
+40% +1.78Ki +40% +1.78Ki .got
+12% +1.49Ki +12% +1.49Ki .gcc_except_table
[ = ] 0 +65% +1.23Ki .relro_padding
+62% +1.20Ki [ = ] 0 [Unmapped]
+13% +448 +19% +448 .init_array
+8.8% +192 [ = ] 0 [ELF Section Headers]
+0.0% +136 +0.0% +80 [7 Others]
+0.1% +96 +0.1% +96 .dynsym
+1.2% +96 +1.2% +96 .rela.plt
+1.5% +80 +1.2% +64 .plt
[ = ] 0 -99.2% -3.68Ki [LOAD #5 [RW]]
+195% +64.0Mi +194% +64.0Mi TOTAL
$ bloaty out/cov-cor/base_unittests.stripped -- out/no-cov/base_unittests.stripped
FILE SIZE VM SIZE
-------------- --------------
+121% +30.4Mi +121% +30.4Mi .text
[NEW] +5.86Mi [NEW] +5.86Mi __llvm_prf_cnts
+95% +1.75Mi +95% +1.75Mi .eh_frame
+108% +400Ki +108% +400Ki .eh_frame_hdr
+9.5% +211Ki +9.5% +211Ki .rela.dyn
+9.2% +95.0Ki +9.2% +95.0Ki .data.rel.ro
+5.0% +87.3Ki +5.0% +87.3Ki .rodata
[ = ] 0 +13% +47.0Ki .bss
+40% +1.78Ki +40% +1.78Ki .got
+12% +1.49Ki +12% +1.49Ki .gcc_except_table
+13% +448 +19% +448 .init_array
+0.1% +96 +0.1% +96 .dynsym
+1.2% +96 +1.2% +96 .rela.plt
+1.2% +64 +1.2% +64 .plt
+2.9% +64 [ = ] 0 [ELF Section Headers]
+0.0% +40 +0.0% +40 .data
+1.2% +32 +1.2% +32 .got.plt
+0.0% +24 +0.0% +8 [5 Others]
[ = ] 0 -22.9% -872 [LOAD #5 [RW]]
-74.5% -1.44Ki [ = ] 0 [Unmapped]
[ = ] 0 -76.5% -1.45Ki .relro_padding
+118% +38.8Mi +117% +38.8Mi TOTAL
```
A few things to note:
1. llvm-profdata doesn't support filter raw profiles by binary id yet,
so when a raw profile doesn't belongs to the binary being digested by
llvm-profdata, merging will fail. Once this is implemented,
llvm-profdata should be able to only merge raw profiles with the same
binary id as the binary and discard the rest (with mismatched/missing
binary id). The workflow I have in mind is to have scripts invoke
llvm-profdata to get all binary ids for all raw profiles, and
selectively choose the raw pnrofiles with matching binary id and the
binary to llvm-profdata for merging.
2. Note: In COFF, currently they are still loaded into memory but not
used. I didn't do it in this patch because I noticed that `.lcovmap` and
`.lcovfunc` are loaded into memory. A separate patch will address it.
3. This should works with PGO when value profiling is disabled as debug
info correlation currently doing, though I haven't tested this yet.
This patch replaces uses of StringRef::{starts,ends}with with
StringRef::{starts,ends}_with for consistency with
std::{string,string_view}::{starts,ends}_with in C++20.
I'm planning to deprecate and eventually remove
StringRef::{starts,ends}with.
Refactor this function to take a callback for each decoded string, rename it and change it to a static function in cpp. Move its (sole) caller definition from header to cpp.
- This is a split of patch https://github.com/llvm/llvm-project/pull/66825; to minimize the diff created in a big PR.
Part 1 of 3. This includes the LLVM back-end processing and profile
reading/writing components. compiler-rt changes are included.
Differential Revision: https://reviews.llvm.org/D138846
- For instance, `collectPGOFuncNameStrings` is reused a lot in https://github.com/llvm/llvm-project/pull/66825 to get the compressed vtable names; and in some added callsites it's just confusing to see 'func' since context clearly shows it's not. This function currently just takes a list of strings as input so name it to `collectGlobalObjectNameStrings`
- Do the rename in a standalone patch since the method is used in non-llvm codebase. It's easier to rollback this NFC in case rename in that codebase takes longer.