292 Commits

Author SHA1 Message Date
Nikita Popov
979c275097
[IR] Store Triple in Module (NFC) (#129868)
The module currently stores the target triple as a string. This means
that any code that wants to actually use the triple first has to
instantiate a Triple, which is somewhat expensive. The change in #121652
caused a moderate compile-time regression due to this. While it would be
easy enough to work around, I think that architecturally, it makes more
sense to store the parsed Triple in the module, so that it can always be
directly queried.

For this change, I've opted not to add any magic conversions between
std::string and Triple for backwards-compatibilty purses, and instead
write out needed Triple()s or str()s explicitly. This is because I think
a decent number of them should be changed to work on Triple as well, to
avoid unnecessary conversions back and forth.

The only interesting part in this patch is that the default triple is
Triple("") instead of Triple() to preserve existing behavior. The former
defaults to using the ELF object format instead of unknown object
format. We should fix that as well.
2025-03-06 10:27:47 +01:00
Kazu Hirata
e264b0e856
[ProfileData] Avoid repeated hash lookups (NFC) (#128829) 2025-02-26 00:57:28 -08:00
Teresa Johnson
594e11ce42
[MemProf] Avoid incorrect ICP symtab canonicalization (#115419)
ICP builds a symtab from the symbols in the module allowing mapping from
the VP metadata GUIDs to the Function. MemProf uses this same symtab
handling for its ICP during cloning. When symbols are added to the
symtab, the handling adds both a GUID computed from the function name,
or from the attached PGOFuncName metadata for locals, as well as a GUID
computed from the "canonicalized" name, which strips all "." suffixes
other than ".__uniq". This was originally meant to remove the ".llvm.*"
suffix added to promoted locals (done earlier in the ThinLTO backend).
In theory, it should no longer be needed as locals should have
PGOFuncName metadata.

However, this was causing a linker unsat, in code that used coroutines.
For an original coroutine function, there were several additional
functions created that had the same name, but different "." suffixes.
Therefore the canonical name for these additional functions had the same
GUID as that of the original function, leading to extra entries in the
symtab, and to selecting the wrong function for promotion. For regular
ICP this can happen, but is just a performance issue. However, for
memprof the promoted direct call calls a memprof clone, and because we
called the wrong function, in this case it didn't have a memprof clone
and we got a linker unsat.

We may be able to remove the canonical name handling for ICP in general,
but for now disable it for MemProf. At worst this could lead to not
finding a GUID in the symtab and not performing an ICP, so should be
conservatively correct.
2024-11-07 21:00:42 -08:00
Ethan Luis McDonough
fde2d23ee2
[PGO][OpenMP] Instrumentation for GPU devices (Revision of #76587) (#102691)
This pull request is a revised version of #76587. This pull request
fixes some build issues that were present in the previous version of
this change.

> This pull request is the first part of an ongoing effort to extends
PGO instrumentation to GPU device code. This PR makes the following
changes:
>
> - Adds blank registration functions to device RTL
> - Gives PGO globals protected visibility when targeting a supported
GPU
> - Handles any addrspace casts for PGO calls
> - Implements PGO global extraction in GPU plugins (currently only
dumps info)
>
> These changes can be tested by supplying `-fprofile-instrument=clang`
while targeting a GPU.
2024-08-22 01:10:54 -05:00
Kazu Hirata
6c8ff4cbb8
[ProfileData] Take ArrayRef<InstrProfValueData> in addValueData (NFC) (#97363)
This patch fixes another place in ProfileData where we have a pointer
to an array of InstrProfValueData and its length separately.

addValueData is a bit unique in that it remaps incoming values in
place before adding them to ValueSites.  AFAICT, no caller of
addValueData uses updated incoming values.  With this patch, we add
value data to ValueSites first and then remaps values there.  This
way, we can take ArrayRef<InstrProfValueData> as a parameter.
2024-07-11 16:38:44 -07:00
Mircea Trofin
afbd7d1e7c
[NFC] Coding style: drop k in kGlobalIdentifierDelimiter (#98230) 2024-07-09 15:44:55 -07:00
Mircea Trofin
e291f31f89
[NFC] Coding style fixes in InstrProf.cpp (#98211) 2024-07-09 13:28:35 -07:00
Kazu Hirata
b8eaa5bb10
[ProfileData] Remove the old version of getValueProfDataFromInst (#97374)
I've migrated uses of the old version of getValueProfDataFromInst to
the one that returns SmallVector<InstrProfValueData, 4>.  This patch
removes the old version.
2024-07-02 11:46:31 -07:00
Mingming Liu
1518b260ce
[TypeProf][InstrFDO]Implement more efficient comparison sequence for indirect-call-promotion with vtable profiles. (#81442)
Clang's `-fwhole-program-vtables` is required for this optimization to
take place. If `-fwhole-program-vtables` is not enabled, this change is
no-op.
    
* Function-comparison (before):

```
%vtable = load ptr, ptr %obj
%vfn = getelementptr inbounds ptr, ptr %vtable, i64 1
%func = load ptr, ptr %vfn
%cond = icmp eq ptr %func, @callee
br i1 %cond, label bb1, label bb2:

bb1:
   call @callee

bb2:
   call %func
```

* VTable-comparison (after):

```
%vtable = load ptr, ptr %obj
%cond = icmp eq ptr %vtable, @vtable-address-point
br i1 %cond, label bb1, label bb2:

bb1:
   call @callee

bb2:
  %vfn = getelementptr inbounds ptr, ptr %vtable, i64 1
  %func = load ptr, ptr %vfn
  call %func
```
    
Key changes:
1. Find out virtual calls and the vtables they come from.
- The ICP relies on type intrinsic `llvm.type.test` to find out virtual
calls and the
compatible vtables, and relies on type metadata to find the address
point for comparison.
2. ICP pass does cost-benefit analysis and compares vtable only when the
number of vtables for a function candidate is within (option specified)
threshold.
3. Sink the function addressing and vtable load instruction to indirect
fallback.
- The sink helper functions are simplified versions of
`InstCombinerImpl::tryToSinkInstruction`. Currently debug intrinsics are
not handled. Ideally `InstCombinerImpl::tryToSinkInstructionDbgValues`
and `InstCombinerImpl::tryToSinkInstructionDbgVariableRecords` could be
moved into Transforms/Utils/Local.cpp (or another util cpp file) to
handle debug intrinsics when moving instructions across basic blocks.
4. Keep value profiles updated
     1) Update vtable value profiles after inline
     2) For either function-based comparison or vtable-based comparison,
          update both vtable and indirect call value profiles.
2024-06-29 23:21:33 -07:00
Ethan Luis McDonough
2c8b912f63
Revert "[PGO][OpenMP] Instrumentation for GPU devices (#76587)"
This reverts commit 5fd2af38e461445c583d7ffc2fe23858966eee76. It caused build issues and broke the buildbot.
2024-06-28 12:30:45 -05:00
Ethan Luis McDonough
5fd2af38e4
[PGO][OpenMP] Instrumentation for GPU devices (#76587)
This pull request is the first part of an ongoing effort to extends PGO
instrumentation to GPU device code. This PR makes the following changes:

- Adds blank registration functions to device RTL
- Gives PGO globals protected visibility when targeting a supported GPU
- Handles any addrspace casts for PGO calls
- Implements PGO global extraction in GPU plugins (currently only dumps
info)

These changes can be tested by supplying `-fprofile-instrument=clang`
while targeting a GPU.
2024-06-28 10:42:19 -05:00
Kazu Hirata
b0ae923ada
[ProfileData] Add a variant of getValueProfDataFromInst (#95993)
This patch adds a variant of getValueProfDataFromInst that returns
std::vector<InstrProfValueData> instead of
std::unique<InstrProfValueData[]>.  The new return type carries the
length with it, so we can drop out parameter ActualNumValueData.
Also, the caller can directly feed the return value into a range-based
for loop as shown in the patch.

I'm planning to migrate other callers of getValueProfDataFromInst to
the new variant in follow-up patches.
2024-06-22 00:40:36 -07:00
Kazu Hirata
beba2e7385
[ProfileData] Teach addValueData to honor parameter Site (#96233)
This patch teaches addValueData to honor Site for verification
purposes.  It does not affect the profile data in any manner.
2024-06-20 22:25:19 -07:00
Kazu Hirata
d6b0b7acf3
[ProfileData] Remove getValueProfDataFromInst (#95617)
I've migrated all uses to the new version of getValueProfDataFromInst
that returns std::unique_ptr<InstrProfValueData[]>.
2024-06-17 18:50:08 -07:00
Kazu Hirata
9ad102f03b
[ProfileData] Migrate to getValueArrayForSite (#95493)
This patch migrates uses of getValueForSite to getValueArrayForSite.
Each hunk is self-contained, meaning that each one can be applied
independently of the others.

In the unit test, there are cases where the array length check is
performed a lot earlier than the array content check.  For now, I'm
leaving the length checks where they are.  I'll consider moving them
when I migrate uses of getNumValueDataForSite to getValueArrayForSite
in a follow-up patch.
2024-06-14 06:38:48 -07:00
Kazu Hirata
31440738bd
[ProfileData] Use std::vector for ValueData (NFC) (#95194)
This patch changes the type of ValueData to
std::vector<InstrProfValueData> so that, in a follow-up patch, we can
teach getValueForSite to return ArrayRef<InstrProfValueData>.

Currently, a typical traversal over the value data looks like:

  uint32_t NV = Func.getNumValueDataForSite(VK, I);
std::unique_ptr<InstrProfValueData[]> VD = Func.getValueForSite(VK, I);
  for (uint32_t V = 0; V < NV; V++)
    Do something with VD[V].Value and/or VD[V].Count;

Note that we need to call getNumValueDataForSite and getValueForSite
separately.  If getValueForSite returns ArrayRef<InstrProfValueData>
in the future, then we'll be able to do something like:

  for (const auto &V : Func.getValueForSite(VK, I))
    Do something with V.Value and/or V.Count;

If ArrayRef<InstrProfValueData> directly points to ValueData, then
getValueForSite won't need to allocate memory with std::make_unique.

Now, switching to std::vector requires us to update several places:

- sortByTargetValues switches to llvm::sort because we don't need to
  worry about sort stability.

- sortByCount retains sort stability because std::list::sort also
  performs stable sort.

- merge builds another array and move it back to ValueData to avoid a
  potential quadratic behavior with std::vector::insert into the
  middle of a vector.
2024-06-12 11:22:49 -07:00
Kazu Hirata
00fa3fbfb8
[ProfileData] Compute sum in annotateValueSite (NFC) (#95199)
getValueForSite computes the total count -- the total number of times
a given value site is visited.  The problem is that, excluding tests,
annotateValueSite is the only place that needs the total count.

This patch moves the total count computation to annotateValueSite.
2024-06-12 10:14:33 -07:00
Kazu Hirata
bfa937a487
[ProfileData] Add const to a few places (NFC) (#94803) 2024-06-07 15:06:04 -07:00
Kazu Hirata
7476c20c48
[ProfileData] Remove swapToHostOrder (#94665)
This patch removes swapToHostOrder in favor of
llvm::support::endian::readNext as swapToHostOrder is too thin a
wrapper around readNext.

Note that there are two variants of readNext:

- readNext<type, endian, align>(ptr)
- readNext<type, align>(ptr, endian)

swapToHostOrder uses the former, but this patch switches to the latter.

While we are at it, this patch teaches readNext to default to
unaligned just as I did in:

  commit 568368a43e5b4adb3c5d105a0eff3e0c13c0af8c
  Author: Kazu Hirata <kazu@google.com>
  Date:   Mon Apr 15 19:05:30 2024 -0700
2024-06-06 13:25:52 -07:00
Mingming Liu
c803c29039
[nfc][InstrProf]Remove 'offsetOf' when parsing indexed profiles (#93346)
- In `Header::readFromBuffer`, read the buffer in the forward direction by using `readNext`.
- When compute the header size, spell out the constant.

With the changes above, we can remove `offsetOf` in InstrProf.cpp

---------

Co-authored-by: Kazu Hirata <kazu@google.com>
2024-05-30 12:44:29 -07:00
Mingming Liu
737a3018e8
[nfc][InstrFDO] Add Header::getIndexedProfileVersion and use it to decide profile version. (#93613)
This is a split of https://github.com/llvm/llvm-project/pull/93346 as
discussed.
2024-05-29 10:15:17 -07:00
Ellis Hoag
73eb9b3314
[InstrProf] Evaluate function order using test traces (#92451)
The `llvm-profdata order` command is used to compute a function order
using traces from the input profile. Add the `--num-test-traces` flag to
keep aside N traces to evalute this order. These test traces are assumed
to be the actual function execution order in some experiment. The output
is a number that represents how many page faults we got. Lower is
better.

I tested on a large profile I already had.
```
llvm-profdata order default.profdata --num-test-traces=30
# Ordered 149103 functions
# Total area under the page fault curve: 2.271827e+09
...
```

I also improved `TemporalProfTraceTy::createBPFunctionNodes()` in a few
ways:
* Simplified how `UN`s are computed
* Change how the initial `Node` order is computed
* Filter out rare and common `UN`s
* Output vector is an aliased argument instead of a return

These changes slightly improved the evaluation in my test.
```
llvm-profdata order default.profdata --num-test-traces=30
# Ordered 149103 functions
# Total area under the page fault curve: 2.268586e+09
...
```
2024-05-23 11:19:29 -07:00
Mingming Liu
b66779b5bf
[nfc][InstrProfReader]Store header fields in native endianness (#92947)
- Use `Header.Version` directly and remove Header::formatVersion

---------

Co-authored-by: Kazu Hirata <kazu@google.com>
2024-05-21 21:25:12 -07:00
Mingming Liu
98c1ba460a
[InstrProf] Add vtables with type metadata into symtab (#81051)
The indirect-call-promotion pass will look up the vtable to find out
the virtual function [1],
and add vtable-derived information in icall
candidate [2] for cost-benefit analysis.

[1] https://github.com/llvm/llvm-project/pull/81442/files#diff-a95d1ac8a0da69713fcb3346135d4b219f0a73920318d2549495620ea215191bR395-R416
[2] https://github.com/llvm/llvm-project/pull/81442/files#diff-a95d1ac8a0da69713fcb3346135d4b219f0a73920318d2549495620ea215191bR195-R199
2024-05-09 10:41:23 -07:00
Kazu Hirata
bb6df0804b
[llvm] Use StringRef::operator== instead of StringRef::equals (NFC) (#91441)
I'm planning to remove StringRef::equals in favor of
StringRef::operator==.

- StringRef::operator==/!= outnumber StringRef::equals by a factor of
  70 under llvm/ in terms of their usage.

- The elimination of StringRef::equals brings StringRef closer to
  std::string_view, which has operator== but not equals.

- S == "foo" is more readable than S.equals("foo"), especially for
  !Long.Expression.equals("str") vs Long.Expression != "str".
2024-05-08 10:33:53 -07:00
Kazu Hirata
f430e37446
[llvm] Drop unaligned from calls to readNext (NFC) (#88841)
Now readNext defaults to unaligned accesses.  This patch drops
unaligned to improve readability.
2024-04-16 12:47:02 -07:00
Mingming Liu
08e210c6af
[NFC][IndirectCallProm] Refactor function-based conditional devirtualization and indirect call value profile update into one helper function (#80762)
* The motivation is to move indirect callee profile update inside the
function-based speculative indirect-call promotion, so that there are
fewer diffs the vtable-based transformation and profile update is
implemented in a follow-up patch.
* The Parent patch is https://github.com/llvm/llvm-project/pull/79381
2024-04-11 13:28:20 -07:00
Mingming Liu
1e15371dd8
[ThinLTO][TypeProf] Implement vtable def import (#79381)
Add annotated vtable GUID as referenced variables in per function
summary, and update bitcode writer to create value-ids for these
referenced vtables.

- This is the part3 of type profiling work, and described in the "Virtual Table Definition Import" [1] section of the
RFC.

[1] https://github.com/llvm/llvm-project/pull/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
2024-04-01 15:14:49 -07:00
Mingming Liu
1351d17826
[InstrFDO][TypeProf] Implement binary instrumentation and profile read/write (#66825)
(The profile format change is split into a standalone change into https://github.com/llvm/llvm-project/pull/81691)

* For InstrFDO value profiling, implement instrumentation and lowering for virtual table address.
* This is controlled by `-enable-vtable-value-profiling` and off by default.
* When the option is on, raw profiles will carry serialized `VTableProfData` structs and compressed vtables as payloads.
 
* Implement profile reader and writer support 
  * Raw profile reader is used by `llvm-profdata` but not compiler. Raw profile reader will construct InstrProfSymtab with symbol names, and map profiled runtime address to vtable symbols.
  * Indexed profile reader is used by `llvm-profdata` and compiler. When initialized, the reader stores a pointer to the beginning of in-memory compressed vtable names and the length of string. When used in `llvm-profdata`, reader decompress the string to show symbols of a profiled site. When used in compiler, string decompression doesn't
happen since IR is used to construct InstrProfSymtab.
  * Indexed profile writer collects the list of vtable names, and stores that to index profiles.
  * Text profile reader and writer support are added but mostly follow the implementation for indirect-call value type.
* `llvm-profdata show -show-vtables <args> <profile>` is implemented.

rfc in
https://discourse.llvm.org/t/rfc-dynamic-type-profiling-and-optimizations-in-llvm/74600#pick-instrumentation-points-and-instrument-runtime-types-7
2024-04-01 08:52:35 -07:00
wanglei
f439c71373 [InstrProf][NFC] Fix -Wimplicit-fallthrough warning in InstrProf.cpp after #82711 2024-03-06 10:20:30 +08:00
Mingming Liu
16e74fd489
Reland "[TypeProf][InstrPGO] Introduce raw and instr profile format change for type profiling." (#82711)
New change on top of [reviewed
patch](https://github.com/llvm/llvm-project/pull/81691) are [in commits
after this
one](d0757f46b3).
Previous commits are restored from the remote branch with timestamps.

1. Fix build breakage for non-ELF platforms, by defining the missing
functions {`__llvm_profile_begin_vtables`, `__llvm_profile_end_vtables`,
`__llvm_profile_begin_vtabnames `, `__llvm_profile_end_vtabnames`}
everywhere.
* Tested on mac laptop (for darwins) and Windows. Specifically,
functions in `InstrProfilingPlatformWindows.c` returns `NULL` to make it
more explicit that type prof isn't supported; see comments for the
reason.
* For the rest (AIX, other), mostly follow existing examples (like this
[one](f95b2f1acf))
   
2. Rename `__llvm_prf_vtabnames` -> `__llvm_prf_vns` for shorter section
name, and make returned pointers
[const](a825d2a4ec (diff-4de780ce726d76b7abc9d3353aef95013e7b21e7bda01be8940cc6574fb0b5ffR120-R121))

**Original Description**

* Raw profile format
- Header: records the byte size of compressed vtable names, and the
number of profiled vtable entries (call it `VTableProfData`). Header
also records padded bytes of each section.
- Payload: adds a section for compressed vtable names, and a section to
store `VTableProfData`. Both sections are padded so the size is a
multiple of 8.
* Indexed profile format
  - Header: records the byte offset of compressed vtable names.
- Payload: adds a section to store compressed vtable names. This section
is used by `llvm-profdata` to show the list of vtables profiled for an
instrumented site.
  
[The originally reviewed
patch](https://github.com/llvm/llvm-project/pull/66825) will have
profile reader/write change and llvm-profdata change.
- To ensure this PR has all the necessary profile format change along
with profile version bump, created a copy of the originally reviewed
patch in https://github.com/llvm/llvm-project/pull/80761. The copy
doesn't have profile format change, but it has the set of tests which
covers type profile generation, profile read and profile merge. Tests
pass there.
  
rfc in
https://discourse.llvm.org/t/rfc-dynamic-type-profiling-and-optimizations-in-llvm/74600

---------

Co-authored-by: modiking <modiking213@gmail.com>
2024-02-27 11:07:40 -08:00
Mingming Liu
0e8d1877cd
Revert type profiling change as compiler-rt test break on Windows. (#82583)
Examples
https://lab.llvm.org/buildbot/#/builders/127/builds/62532/steps/8/logs/stdio
2024-02-21 21:41:33 -08:00
Mingming Liu
db7e9e6841
[TypeProf][InstrPGO] Introduce raw and instr profile format change for type profiling. (#81691)
* Raw profile format
- Header: records the byte size of compressed vtable names, and the
number of profiled vtable entries (call it `VTableProfData`). Header
also records padded bytes of each section.
- Payload: adds a section for compressed vtable names, and a section to
store `VTableProfData`. Both sections are padded so the size is a
multiple of 8.
* Indexed profile format
  - Header: records the byte offset of compressed vtable names.
- Payload: adds a section to store compressed vtable names. This section
is used by `llvm-profdata` to show the list of vtables profiled for an
instrumented site.
  
[The originally reviewed
patch](https://github.com/llvm/llvm-project/pull/66825) will have
profile reader/write change and llvm-profdata change.
- To ensure this PR has all the necessary profile format change along
with profile version bump, created a copy of the originally reviewed
patch in https://github.com/llvm/llvm-project/pull/80761. The copy
doesn't have profile format change, but it has the set of tests which
covers type profile generation, profile read and profile merge. Tests
pass there.
  
rfc in
https://discourse.llvm.org/t/rfc-dynamic-type-profiling-and-optimizations-in-llvm/74600

---------

Co-authored-by: modiking <modiking213@gmail.com>
2024-02-21 20:59:42 -08:00
Mingming Liu
2422e969bf
[NFC][InstrProf]Factor out getCanonicalName to compute the canonical name given a pgo name. (#81547)
- Also update the `InstrProf::addFuncWithName` to call the newly added
`getCanonicalName`.
2024-02-13 10:49:35 -08:00
Mingming Liu
05091aa3ac
[NFC][InstrProf]Generalize getParsedIRPGOFuncName to getParsedIRPGOName (#81054)
- Function getParsedIRPGOFuncName splits name by delimiter. The `[filename;]mangled-name` format could be generalized for non-function global values (e.g., vtables for type profiling). So rename the
function.
- Use kGlobalIdentifierDelimiter rather than semicolon directly for defragmentation.
2024-02-07 20:03:44 -08:00
spupyrev
30aa9fb4c1 Revert "[InstrProf] Adding utility weights to BalancedPartitioning (#72717)"
This reverts commit 5954b9dca21bb0c69b9e991b2ddb84c8b05ecba3
due to broken Windows build
2024-01-19 15:13:47 -08:00
spupyrev
5954b9dca2
[InstrProf] Adding utility weights to BalancedPartitioning (#72717)
Adding weights to utility nodes in BP so that we can give more
importance to
certain utilities. This is useful when we optimize several objectives
jointly.
2024-01-19 13:36:59 -08:00
Fangrui Song
0c6dc80531
BalancedPartitioning: minor updates (#77568)
When LargestTraceSize is a power of two, createBPFunctionNodes does not
allocate a group ID for Trace[LargestTraceSize-1] (as N is off by 1).
Fix
this and change floor+log2 to Log2_64.

BalancedPartitioning::bisect can use unstable sort because `Nodes`
contains distinct `InputOrderIndex`s.

BalancedPartitioning::runIterations: use one DenseMap and simplify the
node renumbering code.
2024-01-17 10:46:34 -08:00
Ellis Hoag
9a2df55f47
[InstrProf] No linkage prefixes in IRPGO names (#76994)
Change the format of IRPGO counter names to
`[<filepath>;]<mangled-name>` which is computed by
`GlobalValue::getGlobalIdentifier()` to fix #74565.

In fe051934cbb0aaf25d960d7d45305135635d650b
(https://reviews.llvm.org/D156569) the format of IRPGO counter names was
changed to be `[<filepath>;]<linkage-name>` where `<linkage-name>` is
basically `F.getName()` with some prefix, e.g., `_` or `l_` on Mach-O
(yes, it is confusing that `<linkage-name>` is computed with
`Mangler().getNameWithPrefix()` while `<mangled-name>` is just
`F.getName()`). We discovered in #74565 that this causes some missed
import issues on some targets and #74008 is a partial fix.

Since `<mangled-name>` may not match the `<linkage-name>` on some
targets like Mach-O, we will need to post-process the output of
`llvm-profdata order` before passing to the linker via `-order_file`.

Profiles generated after fe051934cbb0aaf25d960d7d45305135635d650b will
become stale after this diff, but I think this is acceptable since that
patch landed after the LLVM 18 cut which hasn't been released yet.
2024-01-04 16:13:57 -08:00
Mingming Liu
78a195e100
Reland the reland "[PGO][GlobalValue][LTO]In GlobalValues::getGlobalIdentifier, use semicolon as delimiter for local-linkage varibles. " (#75954)
Simplify the compiler-rt test to make it more general for different
platforms, and use `*DAG` matchers for lines that may be emitted
out-of-order.
- The compiler-rt test passed on a Windows machine. Previously name
matchers don't work for MSVC mangling
(https://lab.llvm.org/buildbot/#/builders/127/builds/59907)
- `*DAG` matchers fixed the error in
https://lab.llvm.org/buildbot/#/builders/94/builds/17924

This is the second reland and fixed errors caught in first reland
(https://github.com/llvm/llvm-project/pull/75860)

**Original commit message**
Commit fe05193 (phab D156569), IRPGO names uses format
`[<filepath>;]<linkage-name>` while prior format is
`[<filepath>:<mangled-name>`. The format change would break the use case
demonstrated in (updated)
`llvm/test/Transforms/PGOProfile/thinlto_indirect_call_promotion.ll` and
`compiler-rt/test/profile/instrprof-thinlto-indirect-call-promotion.cpp`

This patch changes `GlobalValues::getGlobalIdentifer` to use the
semicolon.

To elaborate on the scenario how things break without this PR
1. IRPGO raw profiles stores (compressed) IRPGO names of functions in
one section, and per-function profile data in another section. The
[NameRef](fc715e4cd9/compiler-rt/include/profile/InstrProfData.inc (L72))
field in per-function profile data is the MD5 hash of IRPGO names.
2. When raw profiles are converted to indexed format profiles, the
profiled address is
[mapped](fc715e4cd9/llvm/lib/ProfileData/InstrProf.cpp (L876-L885))
to the MD5 hash of the callee.
3. In `pgo-instr-use` thin-lto prelink pipeline, MD5 hash of IRPGO names
will be
[annotated](fc715e4cd9/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp (L1707))
as value profiles, and used to import indirect-call-prom candidates. If
the annotated MD5 hash is computed from the new format while import uses
the prior format, the callee cannot be imported.

*
`compiler-rt/test/profile/instrprof-thinlto-indirect-call-promotion.cpp`
is added to have an end-to-end test.
* `llvm/test/Transforms/PGOProfile/thinlto_indirect_call_promotion.ll`
is updated to have better test coverage from another aspect (as runtime
tests are more sensitive to the environment and may be skipped by some
contributors)
2023-12-19 12:25:56 -08:00
Mingming Liu
6ce23ea0ab
Revert "Reland "[PGO][GlobalValue][LTO]In GlobalValues::getGlobalIdentifier, use semicolon as delimiter for local-linkage varibles. "" (#75888)
Reverts llvm/llvm-project#75860
- Mangled name mismatch on Windows
(https://lab.llvm.org/buildbot/#/builders/127/builds/59907/steps/8/logs/stdio)
2023-12-18 19:31:18 -08:00
Mingming Liu
c5871712ae
Reland "[PGO][GlobalValue][LTO]In GlobalValues::getGlobalIdentifier, use semicolon as delimiter for local-linkage varibles. " (#75860)
Fixed build-bot failures caught by post-submit tests
1) Add the list of command line tools needed by new compiler-rt test into dependency.
2) Use `starts_with` to replace deprecated `startswith`.

**Original commit message**
Commit fe05193 (phab D156569), IRPGO names uses format
`[<filepath>;]<linkage-name>` while prior format is
`[<filepath>:<mangled-name>`. The format change would break the use case
demonstrated in (updated)
`llvm/test/Transforms/PGOProfile/thinlto_indirect_call_promotion.ll` and
`compiler-rt/test/profile/instrprof-thinlto-indirect-call-promotion.cpp`

This patch changes `GlobalValues::getGlobalIdentifer` to use the
semicolon.

To elaborate on the scenario how things break without this PR
1. IRPGO raw profiles stores (compressed) IRPGO names of functions in
one section, and per-function profile data in another section. The
[NameRef](fc715e4cd9/compiler-rt/include/profile/InstrProfData.inc (L72))
field in per-function profile data is the MD5 hash of IRPGO names.
2. When raw profiles are converted to indexed format profiles, the
profiled address is
[mapped](fc715e4cd9/llvm/lib/ProfileData/InstrProf.cpp (L876-L885))
to the MD5 hash of the callee.
3. In `pgo-instr-use` thin-lto prelink pipeline, MD5 hash of IRPGO names
will be
[annotated](fc715e4cd9/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp (L1707))
as value profiles, and used to import indirect-call-prom candidates. If
the annotated MD5 hash is computed from the new format while import uses
the prior format, the callee cannot be imported.

*
`compiler-rt/test/profile/instrprof-thinlto-indirect-call-promotion.cpp`
is added to have an end-to-end test.
* `llvm/test/Transforms/PGOProfile/thinlto_indirect_call_promotion.ll`
is updated to have better test coverage from another aspect (as runtime
tests are more sensitive to the environment and may be skipped by some
contributors)
2023-12-18 17:43:40 -08:00
Mingming Liu
3aa5d71127
Revert "[PGO][GlobalValue][LTO]In GlobalValues::getGlobalIdentifier, use semicolon as delimiter for local-linkage varibles." (#75835)
Reverts llvm/llvm-project#74008

The compiler-rt test failed due to `llvm-dis` not found
(https://lab.llvm.org/buildbot/#/builders/127/builds/59884)
Will revert and investigate how to require the proper dependency.
2023-12-18 09:39:55 -08:00
Mingming Liu
245cddae70
[PGO][GlobalValue][LTO]In GlobalValues::getGlobalIdentifier, use semicolon as delimiter for local-linkage varibles. (#74008)
Commit fe05193 (phab D156569), IRPGO names uses format
`[<filepath>;]<linkage-name>` while prior format is
`[<filepath>:<mangled-name>`. The format change would break the use case
demonstrated in (updated)
`llvm/test/Transforms/PGOProfile/thinlto_indirect_call_promotion.ll` and
`compiler-rt/test/profile/instrprof-thinlto-indirect-call-promotion.cpp`

This patch changes `GlobalValues::getGlobalIdentifer` to use the
semicolon.

To elaborate on the scenario how things break without this PR
1. IRPGO raw profiles stores (compressed) IRPGO names of functions in
one section, and per-function profile data in another section. The
[NameRef](fc715e4cd9/compiler-rt/include/profile/InstrProfData.inc (L72))
field in per-function profile data is the MD5 hash of IRPGO names.
2. When raw profiles are converted to indexed format profiles, the
profiled address is
[mapped](fc715e4cd9/llvm/lib/ProfileData/InstrProf.cpp (L876-L885))
to the MD5 hash of the callee.
3. In `pgo-instr-use` thin-lto prelink pipeline, MD5 hash of IRPGO names
will be
[annotated](fc715e4cd9/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp (L1707))
as value profiles, and used to import indirect-call-prom candidates. If
the annotated MD5 hash is computed from the new format while import uses
the prior format, the callee cannot be imported.

*`compiler-rt/test/profile/instrprof-thinlto-indirect-call-promotion.cpp`
is added to have an end-to-end test.
* `llvm/test/Transforms/PGOProfile/thinlto_indirect_call_promotion.ll`
is updated to have better test coverage from another aspect (as runtime
tests are more sensitive to the environment and may be skipped by some
contributors)
2023-12-18 09:10:39 -08:00
Zequan Wu
ab3430f891
[Profile] Add binary profile correlation for code coverage. (#69493)
## Motivation
Since we don't need the metadata sections at runtime, we can somehow
offload them from memory at runtime. Initially, I explored [debug info
correlation](https://discourse.llvm.org/t/instrprofiling-lightweight-instrumentation/59113),
which is used for PGO with value profiling disabled. However, it
currently only works with DWARF and it's be hard to add such artificial
debug info for every function in to CodeView which is used on Windows.
So, offloading profile metadata sections at runtime seems to be a
platform independent option.

## Design
The idea is to use new section names for profile name and data sections
and mark them as metadata sections. Under this mode, the new sections
are non-SHF_ALLOC in ELF. So, they are not loaded into memory at runtime
and can be stripped away as a post-linking step. After the process
exits, the generated raw profiles will contains only headers + counters.
llvm-profdata can be used correlate raw profiles with the unstripped
binary to generate indexed profile.

## Data
For chromium base_unittests with code coverage on linux, the binary size
overhead due to instrumentation reduced from 64M to 38.8M (39.4%) and
the raw profile files size reduce from 128M to 68M (46.9%)
```
$ bloaty out/cov/base_unittests.stripped -- out/no-cov/base_unittests.stripped
    FILE SIZE        VM SIZE
 --------------  --------------
  +121% +30.4Mi  +121% +30.4Mi    .text
  [NEW] +14.6Mi  [NEW] +14.6Mi    __llvm_prf_data
  [NEW] +10.6Mi  [NEW] +10.6Mi    __llvm_prf_names
  [NEW] +5.86Mi  [NEW] +5.86Mi    __llvm_prf_cnts
   +95% +1.75Mi   +95% +1.75Mi    .eh_frame
  +108%  +400Ki  +108%  +400Ki    .eh_frame_hdr
  +9.5%  +211Ki  +9.5%  +211Ki    .rela.dyn
  +9.2% +95.0Ki  +9.2% +95.0Ki    .data.rel.ro
  +5.0% +87.3Ki  +5.0% +87.3Ki    .rodata
  [ = ]       0   +13% +47.0Ki    .bss
   +40% +1.78Ki   +40% +1.78Ki    .got
   +12% +1.49Ki   +12% +1.49Ki    .gcc_except_table
  [ = ]       0   +65% +1.23Ki    .relro_padding
   +62% +1.20Ki  [ = ]       0    [Unmapped]
   +13%    +448   +19%    +448    .init_array
  +8.8%    +192  [ = ]       0    [ELF Section Headers]
  +0.0%    +136  +0.0%     +80    [7 Others]
  +0.1%     +96  +0.1%     +96    .dynsym
  +1.2%     +96  +1.2%     +96    .rela.plt
  +1.5%     +80  +1.2%     +64    .plt
  [ = ]       0 -99.2% -3.68Ki    [LOAD #5 [RW]]
  +195% +64.0Mi  +194% +64.0Mi    TOTAL
$ bloaty out/cov-cor/base_unittests.stripped -- out/no-cov/base_unittests.stripped
    FILE SIZE        VM SIZE
 --------------  --------------
  +121% +30.4Mi  +121% +30.4Mi    .text
  [NEW] +5.86Mi  [NEW] +5.86Mi    __llvm_prf_cnts
   +95% +1.75Mi   +95% +1.75Mi    .eh_frame
  +108%  +400Ki  +108%  +400Ki    .eh_frame_hdr
  +9.5%  +211Ki  +9.5%  +211Ki    .rela.dyn
  +9.2% +95.0Ki  +9.2% +95.0Ki    .data.rel.ro
  +5.0% +87.3Ki  +5.0% +87.3Ki    .rodata
  [ = ]       0   +13% +47.0Ki    .bss
   +40% +1.78Ki   +40% +1.78Ki    .got
   +12% +1.49Ki   +12% +1.49Ki    .gcc_except_table
   +13%    +448   +19%    +448    .init_array
  +0.1%     +96  +0.1%     +96    .dynsym
  +1.2%     +96  +1.2%     +96    .rela.plt
  +1.2%     +64  +1.2%     +64    .plt
  +2.9%     +64  [ = ]       0    [ELF Section Headers]
  +0.0%     +40  +0.0%     +40    .data
  +1.2%     +32  +1.2%     +32    .got.plt
  +0.0%     +24  +0.0%      +8    [5 Others]
  [ = ]       0 -22.9%    -872    [LOAD #5 [RW]]
 -74.5% -1.44Ki  [ = ]       0    [Unmapped]
  [ = ]       0 -76.5% -1.45Ki    .relro_padding
  +118% +38.8Mi  +117% +38.8Mi    TOTAL
```

A few things to note:
1. llvm-profdata doesn't support filter raw profiles by binary id yet,
so when a raw profile doesn't belongs to the binary being digested by
llvm-profdata, merging will fail. Once this is implemented,
llvm-profdata should be able to only merge raw profiles with the same
binary id as the binary and discard the rest (with mismatched/missing
binary id). The workflow I have in mind is to have scripts invoke
llvm-profdata to get all binary ids for all raw profiles, and
selectively choose the raw pnrofiles with matching binary id and the
binary to llvm-profdata for merging.
2. Note: In COFF, currently they are still loaded into memory but not
used. I didn't do it in this patch because I noticed that `.lcovmap` and
`.lcovfunc` are loaded into memory. A separate patch will address it.
3. This should works with PGO when value profiling is disabled as debug
info correlation currently doing, though I haven't tested this yet.
2023-12-14 14:16:38 -05:00
Kazu Hirata
586ecdf205
[llvm] Use StringRef::{starts,ends}_with (NFC) (#74956)
This patch replaces uses of StringRef::{starts,ends}with with
StringRef::{starts,ends}_with for consistency with
std::{string,string_view}::{starts,ends}_with in C++20.

I'm planning to deprecate and eventually remove
StringRef::{starts,ends}with.
2023-12-11 21:01:36 -08:00
Mingming Liu
bb642f8b94
[NFC][InstrProf]Refactor readPGOFuncNameStrings (#71566)
Refactor this function to take a callback for each decoded string, rename it and change it to a static function in cpp. Move its (sole) caller definition from header to cpp.
- This is a split of patch https://github.com/llvm/llvm-project/pull/66825; to minimize the diff created in a big PR.
2023-11-09 10:47:44 -08:00
Zequan Wu
89a2e70159
[llvm-profdata] Emit warning when counter value is greater than 2^56. (#69513)
Fixes #65416
2023-10-31 16:40:51 -04:00
Alan Phipps
f95b2f1acf Reland "[InstrProf][compiler-rt] Enable MC/DC Support in LLVM Source-based Code Coverage (1/3)"
Part 1 of 3. This includes the LLVM back-end processing and profile
reading/writing components. compiler-rt changes are included.

Differential Revision: https://reviews.llvm.org/D138846
2023-10-30 11:15:02 -05:00
Mingming Liu
a1e9777b76
[NFC] In InstrProf, generalize helper functions to take 'GlobalObject'. They currently take 'Functions' as function parameters or have 'Func' in the name. (#70287)
- For instance, `collectPGOFuncNameStrings` is reused a lot in https://github.com/llvm/llvm-project/pull/66825 to get the compressed vtable names; and in some added callsites it's just confusing to see 'func' since context clearly shows it's not. This function currently just takes a list of strings as input so name it to `collectGlobalObjectNameStrings`
- Do the rename in a standalone patch since the method is used in non-llvm codebase. It's easier to rollback this NFC in case rename in that codebase takes longer.
2023-10-26 14:48:36 -07:00