259 Commits

Author SHA1 Message Date
Kazu Hirata
99d2b3b0aa
[llvm-profgen] Avoid repeated hash lookups (NFC) (#130466) 2025-03-09 00:49:37 -08:00
Kazu Hirata
4bda95304f
[llvm-profgen] Avoid repeated hash lookups (NFC) (#127028) 2025-02-13 09:12:33 -08:00
Kazu Hirata
6228379a6c
[llvm-profgen] Avoid repeated hash lookups (NFC) (#126467) 2025-02-10 07:50:57 -08:00
Lei Wang
6c3c90b5a8
[CSSPGO]Add a flag to limit unsymbolized context depth (#121531)
Adding a new flag(`--csprof-max-unsymbolized-context-depth`) to only
limit unsymbolized context depth. Currently,`--csprof-max-context-depth`
applies to both symbolized and unsymbolized profile context, there are
scenarios where `--csprof-max-context-depth` may not be flexible enough,
e.g. if we want to limit the context but still keep all the inlinings
from the leaf frame, we could set the value
csprof-max-unsymbolized-context-depth >= 1.
2025-01-07 10:29:52 -08:00
Amir Ayupov
ee09f7d1fc
[MC][NFC] Reduce Address2ProbesMap size
Replace the map from addresses to list of probes with a flat vector
containing probe references sorted by their addresses.

Reduces pseudo probe parsing time from 9.56s to 8.59s and peak RSS from
9.66 GiB to 9.08 GiB as part of perf2bolt processing a large binary.

Test Plan:
```
bin/llvm-lit -sv test/tools/llvm-profgen
```

Reviewers: maksfb, rafaelauler, dcci, ayermolo, wlei-llvm

Reviewed By: wlei-llvm

Pull Request: https://github.com/llvm/llvm-project/pull/102904
2024-08-26 09:14:35 -07:00
Amir Ayupov
04ebd1907c
[MC][NFC] Statically allocate storage for decoded pseudo probes and function records
Use #102774 to allocate storage for decoded probes (`PseudoProbeVec`)
and function records (`InlineTreeVec`).

Leverage that to also shrink sizes of `MCDecodedPseudoProbe`:
- Drop Guid since it's accessible via `InlineTree`.

`MCDecodedPseudoProbeInlineTree`:
- Keep track of probes and inlinees using `ArrayRef`s now that probes
  and function records belonging to the same function are allocated
  contiguously.

This reduces peak RSS from 13.7 GiB to 9.7 GiB and pseudo probe parsing
time (as part of perf2bolt) from 15.3s to 9.6s for a large binary with
400MiB .pseudo_probe section containing 43M probes and 25M function
records.

Depends on:
#102774
#102787
#102788

Reviewers: maksfb, rafaelauler, dcci, ayermolo, wlei-llvm

Reviewed By: wlei-llvm

Pull Request: https://github.com/llvm/llvm-project/pull/102789
2024-08-26 09:09:13 -07:00
Kazu Hirata
ca53611c90
[llvm] Use range-based for loops (NFC) (#105861) 2024-08-23 16:56:27 -07:00
Amir Ayupov
cd15d12f47
[MC][profgen][NFC] Expand auto for MCDecodedPseudoProbe
Expand autos in select places in preparation to #102789.

Reviewers: dcci, maksfb, WenleiHe, rafaelauler, ayermolo, wlei-llvm

Reviewed By: WenleiHe, wlei-llvm

Pull Request: https://github.com/llvm/llvm-project/pull/102788
2024-08-10 23:48:40 -07:00
Amir Ayupov
242f4e85eb
[profgen][NFC] Pass parameter as const_ref
Pass `ProbeNode` parameter of `trackInlineesOptimizedAway` as const
reference.

Reviewers: wlei-llvm, WenleiHe

Reviewed By: WenleiHe

Pull Request: https://github.com/llvm/llvm-project/pull/102787
2024-08-10 23:45:47 -07:00
Tim Creech
23609a383c
[llvm-profgen] Revert #99826 and #99026 (#100147)
Revert #99826 and #99026 to allow for additional input.
2024-08-02 21:16:48 +08:00
Tim Creech
01d783643a
[llvm-profgen] Add --sample-period to estimate absolute counts (#99826)
Without `--sample-period`, no assumptions are made about perf profile
sample frequencies. This is useful for comparing relative hotness of
different program locations within the same profile.

With `--sample-period`, LBR- and IP-based profile hit counts are
adjusted to estimate the absolute total event count for each program
location. This makes it reasonable to compare hit counts between
different profiles, e.g., between two LBR-based execution frequency
profiles with different sampling periods or between LBR-based execution
frequency profiles and IP-based branch mispredict profiles.

This functionality is in support of HWPGO[^1], which aims to enable
feedback from a wider range of hardware events.

[^1]:
https://llvm.org/devmtg/2024-04/slides/TechnicalTalks/Xiao-EnablingHW-BasedPGO.pdf
2024-07-22 17:22:36 +08:00
Tim Creech
0caf0c93e7
[llvm-profgen] Support creating profiles of arbitrary events (#99026)
This change introduces two options which may be used to create profiles
of arbitrary PMU events.

1. `--leading-ip-only` provides a simple sample-IP-based profile mode.
This is not useful for building a profile of execution frequency, but it
is useful for building new types of profiles.

   For example, to build a profile of unpredictable branches:

perf record -b -e branch-misses:upp -o perf.data ... llvm-profgen
--perfdata perf.data --leading-ip-only ...

2. `--perf-event=event` enables the creation of a profile concerned with
a specific event or set of events. The names given should match the
"event" field as emitted by perf-script(1).

This option has two spellings: `--perf-event` and `--perf-events`. The
plural spelling accepts a comma-separated list. The singular spelling
appends a single event name to the set of events which will be used.
This is meant to accommodate event names containing commas.

Combined, these options allow generating multiple kinds of profiles from
a single `perf record` collection. For example, to generate both
execution frequency and branch mispredict profiles:

perf record -c 1000003 -b -e
br_inst_retired.near_taken:upp,br_misp_retired.all_branches:upp ...
llvm-profgen --output execution.prof
--perf-event=br_inst_retired.near_taken:upp ...
llvm-profgen --leading-ip-only --output unpredictable.prof
--perf-event=br_misp_retired.all_branches:upp ...

These additions are in support of more general HWPGO[^1], allowing
feedback from a wider range of hardware events.

[^1]:
https://llvm.org/devmtg/2024-04/slides/TechnicalTalks/Xiao-EnablingHW-BasedPGO.pdf

---------

Co-authored-by: Tim Creech <tcreech@tcreech.com>
2024-07-21 15:04:27 +08:00
Mircea Trofin
ce03155a1b
[NFC] Coding style fixes: SampleProf (#98208)
Also some control flow simplifications.

Notably, this doesn't address `sampleprof_error`. I *think* the style
there tries to match `std::error_category`.

Also left `hash_value` as-is, because it matches what we do in Hashing.h
2024-07-09 14:35:49 -07:00
Amir Ayupov
1d5ac72cd2
[MC][NFC] Fix typo in MCPseudoProbeFrameLocation (#98090) 2024-07-08 20:40:07 -07:00
Kazu Hirata
75bc20ff89
[llvm] Remove redundant calls to std::unique_ptr<T>::get (NFC) (#97914) 2024-07-07 08:23:41 +09:00
Jay Foad
d4a0154902
[llvm-project] Fix typo "seperate" (#95373) 2024-06-13 20:20:27 +01:00
xur-llvm
2fa6eaf93b
[llvm-profgen] Add support for Linux kenrel profile (#92831)
Add the support to handle Linux kernel perf files. The functionality is
under option -kernel. Note that currently only main kernel (in vmlinux)
is handled: kernel modules are not handled.

---------

Co-authored-by: Han Shen <shenhan@google.com>
2024-06-13 10:21:46 -07:00
Lei Wang
b9d40a7ae4
[llvm-profgen] Improve sample profile density (#92144)
The profile density feature(the amount of samples in the profile
relative to the program size) is used to identify insufficient sample
issue and provide hints for user to increase sample count. A low-density
profile can be inaccurate due to statistical noise, which can hurt FDO
performance.

This change introduces two improvements to the current density work. 
1. The density calculation/definition is changed. Previously, the
density of a profile was calculated as the minimum density for all warm
functions (a function was considered warm if its total samples were
within the top N percent of the profile). However, there is a problem
that a high total sample profile can have a very low density, which
makes the density value unstable.
- Instead, we want to find a density number such that if a function's
density is below this value, it is considered low-density function. We
consider the whole profile is bad if a group of low-density functions
have the sum of samples that exceeds N percent cut-off of the total
samples.

- In implementation, we sort the function profiles by density, iterate
them in descending order and keep accumulating the body samples until
the sum exceeds the (100% - N) percentage of the total_samples, the
profile-density is the last(minimum) function-density of processed
functions. We introduce the a flag(`--profile-density-threshold`) for
this percentage threshold.

2. The density is now calculated based on final(compiler used) profiles
instead of merged context-less profiles.
2024-05-24 14:37:24 -04:00
Haohai Wen
8f5a2325c3
[llvm-profgen] Trim tail CR+LF for LBR record line (#93210)
On Windows, perfscript generated by sep contains CR+LF at the end of
LBR records line. This '\r' will be treated as a LBR record when running
llvm-profgen on Linux and then generate warning.
2024-05-24 09:03:53 +08:00
Haohai Wen
3f7f446d38
[llvm-profgen] Remove temporary perf script files (#86668)
The temporary perf script files converted from perf data will occupy
lots
of space for large project. This patch removes them when llvm-profgen
exits normally or receives signals.
2024-04-11 15:28:32 +08:00
Haohai Wen
8c03f400a8
[llvm-profgen] Support COFF binary (#83972)
Intel Vtune/SEP has supported collecting LBR on Windows and generating
perf-script file which is same format as Linux perf script. This patch
teaches llvm-profgen to disassemble COFF binary so that we can do
Sampling based PGO on Windows.
2024-03-15 09:02:26 +08:00
Matthias Braun
8466ab98ca
llvm-profgen: Fix race condition (#83489)
Fix race condition when multiple instances of `llvm-progen` read from
the same inputs.
2024-02-29 14:53:11 -08:00
Lei Wang
24f025175d
[llvm-profgen] Filter out ambiguous cold profiles during profile generation (#81803)
For the built-in local initialization function(`__cxx_global_var_init`,
`__tls_init` prefix), there could be multiple versions of the functions
in the final binary, e.g. `__cxx_global_var_init`, which is a wrapper of
global variable ctors, the compiler could assign suffixes like
`__cxx_global_var_init.N` for different ctors.
However, in the profile generation, we call `getCanonicalFnName` to
canonicalize the names which strip the suffixes. Therefore, samples from
different functions queries the same profile(only
`__cxx_global_var_init`) and the counts are merged. As the functions are
essentially different, entries of the merged profile are ambiguous. In
sample loading, for each version of this function, the IR from one
version would be attributed towards a merged entries, which is
inaccurate, especially for fuzzy profile matching, it gets multiple
callsites(from different function) but using to match one callsite,
which mislead the matching and report a lot of false positives.
Hence, we want to filter them out from the profile map during the
profile generation time. The profiles are all cold functions, it won't
have perf impact.
2024-02-16 14:29:24 -08:00
Nathan Lanza
7ff2dc3b49
[profgen] Use a 64bit integer for &'ing the loadable address (#79930)
For the linux kernel, the loadable segments start at 0xffff... and thus
the 32 bit integer here was truncating all the meaningful bits. Grow it
to 64 bits.
2024-01-30 13:10:22 -05:00
Benjamin Kramer
9423e45987 [ProfileData] Copy CallTargetMaps a bit less. NFCI 2023-12-24 17:48:18 +01:00
Kazu Hirata
586ecdf205
[llvm] Use StringRef::{starts,ends}_with (NFC) (#74956)
This patch replaces uses of StringRef::{starts,ends}with with
StringRef::{starts,ends}_with for consistency with
std::{string,string_view}::{starts,ends}_with in C++20.

I'm planning to deprecate and eventually remove
StringRef::{starts,ends}with.
2023-12-11 21:01:36 -08:00
Kazu Hirata
92c2529ccd [llvm] Stop including vector (NFC)
Identified with clangd.
2023-12-03 22:32:21 -08:00
Kazu Hirata
a7627f721f [llvm] Stop including list (NFC)
Identified with clangd.
2023-12-02 10:47:26 -08:00
Hongtao Yu
7a3db658d9
[llvm-profgen] More tweaks to warnings (#68608)
Tweaking warnings more to avoid flooding user log.
2023-10-22 17:00:14 -07:00
William Junda Huang
ef0e0adccd
[llvm-profdata] Do not create numerical strings for MD5 function names read from a Sample Profile. (#66164)
This is phase 2 of the MD5 refactoring on Sample Profile following
https://reviews.llvm.org/D147740
    
In previous implementation, when a MD5 Sample Profile is read, the
reader first converts the MD5 values to strings, and then create a
StringRef as if the numerical strings are regular function names, and
later on IPO transformation passes perform string comparison over these
numerical strings for profile matching. This is inefficient since it
causes many small heap allocations.
In this patch I created a class `ProfileFuncRef` that is similar to
`StringRef` but it can represent a hash value directly without any
conversion, and it will be more efficient (I will attach some benchmark
results later) when being used in associative containers.

ProfileFuncRef guarantees the same function name in string form or in
MD5 form has the same hash value, which also fix a few issue in IPO
passes where function matching/lookup only check for function name
string, while returns a no-match if the profile is MD5.

When testing on an internal large profile (> 1 GB, with more than 10
million functions), the full profile load time is reduced from 28 sec to
25 sec in average, and reading function offset table from 0.78s to 0.7s
2023-10-17 21:09:39 +00:00
Hongtao Yu
967767830d
[llvm-profgen] Print DWP related warnings under show-detailed-warning (#68019)
Printing DWP related warnings under show-detailed-warning so that they
won't flood user log.
2023-10-03 22:23:12 -07:00
Tom Stellard
e7247f1010 [profiling] Move option declarations into headers
This will make it possible to add visibility attributes to these
variables.  This also fixes some type mismatches between the
declaration and the definition.

Reviewed By: bogner, huangjd

Differential Revision: https://reviews.llvm.org/D156599
2023-09-30 18:51:28 -07:00
Hongtao Yu
47669af47f
[llvm-profgen] Ignore inline frames with an emtpy function name (#66678)
Broken debug information can give empty names for an inlined frame, e.g,
```

0x1d605c68: ryKeyINS7_17SmartCounterTypesEEESt10shared_ptrINS7_15AsyncCacheValueIS9_EEESaIhESt6atomicEEE9fetch_subElSt12memory_order at   Filename: edata.h
  Function start filename: edata.h
  Function start line: 266
  Function start address: 0x1d605c68
  Line: 267
  Column: 0
 (inlined by)  at   Filename: edata.h
  Function start filename: edata.h
  Function start line: 274
  Function start address: 0x1d605c68
  Line: 275
  Column: 0
 (inlined by) _EEEmmEv at   Filename: arena.c
  Function start filename: arena.c
  Function start line: 1303
  Line: 1308
  Column: 0
```

This patch avoids creating a sample context with an empty function name
by stopping tracking at that frame. This prevents a hash failure that
leads to an ICE, where empty context serves at an empty key for the
underlying MapVector
7624de5bea/llvm/lib/ProfileData/SampleProfWriter.cpp (L261)
2023-09-18 12:40:06 -07:00
Fangrui Song
111fcb0df0 [llvm] Fix duplicate word typos. NFC
Those fixes were taken from https://reviews.llvm.org/D137338
2023-09-01 18:25:16 -07:00
Takuya Shimizu
01b88dd66d [NFC] Remove unused variables declared in conditions
D152495 makes clang warn on unused variables that are declared in conditions like `if (int var = init) {}`
This patch is an NFC fix to suppress the new warning in llvm,clang,lld builds to pass CI in the above patch.

Differential Revision: https://reviews.llvm.org/D158016
2023-08-30 10:05:06 +09:00
William Huang
7624de5bea [llvm-profdata] Refactoring Sample Profile Reader to increase FDO build speed using MD5 as key to Sample Profile map
This is phase 1 of multiple planned improvements on the sample profile loader.   The major change is to use MD5 hash code ((instead of the function itself) as the key to look up the function offset table and the profiles, which significantly reduce the time it takes to construct the map.

The optimization is based on the fact that many practical sample profiles are using MD5 values for function names to reduce profile size, so we shouldn't need to convert the MD5 to a string and then to a SampleContext and use it as the map's key, because it's extremely slow.

Several changes to note:

(1) For non-CS SampleContext, if it is already MD5 string, the hash value will be its integral value, instead of hashing the MD5 again. In phase 2 this is going to be optimized further using a union to represent MD5 function (without converting it to string) and regular function names.

(2) The SampleProfileMap is a wrapper to *map<uint64_t, FunctionSamples>, while providing interface allowing using SampleContext as key, so that existing code still work. It will check for MD5 collision (unlikely but not too unlikely, since we only takes the lower 64 bits) and handle it to at least guarantee compilation correctness (conflicting old profile is dropped, instead of returning an old profile with inconsistent context). Other code should not try to use MD5 as key to access the map directly, because it will not be able to handle MD5 collision at all. (see exception at (5) )

(3) Any SampleProfileMap::emplace() followed by SampleContext assignment if newly inserted, should be replaced with SampleProfileMap::Create(), which does the same thing.

(4) Previously we ensure an invariant that in SampleProfileMap, the key is equal to the Context of the value, for profile map that is eventually being used for output (as in llvm-profdata/llvm-profgen). Since the key became MD5 hash, only the value keeps the context now, in several places where an intermediate SampleProfileMap is created, each new FunctionSample's context is set immediately after insertion, which is necessary to "remember" the context otherwise irretrievable.

(5) When reading a profile, we cache the MD5 values of all functions, because they are used at least twice (one to index into FuncOffsetTable, the other into SampleProfileMap, more if there are additional sections), in this case the SampleProfileMap is directly accessed with MD5 value so that we don't recalculate it each time (expensive)

Performance impact:
When reading a ~1GB extbinary profile (fixed length MD5, not compressed) with 10 million function names and 2.5 million top level functions (non CS functions, each function has varying nesting level from 0 to 20), this patch improves the function offset table loading time by 20%, and improves full profile read by 5%.

Reviewed By: davidxl, snehasish

Differential Revision: https://reviews.llvm.org/D147740
2023-08-17 20:10:45 +00:00
Aaron Ballman
1a53b5c367 Revert "[llvm-profdata] Refactoring Sample Profile Reader to increase FDO build speed using MD5 as key to Sample Profile map"
This reverts commit 66ba71d913df7f7cd75e92c0c4265932b7c93292.

Addressing issues found by:
https://lab.llvm.org/buildbot/#/builders/245/builds/11732
https://lab.llvm.org/buildbot/#/builders/187/builds/12251
https://lab.llvm.org/buildbot/#/builders/186/builds/11099
https://lab.llvm.org/buildbot/#/builders/182/builds/6976
2023-07-28 09:41:38 -04:00
William Huang
66ba71d913 [llvm-profdata] Refactoring Sample Profile Reader to increase FDO build speed using MD5 as key to Sample Profile map
This is phase 1 of multiple planned improvements on the sample profile loader.   The major change is to use MD5 hash code ((instead of the function itself) as the key to look up the function offset table and the profiles, which significantly reduce the time it takes to construct the map.

The optimization is based on the fact that many practical sample profiles are using MD5 values for function names to reduce profile size, so we shouldn't need to convert the MD5 to a string and then to a SampleContext and use it as the map's key, because it's extremely slow.

Several changes to note:

(1) For non-CS SampleContext, if it is already MD5 string, the hash value will be its integral value, instead of hashing the MD5 again. In phase 2 this is going to be optimized further using a union to represent MD5 function (without converting it to string) and regular function names.

(2) The SampleProfileMap is a wrapper to *map<uint64_t, FunctionSamples>, while providing interface allowing using SampleContext as key, so that existing code still work. It will check for MD5 collision (unlikely but not too unlikely, since we only takes the lower 64 bits) and handle it to at least guarantee compilation correctness (conflicting old profile is dropped, instead of returning an old profile with inconsistent context). Other code should not try to use MD5 as key to access the map directly, because it will not be able to handle MD5 collision at all. (see exception at (5) )

(3) Any SampleProfileMap::emplace() followed by SampleContext assignment if newly inserted, should be replaced with SampleProfileMap::Create(), which does the same thing.

(4) Previously we ensure an invariant that in SampleProfileMap, the key is equal to the Context of the value, for profile map that is eventually being used for output (as in llvm-profdata/llvm-profgen). Since the key became MD5 hash, only the value keeps the context now, in several places where an intermediate SampleProfileMap is created, each new FunctionSample's context is set immediately after insertion, which is necessary to "remember" the context otherwise irretrievable.

(5) When reading a profile, we cache the MD5 values of all functions, because they are used at least twice (one to index into FuncOffsetTable, the other into SampleProfileMap, more if there are additional sections), in this case the SampleProfileMap is directly accessed with MD5 value so that we don't recalculate it each time (expensive)

Performance impact:
When reading a ~1GB extbinary profile (fixed length MD5, not compressed) with 10 million function names and 2.5 million top level functions (non CS functions, each function has varying nesting level from 0 to 20), this patch improves the function offset table loading time by 20%, and improves full profile read by 5%.

Reviewed By: davidxl, snehasish

Differential Revision: https://reviews.llvm.org/D147740
2023-07-27 23:08:27 +00:00
Hongtao Yu
4bdc938ce3 [CSSPGO][Preinliner] Always inline zero-sized functions.
Zero-sized functions should be cost-free in term of size budget, so they should be considered during inlining even if we run out of size budget.

This appears to give 0.5% win for one of our internal services.

Reviewed By: wenlei

Differential Revision: https://reviews.llvm.org/D153820
2023-06-27 17:06:24 -07:00
Haojian Wu
58056ae299 Revert "[llvm-profdata] Refactoring Sample Profile Reader to increase FDO build speed using MD5 as key to Sample Profile map"
This reverts commit 12e9c7aaa66b7624b5d7666ce2794d912bf9e4b7.

The commit has broken the buildbot, see comment https://reviews.llvm.org/D147740#4451540
2023-06-27 15:19:35 +02:00
Hongtao Yu
5cbcaf1678 [CSSPGO][Preinliner] Bump up the threshold to favor previous compiler inline decision.
The compiler has more insight and knowledge about functions based on their IR and attribures and should make a better inline decision than the offline preinliner does which is purely based on callsites hotness and code size.  Therefore I'm making changes to favor previous compiler inline decision by bumping up the callsite allowance.

This should improve the performance by more than 1% according to testing on Meta services.

Reviewed By: wenlei

Differential Revision: https://reviews.llvm.org/D153797
2023-06-26 17:21:22 -07:00
William Huang
12e9c7aaa6 [llvm-profdata] Refactoring Sample Profile Reader to increase FDO build speed using MD5 as key to Sample Profile map
This is phase 1 of multiple planned improvements on the sample profile loader.   The major change is to use MD5 hash code ((instead of the function itself) as the key to look up the function offset table and the profiles, which significantly reduce the time it takes to construct the map.

The optimization is based on the fact that many practical sample profiles are using MD5 values for function names to reduce profile size, so we shouldn't need to convert the MD5 to a string and then to a SampleContext and use it as the map's key, because it's extremely slow.

Several changes to note:

(1) For non-CS SampleContext, if it is already MD5 string, the hash value will be its integral value, instead of hashing the MD5 again. In phase 2 this is going to be optimized further using a union to represent MD5 function (without converting it to string) and regular function names.

(2) The SampleProfileMap is a wrapper to *map<uint64_t, FunctionSamples>, while providing interface allowing using SampleContext as key, so that existing code still work. It will check for MD5 collision (unlikely but not too unlikely, since we only takes the lower 64 bits) and handle it to at least guarantee compilation correctness (conflicting old profile is dropped, instead of returning an old profile with inconsistent context). Other code should not try to use MD5 as key to access the map directly, because it will not be able to handle MD5 collision at all. (see exception at (5) )

(3) Any SampleProfileMap::emplace() followed by SampleContext assignment if newly inserted, should be replaced with SampleProfileMap::Create(), which does the same thing.

(4) Previously we ensure an invariant that in SampleProfileMap, the key is equal to the Context of the value, for profile map that is eventually being used for output (as in llvm-profdata/llvm-profgen). Since the key became MD5 hash, only the value keeps the context now, in several places where an intermediate SampleProfileMap is created, each new FunctionSample's context is set immediately after insertion, which is necessary to "remember" the context otherwise irretrievable.

(5) When reading a profile, we cache the MD5 values of all functions, because they are used at least twice (one to index into FuncOffsetTable, the other into SampleProfileMap, more if there are additional sections), in this case the SampleProfileMap is directly accessed with MD5 value so that we don't recalculate it each time (expensive)

Performance impact:
When reading a ~1GB extbinary profile (fixed length MD5, not compressed) with 10 million function names and 2.5 million top level functions (non CS functions, each function has varying nesting level from 0 to 20), this patch improves the function offset table loading time by 20%, and improves full profile read by 5%.

Reviewed By: davidxl, snehasish

Differential Revision: https://reviews.llvm.org/D147740
2023-06-27 00:06:05 +00:00
Wenlei He
1a0d23efe1 [NFC] Generalize llvm-profgen message to cover both AutoFDO and CSSPGO
Update llvm-profgen profile density message to cover both AutoFDO and CSSPGO.

Differential Revision: https://reviews.llvm.org/D153730
2023-06-26 09:47:56 -07:00
Douglas Yung
c9a8a0e8a9 Revert "[llvm-profdata] Refactoring Sample Profile Reader to increase FDO build speed using MD5 as key to Sample Profile map"
This reverts commit 31af18bccea95fe1ae8aa2c51cf7c8e92a1c208e.

This change is causing build failures on many Windows build bots:

https://lab.llvm.org/buildbot/#/builders/216/builds/22833
https://lab.llvm.org/buildbot/#/builders/123/builds/19602
https://lab.llvm.org/buildbot/#/builders/172/builds/28315
https://lab.llvm.org/buildbot/#/builders/119/builds/13870
https://lab.llvm.org/buildbot/#/builders/233/builds/794
https://lab.llvm.org/buildbot/#/builders/235/builds/387
https://lab.llvm.org/buildbot/#/builders/13/builds/36921
https://lab.llvm.org/buildbot/#/builders/127/builds/50510
2023-06-23 17:58:22 -07:00
William Huang
31af18bcce [llvm-profdata] Refactoring Sample Profile Reader to increase FDO build speed using MD5 as key to Sample Profile map
This is phase 1 of multiple planned improvements on the sample profile loader.   The major change is to use MD5 hash code ((instead of the function itself) as the key to look up the function offset table and the profiles, which significantly reduce the time it takes to construct the map.

The optimization is based on the fact that many practical sample profiles are using MD5 values for function names to reduce profile size, so we shouldn't need to convert the MD5 to a string and then to a SampleContext and use it as the map's key, because it's extremely slow.

Several changes to note:

(1) For non-CS SampleContext, if it is already MD5 string, the hash value will be its integral value, instead of hashing the MD5 again. In phase 2 this is going to be optimized further using a union to represent MD5 function (without converting it to string) and regular function names.

(2) The SampleProfileMap is a wrapper to *map<uint64_t, FunctionSamples>, while providing interface allowing using SampleContext as key, so that existing code still work. It will check for MD5 collision (unlikely but not too unlikely, since we only takes the lower 64 bits) and handle it to at least guarantee compilation correctness (conflicting old profile is dropped, instead of returning an old profile with inconsistent context). Other code should not try to use MD5 as key to access the map directly, because it will not be able to handle MD5 collision at all. (see exception at (5) )

(3) Any SampleProfileMap::emplace() followed by SampleContext assignment if newly inserted, should be replaced with SampleProfileMap::Create(), which does the same thing.

(4) Previously we ensure an invariant that in SampleProfileMap, the key is equal to the Context of the value, for profile map that is eventually being used for output (as in llvm-profdata/llvm-profgen). Since the key became MD5 hash, only the value keeps the context now, in several places where an intermediate SampleProfileMap is created, each new FunctionSample's context is set immediately after insertion, which is necessary to "remember" the context otherwise irretrievable.

(5) When reading a profile, we cache the MD5 values of all functions, because they are used at least twice (one to index into FuncOffsetTable, the other into SampleProfileMap, more if there are additional sections), in this case the SampleProfileMap is directly accessed with MD5 value so that we don't recalculate it each time (expensive)

Performance impact:
When reading a ~1GB extbinary profile (fixed length MD5, not compressed) with 10 million function names and 2.5 million top level functions (non CS functions, each function has varying nesting level from 0 to 20), this patch improves the function offset table loading time by 20%, and improves full profile read by 5%.

Reviewed By: davidxl, snehasish

Differential Revision: https://reviews.llvm.org/D147740
2023-06-23 21:48:52 +00:00
Hongtao Yu
09742be818 [llvm-profgen] Remove target triple check to allow for more targets
Llvm-profgen internally uses the llvm libraries and the MCDesc interface to do disassembling and symblization and it never checks against target-specific instruction operators. This makes it quite transparent to targets and a first attempt for an aarch64 binary just works. Therefore I'm removing the unnecessary triple check to unblock for new targets.

Reviewed By: wenlei

Differential Revision: https://reviews.llvm.org/D153449
2023-06-23 10:16:24 -07:00
Kazu Hirata
a3b9c1533e [tools] Use llvm::is_contained (NFC) 2023-06-19 23:36:14 -07:00
Mark Santaniello
27c37327da Avoid pointless canonicalize when using Dwarf names
CPU profile indicated memcmp was hot due to the two rfind calls in
getCanonicalFnName. If UseSymbolTable is false, we can avoid the cost entirely.

For CSSPGO profiles I've measured ~5% speedup with this change.

Profile similarity before/after matches 100%.

Reviewed By: wenlei

Differential Revision: https://reviews.llvm.org/D151441
2023-05-25 08:14:11 -07:00
Hongtao Yu
345fd0c10e [FS-AFDO] Generate pseudo-probe-based profiles with FS-discriminators.
This change enables generating pseudo-probe-based FS-AFDO profiles. The change is straightforward based-on previous change {D147651} by just injecting FS-discriminators into various profile generation spot.

Reviewed By: wenlei

Differential Revision: https://reviews.llvm.org/D147957
2023-05-10 11:28:54 -07:00
William Huang
d38d6ca179 [llvm-profdata] Deprecate Compact Binary Sample Profile Format
Remove support for compact binary sample profile format

Reviewed By: davidxl, wenlei

Differential Revision: https://reviews.llvm.org/D149400
2023-05-01 17:10:08 +00:00