llvm-project/bolt/test/X86/reader-stale-yaml.test
Shaw Young 9a9af0a23f
[BOLT] Match blocks with pseudo probes (#99891)
Match inline trees first between profile and the binary: by GUID,
checksum, parent, and inline site for inlined functions. Map profile
probes to binary probes via matched inline tree nodes. Each binary probe
has an associated binary basic block. If all probes from one profile
basic block map to the same binary basic block, it’s an exact match,
otherwise the block is determined by majority vote and reported as loose
match.

Pseudo probe matching happens between exact hash matching and call/loose
matching.

Introduce ProbeMatchSpec - a mechanism to match probes belonging to
another binary function. For example, given functions foo and bar:
```
void foo() {
  bar();
}
```
profiled binary: bar is not inlined => have top-level function bar
new binary where the profile is applied to: bar is inlined into foo.

Currently, BOLT does 1:1 matching between profile functions and binary
functions based on the name. #100446 will extend this to N:M where
multiple profiles can be matched to one binary function (as in the
example above where binary function foo would use profiles for foo and
bar), and one profile can be matched to multiple binary functions (e.g.
if bar was inlined into multiple functions).

In this diff, ProbeMatchSpecs would only have one BinaryFunctionProfile
(existing name-based matching). 

Test Plan: Added match-blocks-with-pseudo-probes.test

Performance test:
- Setup:
  - Baseline no-BOLT: Clang with pseudo probes, ThinLTO + CSSPGO
  (#79942)
  - BOLT fresh: BOLTed Clang using fresh profile,
  - BOLT stale (hash): BOLTed Clang using stale profile (collected on
    Clang 10K commits back), `-infer-stale-profile` (hash+call block
    matching)
  - BOLT stale (+probe): BOLTed Clang using stale profile,
    `-infer-stale-profile` with `-stale-matching-with-pseudo-probes`
    (hash+call+pseudo probe block matching)
  - 2S Intel SKX Xeon 6138 with 40C/80T and 256GB RAM, using 20C/40T for
    build,
  - BOLT profiles are collected on Clang compiling large preprocessed
    C++ file.
- Benchmark: building Clang (average of 5 runs), see driver in
  aaupov/llvm-devmtg-2022
- Results, wall time, lower is better:
  - Baseline no-BOLT: 429.52 +- 2.61s,
  - BOLT stale (hash): 413.21 +- 2.19s,
  - BOLT stale (+probe): 409.69 +- 1.41s,
  - BOLT fresh: 384.50 +- 1.80s.

---------

Co-authored-by: Amir Ayupov <aaupov@fb.com>
2024-11-12 07:21:03 -08:00

111 lines
5.9 KiB
Plaintext

## This script checks that YamlProfileReader in llvm-bolt is reading data
## correctly and stale data is corrected by profile inference.
REQUIRES: asserts
RUN: yaml2obj %p/Inputs/blarge.yaml &> %t.exe
RUN: llvm-bolt %t.exe -o %t.null --b %p/Inputs/blarge_profile_stale.yaml \
RUN: --infer-stale-profile=0 --profile-ignore-hash=1 --profile-use-dfs=0 \
RUN: 2>&1 | FileCheck %s -check-prefix=CHECK0
## Testing "usqrt"
RUN: llvm-bolt %t.exe -o %t.null --b %p/Inputs/blarge_profile_stale.yaml \
RUN: --print-cfg --print-only=usqrt --infer-stale-profile=1 \
RUN: --profile-ignore-hash=1 --profile-use-dfs=0 --debug-only=bolt-prof 2>&1 | FileCheck %s -check-prefix=CHECK1
## Testing "SolveCubic"
RUN: llvm-bolt %t.exe -o %t.null --b %p/Inputs/blarge_profile_stale.yaml \
RUN: --print-cfg --print-only=SolveCubic --infer-stale-profile=1 \
RUN: --profile-ignore-hash=1 --profile-use-dfs=0 --debug-only=bolt-prof 2>&1 | FileCheck %s -check-prefix=CHECK2
## Testing skipped function
RUN: llvm-bolt %t.exe -o %t.null --b %p/Inputs/blarge_profile_stale.yaml \
RUN: --print-cfg --print-only=usqrt --infer-stale-profile=1 --skip-funcs=usqrt \
RUN: --profile-ignore-hash=1 --profile-use-dfs=0
CHECK0: BOLT-INFO: 2 out of 7 functions in the binary (28.6%) have non-empty execution profile
CHECK0: BOLT-WARNING: 2 (100.0% of all profiled) functions have invalid (possibly stale) profile
CHECK0: BOLT-WARNING: 1192 out of 1192 samples in the binary (100.0%) belong to functions with invalid (possibly stale) profile
## Function "usqrt" has stale profile, since the number of blocks in the profile
## (nblocks=6) does not match the size of the CFG in the binary. The entry
## block (bid=0) has an incorrect (missing) count, which should be inferred by
## the algorithm.
## Verify inference details.
CHECK1: pre-processing profile using YAML profile reader
CHECK1: applying profile inference for "usqrt"
CHECK1: Matched yaml block (bid = 0) with hash 1111111111111111 to BB (index = 0) with hash 36007ba1d80c0000
CHECK1-NEXT: loose match
CHECK1: Matched yaml block (bid = 1) with hash d70d7a64320e0010 to BB (index = 1) with hash d70d7a64320e0010
CHECK1-NEXT: exact match
CHECK1: Matched yaml block (bid = 3) with hash 5c06705524800039 to BB (index = 3) with hash 5c06705524800039
CHECK1-NEXT: exact match
## Verify that yaml reader works as expected.
CHECK1: Binary Function "usqrt" after building cfg {
CHECK1: State : CFG constructed
CHECK1: Address : 0x401170
CHECK1: Size : 0x43
CHECK1: Section : .text
CHECK1: IsSimple : 1
CHECK1: BB Count : 5
CHECK1: Exec Count : 20
CHECK1: Branch Count: 640
CHECK1: }
## Verify block counts.
CHECK1: .LBB01 (4 instructions, align : 1)
CHECK1: Successors: .Ltmp[[#BB13:]] (mispreds: 0, count: 20)
CHECK1: .Ltmp[[#BB13:]] (9 instructions, align : 1)
CHECK1: Successors: .Ltmp[[#BB12:]] (mispreds: 0, count: 320), .LFT[[#BB0:]] (mispreds: 0, count: 0)
CHECK1: .LFT[[#BB0:]] (2 instructions, align : 1)
CHECK1: Successors: .Ltmp[[#BB12:]] (mispreds: 0, count: 0)
CHECK1: .Ltmp[[#BB12:]] (2 instructions, align : 1)
CHECK1: Successors: .Ltmp[[#BB13:]] (mispreds: 0, count: 300), .LFT[[#BB1:]] (mispreds: 0, count: 20)
CHECK1: .LFT[[#BB1:]] (2 instructions, align : 1)
## Check the overall inference stats.
CHECK1: 2 out of 7 functions in the binary (28.6%) have non-empty execution profile
CHECK1: BOLT-WARNING: 2 (100.0% of all profiled) functions have invalid (possibly stale) profile
CHECK1: BOLT-WARNING: 1192 out of 1192 samples in the binary (100.0%) belong to functions with invalid (possibly stale) profile
CHECK1: inferred profile for 2 (100.00% of profiled, 100.00% of stale) functions responsible for {{.*}} samples ({{.*}} out of {{.*}})
## Function "SolveCubic" has stale profile, since there is one jump in the
## profile (from bid=13 to bid=2) which is not in the CFG in the binary. The test
## verifies that the inference is able to match two blocks (bid=1 and bid=13)
## using "loose" hashes and then correctly propagate the counts.
## Verify inference details.
CHECK2: pre-processing profile using YAML profile reader
CHECK2: applying profile inference for "SolveCubic"
CHECK2: Matched yaml block (bid = 0) with hash 4600940a609c0000 to BB (index = 0) with hash 4600940a609c0000
CHECK2-NEXT: exact match
CHECK2: Matched yaml block (bid = 13) with hash a8d50000f81902a7 to BB (index = 13) with hash a8d5aa43f81902a7
CHECK2-NEXT: loose match
CHECK2: Matched yaml block (bid = 1) with hash 167a1f084f130088 to BB (index = 1) with hash 167a1f084f130088
CHECK2-NEXT: exact match
CHECK2: Matched yaml block (bid = 3) with hash c516000073dc00a0 to BB (index = 3) with hash c516b1c973dc00a0
CHECK2-NEXT: loose match
CHECK2: Matched yaml block (bid = 5) with hash 6446e1ea500111 to BB (index = 5) with hash 6446e1ea500111
CHECK2-NEXT: exact match
## Verify that yaml reader works as expected.
CHECK2: Binary Function "SolveCubic" after building cfg {
CHECK2: State : CFG constructed
CHECK2: Address : 0x400e00
CHECK2: Size : 0x368
CHECK2: Section : .text
CHECK2: IsSimple : 1
CHECK2: BB Count : 18
CHECK2: Exec Count : 151
CHECK2: Branch Count: 552
## Verify block counts.
CHECK2: .LBB00 (43 instructions, align : 1)
CHECK2: Successors: .Ltmp[[#BB7:]] (mispreds: 0, count: 0), .LFT[[#BB1:]] (mispreds: 0, count: 151)
CHECK2: .LFT[[#BB1:]] (5 instructions, align : 1)
CHECK2: Successors: .Ltmp[[#BB13:]] (mispreds: 0, count: 151), .LFT[[#BB2:]] (mispreds: 0, count: 0)
CHECK2: .Ltmp[[#BB3:]] (26 instructions, align : 1)
CHECK2: Successors: .Ltmp[[#BB5:]] (mispreds: 0, count: 151), .LFT[[#BB4:]] (mispreds: 0, count: 0)
CHECK2: .Ltmp[[#BB5:]] (9 instructions, align : 1)
CHECK2: .Ltmp[[#BB13:]] (12 instructions, align : 1)
CHECK2: Successors: .Ltmp[[#BB3:]] (mispreds: 0, count: 151)
CHECK2: 2 out of 7 functions in the binary (28.6%) have non-empty execution profile