mirror of
https://github.com/llvm/llvm-project.git
synced 2025-04-17 16:06:40 +00:00

Match inline trees first between profile and the binary: by GUID, checksum, parent, and inline site for inlined functions. Map profile probes to binary probes via matched inline tree nodes. Each binary probe has an associated binary basic block. If all probes from one profile basic block map to the same binary basic block, it’s an exact match, otherwise the block is determined by majority vote and reported as loose match. Pseudo probe matching happens between exact hash matching and call/loose matching. Introduce ProbeMatchSpec - a mechanism to match probes belonging to another binary function. For example, given functions foo and bar: ``` void foo() { bar(); } ``` profiled binary: bar is not inlined => have top-level function bar new binary where the profile is applied to: bar is inlined into foo. Currently, BOLT does 1:1 matching between profile functions and binary functions based on the name. #100446 will extend this to N:M where multiple profiles can be matched to one binary function (as in the example above where binary function foo would use profiles for foo and bar), and one profile can be matched to multiple binary functions (e.g. if bar was inlined into multiple functions). In this diff, ProbeMatchSpecs would only have one BinaryFunctionProfile (existing name-based matching). Test Plan: Added match-blocks-with-pseudo-probes.test Performance test: - Setup: - Baseline no-BOLT: Clang with pseudo probes, ThinLTO + CSSPGO (#79942) - BOLT fresh: BOLTed Clang using fresh profile, - BOLT stale (hash): BOLTed Clang using stale profile (collected on Clang 10K commits back), `-infer-stale-profile` (hash+call block matching) - BOLT stale (+probe): BOLTed Clang using stale profile, `-infer-stale-profile` with `-stale-matching-with-pseudo-probes` (hash+call+pseudo probe block matching) - 2S Intel SKX Xeon 6138 with 40C/80T and 256GB RAM, using 20C/40T for build, - BOLT profiles are collected on Clang compiling large preprocessed C++ file. - Benchmark: building Clang (average of 5 runs), see driver in aaupov/llvm-devmtg-2022 - Results, wall time, lower is better: - Baseline no-BOLT: 429.52 +- 2.61s, - BOLT stale (hash): 413.21 +- 2.19s, - BOLT stale (+probe): 409.69 +- 1.41s, - BOLT fresh: 384.50 +- 1.80s. --------- Co-authored-by: Amir Ayupov <aaupov@fb.com>
111 lines
5.9 KiB
Plaintext
111 lines
5.9 KiB
Plaintext
## This script checks that YamlProfileReader in llvm-bolt is reading data
|
|
## correctly and stale data is corrected by profile inference.
|
|
|
|
REQUIRES: asserts
|
|
RUN: yaml2obj %p/Inputs/blarge.yaml &> %t.exe
|
|
RUN: llvm-bolt %t.exe -o %t.null --b %p/Inputs/blarge_profile_stale.yaml \
|
|
RUN: --infer-stale-profile=0 --profile-ignore-hash=1 --profile-use-dfs=0 \
|
|
RUN: 2>&1 | FileCheck %s -check-prefix=CHECK0
|
|
## Testing "usqrt"
|
|
RUN: llvm-bolt %t.exe -o %t.null --b %p/Inputs/blarge_profile_stale.yaml \
|
|
RUN: --print-cfg --print-only=usqrt --infer-stale-profile=1 \
|
|
RUN: --profile-ignore-hash=1 --profile-use-dfs=0 --debug-only=bolt-prof 2>&1 | FileCheck %s -check-prefix=CHECK1
|
|
## Testing "SolveCubic"
|
|
RUN: llvm-bolt %t.exe -o %t.null --b %p/Inputs/blarge_profile_stale.yaml \
|
|
RUN: --print-cfg --print-only=SolveCubic --infer-stale-profile=1 \
|
|
RUN: --profile-ignore-hash=1 --profile-use-dfs=0 --debug-only=bolt-prof 2>&1 | FileCheck %s -check-prefix=CHECK2
|
|
## Testing skipped function
|
|
RUN: llvm-bolt %t.exe -o %t.null --b %p/Inputs/blarge_profile_stale.yaml \
|
|
RUN: --print-cfg --print-only=usqrt --infer-stale-profile=1 --skip-funcs=usqrt \
|
|
RUN: --profile-ignore-hash=1 --profile-use-dfs=0
|
|
|
|
CHECK0: BOLT-INFO: 2 out of 7 functions in the binary (28.6%) have non-empty execution profile
|
|
CHECK0: BOLT-WARNING: 2 (100.0% of all profiled) functions have invalid (possibly stale) profile
|
|
CHECK0: BOLT-WARNING: 1192 out of 1192 samples in the binary (100.0%) belong to functions with invalid (possibly stale) profile
|
|
|
|
## Function "usqrt" has stale profile, since the number of blocks in the profile
|
|
## (nblocks=6) does not match the size of the CFG in the binary. The entry
|
|
## block (bid=0) has an incorrect (missing) count, which should be inferred by
|
|
## the algorithm.
|
|
|
|
## Verify inference details.
|
|
CHECK1: pre-processing profile using YAML profile reader
|
|
CHECK1: applying profile inference for "usqrt"
|
|
CHECK1: Matched yaml block (bid = 0) with hash 1111111111111111 to BB (index = 0) with hash 36007ba1d80c0000
|
|
CHECK1-NEXT: loose match
|
|
CHECK1: Matched yaml block (bid = 1) with hash d70d7a64320e0010 to BB (index = 1) with hash d70d7a64320e0010
|
|
CHECK1-NEXT: exact match
|
|
CHECK1: Matched yaml block (bid = 3) with hash 5c06705524800039 to BB (index = 3) with hash 5c06705524800039
|
|
CHECK1-NEXT: exact match
|
|
|
|
## Verify that yaml reader works as expected.
|
|
CHECK1: Binary Function "usqrt" after building cfg {
|
|
CHECK1: State : CFG constructed
|
|
CHECK1: Address : 0x401170
|
|
CHECK1: Size : 0x43
|
|
CHECK1: Section : .text
|
|
CHECK1: IsSimple : 1
|
|
CHECK1: BB Count : 5
|
|
CHECK1: Exec Count : 20
|
|
CHECK1: Branch Count: 640
|
|
CHECK1: }
|
|
|
|
## Verify block counts.
|
|
CHECK1: .LBB01 (4 instructions, align : 1)
|
|
CHECK1: Successors: .Ltmp[[#BB13:]] (mispreds: 0, count: 20)
|
|
CHECK1: .Ltmp[[#BB13:]] (9 instructions, align : 1)
|
|
CHECK1: Successors: .Ltmp[[#BB12:]] (mispreds: 0, count: 320), .LFT[[#BB0:]] (mispreds: 0, count: 0)
|
|
CHECK1: .LFT[[#BB0:]] (2 instructions, align : 1)
|
|
CHECK1: Successors: .Ltmp[[#BB12:]] (mispreds: 0, count: 0)
|
|
CHECK1: .Ltmp[[#BB12:]] (2 instructions, align : 1)
|
|
CHECK1: Successors: .Ltmp[[#BB13:]] (mispreds: 0, count: 300), .LFT[[#BB1:]] (mispreds: 0, count: 20)
|
|
CHECK1: .LFT[[#BB1:]] (2 instructions, align : 1)
|
|
## Check the overall inference stats.
|
|
CHECK1: 2 out of 7 functions in the binary (28.6%) have non-empty execution profile
|
|
CHECK1: BOLT-WARNING: 2 (100.0% of all profiled) functions have invalid (possibly stale) profile
|
|
CHECK1: BOLT-WARNING: 1192 out of 1192 samples in the binary (100.0%) belong to functions with invalid (possibly stale) profile
|
|
CHECK1: inferred profile for 2 (100.00% of profiled, 100.00% of stale) functions responsible for {{.*}} samples ({{.*}} out of {{.*}})
|
|
|
|
|
|
## Function "SolveCubic" has stale profile, since there is one jump in the
|
|
## profile (from bid=13 to bid=2) which is not in the CFG in the binary. The test
|
|
## verifies that the inference is able to match two blocks (bid=1 and bid=13)
|
|
## using "loose" hashes and then correctly propagate the counts.
|
|
|
|
## Verify inference details.
|
|
CHECK2: pre-processing profile using YAML profile reader
|
|
CHECK2: applying profile inference for "SolveCubic"
|
|
CHECK2: Matched yaml block (bid = 0) with hash 4600940a609c0000 to BB (index = 0) with hash 4600940a609c0000
|
|
CHECK2-NEXT: exact match
|
|
CHECK2: Matched yaml block (bid = 13) with hash a8d50000f81902a7 to BB (index = 13) with hash a8d5aa43f81902a7
|
|
CHECK2-NEXT: loose match
|
|
CHECK2: Matched yaml block (bid = 1) with hash 167a1f084f130088 to BB (index = 1) with hash 167a1f084f130088
|
|
CHECK2-NEXT: exact match
|
|
CHECK2: Matched yaml block (bid = 3) with hash c516000073dc00a0 to BB (index = 3) with hash c516b1c973dc00a0
|
|
CHECK2-NEXT: loose match
|
|
CHECK2: Matched yaml block (bid = 5) with hash 6446e1ea500111 to BB (index = 5) with hash 6446e1ea500111
|
|
CHECK2-NEXT: exact match
|
|
|
|
## Verify that yaml reader works as expected.
|
|
CHECK2: Binary Function "SolveCubic" after building cfg {
|
|
CHECK2: State : CFG constructed
|
|
CHECK2: Address : 0x400e00
|
|
CHECK2: Size : 0x368
|
|
CHECK2: Section : .text
|
|
CHECK2: IsSimple : 1
|
|
CHECK2: BB Count : 18
|
|
CHECK2: Exec Count : 151
|
|
CHECK2: Branch Count: 552
|
|
|
|
## Verify block counts.
|
|
CHECK2: .LBB00 (43 instructions, align : 1)
|
|
CHECK2: Successors: .Ltmp[[#BB7:]] (mispreds: 0, count: 0), .LFT[[#BB1:]] (mispreds: 0, count: 151)
|
|
CHECK2: .LFT[[#BB1:]] (5 instructions, align : 1)
|
|
CHECK2: Successors: .Ltmp[[#BB13:]] (mispreds: 0, count: 151), .LFT[[#BB2:]] (mispreds: 0, count: 0)
|
|
CHECK2: .Ltmp[[#BB3:]] (26 instructions, align : 1)
|
|
CHECK2: Successors: .Ltmp[[#BB5:]] (mispreds: 0, count: 151), .LFT[[#BB4:]] (mispreds: 0, count: 0)
|
|
CHECK2: .Ltmp[[#BB5:]] (9 instructions, align : 1)
|
|
CHECK2: .Ltmp[[#BB13:]] (12 instructions, align : 1)
|
|
CHECK2: Successors: .Ltmp[[#BB3:]] (mispreds: 0, count: 151)
|
|
CHECK2: 2 out of 7 functions in the binary (28.6%) have non-empty execution profile
|