[AArch64][BOLT] Ensure tentative code layout for cold BBs runs. (#96609)

When split functions is used, BOLT may skip tentative code layout
estimation in some cases, like:
- when there is no profile data for some blocks (ie cold blocks)
- when there are cold functions in lite mode
- when skip functions is used
     
However, when rewriting the binary we still need to compute PC-relative
distances between hot and cold basic blocks. Without cold layout
estimation, BOLT uses '0x0' as the address of the first cold block,
leading to incorrect estimations of any PC-relative addresses.
 
This affects large binaries as the relaxStub method expands more
branches than necessary using the short-jump sequence, at it wrongly
believes it has exceeded the branch distance boundary.
 
This increases code size with both a larger and slower sequence;
however,
performance regression is expected to be minimal since this only affects
any called cold code.
 
Example of such an unnecessary relaxation:
from:
```armasm
b       .Ltmp1234
```
 
to:
```armasm
adrp    x16, .Ltmp1234
add     x16, x16, :lo12:.Ltmp1234
br      x16
```
This commit is contained in:
Paschalis Mpeis 2024-10-17 10:59:05 +03:00 committed by GitHub
parent 1cc5290a30
commit cb9bacf57d
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
2 changed files with 41 additions and 9 deletions

View File

@ -324,9 +324,8 @@ uint64_t LongJmpPass::tentativeLayoutRelocColdPart(
uint64_t LongJmpPass::tentativeLayoutRelocMode(
const BinaryContext &BC, std::vector<BinaryFunction *> &SortedFunctions,
uint64_t DotAddress) {
// Compute hot cold frontier
uint32_t LastHotIndex = -1u;
int64_t LastHotIndex = -1u;
uint32_t CurrentIndex = 0;
if (opts::HotFunctionsAtEnd) {
for (BinaryFunction *BF : SortedFunctions) {
@ -351,19 +350,20 @@ uint64_t LongJmpPass::tentativeLayoutRelocMode(
// Hot
CurrentIndex = 0;
bool ColdLayoutDone = false;
auto runColdLayout = [&]() {
DotAddress = tentativeLayoutRelocColdPart(BC, SortedFunctions, DotAddress);
ColdLayoutDone = true;
if (opts::HotFunctionsAtEnd)
DotAddress = alignTo(DotAddress, opts::AlignText);
};
for (BinaryFunction *Func : SortedFunctions) {
if (!BC.shouldEmit(*Func)) {
HotAddresses[Func] = Func->getAddress();
continue;
}
if (!ColdLayoutDone && CurrentIndex >= LastHotIndex) {
DotAddress =
tentativeLayoutRelocColdPart(BC, SortedFunctions, DotAddress);
ColdLayoutDone = true;
if (opts::HotFunctionsAtEnd)
DotAddress = alignTo(DotAddress, opts::AlignText);
}
if (!ColdLayoutDone && CurrentIndex >= LastHotIndex)
runColdLayout();
DotAddress = alignTo(DotAddress, Func->getMinAlignment());
uint64_t Pad =
@ -382,6 +382,11 @@ uint64_t LongJmpPass::tentativeLayoutRelocMode(
DotAddress += Func->estimateConstantIslandSize();
++CurrentIndex;
}
// Ensure that tentative code layout always runs for cold blocks.
if (!ColdLayoutDone)
runColdLayout();
// BBs
for (BinaryFunction *Func : SortedFunctions)
tentativeBBLayout(*Func);

View File

@ -0,0 +1,27 @@
# This test checks that tentative code layout for cold blocks always runs.
# It commonly happens when using lite mode with split functions.
# REQUIRES: system-linux, asserts
# RUN: %clang %cflags -o %t %s
# RUN: %clang %s %cflags -Wl,-q -o %t
# RUN: link_fdata --no-lbr %s %t %t.fdata
# RUN: llvm-bolt %t -o %t.bolt --data %t.fdata -split-functions \
# RUN: -debug 2>&1 | FileCheck %s
.text
.globl foo
.type foo, %function
foo:
.entry_bb:
# FDATA: 1 foo #.entry_bb# 10
cmp x0, #0
b.eq .Lcold_bb1
ret
.Lcold_bb1:
ret
## Force relocation mode.
.reloc 0, R_AARCH64_NONE
# CHECK: foo{{.*}} cold tentative: {{.*}}