llvm-project/bolt/src/BinaryContext.cpp

2173 lines
76 KiB
C++
Raw Normal View History

//===--- BinaryContext.cpp - Interface for machine-level context ---------===//
//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//
//===----------------------------------------------------------------------===//
//
//===----------------------------------------------------------------------===//
#include "BinaryContext.h"
#include "BinaryEmitter.h"
Update subroutine address ranges in binary. Summary: [WIP] Update DWARF info for function address ranges. This diff currently does not work for unknown reasons, but I'm describing here what's the current state. According to both llvm-dwarf and readelf our output seems correct, but GDB does not interpret it as expected. All details go below in hope I missed something. I couldn't actually track the whole change that introduced support for what we need in gdb yet, but I think I can get to it (2007-12-04: Support lexical bocks and function bodies that occupy non-contiguous address ranges). I have reasons to believe gdb at least at some nges). The set of introduced changes was basically this: - After disassembly, iterate over the DIEs in .debug_info and find the ones that correspond to each BinaryFunction. - Refactor DebugArangesWriter to also write addresses of functions to .debug_ranges and track the offsets of function address ranges there - Add some infrastructure to facilitate patching the binary in simple ways (BinaryPatcher.h) - In RewriteInstance, after writing .debug_ranges already with function address ranges, for each function do: -- Find the abbreviation corresponding to the function -- Patch .debug_abbrev to replace DW_AT_low_pc with DW_AT_ranges and DW_AT_high_pc with DW_AT_producer (I'll explain this hack below). Also patch the corresponding forms to DW_FORM_sec_offset and DW_FORM_string (null-terminated in-place string). -- Patch debug_info with the .debug_ranges offset in place of the first 4 bytes of DW_AT_low_pc (DW_AT_ranges only occupies 4 bytes whereas low_pc occupies 8), and write an arbitrary string in-place in the other 12 bytes that were the 4 MSB of low_pc and the 8 bytes of high_pc before the patch. This depends on low_pc and high_pc being put consecutively by the compiler, but it serves to validate the idea. I tried another way of doing it that does not rely on this but it didn't work either and I believe the reason for either not working is the same (and still unknown, but unrelated to them. I might be wrong though, and if I find yet another way of doing it I may try it). The other way was to use a form of DW_FORM_data8 for the section offset. This is disallowed by the specification, but I doubt gdb validates this, as it's just easier to store it as 64-bit anyway as this is even necessary to support 64-bit DWARF (which is not what gcc generates by default apparently). I still need to make changes to the diff to make it production-ready, but first I want to figure out why it doesn't work as expected. By looking at the output of llvm-dwarfdump or readelf, all of .debug_ranges, .debug_abbrev and .debug_info seem to have been correctly updated. However, gdb seems to have serious problems with what we write. (In fact, readelf --debug-dump=Ranges shows some funny warning messages of the form ("Warning: There is a hole [0x100 - 0x120] in .debug_ranges"), but I played around with this and it seems it's just because no compile unit was using these ranges. Changing .debug_info apparently changes these warnings, so they seem to be unrelated to the section itself. Also looking at the hex dump of the section doesn't help, as everything seems fine. llvm-dwarfdump doesn't say anything. So I think .debug_ranges is fine.) The result is that gdb not only doesn't show the function name as we wanted, but it also stops showing line number information. Apparently it's not reading/interpreting the address ranges at all, and so the functions now have no associated address ranges, only the symbol value which allows one to put a breakpoint in the function, but not to show source code. As this left me without more ideas of what to try to feed gdb with, I believe the most promising next trial is to try to debug gdb itself, unless someone spots anything I missed. I found where the interesting part of the code lies for this case (gdb/dwarf2read.c and some other related files, but mainly that one). It seems in some parts gdb uses DW_AT_ranges for only getting its lowest and highest addresses and setting that as low_pc and high_pc (see dwarf2_get_pc_bounds in gdb's code and where it's called). I really hope this is not actually the case for function address ranges. I'll investigate this further. Otherwise I don't think any changes we make will make it work as initially intended, as we'll simply need gdb to support it and in that case it doesn't. (cherry picked from FBD3073641)
2016-03-16 18:08:29 -07:00
#include "BinaryFunction.h"
#include "NameResolver.h"
[BOLT] Support for lite mode with relocations Summary: Add '-lite' support for relocations for improved processing time, memory consumption, and more resilient processing of binaries with embedded assembly code. In lite relocation mode, BOLT will skip full processing of functions without a profile. It will run scanExternalRefs() on such functions to discover external references and to create internal relocations to update references to optimized functions. Note that we could have relied on the compiler/linker to provide relocations for function references. However, there's no assurance that all such references are reported. E.g., the compiler can resolve inter-procedural references internally, leaving no relocations for the linker. The scan process takes about <10 seconds per 100MB of code on modern hardware. It's a reasonable overhead to live with considering the flexibility it provides. If BOLT fails to scan or disassemble a function, .e.g., due to a data object embedded in code, or an unsupported instruction, it enables a patching mode to guarantee that the failed function will call optimized/moved versions of functions. The patching happens at original function entry points. '-skip=<func1,func2,...>' option now can be used to skip processing of arbitrary functions in the relocation mode. With '-use-old-text' or '-strict' we require all functions to be processed. As such, it is incompatible with '-lite' option, and '-skip' option will only disable optimizations of listed functions, not their disassembly and emission. (cherry picked from FBD22040717)
2020-06-15 00:15:47 -07:00
#include "Utils.h"
#include "llvm/ADT/Twine.h"
#include "llvm/DebugInfo/DWARF/DWARFFormValue.h"
Update DWARF lexical blocks address ranges. Summary: Updates DWARF lexical blocks address ranges in the output binary after optimizations. This is similar to updating function address ranges except that the ranges representation needs to be more general, since address ranges can begin or end in the middle of a basic block. The following changes were made: - Added a data structure for iterating over the basic blocks that intersect an address range: BasicBlockTable.h - Added some more bookkeeping in BinaryBasicBlock. Basically, I needed to keep track of the block's size in the input binary as well as its address in the output binary. This information is mostly set by BinaryFunction after disassembly. - Added a representation for address ranges relative to basic blocks (BasicBlockOffsetRanges.h). Will also serve for location lists. - Added a representation for Lexical Blocks (LexicalBlock.h) - Small refactorings in DebugArangesWriter: -- Renamed to DebugRangesSectionsWriter since it also writes .debug_ranges -- Refactored it not to depend on BinaryFunction but instead on anything that can be assined an aoffset in .debug_ranges (added an interface for that) - Iterate over the DIE tree during initialization to find lexical blocks in .debug_info (BinaryContext.cpp) - Added patches to .debug_abbrev and .debug_info in RewriteInstance to update lexical blocks attributes (in fact, this part is very similar to what was done to function address ranges and I just refactored/reused that code) - Added small test case (lexical_blocks_address_ranges_debug.test) (cherry picked from FBD3113181)
2016-03-28 17:45:22 -07:00
#include "llvm/DebugInfo/DWARF/DWARFUnit.h"
#include "llvm/MC/MCAsmLayout.h"
#include "llvm/MC/MCAssembler.h"
#include "llvm/MC/MCContext.h"
#include "llvm/MC/MCDisassembler/MCDisassembler.h"
#include "llvm/MC/MCInstPrinter.h"
#include "llvm/MC/MCObjectStreamer.h"
[BOLT rebase] Rebase fixes on top of LLVM Feb2018 Summary: This commit includes all code necessary to make BOLT working again after the rebase. This includes a redesign of the EHFrame work, cherry-pick of the 3dnow disassembly work, compilation error fixes, and port of the debug_info work. The macroop fusion feature is not ported yet. The rebased version has minor changes to the "executed instructions" dynostats counter because REP prefixes are considered a part of the instruction it applies to. Also, some X86 instructions had the "mayLoad" tablegen property removed, which BOLT uses to identify and account for loads, thus reducing the total number of loads reported by dynostats. This was observed in X86::MOVDQUmr. TRAP instructions are not terminators anymore, changing our CFG. This commit adds compensation to preserve this old behavior and minimize tests changes. debug_info sections are now slightly larger. The discriminator field in the line table is slightly different due to a change upstream. New profiles generated with the other bolt are incompatible with this version because of different hash values calculated for functions, so they will be considered 100% stale. This commit changes the corresponding test to XFAIL so it can be updated. The hash function changes because it relies on raw opcode values, which change according to the opcodes described in the X86 tablegen files. When processing HHVM, bolt was observed to be using about 800MB more memory in the rebased version and being about 5% slower. (cherry picked from FBD7078072)
2018-02-06 15:00:23 -08:00
#include "llvm/MC/MCObjectWriter.h"
#include "llvm/MC/MCSectionELF.h"
#include "llvm/MC/MCStreamer.h"
#include "llvm/MC/MCSymbol.h"
#include "llvm/Support/CommandLine.h"
#include "llvm/Support/Regex.h"
#include <functional>
[BOLT] Refactor global symbol handling code. Summary: This is preparation work for static data reordering. I've created a new class called BinaryData which represents a symbol contained in a section. It records almost all the information relevant for dealing with data, e.g. names, address, size, alignment, profiling data, etc. BinaryContext still stores and manages BinaryData objects similar to how it managed symbols and global addresses before. The interfaces are not changed too drastically from before either. There is a bit of overlap between BinaryData and BinaryFunction. I would have liked to do some more refactoring to make a BinaryFunctionFragment that subclassed from BinaryData and then have BinaryFunction be composed or associated with BinaryFunctionFragments. I've also attempted to use (symbol + offset) for when addresses are pointing into the middle of symbols with known sizes. This changes the simplify rodata loads optimization slightly since the expression on an instruction can now also be a (symbol + offset) rather than just a symbol. One of the overall goals for this refactoring is to make sure every relocation is associated with a BinaryData object. This requires adding "hole" BinaryData's wherever there are gaps in a section's address space. Most of the holes seem to be data that has no associated symbol info. In this case we can't do any better than lumping all the adjacent hole symbols into one big symbol (there may be more than one actual data object that contributes to a hole). At least the combined holes should be moveable. Jump tables have similar issues. They appear to mostly be sub-objects for top level local symbols. The main problem is that we can't recognize jump tables at the time we scan the symbol table, we have to wait til disassembly. When a jump table is discovered we add it as a sub-object to the existing local symbol. If there are one or more existing BinaryData's that appear in the address range of a newly created jump table, those are added as sub-objects as well. (cherry picked from FBD6362544)
2017-11-14 20:05:11 -08:00
#include <iterator>
using namespace llvm;
#undef DEBUG_TYPE
#define DEBUG_TYPE "bolt"
namespace opts {
extern cl::OptionCategory BoltCategory;
extern cl::opt<bool> AggregateOnly;
extern cl::opt<bool> HotText;
extern cl::opt<bool> HotData;
extern cl::opt<bool> StrictMode;
[BOLT] Support for lite mode with relocations Summary: Add '-lite' support for relocations for improved processing time, memory consumption, and more resilient processing of binaries with embedded assembly code. In lite relocation mode, BOLT will skip full processing of functions without a profile. It will run scanExternalRefs() on such functions to discover external references and to create internal relocations to update references to optimized functions. Note that we could have relied on the compiler/linker to provide relocations for function references. However, there's no assurance that all such references are reported. E.g., the compiler can resolve inter-procedural references internally, leaving no relocations for the linker. The scan process takes about <10 seconds per 100MB of code on modern hardware. It's a reasonable overhead to live with considering the flexibility it provides. If BOLT fails to scan or disassemble a function, .e.g., due to a data object embedded in code, or an unsupported instruction, it enables a patching mode to guarantee that the failed function will call optimized/moved versions of functions. The patching happens at original function entry points. '-skip=<func1,func2,...>' option now can be used to skip processing of arbitrary functions in the relocation mode. With '-use-old-text' or '-strict' we require all functions to be processed. As such, it is incompatible with '-lite' option, and '-skip' option will only disable optimizations of listed functions, not their disassembly and emission. (cherry picked from FBD22040717)
2020-06-15 00:15:47 -07:00
extern cl::opt<bool> UseOldText;
extern cl::opt<unsigned> Verbosity;
extern cl::opt<unsigned> ExecutionCountThreshold;
[BOLT] Support for lite mode with relocations Summary: Add '-lite' support for relocations for improved processing time, memory consumption, and more resilient processing of binaries with embedded assembly code. In lite relocation mode, BOLT will skip full processing of functions without a profile. It will run scanExternalRefs() on such functions to discover external references and to create internal relocations to update references to optimized functions. Note that we could have relied on the compiler/linker to provide relocations for function references. However, there's no assurance that all such references are reported. E.g., the compiler can resolve inter-procedural references internally, leaving no relocations for the linker. The scan process takes about <10 seconds per 100MB of code on modern hardware. It's a reasonable overhead to live with considering the flexibility it provides. If BOLT fails to scan or disassemble a function, .e.g., due to a data object embedded in code, or an unsupported instruction, it enables a patching mode to guarantee that the failed function will call optimized/moved versions of functions. The patching happens at original function entry points. '-skip=<func1,func2,...>' option now can be used to skip processing of arbitrary functions in the relocation mode. With '-use-old-text' or '-strict' we require all functions to be processed. As such, it is incompatible with '-lite' option, and '-skip' option will only disable optimizations of listed functions, not their disassembly and emission. (cherry picked from FBD22040717)
2020-06-15 00:15:47 -07:00
extern bool processAllFunctions();
cl::opt<bool>
NoHugePages("no-huge-pages",
cl::desc("use regular size pages for code alignment"),
cl::ZeroOrMore,
cl::Hidden,
cl::cat(BoltCategory));
static cl::opt<bool>
PrintDebugInfo("print-debug-info",
cl::desc("print debug info when printing functions"),
cl::Hidden,
cl::ZeroOrMore,
cl::cat(BoltCategory));
cl::opt<bool>
[BOLT] Improve ICP for virtual method calls and jump tables using value profiling. Summary: Use value profiling data to remove the method pointer loads from vtables when doing ICP at virtual function and jump table callsites. The basic process is the following: 1. Work backwards from the callsite to find the most recent def of the call register. 2. Work back from the call register def to find the instruction where the vtable is loaded. 3. Find out of there is any value profiling data associated with the vtable load. If so, record all these addresses as potential vtables + method offsets. 4. Since the addresses extracted by #3 will be vtable + method offset, we need to figure out the method offset in order to determine the actual vtable base address. At this point I virtually execute all the instructions that occur between #3 and #2 that touch the method pointer register. The result of this execution should be the method offset. 5. Fetch the actual method address from the appropriate data section containing the vtable using the computed method offset. Make sure that this address maps to an actual function symbol. 6. Try to associate a vtable pointer with each target address in SymTargets. If every target has a vtable, then this is almost certainly a virtual method callsite. 7. Use the vtable address when generating the promoted call code. It's basically the same as regular ICP code except that the compare is against the vtable and not the method pointer. Additionally, the instructions to load up the method are dumped into the cold call block. For jump tables, the basic idea is the same. I use the memory profiling data to find the hottest slots in the jumptable and then use that information to compute the indices of the hottest entries. We can then compare the index register to the hot index values and avoid the load from the jump table. Note: I'm assuming the whole call is in a single BB. According to @rafaelauler, this isn't always the case on ARM. This also isn't always the case on X86 either. If there are non-trivial arguments that are passed by value, there could be branches in between the setup and the call. I'm going to leave fixing this until later since it makes things a bit more complicated. I've also fixed a bug where ICP was introducing a conditional tail call. I made sure that SCTC fixes these up afterwards. I have no idea why I made it introduce a CTC in the first place. (cherry picked from FBD6120768)
2017-10-20 12:11:34 -07:00
PrintRelocations("print-relocations",
cl::desc("print relocations when printing functions/objects"),
[BOLT] Improve ICP for virtual method calls and jump tables using value profiling. Summary: Use value profiling data to remove the method pointer loads from vtables when doing ICP at virtual function and jump table callsites. The basic process is the following: 1. Work backwards from the callsite to find the most recent def of the call register. 2. Work back from the call register def to find the instruction where the vtable is loaded. 3. Find out of there is any value profiling data associated with the vtable load. If so, record all these addresses as potential vtables + method offsets. 4. Since the addresses extracted by #3 will be vtable + method offset, we need to figure out the method offset in order to determine the actual vtable base address. At this point I virtually execute all the instructions that occur between #3 and #2 that touch the method pointer register. The result of this execution should be the method offset. 5. Fetch the actual method address from the appropriate data section containing the vtable using the computed method offset. Make sure that this address maps to an actual function symbol. 6. Try to associate a vtable pointer with each target address in SymTargets. If every target has a vtable, then this is almost certainly a virtual method callsite. 7. Use the vtable address when generating the promoted call code. It's basically the same as regular ICP code except that the compare is against the vtable and not the method pointer. Additionally, the instructions to load up the method are dumped into the cold call block. For jump tables, the basic idea is the same. I use the memory profiling data to find the hottest slots in the jumptable and then use that information to compute the indices of the hottest entries. We can then compare the index register to the hot index values and avoid the load from the jump table. Note: I'm assuming the whole call is in a single BB. According to @rafaelauler, this isn't always the case on ARM. This also isn't always the case on X86 either. If there are non-trivial arguments that are passed by value, there could be branches in between the setup and the call. I'm going to leave fixing this until later since it makes things a bit more complicated. I've also fixed a bug where ICP was introducing a conditional tail call. I made sure that SCTC fixes these up afterwards. I have no idea why I made it introduce a CTC in the first place. (cherry picked from FBD6120768)
2017-10-20 12:11:34 -07:00
cl::Hidden,
cl::ZeroOrMore,
[BOLT] Improve ICP for virtual method calls and jump tables using value profiling. Summary: Use value profiling data to remove the method pointer loads from vtables when doing ICP at virtual function and jump table callsites. The basic process is the following: 1. Work backwards from the callsite to find the most recent def of the call register. 2. Work back from the call register def to find the instruction where the vtable is loaded. 3. Find out of there is any value profiling data associated with the vtable load. If so, record all these addresses as potential vtables + method offsets. 4. Since the addresses extracted by #3 will be vtable + method offset, we need to figure out the method offset in order to determine the actual vtable base address. At this point I virtually execute all the instructions that occur between #3 and #2 that touch the method pointer register. The result of this execution should be the method offset. 5. Fetch the actual method address from the appropriate data section containing the vtable using the computed method offset. Make sure that this address maps to an actual function symbol. 6. Try to associate a vtable pointer with each target address in SymTargets. If every target has a vtable, then this is almost certainly a virtual method callsite. 7. Use the vtable address when generating the promoted call code. It's basically the same as regular ICP code except that the compare is against the vtable and not the method pointer. Additionally, the instructions to load up the method are dumped into the cold call block. For jump tables, the basic idea is the same. I use the memory profiling data to find the hottest slots in the jumptable and then use that information to compute the indices of the hottest entries. We can then compare the index register to the hot index values and avoid the load from the jump table. Note: I'm assuming the whole call is in a single BB. According to @rafaelauler, this isn't always the case on ARM. This also isn't always the case on X86 either. If there are non-trivial arguments that are passed by value, there could be branches in between the setup and the call. I'm going to leave fixing this until later since it makes things a bit more complicated. I've also fixed a bug where ICP was introducing a conditional tail call. I made sure that SCTC fixes these up afterwards. I have no idea why I made it introduce a CTC in the first place. (cherry picked from FBD6120768)
2017-10-20 12:11:34 -07:00
cl::cat(BoltCategory));
static cl::opt<bool>
PrintMemData("print-mem-data",
cl::desc("print memory data annotations when printing functions"),
cl::Hidden,
cl::ZeroOrMore,
[BOLT] Improve ICP for virtual method calls and jump tables using value profiling. Summary: Use value profiling data to remove the method pointer loads from vtables when doing ICP at virtual function and jump table callsites. The basic process is the following: 1. Work backwards from the callsite to find the most recent def of the call register. 2. Work back from the call register def to find the instruction where the vtable is loaded. 3. Find out of there is any value profiling data associated with the vtable load. If so, record all these addresses as potential vtables + method offsets. 4. Since the addresses extracted by #3 will be vtable + method offset, we need to figure out the method offset in order to determine the actual vtable base address. At this point I virtually execute all the instructions that occur between #3 and #2 that touch the method pointer register. The result of this execution should be the method offset. 5. Fetch the actual method address from the appropriate data section containing the vtable using the computed method offset. Make sure that this address maps to an actual function symbol. 6. Try to associate a vtable pointer with each target address in SymTargets. If every target has a vtable, then this is almost certainly a virtual method callsite. 7. Use the vtable address when generating the promoted call code. It's basically the same as regular ICP code except that the compare is against the vtable and not the method pointer. Additionally, the instructions to load up the method are dumped into the cold call block. For jump tables, the basic idea is the same. I use the memory profiling data to find the hottest slots in the jumptable and then use that information to compute the indices of the hottest entries. We can then compare the index register to the hot index values and avoid the load from the jump table. Note: I'm assuming the whole call is in a single BB. According to @rafaelauler, this isn't always the case on ARM. This also isn't always the case on X86 either. If there are non-trivial arguments that are passed by value, there could be branches in between the setup and the call. I'm going to leave fixing this until later since it makes things a bit more complicated. I've also fixed a bug where ICP was introducing a conditional tail call. I made sure that SCTC fixes these up afterwards. I have no idea why I made it introduce a CTC in the first place. (cherry picked from FBD6120768)
2017-10-20 12:11:34 -07:00
cl::cat(BoltCategory));
} // namespace opts
[BOLT] Support for lite mode with relocations Summary: Add '-lite' support for relocations for improved processing time, memory consumption, and more resilient processing of binaries with embedded assembly code. In lite relocation mode, BOLT will skip full processing of functions without a profile. It will run scanExternalRefs() on such functions to discover external references and to create internal relocations to update references to optimized functions. Note that we could have relied on the compiler/linker to provide relocations for function references. However, there's no assurance that all such references are reported. E.g., the compiler can resolve inter-procedural references internally, leaving no relocations for the linker. The scan process takes about <10 seconds per 100MB of code on modern hardware. It's a reasonable overhead to live with considering the flexibility it provides. If BOLT fails to scan or disassemble a function, .e.g., due to a data object embedded in code, or an unsupported instruction, it enables a patching mode to guarantee that the failed function will call optimized/moved versions of functions. The patching happens at original function entry points. '-skip=<func1,func2,...>' option now can be used to skip processing of arbitrary functions in the relocation mode. With '-use-old-text' or '-strict' we require all functions to be processed. As such, it is incompatible with '-lite' option, and '-skip' option will only disable optimizations of listed functions, not their disassembly and emission. (cherry picked from FBD22040717)
2020-06-15 00:15:47 -07:00
namespace llvm {
namespace bolt {
BinaryContext::BinaryContext(std::unique_ptr<MCContext> Ctx,
std::unique_ptr<DWARFContext> DwCtx,
std::unique_ptr<Triple> TheTriple,
const Target *TheTarget,
std::string TripleName,
std::unique_ptr<MCCodeEmitter> MCE,
std::unique_ptr<MCObjectFileInfo> MOFI,
std::unique_ptr<const MCAsmInfo> AsmInfo,
std::unique_ptr<const MCInstrInfo> MII,
std::unique_ptr<const MCSubtargetInfo> STI,
std::unique_ptr<MCInstPrinter> InstPrinter,
std::unique_ptr<const MCInstrAnalysis> MIA,
std::unique_ptr<MCPlusBuilder> MIB,
std::unique_ptr<const MCRegisterInfo> MRI,
std::unique_ptr<MCDisassembler> DisAsm)
: Ctx(std::move(Ctx)),
DwCtx(std::move(DwCtx)),
TheTriple(std::move(TheTriple)),
TheTarget(TheTarget),
TripleName(TripleName),
MCE(std::move(MCE)),
MOFI(std::move(MOFI)),
AsmInfo(std::move(AsmInfo)),
MII(std::move(MII)),
STI(std::move(STI)),
InstPrinter(std::move(InstPrinter)),
MIA(std::move(MIA)),
MIB(std::move(MIB)),
MRI(std::move(MRI)),
DisAsm(std::move(DisAsm)) {
Relocation::Arch = this->TheTriple->getArch();
PageAlign = opts::NoHugePages ? RegularPageSize : HugePageSize;
}
BinaryContext::~BinaryContext() {
for (BinarySection *Section : Sections) {
delete Section;
}
for (BinaryFunction *InjectedFunction : InjectedBinaryFunctions) {
delete InjectedFunction;
}
for (std::pair<const uint64_t, JumpTable *> JTI : JumpTables) {
delete JTI.second;
}
[BOLT] Refactor global symbol handling code. Summary: This is preparation work for static data reordering. I've created a new class called BinaryData which represents a symbol contained in a section. It records almost all the information relevant for dealing with data, e.g. names, address, size, alignment, profiling data, etc. BinaryContext still stores and manages BinaryData objects similar to how it managed symbols and global addresses before. The interfaces are not changed too drastically from before either. There is a bit of overlap between BinaryData and BinaryFunction. I would have liked to do some more refactoring to make a BinaryFunctionFragment that subclassed from BinaryData and then have BinaryFunction be composed or associated with BinaryFunctionFragments. I've also attempted to use (symbol + offset) for when addresses are pointing into the middle of symbols with known sizes. This changes the simplify rodata loads optimization slightly since the expression on an instruction can now also be a (symbol + offset) rather than just a symbol. One of the overall goals for this refactoring is to make sure every relocation is associated with a BinaryData object. This requires adding "hole" BinaryData's wherever there are gaps in a section's address space. Most of the holes seem to be data that has no associated symbol info. In this case we can't do any better than lumping all the adjacent hole symbols into one big symbol (there may be more than one actual data object that contributes to a hole). At least the combined holes should be moveable. Jump tables have similar issues. They appear to mostly be sub-objects for top level local symbols. The main problem is that we can't recognize jump tables at the time we scan the symbol table, we have to wait til disassembly. When a jump table is discovered we add it as a sub-object to the existing local symbol. If there are one or more existing BinaryData's that appear in the address range of a newly created jump table, those are added as sub-objects as well. (cherry picked from FBD6362544)
2017-11-14 20:05:11 -08:00
clearBinaryData();
}
Update DWARF lexical blocks address ranges. Summary: Updates DWARF lexical blocks address ranges in the output binary after optimizations. This is similar to updating function address ranges except that the ranges representation needs to be more general, since address ranges can begin or end in the middle of a basic block. The following changes were made: - Added a data structure for iterating over the basic blocks that intersect an address range: BasicBlockTable.h - Added some more bookkeeping in BinaryBasicBlock. Basically, I needed to keep track of the block's size in the input binary as well as its address in the output binary. This information is mostly set by BinaryFunction after disassembly. - Added a representation for address ranges relative to basic blocks (BasicBlockOffsetRanges.h). Will also serve for location lists. - Added a representation for Lexical Blocks (LexicalBlock.h) - Small refactorings in DebugArangesWriter: -- Renamed to DebugRangesSectionsWriter since it also writes .debug_ranges -- Refactored it not to depend on BinaryFunction but instead on anything that can be assined an aoffset in .debug_ranges (added an interface for that) - Iterate over the DIE tree during initialization to find lexical blocks in .debug_info (BinaryContext.cpp) - Added patches to .debug_abbrev and .debug_info in RewriteInstance to update lexical blocks attributes (in fact, this part is very similar to what was done to function address ranges and I just refactored/reused that code) - Added small test case (lexical_blocks_address_ranges_debug.test) (cherry picked from FBD3113181)
2016-03-28 17:45:22 -07:00
extern MCPlusBuilder *createX86MCPlusBuilder(const MCInstrAnalysis *,
const MCInstrInfo *,
const MCRegisterInfo *);
extern MCPlusBuilder *createAArch64MCPlusBuilder(const MCInstrAnalysis *,
const MCInstrInfo *,
const MCRegisterInfo *);
namespace {
MCPlusBuilder *createMCPlusBuilder(const Triple::ArchType Arch,
const MCInstrAnalysis *Analysis,
const MCInstrInfo *Info,
const MCRegisterInfo *RegInfo) {
#ifdef X86_AVAILABLE
if (Arch == Triple::x86_64)
return createX86MCPlusBuilder(Analysis, Info, RegInfo);
#endif
#ifdef AARCH64_AVAILABLE
if (Arch == Triple::aarch64)
return createAArch64MCPlusBuilder(Analysis, Info, RegInfo);
#endif
llvm_unreachable("architecture unsupport by MCPlusBuilder");
}
} // anonymous namespace
/// Create BinaryContext for a given architecture \p ArchName and
/// triple \p TripleName.
std::unique_ptr<BinaryContext>
BinaryContext::createBinaryContext(const ObjectFile *File, bool IsPIC,
std::unique_ptr<DWARFContext> DwCtx) {
StringRef ArchName = "";
StringRef FeaturesStr = "";
switch (File->getArch()) {
case llvm::Triple::x86_64:
ArchName = "x86-64";
FeaturesStr = "+nopl";
break;
case llvm::Triple::aarch64:
ArchName = "aarch64";
FeaturesStr = "+fp-armv8,+neon,+crypto,+dotprod,+crc,+lse,+ras,+rdm,"
"+fullfp16,+spe,+fuse-aes,+rcpc";
break;
default:
errs() << "BOLT-ERROR: Unrecognized machine in ELF file.\n";
return nullptr;
}
auto TheTriple = std::make_unique<Triple>(File->makeTriple());
const std::string TripleName = TheTriple->str();
std::string Error;
const Target *TheTarget =
TargetRegistry::lookupTarget(std::string(ArchName), *TheTriple, Error);
if (!TheTarget) {
errs() << "BOLT-ERROR: " << Error;
return nullptr;
}
std::unique_ptr<const MCRegisterInfo> MRI(
TheTarget->createMCRegInfo(TripleName));
if (!MRI) {
errs() << "BOLT-ERROR: no register info for target " << TripleName << "\n";
return nullptr;
}
// Set up disassembler.
std::unique_ptr<const MCAsmInfo> AsmInfo(
TheTarget->createMCAsmInfo(*MRI, TripleName, MCTargetOptions()));
if (!AsmInfo) {
errs() << "BOLT-ERROR: no assembly info for target " << TripleName << "\n";
return nullptr;
}
std::unique_ptr<const MCSubtargetInfo> STI(
TheTarget->createMCSubtargetInfo(TripleName, "", FeaturesStr));
if (!STI) {
errs() << "BOLT-ERROR: no subtarget info for target " << TripleName << "\n";
return nullptr;
}
std::unique_ptr<const MCInstrInfo> MII(TheTarget->createMCInstrInfo());
if (!MII) {
errs() << "BOLT-ERROR: no instruction info for target " << TripleName
<< "\n";
return nullptr;
}
std::unique_ptr<MCContext> Ctx(
new MCContext(*TheTriple, AsmInfo.get(), MRI.get(), STI.get()));
std::unique_ptr<MCObjectFileInfo> MOFI(
TheTarget->createMCObjectFileInfo(*Ctx, IsPIC));
Ctx->setObjectFileInfo(MOFI.get());
// We do not support X86 Large code model. Change this in the future.
bool Large = false;
if (TheTriple->getArch() == llvm::Triple::aarch64)
Large = true;
unsigned LSDAEncoding =
Large ? dwarf::DW_EH_PE_absptr : dwarf::DW_EH_PE_udata4;
unsigned TTypeEncoding =
Large ? dwarf::DW_EH_PE_absptr : dwarf::DW_EH_PE_udata4;
if (IsPIC) {
LSDAEncoding = dwarf::DW_EH_PE_pcrel |
(Large ? dwarf::DW_EH_PE_sdata8 : dwarf::DW_EH_PE_sdata4);
TTypeEncoding = dwarf::DW_EH_PE_indirect | dwarf::DW_EH_PE_pcrel |
(Large ? dwarf::DW_EH_PE_sdata8 : dwarf::DW_EH_PE_sdata4);
}
std::unique_ptr<MCDisassembler> DisAsm(
TheTarget->createMCDisassembler(*STI, *Ctx));
if (!DisAsm) {
errs() << "BOLT-ERROR: no disassembler for target " << TripleName << "\n";
return nullptr;
}
std::unique_ptr<const MCInstrAnalysis> MIA(
TheTarget->createMCInstrAnalysis(MII.get()));
if (!MIA) {
errs() << "BOLT-ERROR: failed to create instruction analysis for target"
<< TripleName << "\n";
return nullptr;
}
std::unique_ptr<MCPlusBuilder> MIB(createMCPlusBuilder(
TheTriple->getArch(), MIA.get(), MII.get(), MRI.get()));
if (!MIB) {
errs() << "BOLT-ERROR: failed to create instruction builder for target"
<< TripleName << "\n";
return nullptr;
}
int AsmPrinterVariant = AsmInfo->getAssemblerDialect();
std::unique_ptr<MCInstPrinter> InstructionPrinter(
TheTarget->createMCInstPrinter(*TheTriple, AsmPrinterVariant, *AsmInfo,
*MII, *MRI));
if (!InstructionPrinter) {
errs() << "BOLT-ERROR: no instruction printer for target " << TripleName
<< '\n';
return nullptr;
}
InstructionPrinter->setPrintImmHex(true);
std::unique_ptr<MCCodeEmitter> MCE(
TheTarget->createMCCodeEmitter(*MII, *MRI, *Ctx));
// Make sure we don't miss any output on core dumps.
outs().SetUnbuffered();
errs().SetUnbuffered();
dbgs().SetUnbuffered();
auto BC = std::make_unique<BinaryContext>(
std::move(Ctx), std::move(DwCtx), std::move(TheTriple), TheTarget,
std::string(TripleName), std::move(MCE), std::move(MOFI),
std::move(AsmInfo), std::move(MII), std::move(STI),
std::move(InstructionPrinter), std::move(MIA), std::move(MIB),
std::move(MRI), std::move(DisAsm));
BC->TTypeEncoding = TTypeEncoding;
BC->LSDAEncoding = LSDAEncoding;
[BOLT] Support for lite mode with relocations Summary: Add '-lite' support for relocations for improved processing time, memory consumption, and more resilient processing of binaries with embedded assembly code. In lite relocation mode, BOLT will skip full processing of functions without a profile. It will run scanExternalRefs() on such functions to discover external references and to create internal relocations to update references to optimized functions. Note that we could have relied on the compiler/linker to provide relocations for function references. However, there's no assurance that all such references are reported. E.g., the compiler can resolve inter-procedural references internally, leaving no relocations for the linker. The scan process takes about <10 seconds per 100MB of code on modern hardware. It's a reasonable overhead to live with considering the flexibility it provides. If BOLT fails to scan or disassemble a function, .e.g., due to a data object embedded in code, or an unsupported instruction, it enables a patching mode to guarantee that the failed function will call optimized/moved versions of functions. The patching happens at original function entry points. '-skip=<func1,func2,...>' option now can be used to skip processing of arbitrary functions in the relocation mode. With '-use-old-text' or '-strict' we require all functions to be processed. As such, it is incompatible with '-lite' option, and '-skip' option will only disable optimizations of listed functions, not their disassembly and emission. (cherry picked from FBD22040717)
2020-06-15 00:15:47 -07:00
BC->MAB = std::unique_ptr<MCAsmBackend>(
BC->TheTarget->createMCAsmBackend(*BC->STI, *BC->MRI, MCTargetOptions()));
BC->setFilename(File->getFileName());
BC->HasFixedLoadAddress = !IsPIC;
return BC;
}
bool BinaryContext::forceSymbolRelocations(StringRef SymbolName) const {
if (opts::HotText && (SymbolName == "__hot_start" ||
SymbolName == "__hot_end"))
return true;
if (opts::HotData && (SymbolName == "__hot_data_start" ||
SymbolName == "__hot_data_end"))
return true;
if (SymbolName == "_end")
return true;
return false;
}
[BOLT rebase] Rebase fixes on top of LLVM Feb2018 Summary: This commit includes all code necessary to make BOLT working again after the rebase. This includes a redesign of the EHFrame work, cherry-pick of the 3dnow disassembly work, compilation error fixes, and port of the debug_info work. The macroop fusion feature is not ported yet. The rebased version has minor changes to the "executed instructions" dynostats counter because REP prefixes are considered a part of the instruction it applies to. Also, some X86 instructions had the "mayLoad" tablegen property removed, which BOLT uses to identify and account for loads, thus reducing the total number of loads reported by dynostats. This was observed in X86::MOVDQUmr. TRAP instructions are not terminators anymore, changing our CFG. This commit adds compensation to preserve this old behavior and minimize tests changes. debug_info sections are now slightly larger. The discriminator field in the line table is slightly different due to a change upstream. New profiles generated with the other bolt are incompatible with this version because of different hash values calculated for functions, so they will be considered 100% stale. This commit changes the corresponding test to XFAIL so it can be updated. The hash function changes because it relies on raw opcode values, which change according to the opcodes described in the X86 tablegen files. When processing HHVM, bolt was observed to be using about 800MB more memory in the rebased version and being about 5% slower. (cherry picked from FBD7078072)
2018-02-06 15:00:23 -08:00
std::unique_ptr<MCObjectWriter>
BinaryContext::createObjectWriter(raw_pwrite_stream &OS) {
return MAB->createObjectWriter(OS);
}
[BOLT] Refactor global symbol handling code. Summary: This is preparation work for static data reordering. I've created a new class called BinaryData which represents a symbol contained in a section. It records almost all the information relevant for dealing with data, e.g. names, address, size, alignment, profiling data, etc. BinaryContext still stores and manages BinaryData objects similar to how it managed symbols and global addresses before. The interfaces are not changed too drastically from before either. There is a bit of overlap between BinaryData and BinaryFunction. I would have liked to do some more refactoring to make a BinaryFunctionFragment that subclassed from BinaryData and then have BinaryFunction be composed or associated with BinaryFunctionFragments. I've also attempted to use (symbol + offset) for when addresses are pointing into the middle of symbols with known sizes. This changes the simplify rodata loads optimization slightly since the expression on an instruction can now also be a (symbol + offset) rather than just a symbol. One of the overall goals for this refactoring is to make sure every relocation is associated with a BinaryData object. This requires adding "hole" BinaryData's wherever there are gaps in a section's address space. Most of the holes seem to be data that has no associated symbol info. In this case we can't do any better than lumping all the adjacent hole symbols into one big symbol (there may be more than one actual data object that contributes to a hole). At least the combined holes should be moveable. Jump tables have similar issues. They appear to mostly be sub-objects for top level local symbols. The main problem is that we can't recognize jump tables at the time we scan the symbol table, we have to wait til disassembly. When a jump table is discovered we add it as a sub-object to the existing local symbol. If there are one or more existing BinaryData's that appear in the address range of a newly created jump table, those are added as sub-objects as well. (cherry picked from FBD6362544)
2017-11-14 20:05:11 -08:00
bool BinaryContext::validateObjectNesting() const {
auto Itr = BinaryDataMap.begin();
auto End = BinaryDataMap.end();
bool Valid = true;
while (Itr != End) {
auto Next = std::next(Itr);
while (Next != End &&
Itr->second->getSection() == Next->second->getSection() &&
Itr->second->containsRange(Next->second->getAddress(),
Next->second->getSize())) {
if (Next->second->Parent != Itr->second) {
errs() << "BOLT-WARNING: object nesting incorrect for:\n"
<< "BOLT-WARNING: " << *Itr->second << "\n"
<< "BOLT-WARNING: " << *Next->second << "\n";
Valid = false;
}
++Next;
}
Itr = Next;
}
return Valid;
}
bool BinaryContext::validateHoles() const {
bool Valid = true;
for (BinarySection &Section : sections()) {
for (const Relocation &Rel : Section.relocations()) {
uint64_t RelAddr = Rel.Offset + Section.getAddress();
const BinaryData *BD = getBinaryDataContainingAddress(RelAddr);
[BOLT] Refactor global symbol handling code. Summary: This is preparation work for static data reordering. I've created a new class called BinaryData which represents a symbol contained in a section. It records almost all the information relevant for dealing with data, e.g. names, address, size, alignment, profiling data, etc. BinaryContext still stores and manages BinaryData objects similar to how it managed symbols and global addresses before. The interfaces are not changed too drastically from before either. There is a bit of overlap between BinaryData and BinaryFunction. I would have liked to do some more refactoring to make a BinaryFunctionFragment that subclassed from BinaryData and then have BinaryFunction be composed or associated with BinaryFunctionFragments. I've also attempted to use (symbol + offset) for when addresses are pointing into the middle of symbols with known sizes. This changes the simplify rodata loads optimization slightly since the expression on an instruction can now also be a (symbol + offset) rather than just a symbol. One of the overall goals for this refactoring is to make sure every relocation is associated with a BinaryData object. This requires adding "hole" BinaryData's wherever there are gaps in a section's address space. Most of the holes seem to be data that has no associated symbol info. In this case we can't do any better than lumping all the adjacent hole symbols into one big symbol (there may be more than one actual data object that contributes to a hole). At least the combined holes should be moveable. Jump tables have similar issues. They appear to mostly be sub-objects for top level local symbols. The main problem is that we can't recognize jump tables at the time we scan the symbol table, we have to wait til disassembly. When a jump table is discovered we add it as a sub-object to the existing local symbol. If there are one or more existing BinaryData's that appear in the address range of a newly created jump table, those are added as sub-objects as well. (cherry picked from FBD6362544)
2017-11-14 20:05:11 -08:00
if (!BD) {
errs() << "BOLT-WARNING: no BinaryData found for relocation at address"
<< " 0x" << Twine::utohexstr(RelAddr) << " in "
<< Section.getName() << "\n";
Valid = false;
} else if (!BD->getAtomicRoot()) {
errs() << "BOLT-WARNING: no atomic BinaryData found for relocation at "
<< "address 0x" << Twine::utohexstr(RelAddr) << " in "
<< Section.getName() << "\n";
Valid = false;
}
}
}
return Valid;
}
void BinaryContext::updateObjectNesting(BinaryDataMapType::iterator GAI) {
const uint64_t Address = GAI->second->getAddress();
const uint64_t Size = GAI->second->getSize();
[BOLT] Refactor global symbol handling code. Summary: This is preparation work for static data reordering. I've created a new class called BinaryData which represents a symbol contained in a section. It records almost all the information relevant for dealing with data, e.g. names, address, size, alignment, profiling data, etc. BinaryContext still stores and manages BinaryData objects similar to how it managed symbols and global addresses before. The interfaces are not changed too drastically from before either. There is a bit of overlap between BinaryData and BinaryFunction. I would have liked to do some more refactoring to make a BinaryFunctionFragment that subclassed from BinaryData and then have BinaryFunction be composed or associated with BinaryFunctionFragments. I've also attempted to use (symbol + offset) for when addresses are pointing into the middle of symbols with known sizes. This changes the simplify rodata loads optimization slightly since the expression on an instruction can now also be a (symbol + offset) rather than just a symbol. One of the overall goals for this refactoring is to make sure every relocation is associated with a BinaryData object. This requires adding "hole" BinaryData's wherever there are gaps in a section's address space. Most of the holes seem to be data that has no associated symbol info. In this case we can't do any better than lumping all the adjacent hole symbols into one big symbol (there may be more than one actual data object that contributes to a hole). At least the combined holes should be moveable. Jump tables have similar issues. They appear to mostly be sub-objects for top level local symbols. The main problem is that we can't recognize jump tables at the time we scan the symbol table, we have to wait til disassembly. When a jump table is discovered we add it as a sub-object to the existing local symbol. If there are one or more existing BinaryData's that appear in the address range of a newly created jump table, those are added as sub-objects as well. (cherry picked from FBD6362544)
2017-11-14 20:05:11 -08:00
auto fixParents =
[&](BinaryDataMapType::iterator Itr, BinaryData *NewParent) {
BinaryData *OldParent = Itr->second->Parent;
Itr->second->Parent = NewParent;
++Itr;
while (Itr != BinaryDataMap.end() && OldParent &&
Itr->second->Parent == OldParent) {
[BOLT] Refactor global symbol handling code. Summary: This is preparation work for static data reordering. I've created a new class called BinaryData which represents a symbol contained in a section. It records almost all the information relevant for dealing with data, e.g. names, address, size, alignment, profiling data, etc. BinaryContext still stores and manages BinaryData objects similar to how it managed symbols and global addresses before. The interfaces are not changed too drastically from before either. There is a bit of overlap between BinaryData and BinaryFunction. I would have liked to do some more refactoring to make a BinaryFunctionFragment that subclassed from BinaryData and then have BinaryFunction be composed or associated with BinaryFunctionFragments. I've also attempted to use (symbol + offset) for when addresses are pointing into the middle of symbols with known sizes. This changes the simplify rodata loads optimization slightly since the expression on an instruction can now also be a (symbol + offset) rather than just a symbol. One of the overall goals for this refactoring is to make sure every relocation is associated with a BinaryData object. This requires adding "hole" BinaryData's wherever there are gaps in a section's address space. Most of the holes seem to be data that has no associated symbol info. In this case we can't do any better than lumping all the adjacent hole symbols into one big symbol (there may be more than one actual data object that contributes to a hole). At least the combined holes should be moveable. Jump tables have similar issues. They appear to mostly be sub-objects for top level local symbols. The main problem is that we can't recognize jump tables at the time we scan the symbol table, we have to wait til disassembly. When a jump table is discovered we add it as a sub-object to the existing local symbol. If there are one or more existing BinaryData's that appear in the address range of a newly created jump table, those are added as sub-objects as well. (cherry picked from FBD6362544)
2017-11-14 20:05:11 -08:00
Itr->second->Parent = NewParent;
++Itr;
}
[BOLT] Refactor global symbol handling code. Summary: This is preparation work for static data reordering. I've created a new class called BinaryData which represents a symbol contained in a section. It records almost all the information relevant for dealing with data, e.g. names, address, size, alignment, profiling data, etc. BinaryContext still stores and manages BinaryData objects similar to how it managed symbols and global addresses before. The interfaces are not changed too drastically from before either. There is a bit of overlap between BinaryData and BinaryFunction. I would have liked to do some more refactoring to make a BinaryFunctionFragment that subclassed from BinaryData and then have BinaryFunction be composed or associated with BinaryFunctionFragments. I've also attempted to use (symbol + offset) for when addresses are pointing into the middle of symbols with known sizes. This changes the simplify rodata loads optimization slightly since the expression on an instruction can now also be a (symbol + offset) rather than just a symbol. One of the overall goals for this refactoring is to make sure every relocation is associated with a BinaryData object. This requires adding "hole" BinaryData's wherever there are gaps in a section's address space. Most of the holes seem to be data that has no associated symbol info. In this case we can't do any better than lumping all the adjacent hole symbols into one big symbol (there may be more than one actual data object that contributes to a hole). At least the combined holes should be moveable. Jump tables have similar issues. They appear to mostly be sub-objects for top level local symbols. The main problem is that we can't recognize jump tables at the time we scan the symbol table, we have to wait til disassembly. When a jump table is discovered we add it as a sub-object to the existing local symbol. If there are one or more existing BinaryData's that appear in the address range of a newly created jump table, those are added as sub-objects as well. (cherry picked from FBD6362544)
2017-11-14 20:05:11 -08:00
};
// Check if the previous symbol contains the newly added symbol.
if (GAI != BinaryDataMap.begin()) {
BinaryData *Prev = std::prev(GAI)->second;
[BOLT] Refactor global symbol handling code. Summary: This is preparation work for static data reordering. I've created a new class called BinaryData which represents a symbol contained in a section. It records almost all the information relevant for dealing with data, e.g. names, address, size, alignment, profiling data, etc. BinaryContext still stores and manages BinaryData objects similar to how it managed symbols and global addresses before. The interfaces are not changed too drastically from before either. There is a bit of overlap between BinaryData and BinaryFunction. I would have liked to do some more refactoring to make a BinaryFunctionFragment that subclassed from BinaryData and then have BinaryFunction be composed or associated with BinaryFunctionFragments. I've also attempted to use (symbol + offset) for when addresses are pointing into the middle of symbols with known sizes. This changes the simplify rodata loads optimization slightly since the expression on an instruction can now also be a (symbol + offset) rather than just a symbol. One of the overall goals for this refactoring is to make sure every relocation is associated with a BinaryData object. This requires adding "hole" BinaryData's wherever there are gaps in a section's address space. Most of the holes seem to be data that has no associated symbol info. In this case we can't do any better than lumping all the adjacent hole symbols into one big symbol (there may be more than one actual data object that contributes to a hole). At least the combined holes should be moveable. Jump tables have similar issues. They appear to mostly be sub-objects for top level local symbols. The main problem is that we can't recognize jump tables at the time we scan the symbol table, we have to wait til disassembly. When a jump table is discovered we add it as a sub-object to the existing local symbol. If there are one or more existing BinaryData's that appear in the address range of a newly created jump table, those are added as sub-objects as well. (cherry picked from FBD6362544)
2017-11-14 20:05:11 -08:00
while (Prev) {
if (Prev->getSection() == GAI->second->getSection() &&
Prev->containsRange(Address, Size)) {
fixParents(GAI, Prev);
} else {
fixParents(GAI, nullptr);
}
Prev = Prev->Parent;
}
}
// Check if the newly added symbol contains any subsequent symbols.
if (Size != 0) {
BinaryData *BD = GAI->second->Parent ? GAI->second->Parent : GAI->second;
[BOLT] Refactor global symbol handling code. Summary: This is preparation work for static data reordering. I've created a new class called BinaryData which represents a symbol contained in a section. It records almost all the information relevant for dealing with data, e.g. names, address, size, alignment, profiling data, etc. BinaryContext still stores and manages BinaryData objects similar to how it managed symbols and global addresses before. The interfaces are not changed too drastically from before either. There is a bit of overlap between BinaryData and BinaryFunction. I would have liked to do some more refactoring to make a BinaryFunctionFragment that subclassed from BinaryData and then have BinaryFunction be composed or associated with BinaryFunctionFragments. I've also attempted to use (symbol + offset) for when addresses are pointing into the middle of symbols with known sizes. This changes the simplify rodata loads optimization slightly since the expression on an instruction can now also be a (symbol + offset) rather than just a symbol. One of the overall goals for this refactoring is to make sure every relocation is associated with a BinaryData object. This requires adding "hole" BinaryData's wherever there are gaps in a section's address space. Most of the holes seem to be data that has no associated symbol info. In this case we can't do any better than lumping all the adjacent hole symbols into one big symbol (there may be more than one actual data object that contributes to a hole). At least the combined holes should be moveable. Jump tables have similar issues. They appear to mostly be sub-objects for top level local symbols. The main problem is that we can't recognize jump tables at the time we scan the symbol table, we have to wait til disassembly. When a jump table is discovered we add it as a sub-object to the existing local symbol. If there are one or more existing BinaryData's that appear in the address range of a newly created jump table, those are added as sub-objects as well. (cherry picked from FBD6362544)
2017-11-14 20:05:11 -08:00
auto Itr = std::next(GAI);
while (Itr != BinaryDataMap.end() &&
BD->containsRange(Itr->second->getAddress(),
Itr->second->getSize())) {
[BOLT] Refactor global symbol handling code. Summary: This is preparation work for static data reordering. I've created a new class called BinaryData which represents a symbol contained in a section. It records almost all the information relevant for dealing with data, e.g. names, address, size, alignment, profiling data, etc. BinaryContext still stores and manages BinaryData objects similar to how it managed symbols and global addresses before. The interfaces are not changed too drastically from before either. There is a bit of overlap between BinaryData and BinaryFunction. I would have liked to do some more refactoring to make a BinaryFunctionFragment that subclassed from BinaryData and then have BinaryFunction be composed or associated with BinaryFunctionFragments. I've also attempted to use (symbol + offset) for when addresses are pointing into the middle of symbols with known sizes. This changes the simplify rodata loads optimization slightly since the expression on an instruction can now also be a (symbol + offset) rather than just a symbol. One of the overall goals for this refactoring is to make sure every relocation is associated with a BinaryData object. This requires adding "hole" BinaryData's wherever there are gaps in a section's address space. Most of the holes seem to be data that has no associated symbol info. In this case we can't do any better than lumping all the adjacent hole symbols into one big symbol (there may be more than one actual data object that contributes to a hole). At least the combined holes should be moveable. Jump tables have similar issues. They appear to mostly be sub-objects for top level local symbols. The main problem is that we can't recognize jump tables at the time we scan the symbol table, we have to wait til disassembly. When a jump table is discovered we add it as a sub-object to the existing local symbol. If there are one or more existing BinaryData's that appear in the address range of a newly created jump table, those are added as sub-objects as well. (cherry picked from FBD6362544)
2017-11-14 20:05:11 -08:00
Itr->second->Parent = BD;
++Itr;
}
}
}
[BOLT] Static data reordering pass. Summary: Enable BOLT to reorder data sections in a binary based on memory profiling data. This diff adds a new pass to BOLT that can reorder data sections for better locality based on memory profiling data. For now, the algorithm to order data is primitive and just relies on the frequency of loads to order the contents of a section. We could probably do a lot better by looking at what functions use the hot data and grouping together hot data that is used by a single function (or cluster of functions). Block ordering might give some hints on how to order the data better as well. The new pass has two basic modes: inplace and split (when inplace is false). The default is split since inplace hasn't really been tested much. When splitting is on, the cold data is copied to a "cold" version of the section while the hot data is kept in the original section, e.g. for .rodata, .rodata will contain the hot data and .bolt.org.rodata will contain the cold bits. In inplace mode, the section contents are reordered inplace. In either mode, all relocations to data within that section are updated to reflect new data locations. Things to improve: - The current algorithm is really dumb and doesn't seem to lead to any wins. It certainly could use some improvement. - Private symbols can have data that leaks over to an adjacent symbol, e.g. a string that has a common suffix can start in one symbol and leak over (with the common suffix) into the next. For now, we punt on adjacent private symbols. - Handle ambiguous relocations better. Section relocations that point to the boundary of two symbols will prevent the adjacent symbols from being moved because we can't tell which symbol the relocation is for. - Handle jump tables. Right now jump table support must be basic if data reordering is enabled. - Being able to handle TLS. A good amount of data access in some binaries are happening in TLS. It would be worthwhile to be able to reorder any TLS sections too. - Handle sections with writeable data. This hasn't been tested so probably won't work. We could try to prevent false sharing in writeable sections as well. - A pie in the sky goal would be to use DWARF info to reorder types. (cherry picked from FBD6792876)
2018-04-20 20:03:31 -07:00
iterator_range<BinaryContext::binary_data_iterator>
BinaryContext::getSubBinaryData(BinaryData *BD) {
auto Start = std::next(BinaryDataMap.find(BD->getAddress()));
auto End = Start;
while (End != BinaryDataMap.end() &&
BD->isAncestorOf(End->second)) {
++End;
}
return make_range(Start, End);
}
std::pair<const MCSymbol *, uint64_t>
BinaryContext::handleAddressRef(uint64_t Address, BinaryFunction &BF,
bool IsPCRel) {
uint64_t Addend = 0;
if (isAArch64()) {
// Check if this is an access to a constant island and create bookkeeping
// to keep track of it and emit it later as part of this function.
if (MCSymbol *IslandSym = BF.getOrCreateIslandAccess(Address))
return std::make_pair(IslandSym, Addend);
// Detect custom code written in assembly that refers to arbitrary
// constant islands from other functions. Write this reference so we
// can pull this constant island and emit it as part of this function
// too.
auto IslandIter = AddressToConstantIslandMap.lower_bound(Address);
if (IslandIter != AddressToConstantIslandMap.end()) {
if (MCSymbol *IslandSym =
IslandIter->second->getOrCreateProxyIslandAccess(Address, BF)) {
/// Make this function depend on IslandIter->second because we have
/// a reference to its constant island. When emitting this function,
/// we will also emit IslandIter->second's constants. This only
/// happens in custom AArch64 assembly code.
BF.Islands.Dependency.insert(IslandIter->second);
BF.Islands.ProxySymbols[IslandSym] = IslandIter->second;
return std::make_pair(IslandSym, Addend);
}
}
}
// Note that the address does not necessarily have to reside inside
// a section, it could be an absolute address too.
ErrorOr<BinarySection &> Section = getSectionForAddress(Address);
if (Section && Section->isText()) {
if (BF.containsAddress(Address, /*UseMaxSize=*/ isAArch64())) {
if (Address != BF.getAddress()) {
// The address could potentially escape. Mark it as another entry
// point into the function.
if (opts::Verbosity >= 1) {
outs() << "BOLT-INFO: potentially escaped address 0x"
<< Twine::utohexstr(Address) << " in function "
<< BF << '\n';
}
BF.HasInternalLabelReference = true;
return std::make_pair(
BF.addEntryPointAtOffset(Address - BF.getAddress()),
Addend);
}
} else {
[BOLT] Support for lite mode with relocations Summary: Add '-lite' support for relocations for improved processing time, memory consumption, and more resilient processing of binaries with embedded assembly code. In lite relocation mode, BOLT will skip full processing of functions without a profile. It will run scanExternalRefs() on such functions to discover external references and to create internal relocations to update references to optimized functions. Note that we could have relied on the compiler/linker to provide relocations for function references. However, there's no assurance that all such references are reported. E.g., the compiler can resolve inter-procedural references internally, leaving no relocations for the linker. The scan process takes about <10 seconds per 100MB of code on modern hardware. It's a reasonable overhead to live with considering the flexibility it provides. If BOLT fails to scan or disassemble a function, .e.g., due to a data object embedded in code, or an unsupported instruction, it enables a patching mode to guarantee that the failed function will call optimized/moved versions of functions. The patching happens at original function entry points. '-skip=<func1,func2,...>' option now can be used to skip processing of arbitrary functions in the relocation mode. With '-use-old-text' or '-strict' we require all functions to be processed. As such, it is incompatible with '-lite' option, and '-skip' option will only disable optimizations of listed functions, not their disassembly and emission. (cherry picked from FBD22040717)
2020-06-15 00:15:47 -07:00
BF.InterproceduralReferences.insert(Address);
}
}
// With relocations, catch jump table references outside of the basic block
// containing the indirect jump.
if (HasRelocations) {
const MemoryContentsType MemType = analyzeMemoryAt(Address, BF);
if (MemType == MemoryContentsType::POSSIBLE_PIC_JUMP_TABLE && IsPCRel) {
const MCSymbol *Symbol =
getOrCreateJumpTable(BF, Address, JumpTable::JTT_PIC);
return std::make_pair(Symbol, Addend);
}
}
if (BinaryData *BD = getBinaryDataContainingAddress(Address)) {
return std::make_pair(BD->getSymbol(), Address - BD->getAddress());
}
// TODO: use DWARF info to get size/alignment here?
MCSymbol *TargetSymbol = getOrCreateGlobalSymbol(Address, "DATAat");
LLVM_DEBUG(dbgs() << "Created symbol " << TargetSymbol->getName() << '\n');
return std::make_pair(TargetSymbol, Addend);
}
MemoryContentsType
BinaryContext::analyzeMemoryAt(uint64_t Address, BinaryFunction &BF) {
if (!isX86())
return MemoryContentsType::UNKNOWN;
ErrorOr<BinarySection &> Section = getSectionForAddress(Address);
if (!Section) {
// No section - possibly an absolute address. Since we don't allow
// internal function addresses to escape the function scope - we
// consider it a tail call.
if (opts::Verbosity > 1) {
errs() << "BOLT-WARNING: no section for address 0x"
<< Twine::utohexstr(Address) << " referenced from function "
<< BF << '\n';
}
return MemoryContentsType::UNKNOWN;
}
if (Section->isVirtual()) {
// The contents are filled at runtime.
return MemoryContentsType::UNKNOWN;
}
// No support for jump tables in code yet.
if (Section->isText())
return MemoryContentsType::UNKNOWN;
// Start with checking for PIC jump table. We expect non-PIC jump tables
// to have high 32 bits set to 0.
if (analyzeJumpTable(Address, JumpTable::JTT_PIC, BF))
return MemoryContentsType::POSSIBLE_PIC_JUMP_TABLE;
if (analyzeJumpTable(Address, JumpTable::JTT_NORMAL, BF))
return MemoryContentsType::POSSIBLE_JUMP_TABLE;
return MemoryContentsType::UNKNOWN;
}
bool BinaryContext::analyzeJumpTable(const uint64_t Address,
const JumpTable::JumpTableType Type,
BinaryFunction &BF,
const uint64_t NextJTAddress,
JumpTable::OffsetsType *Offsets) {
// Is one of the targets __builtin_unreachable?
bool HasUnreachable = false;
// Number of targets other than __builtin_unreachable.
uint64_t NumRealEntries = 0;
constexpr uint64_t INVALID_OFFSET = std::numeric_limits<uint64_t>::max();
auto addOffset = [&](uint64_t Offset) {
if (Offsets)
Offsets->emplace_back(Offset);
};
auto isFragment = [](BinaryFunction &Fragment,
BinaryFunction &Parent) -> bool {
// Check if <fragment restored name> == <parent restored name>.cold(.\d+)?
for (StringRef BFName : Parent.getNames()) {
std::string BFNamePrefix = Regex::escape(NameResolver::restore(BFName));
std::string BFNameRegex =
Twine(BFNamePrefix, "\\.cold(\\.[0-9]+)?").str();
if (Fragment.hasRestoredNameRegex(BFNameRegex))
return true;
}
return false;
};
auto doesBelongToFunction = [&](const uint64_t Addr,
BinaryFunction *TargetBF) -> bool {
if (BF.containsAddress(Addr))
return true;
// Nothing to do if we failed to identify the containing function.
if (!TargetBF)
return false;
// Case 1: check if BF is a fragment and TargetBF is its parent.
if (BF.isFragment()) {
// BF is a fragment, but parent function is not registered.
// This means there's no direct jump between parent and fragment.
// Set parent link here in jump table analysis, based on function name
// matching heuristic.
if (!BF.getParentFragment() && isFragment(BF, *TargetBF))
registerFragment(BF, *TargetBF);
return BF.getParentFragment() == TargetBF;
}
// Case 2: check if TargetBF is a fragment and BF is its parent.
if (TargetBF->isFragment()) {
// TargetBF is a fragment, but parent function is not registered.
// This means there's no direct jump between parent and fragment.
// Set parent link here in jump table analysis, based on function name
// matching heuristic.
if (!TargetBF->getParentFragment() && isFragment(*TargetBF, BF))
registerFragment(*TargetBF, BF);
return TargetBF->getParentFragment() == &BF;
}
return false;
};
ErrorOr<BinarySection &> Section = getSectionForAddress(Address);
if (!Section)
return false;
// The upper bound is defined by containing object, section limits, and
// the next jump table in memory.
uint64_t UpperBound = Section->getEndAddress();
const BinaryData *JumpTableBD = getBinaryDataAtAddress(Address);
if (JumpTableBD && JumpTableBD->getSize()) {
assert(JumpTableBD->getEndAddress() <= UpperBound &&
"data object cannot cross a section boundary");
UpperBound = JumpTableBD->getEndAddress();
}
if (NextJTAddress) {
UpperBound = std::min(NextJTAddress, UpperBound);
}
LLVM_DEBUG(dbgs() << "BOLT-DEBUG: analyzeJumpTable in " << BF.getPrintName()
<< '\n');
const uint64_t EntrySize = getJumpTableEntrySize(Type);
for (uint64_t EntryAddress = Address; EntryAddress <= UpperBound - EntrySize;
EntryAddress += EntrySize) {
LLVM_DEBUG(dbgs() << " * Checking 0x" << Twine::utohexstr(EntryAddress)
<< " -> ");
// Check if there's a proper relocation against the jump table entry.
if (HasRelocations) {
[BOLT] Debug logging in analyzeJumpTable Summary: Added debug logging in/around `analyzeJumpTable`: - Dump jump table entries as they are being processed: ```BOLT-DEBUG: analyzeJumpTable in read_encoded_value_with_base/2(*2) * Checking 0x428ff40 -> OK: real entry * Checking 0x428ff44 -> OK: real entry * Checking 0x428ff48 -> OK: real entry * Checking 0x428ff4c -> OK: real entry * Checking 0x428ff50 -> OK: real entry * Checking 0x428ff54 -> OK: address in split fragment * Checking 0x428ff58 -> OK: address in split fragment * Checking 0x428ff5c -> OK: address in split fragment * Checking 0x428ff60 -> OK: address in split fragment * Checking 0x428ff64 -> OK: real entry * Checking 0x428ff68 -> OK: real entry * Checking 0x428ff6c -> OK: real entry * Checking 0x428ff70 -> OK: real entry BOLT-DEBUG: analyzeJumpTable in classify_object_over_fdes/1(*2) * Checking 0x428ff74 -> OK: real entry ... ``` - Dump skipped functions: ``` Skipping _ZNK6icu_676number4impl12RoundingImpl5applyERNS1_15DecimalQuantityER10UErrorCode.part.2/1(*2) family Ignoring _ZNK6icu_676number4impl12RoundingImpl5applyERNS1_15DecimalQuantityER10UErrorCode.part.2/1(*2) Ignoring _ZNK6icu_676number4impl12RoundingImpl5applyERNS1_15DecimalQuantityER10UErrorCode.part.2.cold.3/1(*2) Skipping _ZNK6icu_676number4impl12RoundingImpl5applyERNS1_15DecimalQuantityER10UErrorCode family Ignoring _ZNK6icu_676number4impl12RoundingImpl5applyERNS1_15DecimalQuantityER10UErrorCode Ignoring _ZNK6icu_676number4impl12RoundingImpl5applyERNS1_15DecimalQuantityER10UErrorCode.cold.4/1(*2) ``` - Dump values of unclaimed PC-relative relocations in data. (cherry picked from FBD24898172)
2020-11-12 11:54:38 -08:00
if (Type == JumpTable::JTT_PIC &&
!DataPCRelocations.count(EntryAddress)) {
LLVM_DEBUG(
[BOLT] Debug logging in analyzeJumpTable Summary: Added debug logging in/around `analyzeJumpTable`: - Dump jump table entries as they are being processed: ```BOLT-DEBUG: analyzeJumpTable in read_encoded_value_with_base/2(*2) * Checking 0x428ff40 -> OK: real entry * Checking 0x428ff44 -> OK: real entry * Checking 0x428ff48 -> OK: real entry * Checking 0x428ff4c -> OK: real entry * Checking 0x428ff50 -> OK: real entry * Checking 0x428ff54 -> OK: address in split fragment * Checking 0x428ff58 -> OK: address in split fragment * Checking 0x428ff5c -> OK: address in split fragment * Checking 0x428ff60 -> OK: address in split fragment * Checking 0x428ff64 -> OK: real entry * Checking 0x428ff68 -> OK: real entry * Checking 0x428ff6c -> OK: real entry * Checking 0x428ff70 -> OK: real entry BOLT-DEBUG: analyzeJumpTable in classify_object_over_fdes/1(*2) * Checking 0x428ff74 -> OK: real entry ... ``` - Dump skipped functions: ``` Skipping _ZNK6icu_676number4impl12RoundingImpl5applyERNS1_15DecimalQuantityER10UErrorCode.part.2/1(*2) family Ignoring _ZNK6icu_676number4impl12RoundingImpl5applyERNS1_15DecimalQuantityER10UErrorCode.part.2/1(*2) Ignoring _ZNK6icu_676number4impl12RoundingImpl5applyERNS1_15DecimalQuantityER10UErrorCode.part.2.cold.3/1(*2) Skipping _ZNK6icu_676number4impl12RoundingImpl5applyERNS1_15DecimalQuantityER10UErrorCode family Ignoring _ZNK6icu_676number4impl12RoundingImpl5applyERNS1_15DecimalQuantityER10UErrorCode Ignoring _ZNK6icu_676number4impl12RoundingImpl5applyERNS1_15DecimalQuantityER10UErrorCode.cold.4/1(*2) ``` - Dump values of unclaimed PC-relative relocations in data. (cherry picked from FBD24898172)
2020-11-12 11:54:38 -08:00
dbgs() << "FAIL: JTT_PIC table, no relocation for this address\n");
break;
[BOLT] Debug logging in analyzeJumpTable Summary: Added debug logging in/around `analyzeJumpTable`: - Dump jump table entries as they are being processed: ```BOLT-DEBUG: analyzeJumpTable in read_encoded_value_with_base/2(*2) * Checking 0x428ff40 -> OK: real entry * Checking 0x428ff44 -> OK: real entry * Checking 0x428ff48 -> OK: real entry * Checking 0x428ff4c -> OK: real entry * Checking 0x428ff50 -> OK: real entry * Checking 0x428ff54 -> OK: address in split fragment * Checking 0x428ff58 -> OK: address in split fragment * Checking 0x428ff5c -> OK: address in split fragment * Checking 0x428ff60 -> OK: address in split fragment * Checking 0x428ff64 -> OK: real entry * Checking 0x428ff68 -> OK: real entry * Checking 0x428ff6c -> OK: real entry * Checking 0x428ff70 -> OK: real entry BOLT-DEBUG: analyzeJumpTable in classify_object_over_fdes/1(*2) * Checking 0x428ff74 -> OK: real entry ... ``` - Dump skipped functions: ``` Skipping _ZNK6icu_676number4impl12RoundingImpl5applyERNS1_15DecimalQuantityER10UErrorCode.part.2/1(*2) family Ignoring _ZNK6icu_676number4impl12RoundingImpl5applyERNS1_15DecimalQuantityER10UErrorCode.part.2/1(*2) Ignoring _ZNK6icu_676number4impl12RoundingImpl5applyERNS1_15DecimalQuantityER10UErrorCode.part.2.cold.3/1(*2) Skipping _ZNK6icu_676number4impl12RoundingImpl5applyERNS1_15DecimalQuantityER10UErrorCode family Ignoring _ZNK6icu_676number4impl12RoundingImpl5applyERNS1_15DecimalQuantityER10UErrorCode Ignoring _ZNK6icu_676number4impl12RoundingImpl5applyERNS1_15DecimalQuantityER10UErrorCode.cold.4/1(*2) ``` - Dump values of unclaimed PC-relative relocations in data. (cherry picked from FBD24898172)
2020-11-12 11:54:38 -08:00
}
if (Type == JumpTable::JTT_NORMAL && !getRelocationAt(EntryAddress)) {
LLVM_DEBUG(
dbgs()
<< "FAIL: JTT_NORMAL table, no relocation for this address\n");
break;
[BOLT] Debug logging in analyzeJumpTable Summary: Added debug logging in/around `analyzeJumpTable`: - Dump jump table entries as they are being processed: ```BOLT-DEBUG: analyzeJumpTable in read_encoded_value_with_base/2(*2) * Checking 0x428ff40 -> OK: real entry * Checking 0x428ff44 -> OK: real entry * Checking 0x428ff48 -> OK: real entry * Checking 0x428ff4c -> OK: real entry * Checking 0x428ff50 -> OK: real entry * Checking 0x428ff54 -> OK: address in split fragment * Checking 0x428ff58 -> OK: address in split fragment * Checking 0x428ff5c -> OK: address in split fragment * Checking 0x428ff60 -> OK: address in split fragment * Checking 0x428ff64 -> OK: real entry * Checking 0x428ff68 -> OK: real entry * Checking 0x428ff6c -> OK: real entry * Checking 0x428ff70 -> OK: real entry BOLT-DEBUG: analyzeJumpTable in classify_object_over_fdes/1(*2) * Checking 0x428ff74 -> OK: real entry ... ``` - Dump skipped functions: ``` Skipping _ZNK6icu_676number4impl12RoundingImpl5applyERNS1_15DecimalQuantityER10UErrorCode.part.2/1(*2) family Ignoring _ZNK6icu_676number4impl12RoundingImpl5applyERNS1_15DecimalQuantityER10UErrorCode.part.2/1(*2) Ignoring _ZNK6icu_676number4impl12RoundingImpl5applyERNS1_15DecimalQuantityER10UErrorCode.part.2.cold.3/1(*2) Skipping _ZNK6icu_676number4impl12RoundingImpl5applyERNS1_15DecimalQuantityER10UErrorCode family Ignoring _ZNK6icu_676number4impl12RoundingImpl5applyERNS1_15DecimalQuantityER10UErrorCode Ignoring _ZNK6icu_676number4impl12RoundingImpl5applyERNS1_15DecimalQuantityER10UErrorCode.cold.4/1(*2) ``` - Dump values of unclaimed PC-relative relocations in data. (cherry picked from FBD24898172)
2020-11-12 11:54:38 -08:00
}
}
const uint64_t Value = (Type == JumpTable::JTT_PIC)
? Address + *getSignedValueAtAddress(EntryAddress, EntrySize)
: *getPointerAtAddress(EntryAddress);
// __builtin_unreachable() case.
if (Value == BF.getAddress() + BF.getSize()) {
addOffset(Value - BF.getAddress());
HasUnreachable = true;
LLVM_DEBUG(dbgs() << "OK: __builtin_unreachable\n");
continue;
}
// Function or one of its fragments.
BinaryFunction *TargetBF = getBinaryFunctionContainingAddress(Value);
// We assume that a jump table cannot have function start as an entry.
[BOLT] Debug logging in analyzeJumpTable Summary: Added debug logging in/around `analyzeJumpTable`: - Dump jump table entries as they are being processed: ```BOLT-DEBUG: analyzeJumpTable in read_encoded_value_with_base/2(*2) * Checking 0x428ff40 -> OK: real entry * Checking 0x428ff44 -> OK: real entry * Checking 0x428ff48 -> OK: real entry * Checking 0x428ff4c -> OK: real entry * Checking 0x428ff50 -> OK: real entry * Checking 0x428ff54 -> OK: address in split fragment * Checking 0x428ff58 -> OK: address in split fragment * Checking 0x428ff5c -> OK: address in split fragment * Checking 0x428ff60 -> OK: address in split fragment * Checking 0x428ff64 -> OK: real entry * Checking 0x428ff68 -> OK: real entry * Checking 0x428ff6c -> OK: real entry * Checking 0x428ff70 -> OK: real entry BOLT-DEBUG: analyzeJumpTable in classify_object_over_fdes/1(*2) * Checking 0x428ff74 -> OK: real entry ... ``` - Dump skipped functions: ``` Skipping _ZNK6icu_676number4impl12RoundingImpl5applyERNS1_15DecimalQuantityER10UErrorCode.part.2/1(*2) family Ignoring _ZNK6icu_676number4impl12RoundingImpl5applyERNS1_15DecimalQuantityER10UErrorCode.part.2/1(*2) Ignoring _ZNK6icu_676number4impl12RoundingImpl5applyERNS1_15DecimalQuantityER10UErrorCode.part.2.cold.3/1(*2) Skipping _ZNK6icu_676number4impl12RoundingImpl5applyERNS1_15DecimalQuantityER10UErrorCode family Ignoring _ZNK6icu_676number4impl12RoundingImpl5applyERNS1_15DecimalQuantityER10UErrorCode Ignoring _ZNK6icu_676number4impl12RoundingImpl5applyERNS1_15DecimalQuantityER10UErrorCode.cold.4/1(*2) ``` - Dump values of unclaimed PC-relative relocations in data. (cherry picked from FBD24898172)
2020-11-12 11:54:38 -08:00
if (!doesBelongToFunction(Value, TargetBF) || Value == BF.getAddress()) {
LLVM_DEBUG({
[BOLT] Debug logging in analyzeJumpTable Summary: Added debug logging in/around `analyzeJumpTable`: - Dump jump table entries as they are being processed: ```BOLT-DEBUG: analyzeJumpTable in read_encoded_value_with_base/2(*2) * Checking 0x428ff40 -> OK: real entry * Checking 0x428ff44 -> OK: real entry * Checking 0x428ff48 -> OK: real entry * Checking 0x428ff4c -> OK: real entry * Checking 0x428ff50 -> OK: real entry * Checking 0x428ff54 -> OK: address in split fragment * Checking 0x428ff58 -> OK: address in split fragment * Checking 0x428ff5c -> OK: address in split fragment * Checking 0x428ff60 -> OK: address in split fragment * Checking 0x428ff64 -> OK: real entry * Checking 0x428ff68 -> OK: real entry * Checking 0x428ff6c -> OK: real entry * Checking 0x428ff70 -> OK: real entry BOLT-DEBUG: analyzeJumpTable in classify_object_over_fdes/1(*2) * Checking 0x428ff74 -> OK: real entry ... ``` - Dump skipped functions: ``` Skipping _ZNK6icu_676number4impl12RoundingImpl5applyERNS1_15DecimalQuantityER10UErrorCode.part.2/1(*2) family Ignoring _ZNK6icu_676number4impl12RoundingImpl5applyERNS1_15DecimalQuantityER10UErrorCode.part.2/1(*2) Ignoring _ZNK6icu_676number4impl12RoundingImpl5applyERNS1_15DecimalQuantityER10UErrorCode.part.2.cold.3/1(*2) Skipping _ZNK6icu_676number4impl12RoundingImpl5applyERNS1_15DecimalQuantityER10UErrorCode family Ignoring _ZNK6icu_676number4impl12RoundingImpl5applyERNS1_15DecimalQuantityER10UErrorCode Ignoring _ZNK6icu_676number4impl12RoundingImpl5applyERNS1_15DecimalQuantityER10UErrorCode.cold.4/1(*2) ``` - Dump values of unclaimed PC-relative relocations in data. (cherry picked from FBD24898172)
2020-11-12 11:54:38 -08:00
if (!BF.containsAddress(Value)) {
dbgs() << "FAIL: function doesn't contain this address\n";
if (TargetBF) {
dbgs() << " ! function containing this address: "
<< TargetBF->getPrintName() << '\n';
if (TargetBF->isFragment())
dbgs() << " ! is a fragment\n";
BinaryFunction *TargetParent = TargetBF->getParentFragment();
[BOLT] Debug logging in analyzeJumpTable Summary: Added debug logging in/around `analyzeJumpTable`: - Dump jump table entries as they are being processed: ```BOLT-DEBUG: analyzeJumpTable in read_encoded_value_with_base/2(*2) * Checking 0x428ff40 -> OK: real entry * Checking 0x428ff44 -> OK: real entry * Checking 0x428ff48 -> OK: real entry * Checking 0x428ff4c -> OK: real entry * Checking 0x428ff50 -> OK: real entry * Checking 0x428ff54 -> OK: address in split fragment * Checking 0x428ff58 -> OK: address in split fragment * Checking 0x428ff5c -> OK: address in split fragment * Checking 0x428ff60 -> OK: address in split fragment * Checking 0x428ff64 -> OK: real entry * Checking 0x428ff68 -> OK: real entry * Checking 0x428ff6c -> OK: real entry * Checking 0x428ff70 -> OK: real entry BOLT-DEBUG: analyzeJumpTable in classify_object_over_fdes/1(*2) * Checking 0x428ff74 -> OK: real entry ... ``` - Dump skipped functions: ``` Skipping _ZNK6icu_676number4impl12RoundingImpl5applyERNS1_15DecimalQuantityER10UErrorCode.part.2/1(*2) family Ignoring _ZNK6icu_676number4impl12RoundingImpl5applyERNS1_15DecimalQuantityER10UErrorCode.part.2/1(*2) Ignoring _ZNK6icu_676number4impl12RoundingImpl5applyERNS1_15DecimalQuantityER10UErrorCode.part.2.cold.3/1(*2) Skipping _ZNK6icu_676number4impl12RoundingImpl5applyERNS1_15DecimalQuantityER10UErrorCode family Ignoring _ZNK6icu_676number4impl12RoundingImpl5applyERNS1_15DecimalQuantityER10UErrorCode Ignoring _ZNK6icu_676number4impl12RoundingImpl5applyERNS1_15DecimalQuantityER10UErrorCode.cold.4/1(*2) ``` - Dump values of unclaimed PC-relative relocations in data. (cherry picked from FBD24898172)
2020-11-12 11:54:38 -08:00
dbgs() << " ! its parent is "
<< (TargetParent ? TargetParent->getPrintName() : "(none)")
<< '\n';
}
}
if (Value == BF.getAddress())
dbgs() << "FAIL: jump table cannot have function start as an entry\n";
});
break;
[BOLT] Debug logging in analyzeJumpTable Summary: Added debug logging in/around `analyzeJumpTable`: - Dump jump table entries as they are being processed: ```BOLT-DEBUG: analyzeJumpTable in read_encoded_value_with_base/2(*2) * Checking 0x428ff40 -> OK: real entry * Checking 0x428ff44 -> OK: real entry * Checking 0x428ff48 -> OK: real entry * Checking 0x428ff4c -> OK: real entry * Checking 0x428ff50 -> OK: real entry * Checking 0x428ff54 -> OK: address in split fragment * Checking 0x428ff58 -> OK: address in split fragment * Checking 0x428ff5c -> OK: address in split fragment * Checking 0x428ff60 -> OK: address in split fragment * Checking 0x428ff64 -> OK: real entry * Checking 0x428ff68 -> OK: real entry * Checking 0x428ff6c -> OK: real entry * Checking 0x428ff70 -> OK: real entry BOLT-DEBUG: analyzeJumpTable in classify_object_over_fdes/1(*2) * Checking 0x428ff74 -> OK: real entry ... ``` - Dump skipped functions: ``` Skipping _ZNK6icu_676number4impl12RoundingImpl5applyERNS1_15DecimalQuantityER10UErrorCode.part.2/1(*2) family Ignoring _ZNK6icu_676number4impl12RoundingImpl5applyERNS1_15DecimalQuantityER10UErrorCode.part.2/1(*2) Ignoring _ZNK6icu_676number4impl12RoundingImpl5applyERNS1_15DecimalQuantityER10UErrorCode.part.2.cold.3/1(*2) Skipping _ZNK6icu_676number4impl12RoundingImpl5applyERNS1_15DecimalQuantityER10UErrorCode family Ignoring _ZNK6icu_676number4impl12RoundingImpl5applyERNS1_15DecimalQuantityER10UErrorCode Ignoring _ZNK6icu_676number4impl12RoundingImpl5applyERNS1_15DecimalQuantityER10UErrorCode.cold.4/1(*2) ``` - Dump values of unclaimed PC-relative relocations in data. (cherry picked from FBD24898172)
2020-11-12 11:54:38 -08:00
}
// Check there's an instruction at this offset.
if (TargetBF->getState() == BinaryFunction::State::Disassembled &&
!TargetBF->getInstructionAtOffset(Value - TargetBF->getAddress())) {
LLVM_DEBUG(dbgs() << "FAIL: no instruction at this offset\n");
break;
}
++NumRealEntries;
if (TargetBF == &BF) {
// Address inside the function.
addOffset(Value - TargetBF->getAddress());
LLVM_DEBUG(dbgs() << "OK: real entry\n");
} else {
// Address in split fragment.
BF.setHasSplitJumpTable(true);
// Add invalid offset for proper identification of jump table size.
addOffset(INVALID_OFFSET);
LLVM_DEBUG(dbgs() << "OK: address in split fragment\n");
}
}
// It's a jump table if the number of real entries is more than 1, or there's
// one real entry and "unreachable" targets. If there are only multiple
// "unreachable" targets, then it's not a jump table.
return NumRealEntries + HasUnreachable >= 2;
}
void BinaryContext::populateJumpTables() {
std::vector<BinaryFunction *> FuncsToSkip;
LLVM_DEBUG(dbgs() << "DataPCRelocations: " << DataPCRelocations.size()
<< '\n');
for (auto JTI = JumpTables.begin(), JTE = JumpTables.end(); JTI != JTE;
++JTI) {
JumpTable *JT = JTI->second;
BinaryFunction &BF = *JT->Parent;
if (!BF.isSimple())
continue;
uint64_t NextJTAddress = 0;
auto NextJTI = std::next(JTI);
if (NextJTI != JTE) {
NextJTAddress = NextJTI->second->getAddress();
}
const bool Success = analyzeJumpTable(JT->getAddress(), JT->Type, BF,
NextJTAddress, &JT->OffsetEntries);
if (!Success) {
dbgs() << "failed to analyze jump table in function " << BF << '\n';
JT->print(dbgs());
if (NextJTI != JTE) {
dbgs() << "next jump table at 0x"
<< Twine::utohexstr(NextJTI->second->getAddress())
<< " belongs to function " << *NextJTI->second->Parent << '\n';
NextJTI->second->print(dbgs());
}
llvm_unreachable("jump table heuristic failure");
}
for (uint64_t EntryOffset : JT->OffsetEntries) {
if (EntryOffset == BF.getSize())
BF.IgnoredBranches.emplace_back(EntryOffset, BF.getSize());
else
BF.registerReferencedOffset(EntryOffset);
}
// In strict mode, erase PC-relative relocation record. Later we check that
// all such records are erased and thus have been accounted for.
if (opts::StrictMode && JT->Type == JumpTable::JTT_PIC) {
for (uint64_t Address = JT->getAddress();
Address < JT->getAddress() + JT->getSize();
Address += JT->EntrySize) {
DataPCRelocations.erase(DataPCRelocations.find(Address));
}
}
// Mark to skip the function and all its fragments.
if (BF.hasSplitJumpTable())
FuncsToSkip.push_back(&BF);
}
[BOLT] Debug logging in analyzeJumpTable Summary: Added debug logging in/around `analyzeJumpTable`: - Dump jump table entries as they are being processed: ```BOLT-DEBUG: analyzeJumpTable in read_encoded_value_with_base/2(*2) * Checking 0x428ff40 -> OK: real entry * Checking 0x428ff44 -> OK: real entry * Checking 0x428ff48 -> OK: real entry * Checking 0x428ff4c -> OK: real entry * Checking 0x428ff50 -> OK: real entry * Checking 0x428ff54 -> OK: address in split fragment * Checking 0x428ff58 -> OK: address in split fragment * Checking 0x428ff5c -> OK: address in split fragment * Checking 0x428ff60 -> OK: address in split fragment * Checking 0x428ff64 -> OK: real entry * Checking 0x428ff68 -> OK: real entry * Checking 0x428ff6c -> OK: real entry * Checking 0x428ff70 -> OK: real entry BOLT-DEBUG: analyzeJumpTable in classify_object_over_fdes/1(*2) * Checking 0x428ff74 -> OK: real entry ... ``` - Dump skipped functions: ``` Skipping _ZNK6icu_676number4impl12RoundingImpl5applyERNS1_15DecimalQuantityER10UErrorCode.part.2/1(*2) family Ignoring _ZNK6icu_676number4impl12RoundingImpl5applyERNS1_15DecimalQuantityER10UErrorCode.part.2/1(*2) Ignoring _ZNK6icu_676number4impl12RoundingImpl5applyERNS1_15DecimalQuantityER10UErrorCode.part.2.cold.3/1(*2) Skipping _ZNK6icu_676number4impl12RoundingImpl5applyERNS1_15DecimalQuantityER10UErrorCode family Ignoring _ZNK6icu_676number4impl12RoundingImpl5applyERNS1_15DecimalQuantityER10UErrorCode Ignoring _ZNK6icu_676number4impl12RoundingImpl5applyERNS1_15DecimalQuantityER10UErrorCode.cold.4/1(*2) ``` - Dump values of unclaimed PC-relative relocations in data. (cherry picked from FBD24898172)
2020-11-12 11:54:38 -08:00
if (opts::StrictMode && DataPCRelocations.size()) {
LLVM_DEBUG({
[BOLT] Debug logging in analyzeJumpTable Summary: Added debug logging in/around `analyzeJumpTable`: - Dump jump table entries as they are being processed: ```BOLT-DEBUG: analyzeJumpTable in read_encoded_value_with_base/2(*2) * Checking 0x428ff40 -> OK: real entry * Checking 0x428ff44 -> OK: real entry * Checking 0x428ff48 -> OK: real entry * Checking 0x428ff4c -> OK: real entry * Checking 0x428ff50 -> OK: real entry * Checking 0x428ff54 -> OK: address in split fragment * Checking 0x428ff58 -> OK: address in split fragment * Checking 0x428ff5c -> OK: address in split fragment * Checking 0x428ff60 -> OK: address in split fragment * Checking 0x428ff64 -> OK: real entry * Checking 0x428ff68 -> OK: real entry * Checking 0x428ff6c -> OK: real entry * Checking 0x428ff70 -> OK: real entry BOLT-DEBUG: analyzeJumpTable in classify_object_over_fdes/1(*2) * Checking 0x428ff74 -> OK: real entry ... ``` - Dump skipped functions: ``` Skipping _ZNK6icu_676number4impl12RoundingImpl5applyERNS1_15DecimalQuantityER10UErrorCode.part.2/1(*2) family Ignoring _ZNK6icu_676number4impl12RoundingImpl5applyERNS1_15DecimalQuantityER10UErrorCode.part.2/1(*2) Ignoring _ZNK6icu_676number4impl12RoundingImpl5applyERNS1_15DecimalQuantityER10UErrorCode.part.2.cold.3/1(*2) Skipping _ZNK6icu_676number4impl12RoundingImpl5applyERNS1_15DecimalQuantityER10UErrorCode family Ignoring _ZNK6icu_676number4impl12RoundingImpl5applyERNS1_15DecimalQuantityER10UErrorCode Ignoring _ZNK6icu_676number4impl12RoundingImpl5applyERNS1_15DecimalQuantityER10UErrorCode.cold.4/1(*2) ``` - Dump values of unclaimed PC-relative relocations in data. (cherry picked from FBD24898172)
2020-11-12 11:54:38 -08:00
dbgs() << DataPCRelocations.size()
<< " unclaimed PC-relative relocations left in data:\n";
for (uint64_t Reloc : DataPCRelocations)
[BOLT] Debug logging in analyzeJumpTable Summary: Added debug logging in/around `analyzeJumpTable`: - Dump jump table entries as they are being processed: ```BOLT-DEBUG: analyzeJumpTable in read_encoded_value_with_base/2(*2) * Checking 0x428ff40 -> OK: real entry * Checking 0x428ff44 -> OK: real entry * Checking 0x428ff48 -> OK: real entry * Checking 0x428ff4c -> OK: real entry * Checking 0x428ff50 -> OK: real entry * Checking 0x428ff54 -> OK: address in split fragment * Checking 0x428ff58 -> OK: address in split fragment * Checking 0x428ff5c -> OK: address in split fragment * Checking 0x428ff60 -> OK: address in split fragment * Checking 0x428ff64 -> OK: real entry * Checking 0x428ff68 -> OK: real entry * Checking 0x428ff6c -> OK: real entry * Checking 0x428ff70 -> OK: real entry BOLT-DEBUG: analyzeJumpTable in classify_object_over_fdes/1(*2) * Checking 0x428ff74 -> OK: real entry ... ``` - Dump skipped functions: ``` Skipping _ZNK6icu_676number4impl12RoundingImpl5applyERNS1_15DecimalQuantityER10UErrorCode.part.2/1(*2) family Ignoring _ZNK6icu_676number4impl12RoundingImpl5applyERNS1_15DecimalQuantityER10UErrorCode.part.2/1(*2) Ignoring _ZNK6icu_676number4impl12RoundingImpl5applyERNS1_15DecimalQuantityER10UErrorCode.part.2.cold.3/1(*2) Skipping _ZNK6icu_676number4impl12RoundingImpl5applyERNS1_15DecimalQuantityER10UErrorCode family Ignoring _ZNK6icu_676number4impl12RoundingImpl5applyERNS1_15DecimalQuantityER10UErrorCode Ignoring _ZNK6icu_676number4impl12RoundingImpl5applyERNS1_15DecimalQuantityER10UErrorCode.cold.4/1(*2) ``` - Dump values of unclaimed PC-relative relocations in data. (cherry picked from FBD24898172)
2020-11-12 11:54:38 -08:00
dbgs() << Twine::utohexstr(Reloc) << '\n';
});
assert(0 && "unclaimed PC-relative relocations left in data\n");
}
clearList(DataPCRelocations);
// Functions containing split jump tables need to be skipped with all
// fragments.
for (BinaryFunction *BF : FuncsToSkip) {
BinaryFunction *ParentBF =
const_cast<BinaryFunction *>(BF->getTopmostFragment());
LLVM_DEBUG(dbgs() << "Skipping " << ParentBF->getPrintName()
<< " family\n");
ParentBF->setIgnored();
ParentBF->ignoreFragments();
}
}
MCSymbol *BinaryContext::getOrCreateGlobalSymbol(uint64_t Address,
Twine Prefix,
[BOLT] Refactor global symbol handling code. Summary: This is preparation work for static data reordering. I've created a new class called BinaryData which represents a symbol contained in a section. It records almost all the information relevant for dealing with data, e.g. names, address, size, alignment, profiling data, etc. BinaryContext still stores and manages BinaryData objects similar to how it managed symbols and global addresses before. The interfaces are not changed too drastically from before either. There is a bit of overlap between BinaryData and BinaryFunction. I would have liked to do some more refactoring to make a BinaryFunctionFragment that subclassed from BinaryData and then have BinaryFunction be composed or associated with BinaryFunctionFragments. I've also attempted to use (symbol + offset) for when addresses are pointing into the middle of symbols with known sizes. This changes the simplify rodata loads optimization slightly since the expression on an instruction can now also be a (symbol + offset) rather than just a symbol. One of the overall goals for this refactoring is to make sure every relocation is associated with a BinaryData object. This requires adding "hole" BinaryData's wherever there are gaps in a section's address space. Most of the holes seem to be data that has no associated symbol info. In this case we can't do any better than lumping all the adjacent hole symbols into one big symbol (there may be more than one actual data object that contributes to a hole). At least the combined holes should be moveable. Jump tables have similar issues. They appear to mostly be sub-objects for top level local symbols. The main problem is that we can't recognize jump tables at the time we scan the symbol table, we have to wait til disassembly. When a jump table is discovered we add it as a sub-object to the existing local symbol. If there are one or more existing BinaryData's that appear in the address range of a newly created jump table, those are added as sub-objects as well. (cherry picked from FBD6362544)
2017-11-14 20:05:11 -08:00
uint64_t Size,
uint16_t Alignment,
[BOLT] Static data reordering pass. Summary: Enable BOLT to reorder data sections in a binary based on memory profiling data. This diff adds a new pass to BOLT that can reorder data sections for better locality based on memory profiling data. For now, the algorithm to order data is primitive and just relies on the frequency of loads to order the contents of a section. We could probably do a lot better by looking at what functions use the hot data and grouping together hot data that is used by a single function (or cluster of functions). Block ordering might give some hints on how to order the data better as well. The new pass has two basic modes: inplace and split (when inplace is false). The default is split since inplace hasn't really been tested much. When splitting is on, the cold data is copied to a "cold" version of the section while the hot data is kept in the original section, e.g. for .rodata, .rodata will contain the hot data and .bolt.org.rodata will contain the cold bits. In inplace mode, the section contents are reordered inplace. In either mode, all relocations to data within that section are updated to reflect new data locations. Things to improve: - The current algorithm is really dumb and doesn't seem to lead to any wins. It certainly could use some improvement. - Private symbols can have data that leaks over to an adjacent symbol, e.g. a string that has a common suffix can start in one symbol and leak over (with the common suffix) into the next. For now, we punt on adjacent private symbols. - Handle ambiguous relocations better. Section relocations that point to the boundary of two symbols will prevent the adjacent symbols from being moved because we can't tell which symbol the relocation is for. - Handle jump tables. Right now jump table support must be basic if data reordering is enabled. - Being able to handle TLS. A good amount of data access in some binaries are happening in TLS. It would be worthwhile to be able to reorder any TLS sections too. - Handle sections with writeable data. This hasn't been tested so probably won't work. We could try to prevent false sharing in writeable sections as well. - A pie in the sky goal would be to use DWARF info to reorder types. (cherry picked from FBD6792876)
2018-04-20 20:03:31 -07:00
unsigned Flags) {
[BOLT] Refactor global symbol handling code. Summary: This is preparation work for static data reordering. I've created a new class called BinaryData which represents a symbol contained in a section. It records almost all the information relevant for dealing with data, e.g. names, address, size, alignment, profiling data, etc. BinaryContext still stores and manages BinaryData objects similar to how it managed symbols and global addresses before. The interfaces are not changed too drastically from before either. There is a bit of overlap between BinaryData and BinaryFunction. I would have liked to do some more refactoring to make a BinaryFunctionFragment that subclassed from BinaryData and then have BinaryFunction be composed or associated with BinaryFunctionFragments. I've also attempted to use (symbol + offset) for when addresses are pointing into the middle of symbols with known sizes. This changes the simplify rodata loads optimization slightly since the expression on an instruction can now also be a (symbol + offset) rather than just a symbol. One of the overall goals for this refactoring is to make sure every relocation is associated with a BinaryData object. This requires adding "hole" BinaryData's wherever there are gaps in a section's address space. Most of the holes seem to be data that has no associated symbol info. In this case we can't do any better than lumping all the adjacent hole symbols into one big symbol (there may be more than one actual data object that contributes to a hole). At least the combined holes should be moveable. Jump tables have similar issues. They appear to mostly be sub-objects for top level local symbols. The main problem is that we can't recognize jump tables at the time we scan the symbol table, we have to wait til disassembly. When a jump table is discovered we add it as a sub-object to the existing local symbol. If there are one or more existing BinaryData's that appear in the address range of a newly created jump table, those are added as sub-objects as well. (cherry picked from FBD6362544)
2017-11-14 20:05:11 -08:00
auto Itr = BinaryDataMap.find(Address);
if (Itr != BinaryDataMap.end()) {
assert(Itr->second->getSize() == Size || !Size);
return Itr->second->getSymbol();
}
[BOLT] Refactor global symbol handling code. Summary: This is preparation work for static data reordering. I've created a new class called BinaryData which represents a symbol contained in a section. It records almost all the information relevant for dealing with data, e.g. names, address, size, alignment, profiling data, etc. BinaryContext still stores and manages BinaryData objects similar to how it managed symbols and global addresses before. The interfaces are not changed too drastically from before either. There is a bit of overlap between BinaryData and BinaryFunction. I would have liked to do some more refactoring to make a BinaryFunctionFragment that subclassed from BinaryData and then have BinaryFunction be composed or associated with BinaryFunctionFragments. I've also attempted to use (symbol + offset) for when addresses are pointing into the middle of symbols with known sizes. This changes the simplify rodata loads optimization slightly since the expression on an instruction can now also be a (symbol + offset) rather than just a symbol. One of the overall goals for this refactoring is to make sure every relocation is associated with a BinaryData object. This requires adding "hole" BinaryData's wherever there are gaps in a section's address space. Most of the holes seem to be data that has no associated symbol info. In this case we can't do any better than lumping all the adjacent hole symbols into one big symbol (there may be more than one actual data object that contributes to a hole). At least the combined holes should be moveable. Jump tables have similar issues. They appear to mostly be sub-objects for top level local symbols. The main problem is that we can't recognize jump tables at the time we scan the symbol table, we have to wait til disassembly. When a jump table is discovered we add it as a sub-object to the existing local symbol. If there are one or more existing BinaryData's that appear in the address range of a newly created jump table, those are added as sub-objects as well. (cherry picked from FBD6362544)
2017-11-14 20:05:11 -08:00
std::string Name = (Prefix + "0x" + Twine::utohexstr(Address)).str();
assert(!GlobalSymbols.count(Name) && "created name is not unique");
[BOLT] Static data reordering pass. Summary: Enable BOLT to reorder data sections in a binary based on memory profiling data. This diff adds a new pass to BOLT that can reorder data sections for better locality based on memory profiling data. For now, the algorithm to order data is primitive and just relies on the frequency of loads to order the contents of a section. We could probably do a lot better by looking at what functions use the hot data and grouping together hot data that is used by a single function (or cluster of functions). Block ordering might give some hints on how to order the data better as well. The new pass has two basic modes: inplace and split (when inplace is false). The default is split since inplace hasn't really been tested much. When splitting is on, the cold data is copied to a "cold" version of the section while the hot data is kept in the original section, e.g. for .rodata, .rodata will contain the hot data and .bolt.org.rodata will contain the cold bits. In inplace mode, the section contents are reordered inplace. In either mode, all relocations to data within that section are updated to reflect new data locations. Things to improve: - The current algorithm is really dumb and doesn't seem to lead to any wins. It certainly could use some improvement. - Private symbols can have data that leaks over to an adjacent symbol, e.g. a string that has a common suffix can start in one symbol and leak over (with the common suffix) into the next. For now, we punt on adjacent private symbols. - Handle ambiguous relocations better. Section relocations that point to the boundary of two symbols will prevent the adjacent symbols from being moved because we can't tell which symbol the relocation is for. - Handle jump tables. Right now jump table support must be basic if data reordering is enabled. - Being able to handle TLS. A good amount of data access in some binaries are happening in TLS. It would be worthwhile to be able to reorder any TLS sections too. - Handle sections with writeable data. This hasn't been tested so probably won't work. We could try to prevent false sharing in writeable sections as well. - A pie in the sky goal would be to use DWARF info to reorder types. (cherry picked from FBD6792876)
2018-04-20 20:03:31 -07:00
return registerNameAtAddress(Name, Address, Size, Alignment, Flags);
[BOLT] Refactor global symbol handling code. Summary: This is preparation work for static data reordering. I've created a new class called BinaryData which represents a symbol contained in a section. It records almost all the information relevant for dealing with data, e.g. names, address, size, alignment, profiling data, etc. BinaryContext still stores and manages BinaryData objects similar to how it managed symbols and global addresses before. The interfaces are not changed too drastically from before either. There is a bit of overlap between BinaryData and BinaryFunction. I would have liked to do some more refactoring to make a BinaryFunctionFragment that subclassed from BinaryData and then have BinaryFunction be composed or associated with BinaryFunctionFragments. I've also attempted to use (symbol + offset) for when addresses are pointing into the middle of symbols with known sizes. This changes the simplify rodata loads optimization slightly since the expression on an instruction can now also be a (symbol + offset) rather than just a symbol. One of the overall goals for this refactoring is to make sure every relocation is associated with a BinaryData object. This requires adding "hole" BinaryData's wherever there are gaps in a section's address space. Most of the holes seem to be data that has no associated symbol info. In this case we can't do any better than lumping all the adjacent hole symbols into one big symbol (there may be more than one actual data object that contributes to a hole). At least the combined holes should be moveable. Jump tables have similar issues. They appear to mostly be sub-objects for top level local symbols. The main problem is that we can't recognize jump tables at the time we scan the symbol table, we have to wait til disassembly. When a jump table is discovered we add it as a sub-object to the existing local symbol. If there are one or more existing BinaryData's that appear in the address range of a newly created jump table, those are added as sub-objects as well. (cherry picked from FBD6362544)
2017-11-14 20:05:11 -08:00
}
BinaryFunction *BinaryContext::createBinaryFunction(
const std::string &Name, BinarySection &Section, uint64_t Address,
[BOLT] Support for lite mode with relocations Summary: Add '-lite' support for relocations for improved processing time, memory consumption, and more resilient processing of binaries with embedded assembly code. In lite relocation mode, BOLT will skip full processing of functions without a profile. It will run scanExternalRefs() on such functions to discover external references and to create internal relocations to update references to optimized functions. Note that we could have relied on the compiler/linker to provide relocations for function references. However, there's no assurance that all such references are reported. E.g., the compiler can resolve inter-procedural references internally, leaving no relocations for the linker. The scan process takes about <10 seconds per 100MB of code on modern hardware. It's a reasonable overhead to live with considering the flexibility it provides. If BOLT fails to scan or disassemble a function, .e.g., due to a data object embedded in code, or an unsupported instruction, it enables a patching mode to guarantee that the failed function will call optimized/moved versions of functions. The patching happens at original function entry points. '-skip=<func1,func2,...>' option now can be used to skip processing of arbitrary functions in the relocation mode. With '-use-old-text' or '-strict' we require all functions to be processed. As such, it is incompatible with '-lite' option, and '-skip' option will only disable optimizations of listed functions, not their disassembly and emission. (cherry picked from FBD22040717)
2020-06-15 00:15:47 -07:00
uint64_t Size, uint64_t SymbolSize, uint16_t Alignment) {
auto Result = BinaryFunctions.emplace(
[BOLT] Support for lite mode with relocations Summary: Add '-lite' support for relocations for improved processing time, memory consumption, and more resilient processing of binaries with embedded assembly code. In lite relocation mode, BOLT will skip full processing of functions without a profile. It will run scanExternalRefs() on such functions to discover external references and to create internal relocations to update references to optimized functions. Note that we could have relied on the compiler/linker to provide relocations for function references. However, there's no assurance that all such references are reported. E.g., the compiler can resolve inter-procedural references internally, leaving no relocations for the linker. The scan process takes about <10 seconds per 100MB of code on modern hardware. It's a reasonable overhead to live with considering the flexibility it provides. If BOLT fails to scan or disassemble a function, .e.g., due to a data object embedded in code, or an unsupported instruction, it enables a patching mode to guarantee that the failed function will call optimized/moved versions of functions. The patching happens at original function entry points. '-skip=<func1,func2,...>' option now can be used to skip processing of arbitrary functions in the relocation mode. With '-use-old-text' or '-strict' we require all functions to be processed. As such, it is incompatible with '-lite' option, and '-skip' option will only disable optimizations of listed functions, not their disassembly and emission. (cherry picked from FBD22040717)
2020-06-15 00:15:47 -07:00
Address, BinaryFunction(Name, Section, Address, Size, *this));
assert(Result.second == true && "unexpected duplicate function");
BinaryFunction *BF = &Result.first->second;
registerNameAtAddress(Name, Address, SymbolSize ? SymbolSize : Size,
Alignment);
setSymbolToFunctionMap(BF->getSymbol(), BF);
return BF;
}
const MCSymbol *
BinaryContext::getOrCreateJumpTable(BinaryFunction &Function, uint64_t Address,
JumpTable::JumpTableType Type) {
if (JumpTable *JT = getJumpTableContainingAddress(Address)) {
assert(JT->Type == Type && "jump table types have to match");
assert(JT->Parent == &Function &&
"cannot re-use jump table of a different function");
assert(Address == JT->getAddress() && "unexpected non-empty jump table");
return JT->getFirstLabel();
}
// Re-use the existing symbol if possible.
MCSymbol *JTLabel = nullptr;
if (BinaryData *Object = getBinaryDataAtAddress(Address)) {
if (!isInternalSymbolName(Object->getSymbol()->getName()))
JTLabel = Object->getSymbol();
}
const uint64_t EntrySize = getJumpTableEntrySize(Type);
if (!JTLabel) {
const std::string JumpTableName = generateJumpTableName(Function, Address);
JTLabel = registerNameAtAddress(JumpTableName, Address, 0, EntrySize);
}
LLVM_DEBUG(dbgs() << "BOLT-DEBUG: creating jump table " << JTLabel->getName()
<< " in function " << Function << '\n');
JumpTable *JT = new JumpTable(*JTLabel, Address, EntrySize, Type,
JumpTable::LabelMapType{{0, JTLabel}}, Function,
*getSectionForAddress(Address));
JumpTables.emplace(Address, JT);
// Duplicate the entry for the parent function for easy access.
Function.JumpTables.emplace(Address, JT);
return JTLabel;
}
std::pair<uint64_t, const MCSymbol *>
BinaryContext::duplicateJumpTable(BinaryFunction &Function, JumpTable *JT,
const MCSymbol *OldLabel) {
auto L = scopeLock();
unsigned Offset = 0;
bool Found = false;
for (std::pair<const unsigned, MCSymbol *> Elmt : JT->Labels) {
if (Elmt.second != OldLabel)
continue;
Offset = Elmt.first;
Found = true;
break;
}
assert(Found && "Label not found");
MCSymbol *NewLabel = Ctx->createNamedTempSymbol("duplicatedJT");
JumpTable *NewJT =
new JumpTable(*NewLabel, JT->getAddress(), JT->EntrySize, JT->Type,
JumpTable::LabelMapType{{Offset, NewLabel}}, Function,
*getSectionForAddress(JT->getAddress()));
NewJT->Entries = JT->Entries;
NewJT->Counts = JT->Counts;
uint64_t JumpTableID = ++DuplicatedJumpTables;
// Invert it to differentiate from regular jump tables whose IDs are their
// addresses in the input binary memory space
JumpTableID = ~JumpTableID;
JumpTables.emplace(JumpTableID, NewJT);
Function.JumpTables.emplace(JumpTableID, NewJT);
return std::make_pair(JumpTableID, NewLabel);
}
std::string BinaryContext::generateJumpTableName(const BinaryFunction &BF,
uint64_t Address) {
size_t Id;
uint64_t Offset = 0;
if (const JumpTable *JT = BF.getJumpTableContainingAddress(Address)) {
Offset = Address - JT->getAddress();
auto Itr = JT->Labels.find(Offset);
if (Itr != JT->Labels.end()) {
return std::string(Itr->second->getName());
}
Id = JumpTableIds.at(JT->getAddress());
} else {
Id = JumpTableIds[Address] = BF.JumpTables.size();
}
return ("JUMP_TABLE/" + BF.getOneName().str() + "." + std::to_string(Id) +
(Offset ? ("." + std::to_string(Offset)) : ""));
}
bool BinaryContext::hasValidCodePadding(const BinaryFunction &BF) {
// FIXME: aarch64 support is missing.
if (!isX86())
return true;
if (BF.getSize() == BF.getMaxSize())
return true;
ErrorOr<ArrayRef<unsigned char>> FunctionData = BF.getData();
assert(FunctionData && "cannot get function as data");
uint64_t Offset = BF.getSize();
MCInst Instr;
uint64_t InstrSize = 0;
uint64_t InstrAddress = BF.getAddress() + Offset;
using std::placeholders::_1;
// Skip instructions that satisfy the predicate condition.
auto skipInstructions = [&](std::function<bool(const MCInst &)> Predicate) {
const uint64_t StartOffset = Offset;
for (; Offset < BF.getMaxSize();
Offset += InstrSize, InstrAddress += InstrSize) {
if (!DisAsm->getInstruction(Instr,
InstrSize,
FunctionData->slice(Offset),
InstrAddress,
nulls()))
break;
if (!Predicate(Instr))
break;
}
return Offset - StartOffset;
};
// Skip a sequence of zero bytes.
auto skipZeros = [&]() {
const uint64_t StartOffset = Offset;
for (; Offset < BF.getMaxSize(); ++Offset)
if ((*FunctionData)[Offset] != 0)
break;
return Offset - StartOffset;
};
// Accept the whole padding area filled with breakpoints.
auto isBreakpoint = std::bind(&MCPlusBuilder::isBreakpoint, MIB.get(), _1);
if (skipInstructions(isBreakpoint) && Offset == BF.getMaxSize())
return true;
auto isNoop = std::bind(&MCPlusBuilder::isNoop, MIB.get(), _1);
// Some functions have a jump to the next function or to the padding area
// inserted after the body.
auto isSkipJump = [&](const MCInst &Instr) {
uint64_t TargetAddress = 0;
if (MIB->isUnconditionalBranch(Instr) &&
MIB->evaluateBranch(Instr, InstrAddress, InstrSize, TargetAddress)) {
if (TargetAddress >= InstrAddress + InstrSize &&
TargetAddress <= BF.getAddress() + BF.getMaxSize()) {
return true;
}
}
return false;
};
// Skip over nops, jumps, and zero padding. Allow interleaving (this happens).
while (skipInstructions(isNoop) ||
skipInstructions(isSkipJump) ||
skipZeros())
;
if (Offset == BF.getMaxSize())
return true;
if (opts::Verbosity >= 1) {
errs() << "BOLT-WARNING: bad padding at address 0x"
<< Twine::utohexstr(BF.getAddress() + BF.getSize())
<< " starting at offset "
<< (Offset - BF.getSize()) << " in function "
<< BF << '\n'
<< FunctionData->slice(BF.getSize(), BF.getMaxSize() - BF.getSize())
<< '\n';
}
return false;
}
void BinaryContext::adjustCodePadding() {
for (auto &BFI : BinaryFunctions) {
BinaryFunction &BF = BFI.second;
[BOLT] Support for lite mode with relocations Summary: Add '-lite' support for relocations for improved processing time, memory consumption, and more resilient processing of binaries with embedded assembly code. In lite relocation mode, BOLT will skip full processing of functions without a profile. It will run scanExternalRefs() on such functions to discover external references and to create internal relocations to update references to optimized functions. Note that we could have relied on the compiler/linker to provide relocations for function references. However, there's no assurance that all such references are reported. E.g., the compiler can resolve inter-procedural references internally, leaving no relocations for the linker. The scan process takes about <10 seconds per 100MB of code on modern hardware. It's a reasonable overhead to live with considering the flexibility it provides. If BOLT fails to scan or disassemble a function, .e.g., due to a data object embedded in code, or an unsupported instruction, it enables a patching mode to guarantee that the failed function will call optimized/moved versions of functions. The patching happens at original function entry points. '-skip=<func1,func2,...>' option now can be used to skip processing of arbitrary functions in the relocation mode. With '-use-old-text' or '-strict' we require all functions to be processed. As such, it is incompatible with '-lite' option, and '-skip' option will only disable optimizations of listed functions, not their disassembly and emission. (cherry picked from FBD22040717)
2020-06-15 00:15:47 -07:00
if (!shouldEmit(BF))
continue;
if (!hasValidCodePadding(BF)) {
[BOLT] Support for lite mode with relocations Summary: Add '-lite' support for relocations for improved processing time, memory consumption, and more resilient processing of binaries with embedded assembly code. In lite relocation mode, BOLT will skip full processing of functions without a profile. It will run scanExternalRefs() on such functions to discover external references and to create internal relocations to update references to optimized functions. Note that we could have relied on the compiler/linker to provide relocations for function references. However, there's no assurance that all such references are reported. E.g., the compiler can resolve inter-procedural references internally, leaving no relocations for the linker. The scan process takes about <10 seconds per 100MB of code on modern hardware. It's a reasonable overhead to live with considering the flexibility it provides. If BOLT fails to scan or disassemble a function, .e.g., due to a data object embedded in code, or an unsupported instruction, it enables a patching mode to guarantee that the failed function will call optimized/moved versions of functions. The patching happens at original function entry points. '-skip=<func1,func2,...>' option now can be used to skip processing of arbitrary functions in the relocation mode. With '-use-old-text' or '-strict' we require all functions to be processed. As such, it is incompatible with '-lite' option, and '-skip' option will only disable optimizations of listed functions, not their disassembly and emission. (cherry picked from FBD22040717)
2020-06-15 00:15:47 -07:00
if (HasRelocations) {
if (opts::Verbosity >= 1) {
outs() << "BOLT-INFO: function " << BF
<< " has invalid padding. Ignoring the function.\n";
}
BF.setIgnored();
} else {
BF.setMaxSize(BF.getSize());
}
}
}
}
[BOLT] Refactor global symbol handling code. Summary: This is preparation work for static data reordering. I've created a new class called BinaryData which represents a symbol contained in a section. It records almost all the information relevant for dealing with data, e.g. names, address, size, alignment, profiling data, etc. BinaryContext still stores and manages BinaryData objects similar to how it managed symbols and global addresses before. The interfaces are not changed too drastically from before either. There is a bit of overlap between BinaryData and BinaryFunction. I would have liked to do some more refactoring to make a BinaryFunctionFragment that subclassed from BinaryData and then have BinaryFunction be composed or associated with BinaryFunctionFragments. I've also attempted to use (symbol + offset) for when addresses are pointing into the middle of symbols with known sizes. This changes the simplify rodata loads optimization slightly since the expression on an instruction can now also be a (symbol + offset) rather than just a symbol. One of the overall goals for this refactoring is to make sure every relocation is associated with a BinaryData object. This requires adding "hole" BinaryData's wherever there are gaps in a section's address space. Most of the holes seem to be data that has no associated symbol info. In this case we can't do any better than lumping all the adjacent hole symbols into one big symbol (there may be more than one actual data object that contributes to a hole). At least the combined holes should be moveable. Jump tables have similar issues. They appear to mostly be sub-objects for top level local symbols. The main problem is that we can't recognize jump tables at the time we scan the symbol table, we have to wait til disassembly. When a jump table is discovered we add it as a sub-object to the existing local symbol. If there are one or more existing BinaryData's that appear in the address range of a newly created jump table, those are added as sub-objects as well. (cherry picked from FBD6362544)
2017-11-14 20:05:11 -08:00
MCSymbol *BinaryContext::registerNameAtAddress(StringRef Name,
uint64_t Address,
uint64_t Size,
[BOLT] Static data reordering pass. Summary: Enable BOLT to reorder data sections in a binary based on memory profiling data. This diff adds a new pass to BOLT that can reorder data sections for better locality based on memory profiling data. For now, the algorithm to order data is primitive and just relies on the frequency of loads to order the contents of a section. We could probably do a lot better by looking at what functions use the hot data and grouping together hot data that is used by a single function (or cluster of functions). Block ordering might give some hints on how to order the data better as well. The new pass has two basic modes: inplace and split (when inplace is false). The default is split since inplace hasn't really been tested much. When splitting is on, the cold data is copied to a "cold" version of the section while the hot data is kept in the original section, e.g. for .rodata, .rodata will contain the hot data and .bolt.org.rodata will contain the cold bits. In inplace mode, the section contents are reordered inplace. In either mode, all relocations to data within that section are updated to reflect new data locations. Things to improve: - The current algorithm is really dumb and doesn't seem to lead to any wins. It certainly could use some improvement. - Private symbols can have data that leaks over to an adjacent symbol, e.g. a string that has a common suffix can start in one symbol and leak over (with the common suffix) into the next. For now, we punt on adjacent private symbols. - Handle ambiguous relocations better. Section relocations that point to the boundary of two symbols will prevent the adjacent symbols from being moved because we can't tell which symbol the relocation is for. - Handle jump tables. Right now jump table support must be basic if data reordering is enabled. - Being able to handle TLS. A good amount of data access in some binaries are happening in TLS. It would be worthwhile to be able to reorder any TLS sections too. - Handle sections with writeable data. This hasn't been tested so probably won't work. We could try to prevent false sharing in writeable sections as well. - A pie in the sky goal would be to use DWARF info to reorder types. (cherry picked from FBD6792876)
2018-04-20 20:03:31 -07:00
uint16_t Alignment,
unsigned Flags) {
// Register the name with MCContext.
MCSymbol *Symbol = Ctx->getOrCreateSymbol(Name);
[BOLT] Refactor global symbol handling code. Summary: This is preparation work for static data reordering. I've created a new class called BinaryData which represents a symbol contained in a section. It records almost all the information relevant for dealing with data, e.g. names, address, size, alignment, profiling data, etc. BinaryContext still stores and manages BinaryData objects similar to how it managed symbols and global addresses before. The interfaces are not changed too drastically from before either. There is a bit of overlap between BinaryData and BinaryFunction. I would have liked to do some more refactoring to make a BinaryFunctionFragment that subclassed from BinaryData and then have BinaryFunction be composed or associated with BinaryFunctionFragments. I've also attempted to use (symbol + offset) for when addresses are pointing into the middle of symbols with known sizes. This changes the simplify rodata loads optimization slightly since the expression on an instruction can now also be a (symbol + offset) rather than just a symbol. One of the overall goals for this refactoring is to make sure every relocation is associated with a BinaryData object. This requires adding "hole" BinaryData's wherever there are gaps in a section's address space. Most of the holes seem to be data that has no associated symbol info. In this case we can't do any better than lumping all the adjacent hole symbols into one big symbol (there may be more than one actual data object that contributes to a hole). At least the combined holes should be moveable. Jump tables have similar issues. They appear to mostly be sub-objects for top level local symbols. The main problem is that we can't recognize jump tables at the time we scan the symbol table, we have to wait til disassembly. When a jump table is discovered we add it as a sub-object to the existing local symbol. If there are one or more existing BinaryData's that appear in the address range of a newly created jump table, those are added as sub-objects as well. (cherry picked from FBD6362544)
2017-11-14 20:05:11 -08:00
auto GAI = BinaryDataMap.find(Address);
BinaryData *BD;
if (GAI == BinaryDataMap.end()) {
ErrorOr<BinarySection &> SectionOrErr = getSectionForAddress(Address);
BinarySection &Section =
SectionOrErr ? SectionOrErr.get() : absoluteSection();
BD = new BinaryData(*Symbol,
[BOLT] Refactor global symbol handling code. Summary: This is preparation work for static data reordering. I've created a new class called BinaryData which represents a symbol contained in a section. It records almost all the information relevant for dealing with data, e.g. names, address, size, alignment, profiling data, etc. BinaryContext still stores and manages BinaryData objects similar to how it managed symbols and global addresses before. The interfaces are not changed too drastically from before either. There is a bit of overlap between BinaryData and BinaryFunction. I would have liked to do some more refactoring to make a BinaryFunctionFragment that subclassed from BinaryData and then have BinaryFunction be composed or associated with BinaryFunctionFragments. I've also attempted to use (symbol + offset) for when addresses are pointing into the middle of symbols with known sizes. This changes the simplify rodata loads optimization slightly since the expression on an instruction can now also be a (symbol + offset) rather than just a symbol. One of the overall goals for this refactoring is to make sure every relocation is associated with a BinaryData object. This requires adding "hole" BinaryData's wherever there are gaps in a section's address space. Most of the holes seem to be data that has no associated symbol info. In this case we can't do any better than lumping all the adjacent hole symbols into one big symbol (there may be more than one actual data object that contributes to a hole). At least the combined holes should be moveable. Jump tables have similar issues. They appear to mostly be sub-objects for top level local symbols. The main problem is that we can't recognize jump tables at the time we scan the symbol table, we have to wait til disassembly. When a jump table is discovered we add it as a sub-object to the existing local symbol. If there are one or more existing BinaryData's that appear in the address range of a newly created jump table, those are added as sub-objects as well. (cherry picked from FBD6362544)
2017-11-14 20:05:11 -08:00
Address,
Size,
Alignment ? Alignment : 1,
[BOLT] Static data reordering pass. Summary: Enable BOLT to reorder data sections in a binary based on memory profiling data. This diff adds a new pass to BOLT that can reorder data sections for better locality based on memory profiling data. For now, the algorithm to order data is primitive and just relies on the frequency of loads to order the contents of a section. We could probably do a lot better by looking at what functions use the hot data and grouping together hot data that is used by a single function (or cluster of functions). Block ordering might give some hints on how to order the data better as well. The new pass has two basic modes: inplace and split (when inplace is false). The default is split since inplace hasn't really been tested much. When splitting is on, the cold data is copied to a "cold" version of the section while the hot data is kept in the original section, e.g. for .rodata, .rodata will contain the hot data and .bolt.org.rodata will contain the cold bits. In inplace mode, the section contents are reordered inplace. In either mode, all relocations to data within that section are updated to reflect new data locations. Things to improve: - The current algorithm is really dumb and doesn't seem to lead to any wins. It certainly could use some improvement. - Private symbols can have data that leaks over to an adjacent symbol, e.g. a string that has a common suffix can start in one symbol and leak over (with the common suffix) into the next. For now, we punt on adjacent private symbols. - Handle ambiguous relocations better. Section relocations that point to the boundary of two symbols will prevent the adjacent symbols from being moved because we can't tell which symbol the relocation is for. - Handle jump tables. Right now jump table support must be basic if data reordering is enabled. - Being able to handle TLS. A good amount of data access in some binaries are happening in TLS. It would be worthwhile to be able to reorder any TLS sections too. - Handle sections with writeable data. This hasn't been tested so probably won't work. We could try to prevent false sharing in writeable sections as well. - A pie in the sky goal would be to use DWARF info to reorder types. (cherry picked from FBD6792876)
2018-04-20 20:03:31 -07:00
Section,
Flags);
[BOLT] Refactor global symbol handling code. Summary: This is preparation work for static data reordering. I've created a new class called BinaryData which represents a symbol contained in a section. It records almost all the information relevant for dealing with data, e.g. names, address, size, alignment, profiling data, etc. BinaryContext still stores and manages BinaryData objects similar to how it managed symbols and global addresses before. The interfaces are not changed too drastically from before either. There is a bit of overlap between BinaryData and BinaryFunction. I would have liked to do some more refactoring to make a BinaryFunctionFragment that subclassed from BinaryData and then have BinaryFunction be composed or associated with BinaryFunctionFragments. I've also attempted to use (symbol + offset) for when addresses are pointing into the middle of symbols with known sizes. This changes the simplify rodata loads optimization slightly since the expression on an instruction can now also be a (symbol + offset) rather than just a symbol. One of the overall goals for this refactoring is to make sure every relocation is associated with a BinaryData object. This requires adding "hole" BinaryData's wherever there are gaps in a section's address space. Most of the holes seem to be data that has no associated symbol info. In this case we can't do any better than lumping all the adjacent hole symbols into one big symbol (there may be more than one actual data object that contributes to a hole). At least the combined holes should be moveable. Jump tables have similar issues. They appear to mostly be sub-objects for top level local symbols. The main problem is that we can't recognize jump tables at the time we scan the symbol table, we have to wait til disassembly. When a jump table is discovered we add it as a sub-object to the existing local symbol. If there are one or more existing BinaryData's that appear in the address range of a newly created jump table, those are added as sub-objects as well. (cherry picked from FBD6362544)
2017-11-14 20:05:11 -08:00
GAI = BinaryDataMap.emplace(Address, BD).first;
GlobalSymbols[Name] = BD;
updateObjectNesting(GAI);
} else {
BD = GAI->second;
if (!BD->hasName(Name)) {
GlobalSymbols[Name] = BD;
BD->Symbols.push_back(Symbol);
}
[BOLT] Refactor global symbol handling code. Summary: This is preparation work for static data reordering. I've created a new class called BinaryData which represents a symbol contained in a section. It records almost all the information relevant for dealing with data, e.g. names, address, size, alignment, profiling data, etc. BinaryContext still stores and manages BinaryData objects similar to how it managed symbols and global addresses before. The interfaces are not changed too drastically from before either. There is a bit of overlap between BinaryData and BinaryFunction. I would have liked to do some more refactoring to make a BinaryFunctionFragment that subclassed from BinaryData and then have BinaryFunction be composed or associated with BinaryFunctionFragments. I've also attempted to use (symbol + offset) for when addresses are pointing into the middle of symbols with known sizes. This changes the simplify rodata loads optimization slightly since the expression on an instruction can now also be a (symbol + offset) rather than just a symbol. One of the overall goals for this refactoring is to make sure every relocation is associated with a BinaryData object. This requires adding "hole" BinaryData's wherever there are gaps in a section's address space. Most of the holes seem to be data that has no associated symbol info. In this case we can't do any better than lumping all the adjacent hole symbols into one big symbol (there may be more than one actual data object that contributes to a hole). At least the combined holes should be moveable. Jump tables have similar issues. They appear to mostly be sub-objects for top level local symbols. The main problem is that we can't recognize jump tables at the time we scan the symbol table, we have to wait til disassembly. When a jump table is discovered we add it as a sub-object to the existing local symbol. If there are one or more existing BinaryData's that appear in the address range of a newly created jump table, those are added as sub-objects as well. (cherry picked from FBD6362544)
2017-11-14 20:05:11 -08:00
}
return Symbol;
}
[BOLT] Refactor global symbol handling code. Summary: This is preparation work for static data reordering. I've created a new class called BinaryData which represents a symbol contained in a section. It records almost all the information relevant for dealing with data, e.g. names, address, size, alignment, profiling data, etc. BinaryContext still stores and manages BinaryData objects similar to how it managed symbols and global addresses before. The interfaces are not changed too drastically from before either. There is a bit of overlap between BinaryData and BinaryFunction. I would have liked to do some more refactoring to make a BinaryFunctionFragment that subclassed from BinaryData and then have BinaryFunction be composed or associated with BinaryFunctionFragments. I've also attempted to use (symbol + offset) for when addresses are pointing into the middle of symbols with known sizes. This changes the simplify rodata loads optimization slightly since the expression on an instruction can now also be a (symbol + offset) rather than just a symbol. One of the overall goals for this refactoring is to make sure every relocation is associated with a BinaryData object. This requires adding "hole" BinaryData's wherever there are gaps in a section's address space. Most of the holes seem to be data that has no associated symbol info. In this case we can't do any better than lumping all the adjacent hole symbols into one big symbol (there may be more than one actual data object that contributes to a hole). At least the combined holes should be moveable. Jump tables have similar issues. They appear to mostly be sub-objects for top level local symbols. The main problem is that we can't recognize jump tables at the time we scan the symbol table, we have to wait til disassembly. When a jump table is discovered we add it as a sub-object to the existing local symbol. If there are one or more existing BinaryData's that appear in the address range of a newly created jump table, those are added as sub-objects as well. (cherry picked from FBD6362544)
2017-11-14 20:05:11 -08:00
const BinaryData *
BinaryContext::getBinaryDataContainingAddressImpl(uint64_t Address) const {
[BOLT] Refactor global symbol handling code. Summary: This is preparation work for static data reordering. I've created a new class called BinaryData which represents a symbol contained in a section. It records almost all the information relevant for dealing with data, e.g. names, address, size, alignment, profiling data, etc. BinaryContext still stores and manages BinaryData objects similar to how it managed symbols and global addresses before. The interfaces are not changed too drastically from before either. There is a bit of overlap between BinaryData and BinaryFunction. I would have liked to do some more refactoring to make a BinaryFunctionFragment that subclassed from BinaryData and then have BinaryFunction be composed or associated with BinaryFunctionFragments. I've also attempted to use (symbol + offset) for when addresses are pointing into the middle of symbols with known sizes. This changes the simplify rodata loads optimization slightly since the expression on an instruction can now also be a (symbol + offset) rather than just a symbol. One of the overall goals for this refactoring is to make sure every relocation is associated with a BinaryData object. This requires adding "hole" BinaryData's wherever there are gaps in a section's address space. Most of the holes seem to be data that has no associated symbol info. In this case we can't do any better than lumping all the adjacent hole symbols into one big symbol (there may be more than one actual data object that contributes to a hole). At least the combined holes should be moveable. Jump tables have similar issues. They appear to mostly be sub-objects for top level local symbols. The main problem is that we can't recognize jump tables at the time we scan the symbol table, we have to wait til disassembly. When a jump table is discovered we add it as a sub-object to the existing local symbol. If there are one or more existing BinaryData's that appear in the address range of a newly created jump table, those are added as sub-objects as well. (cherry picked from FBD6362544)
2017-11-14 20:05:11 -08:00
auto NI = BinaryDataMap.lower_bound(Address);
auto End = BinaryDataMap.end();
if ((NI != End && Address == NI->first) ||
((NI != BinaryDataMap.begin()) && (NI-- != BinaryDataMap.begin()))) {
if (NI->second->containsAddress(Address)) {
[BOLT] Refactor global symbol handling code. Summary: This is preparation work for static data reordering. I've created a new class called BinaryData which represents a symbol contained in a section. It records almost all the information relevant for dealing with data, e.g. names, address, size, alignment, profiling data, etc. BinaryContext still stores and manages BinaryData objects similar to how it managed symbols and global addresses before. The interfaces are not changed too drastically from before either. There is a bit of overlap between BinaryData and BinaryFunction. I would have liked to do some more refactoring to make a BinaryFunctionFragment that subclassed from BinaryData and then have BinaryFunction be composed or associated with BinaryFunctionFragments. I've also attempted to use (symbol + offset) for when addresses are pointing into the middle of symbols with known sizes. This changes the simplify rodata loads optimization slightly since the expression on an instruction can now also be a (symbol + offset) rather than just a symbol. One of the overall goals for this refactoring is to make sure every relocation is associated with a BinaryData object. This requires adding "hole" BinaryData's wherever there are gaps in a section's address space. Most of the holes seem to be data that has no associated symbol info. In this case we can't do any better than lumping all the adjacent hole symbols into one big symbol (there may be more than one actual data object that contributes to a hole). At least the combined holes should be moveable. Jump tables have similar issues. They appear to mostly be sub-objects for top level local symbols. The main problem is that we can't recognize jump tables at the time we scan the symbol table, we have to wait til disassembly. When a jump table is discovered we add it as a sub-object to the existing local symbol. If there are one or more existing BinaryData's that appear in the address range of a newly created jump table, those are added as sub-objects as well. (cherry picked from FBD6362544)
2017-11-14 20:05:11 -08:00
return NI->second;
}
[BOLT] Refactor global symbol handling code. Summary: This is preparation work for static data reordering. I've created a new class called BinaryData which represents a symbol contained in a section. It records almost all the information relevant for dealing with data, e.g. names, address, size, alignment, profiling data, etc. BinaryContext still stores and manages BinaryData objects similar to how it managed symbols and global addresses before. The interfaces are not changed too drastically from before either. There is a bit of overlap between BinaryData and BinaryFunction. I would have liked to do some more refactoring to make a BinaryFunctionFragment that subclassed from BinaryData and then have BinaryFunction be composed or associated with BinaryFunctionFragments. I've also attempted to use (symbol + offset) for when addresses are pointing into the middle of symbols with known sizes. This changes the simplify rodata loads optimization slightly since the expression on an instruction can now also be a (symbol + offset) rather than just a symbol. One of the overall goals for this refactoring is to make sure every relocation is associated with a BinaryData object. This requires adding "hole" BinaryData's wherever there are gaps in a section's address space. Most of the holes seem to be data that has no associated symbol info. In this case we can't do any better than lumping all the adjacent hole symbols into one big symbol (there may be more than one actual data object that contributes to a hole). At least the combined holes should be moveable. Jump tables have similar issues. They appear to mostly be sub-objects for top level local symbols. The main problem is that we can't recognize jump tables at the time we scan the symbol table, we have to wait til disassembly. When a jump table is discovered we add it as a sub-object to the existing local symbol. If there are one or more existing BinaryData's that appear in the address range of a newly created jump table, those are added as sub-objects as well. (cherry picked from FBD6362544)
2017-11-14 20:05:11 -08:00
// If this is a sub-symbol, see if a parent data contains the address.
const BinaryData *BD = NI->second->getParent();
[BOLT] Refactor global symbol handling code. Summary: This is preparation work for static data reordering. I've created a new class called BinaryData which represents a symbol contained in a section. It records almost all the information relevant for dealing with data, e.g. names, address, size, alignment, profiling data, etc. BinaryContext still stores and manages BinaryData objects similar to how it managed symbols and global addresses before. The interfaces are not changed too drastically from before either. There is a bit of overlap between BinaryData and BinaryFunction. I would have liked to do some more refactoring to make a BinaryFunctionFragment that subclassed from BinaryData and then have BinaryFunction be composed or associated with BinaryFunctionFragments. I've also attempted to use (symbol + offset) for when addresses are pointing into the middle of symbols with known sizes. This changes the simplify rodata loads optimization slightly since the expression on an instruction can now also be a (symbol + offset) rather than just a symbol. One of the overall goals for this refactoring is to make sure every relocation is associated with a BinaryData object. This requires adding "hole" BinaryData's wherever there are gaps in a section's address space. Most of the holes seem to be data that has no associated symbol info. In this case we can't do any better than lumping all the adjacent hole symbols into one big symbol (there may be more than one actual data object that contributes to a hole). At least the combined holes should be moveable. Jump tables have similar issues. They appear to mostly be sub-objects for top level local symbols. The main problem is that we can't recognize jump tables at the time we scan the symbol table, we have to wait til disassembly. When a jump table is discovered we add it as a sub-object to the existing local symbol. If there are one or more existing BinaryData's that appear in the address range of a newly created jump table, those are added as sub-objects as well. (cherry picked from FBD6362544)
2017-11-14 20:05:11 -08:00
while (BD) {
if (BD->containsAddress(Address))
[BOLT] Refactor global symbol handling code. Summary: This is preparation work for static data reordering. I've created a new class called BinaryData which represents a symbol contained in a section. It records almost all the information relevant for dealing with data, e.g. names, address, size, alignment, profiling data, etc. BinaryContext still stores and manages BinaryData objects similar to how it managed symbols and global addresses before. The interfaces are not changed too drastically from before either. There is a bit of overlap between BinaryData and BinaryFunction. I would have liked to do some more refactoring to make a BinaryFunctionFragment that subclassed from BinaryData and then have BinaryFunction be composed or associated with BinaryFunctionFragments. I've also attempted to use (symbol + offset) for when addresses are pointing into the middle of symbols with known sizes. This changes the simplify rodata loads optimization slightly since the expression on an instruction can now also be a (symbol + offset) rather than just a symbol. One of the overall goals for this refactoring is to make sure every relocation is associated with a BinaryData object. This requires adding "hole" BinaryData's wherever there are gaps in a section's address space. Most of the holes seem to be data that has no associated symbol info. In this case we can't do any better than lumping all the adjacent hole symbols into one big symbol (there may be more than one actual data object that contributes to a hole). At least the combined holes should be moveable. Jump tables have similar issues. They appear to mostly be sub-objects for top level local symbols. The main problem is that we can't recognize jump tables at the time we scan the symbol table, we have to wait til disassembly. When a jump table is discovered we add it as a sub-object to the existing local symbol. If there are one or more existing BinaryData's that appear in the address range of a newly created jump table, those are added as sub-objects as well. (cherry picked from FBD6362544)
2017-11-14 20:05:11 -08:00
return BD;
BD = BD->getParent();
}
}
return nullptr;
}
[BOLT] Refactor global symbol handling code. Summary: This is preparation work for static data reordering. I've created a new class called BinaryData which represents a symbol contained in a section. It records almost all the information relevant for dealing with data, e.g. names, address, size, alignment, profiling data, etc. BinaryContext still stores and manages BinaryData objects similar to how it managed symbols and global addresses before. The interfaces are not changed too drastically from before either. There is a bit of overlap between BinaryData and BinaryFunction. I would have liked to do some more refactoring to make a BinaryFunctionFragment that subclassed from BinaryData and then have BinaryFunction be composed or associated with BinaryFunctionFragments. I've also attempted to use (symbol + offset) for when addresses are pointing into the middle of symbols with known sizes. This changes the simplify rodata loads optimization slightly since the expression on an instruction can now also be a (symbol + offset) rather than just a symbol. One of the overall goals for this refactoring is to make sure every relocation is associated with a BinaryData object. This requires adding "hole" BinaryData's wherever there are gaps in a section's address space. Most of the holes seem to be data that has no associated symbol info. In this case we can't do any better than lumping all the adjacent hole symbols into one big symbol (there may be more than one actual data object that contributes to a hole). At least the combined holes should be moveable. Jump tables have similar issues. They appear to mostly be sub-objects for top level local symbols. The main problem is that we can't recognize jump tables at the time we scan the symbol table, we have to wait til disassembly. When a jump table is discovered we add it as a sub-object to the existing local symbol. If there are one or more existing BinaryData's that appear in the address range of a newly created jump table, those are added as sub-objects as well. (cherry picked from FBD6362544)
2017-11-14 20:05:11 -08:00
bool BinaryContext::setBinaryDataSize(uint64_t Address, uint64_t Size) {
auto NI = BinaryDataMap.find(Address);
assert(NI != BinaryDataMap.end());
if (NI == BinaryDataMap.end())
return false;
// TODO: it's possible that a jump table starts at the same address
// as a larger blob of private data. When we set the size of the
// jump table, it might be smaller than the total blob size. In this
// case we just leave the original size since (currently) it won't really
// affect anything. See T26915981.
assert((!NI->second->Size || NI->second->Size == Size ||
(NI->second->isJumpTable() && NI->second->Size > Size)) &&
"can't change the size of a symbol that has already had its "
"size set");
if (!NI->second->Size) {
NI->second->Size = Size;
updateObjectNesting(NI);
return true;
}
return false;
}
void BinaryContext::generateSymbolHashes() {
auto isPadding = [](const BinaryData &BD) {
StringRef Contents = BD.getSection().getContents();
StringRef SymData = Contents.substr(BD.getOffset(), BD.getSize());
return (BD.getName().startswith("HOLEat") ||
SymData.find_first_not_of(0) == StringRef::npos);
};
uint64_t NumCollisions = 0;
for (auto &Entry : BinaryDataMap) {
BinaryData &BD = *Entry.second;
StringRef Name = BD.getName();
if (!isInternalSymbolName(Name))
continue;
// First check if a non-anonymous alias exists and move it to the front.
if (BD.getSymbols().size() > 1) {
auto Itr = std::find_if(BD.getSymbols().begin(),
BD.getSymbols().end(),
[&](const MCSymbol *Symbol) {
return !isInternalSymbolName(Symbol->getName());
});
if (Itr != BD.getSymbols().end()) {
size_t Idx = std::distance(BD.getSymbols().begin(), Itr);
std::swap(BD.getSymbols()[0], BD.getSymbols()[Idx]);
continue;
}
}
// We have to skip 0 size symbols since they will all collide.
if (BD.getSize() == 0) {
continue;
}
const uint64_t Hash = BD.getSection().hash(BD);
const size_t Idx = Name.find("0x");
std::string NewName = (Twine(Name.substr(0, Idx)) +
"_" + Twine::utohexstr(Hash)).str();
if (getBinaryDataByName(NewName)) {
// Ignore collisions for symbols that appear to be padding
// (i.e. all zeros or a "hole")
if (!isPadding(BD)) {
if (opts::Verbosity) {
errs() << "BOLT-WARNING: collision detected when hashing " << BD
<< " with new name (" << NewName << "), skipping.\n";
}
++NumCollisions;
}
continue;
}
BD.Symbols.insert(BD.Symbols.begin(),
Ctx->getOrCreateSymbol(NewName));
GlobalSymbols[NewName] = &BD;
}
if (NumCollisions) {
errs() << "BOLT-WARNING: " << NumCollisions
<< " collisions detected while hashing binary objects";
if (!opts::Verbosity)
errs() << ". Use -v=1 to see the list.";
errs() << '\n';
}
}
void BinaryContext::registerFragment(BinaryFunction &TargetFunction,
BinaryFunction &Function) const {
// Only a parent function (or a sibling) can reach its fragment.
assert(!Function.IsFragment &&
"only one cold fragment is supported at this time");
if (BinaryFunction *TargetParent = TargetFunction.getParentFragment()) {
assert(TargetParent == &Function && "mismatching parent function");
return;
}
TargetFunction.setParentFragment(Function);
Function.addFragment(TargetFunction);
if (!HasRelocations) {
TargetFunction.setSimple(false);
Function.setSimple(false);
}
if (opts::Verbosity >= 1) {
outs() << "BOLT-INFO: marking " << TargetFunction
<< " as a fragment of " << Function << '\n';
}
}
[BOLT] Support for lite mode with relocations Summary: Add '-lite' support for relocations for improved processing time, memory consumption, and more resilient processing of binaries with embedded assembly code. In lite relocation mode, BOLT will skip full processing of functions without a profile. It will run scanExternalRefs() on such functions to discover external references and to create internal relocations to update references to optimized functions. Note that we could have relied on the compiler/linker to provide relocations for function references. However, there's no assurance that all such references are reported. E.g., the compiler can resolve inter-procedural references internally, leaving no relocations for the linker. The scan process takes about <10 seconds per 100MB of code on modern hardware. It's a reasonable overhead to live with considering the flexibility it provides. If BOLT fails to scan or disassemble a function, .e.g., due to a data object embedded in code, or an unsupported instruction, it enables a patching mode to guarantee that the failed function will call optimized/moved versions of functions. The patching happens at original function entry points. '-skip=<func1,func2,...>' option now can be used to skip processing of arbitrary functions in the relocation mode. With '-use-old-text' or '-strict' we require all functions to be processed. As such, it is incompatible with '-lite' option, and '-skip' option will only disable optimizations of listed functions, not their disassembly and emission. (cherry picked from FBD22040717)
2020-06-15 00:15:47 -07:00
void BinaryContext::processInterproceduralReferences(BinaryFunction &Function) {
for (uint64_t Address : Function.InterproceduralReferences) {
if (!Address)
continue;
BinaryFunction *TargetFunction =
getBinaryFunctionContainingAddress(Address);
if (&Function == TargetFunction)
continue;
if (TargetFunction) {
if (TargetFunction->IsFragment)
registerFragment(*TargetFunction, Function);
if (uint64_t Offset = Address - TargetFunction->getAddress())
TargetFunction->addEntryPointAtOffset(Offset);
continue;
}
// Check if address falls in function padding space - this could be
// unmarked data in code. In this case adjust the padding space size.
ErrorOr<BinarySection &> Section = getSectionForAddress(Address);
assert(Section && "cannot get section for referenced address");
if (!Section->isText())
continue;
// PLT requires special handling and could be ignored in this context.
StringRef SectionName = Section->getName();
if (SectionName == ".plt" || SectionName == ".plt.got")
continue;
if (opts::processAllFunctions()) {
errs() << "BOLT-ERROR: cannot process binaries with unmarked "
<< "object in code at address 0x"
<< Twine::utohexstr(Address) << " belonging to section "
<< SectionName << " in current mode\n";
exit(1);
}
TargetFunction =
getBinaryFunctionContainingAddress(Address,
/*CheckPastEnd=*/false,
/*UseMaxSize=*/true);
// We are not going to overwrite non-simple functions, but for simple
// ones - adjust the padding size.
if (TargetFunction && TargetFunction->isSimple()) {
errs() << "BOLT-WARNING: function " << *TargetFunction
<< " has an object detected in a padding region at address 0x"
<< Twine::utohexstr(Address) << '\n';
TargetFunction->setMaxSize(TargetFunction->getSize());
}
}
[BOLT] Support for lite mode with relocations Summary: Add '-lite' support for relocations for improved processing time, memory consumption, and more resilient processing of binaries with embedded assembly code. In lite relocation mode, BOLT will skip full processing of functions without a profile. It will run scanExternalRefs() on such functions to discover external references and to create internal relocations to update references to optimized functions. Note that we could have relied on the compiler/linker to provide relocations for function references. However, there's no assurance that all such references are reported. E.g., the compiler can resolve inter-procedural references internally, leaving no relocations for the linker. The scan process takes about <10 seconds per 100MB of code on modern hardware. It's a reasonable overhead to live with considering the flexibility it provides. If BOLT fails to scan or disassemble a function, .e.g., due to a data object embedded in code, or an unsupported instruction, it enables a patching mode to guarantee that the failed function will call optimized/moved versions of functions. The patching happens at original function entry points. '-skip=<func1,func2,...>' option now can be used to skip processing of arbitrary functions in the relocation mode. With '-use-old-text' or '-strict' we require all functions to be processed. As such, it is incompatible with '-lite' option, and '-skip' option will only disable optimizations of listed functions, not their disassembly and emission. (cherry picked from FBD22040717)
2020-06-15 00:15:47 -07:00
clearList(Function.InterproceduralReferences);
}
[BOLT] Refactor global symbol handling code. Summary: This is preparation work for static data reordering. I've created a new class called BinaryData which represents a symbol contained in a section. It records almost all the information relevant for dealing with data, e.g. names, address, size, alignment, profiling data, etc. BinaryContext still stores and manages BinaryData objects similar to how it managed symbols and global addresses before. The interfaces are not changed too drastically from before either. There is a bit of overlap between BinaryData and BinaryFunction. I would have liked to do some more refactoring to make a BinaryFunctionFragment that subclassed from BinaryData and then have BinaryFunction be composed or associated with BinaryFunctionFragments. I've also attempted to use (symbol + offset) for when addresses are pointing into the middle of symbols with known sizes. This changes the simplify rodata loads optimization slightly since the expression on an instruction can now also be a (symbol + offset) rather than just a symbol. One of the overall goals for this refactoring is to make sure every relocation is associated with a BinaryData object. This requires adding "hole" BinaryData's wherever there are gaps in a section's address space. Most of the holes seem to be data that has no associated symbol info. In this case we can't do any better than lumping all the adjacent hole symbols into one big symbol (there may be more than one actual data object that contributes to a hole). At least the combined holes should be moveable. Jump tables have similar issues. They appear to mostly be sub-objects for top level local symbols. The main problem is that we can't recognize jump tables at the time we scan the symbol table, we have to wait til disassembly. When a jump table is discovered we add it as a sub-object to the existing local symbol. If there are one or more existing BinaryData's that appear in the address range of a newly created jump table, those are added as sub-objects as well. (cherry picked from FBD6362544)
2017-11-14 20:05:11 -08:00
void BinaryContext::postProcessSymbolTable() {
fixBinaryDataHoles();
bool Valid = true;
for (auto &Entry : BinaryDataMap) {
BinaryData *BD = Entry.second;
[BOLT] Refactor global symbol handling code. Summary: This is preparation work for static data reordering. I've created a new class called BinaryData which represents a symbol contained in a section. It records almost all the information relevant for dealing with data, e.g. names, address, size, alignment, profiling data, etc. BinaryContext still stores and manages BinaryData objects similar to how it managed symbols and global addresses before. The interfaces are not changed too drastically from before either. There is a bit of overlap between BinaryData and BinaryFunction. I would have liked to do some more refactoring to make a BinaryFunctionFragment that subclassed from BinaryData and then have BinaryFunction be composed or associated with BinaryFunctionFragments. I've also attempted to use (symbol + offset) for when addresses are pointing into the middle of symbols with known sizes. This changes the simplify rodata loads optimization slightly since the expression on an instruction can now also be a (symbol + offset) rather than just a symbol. One of the overall goals for this refactoring is to make sure every relocation is associated with a BinaryData object. This requires adding "hole" BinaryData's wherever there are gaps in a section's address space. Most of the holes seem to be data that has no associated symbol info. In this case we can't do any better than lumping all the adjacent hole symbols into one big symbol (there may be more than one actual data object that contributes to a hole). At least the combined holes should be moveable. Jump tables have similar issues. They appear to mostly be sub-objects for top level local symbols. The main problem is that we can't recognize jump tables at the time we scan the symbol table, we have to wait til disassembly. When a jump table is discovered we add it as a sub-object to the existing local symbol. If there are one or more existing BinaryData's that appear in the address range of a newly created jump table, those are added as sub-objects as well. (cherry picked from FBD6362544)
2017-11-14 20:05:11 -08:00
if ((BD->getName().startswith("SYMBOLat") ||
BD->getName().startswith("DATAat")) &&
!BD->getParent() &&
!BD->getSize() &&
!BD->isAbsolute() &&
BD->getSection()) {
errs() << "BOLT-WARNING: zero-sized top level symbol: " << *BD << "\n";
[BOLT] Refactor global symbol handling code. Summary: This is preparation work for static data reordering. I've created a new class called BinaryData which represents a symbol contained in a section. It records almost all the information relevant for dealing with data, e.g. names, address, size, alignment, profiling data, etc. BinaryContext still stores and manages BinaryData objects similar to how it managed symbols and global addresses before. The interfaces are not changed too drastically from before either. There is a bit of overlap between BinaryData and BinaryFunction. I would have liked to do some more refactoring to make a BinaryFunctionFragment that subclassed from BinaryData and then have BinaryFunction be composed or associated with BinaryFunctionFragments. I've also attempted to use (symbol + offset) for when addresses are pointing into the middle of symbols with known sizes. This changes the simplify rodata loads optimization slightly since the expression on an instruction can now also be a (symbol + offset) rather than just a symbol. One of the overall goals for this refactoring is to make sure every relocation is associated with a BinaryData object. This requires adding "hole" BinaryData's wherever there are gaps in a section's address space. Most of the holes seem to be data that has no associated symbol info. In this case we can't do any better than lumping all the adjacent hole symbols into one big symbol (there may be more than one actual data object that contributes to a hole). At least the combined holes should be moveable. Jump tables have similar issues. They appear to mostly be sub-objects for top level local symbols. The main problem is that we can't recognize jump tables at the time we scan the symbol table, we have to wait til disassembly. When a jump table is discovered we add it as a sub-object to the existing local symbol. If there are one or more existing BinaryData's that appear in the address range of a newly created jump table, those are added as sub-objects as well. (cherry picked from FBD6362544)
2017-11-14 20:05:11 -08:00
Valid = false;
}
}
assert(Valid);
generateSymbolHashes();
}
void BinaryContext::foldFunction(BinaryFunction &ChildBF,
BinaryFunction &ParentBF) {
assert(!ChildBF.isMultiEntry() && !ParentBF.isMultiEntry() &&
"cannot merge functions with multiple entry points");
std::unique_lock<std::shared_timed_mutex> WriteCtxLock(CtxMutex,
std::defer_lock);
std::unique_lock<std::shared_timed_mutex> WriteSymbolMapLock(
SymbolToFunctionMapMutex, std::defer_lock);
const StringRef ChildName = ChildBF.getOneName();
// Move symbols over and update bookkeeping info.
for (MCSymbol *Symbol : ChildBF.getSymbols()) {
ParentBF.getSymbols().push_back(Symbol);
WriteSymbolMapLock.lock();
SymbolToFunctionMap[Symbol] = &ParentBF;
WriteSymbolMapLock.unlock();
[BOLT] Refactor global symbol handling code. Summary: This is preparation work for static data reordering. I've created a new class called BinaryData which represents a symbol contained in a section. It records almost all the information relevant for dealing with data, e.g. names, address, size, alignment, profiling data, etc. BinaryContext still stores and manages BinaryData objects similar to how it managed symbols and global addresses before. The interfaces are not changed too drastically from before either. There is a bit of overlap between BinaryData and BinaryFunction. I would have liked to do some more refactoring to make a BinaryFunctionFragment that subclassed from BinaryData and then have BinaryFunction be composed or associated with BinaryFunctionFragments. I've also attempted to use (symbol + offset) for when addresses are pointing into the middle of symbols with known sizes. This changes the simplify rodata loads optimization slightly since the expression on an instruction can now also be a (symbol + offset) rather than just a symbol. One of the overall goals for this refactoring is to make sure every relocation is associated with a BinaryData object. This requires adding "hole" BinaryData's wherever there are gaps in a section's address space. Most of the holes seem to be data that has no associated symbol info. In this case we can't do any better than lumping all the adjacent hole symbols into one big symbol (there may be more than one actual data object that contributes to a hole). At least the combined holes should be moveable. Jump tables have similar issues. They appear to mostly be sub-objects for top level local symbols. The main problem is that we can't recognize jump tables at the time we scan the symbol table, we have to wait til disassembly. When a jump table is discovered we add it as a sub-object to the existing local symbol. If there are one or more existing BinaryData's that appear in the address range of a newly created jump table, those are added as sub-objects as well. (cherry picked from FBD6362544)
2017-11-14 20:05:11 -08:00
// NB: there's no need to update BinaryDataMap and GlobalSymbols.
}
ChildBF.getSymbols().clear();
// Move other names the child function is known under.
std::move(ChildBF.Aliases.begin(), ChildBF.Aliases.end(),
std::back_inserter(ParentBF.Aliases));
ChildBF.Aliases.clear();
if (HasRelocations) {
// Merge execution counts of ChildBF into those of ParentBF.
// Without relocations, we cannot reliably merge profiles as both functions
// continue to exist and either one can be executed.
ChildBF.mergeProfileDataInto(ParentBF);
std::shared_lock<std::shared_timed_mutex> ReadBfsLock(BinaryFunctionsMutex,
std::defer_lock);
std::unique_lock<std::shared_timed_mutex> WriteBfsLock(BinaryFunctionsMutex,
std::defer_lock);
// Remove ChildBF from the global set of functions in relocs mode.
ReadBfsLock.lock();
auto FI = BinaryFunctions.find(ChildBF.getAddress());
ReadBfsLock.unlock();
assert(FI != BinaryFunctions.end() && "function not found");
assert(&ChildBF == &FI->second && "function mismatch");
WriteBfsLock.lock();
ChildBF.clearDisasmState();
FI = BinaryFunctions.erase(FI);
WriteBfsLock.unlock();
} else {
// In non-relocation mode we keep the function, but rename it.
std::string NewName = "__ICF_" + ChildName.str();
WriteCtxLock.lock();
ChildBF.getSymbols().push_back(Ctx->getOrCreateSymbol(NewName));
WriteCtxLock.unlock();
ChildBF.setFolded(&ParentBF);
}
}
[BOLT] Refactor global symbol handling code. Summary: This is preparation work for static data reordering. I've created a new class called BinaryData which represents a symbol contained in a section. It records almost all the information relevant for dealing with data, e.g. names, address, size, alignment, profiling data, etc. BinaryContext still stores and manages BinaryData objects similar to how it managed symbols and global addresses before. The interfaces are not changed too drastically from before either. There is a bit of overlap between BinaryData and BinaryFunction. I would have liked to do some more refactoring to make a BinaryFunctionFragment that subclassed from BinaryData and then have BinaryFunction be composed or associated with BinaryFunctionFragments. I've also attempted to use (symbol + offset) for when addresses are pointing into the middle of symbols with known sizes. This changes the simplify rodata loads optimization slightly since the expression on an instruction can now also be a (symbol + offset) rather than just a symbol. One of the overall goals for this refactoring is to make sure every relocation is associated with a BinaryData object. This requires adding "hole" BinaryData's wherever there are gaps in a section's address space. Most of the holes seem to be data that has no associated symbol info. In this case we can't do any better than lumping all the adjacent hole symbols into one big symbol (there may be more than one actual data object that contributes to a hole). At least the combined holes should be moveable. Jump tables have similar issues. They appear to mostly be sub-objects for top level local symbols. The main problem is that we can't recognize jump tables at the time we scan the symbol table, we have to wait til disassembly. When a jump table is discovered we add it as a sub-object to the existing local symbol. If there are one or more existing BinaryData's that appear in the address range of a newly created jump table, those are added as sub-objects as well. (cherry picked from FBD6362544)
2017-11-14 20:05:11 -08:00
void BinaryContext::fixBinaryDataHoles() {
assert(validateObjectNesting() && "object nesting inconsitency detected");
for (BinarySection &Section : allocatableSections()) {
[BOLT] Refactor global symbol handling code. Summary: This is preparation work for static data reordering. I've created a new class called BinaryData which represents a symbol contained in a section. It records almost all the information relevant for dealing with data, e.g. names, address, size, alignment, profiling data, etc. BinaryContext still stores and manages BinaryData objects similar to how it managed symbols and global addresses before. The interfaces are not changed too drastically from before either. There is a bit of overlap between BinaryData and BinaryFunction. I would have liked to do some more refactoring to make a BinaryFunctionFragment that subclassed from BinaryData and then have BinaryFunction be composed or associated with BinaryFunctionFragments. I've also attempted to use (symbol + offset) for when addresses are pointing into the middle of symbols with known sizes. This changes the simplify rodata loads optimization slightly since the expression on an instruction can now also be a (symbol + offset) rather than just a symbol. One of the overall goals for this refactoring is to make sure every relocation is associated with a BinaryData object. This requires adding "hole" BinaryData's wherever there are gaps in a section's address space. Most of the holes seem to be data that has no associated symbol info. In this case we can't do any better than lumping all the adjacent hole symbols into one big symbol (there may be more than one actual data object that contributes to a hole). At least the combined holes should be moveable. Jump tables have similar issues. They appear to mostly be sub-objects for top level local symbols. The main problem is that we can't recognize jump tables at the time we scan the symbol table, we have to wait til disassembly. When a jump table is discovered we add it as a sub-object to the existing local symbol. If there are one or more existing BinaryData's that appear in the address range of a newly created jump table, those are added as sub-objects as well. (cherry picked from FBD6362544)
2017-11-14 20:05:11 -08:00
std::vector<std::pair<uint64_t, uint64_t>> Holes;
auto isNotHole = [&Section](const binary_data_iterator &Itr) {
BinaryData *BD = Itr->second;
[BOLT] Refactor global symbol handling code. Summary: This is preparation work for static data reordering. I've created a new class called BinaryData which represents a symbol contained in a section. It records almost all the information relevant for dealing with data, e.g. names, address, size, alignment, profiling data, etc. BinaryContext still stores and manages BinaryData objects similar to how it managed symbols and global addresses before. The interfaces are not changed too drastically from before either. There is a bit of overlap between BinaryData and BinaryFunction. I would have liked to do some more refactoring to make a BinaryFunctionFragment that subclassed from BinaryData and then have BinaryFunction be composed or associated with BinaryFunctionFragments. I've also attempted to use (symbol + offset) for when addresses are pointing into the middle of symbols with known sizes. This changes the simplify rodata loads optimization slightly since the expression on an instruction can now also be a (symbol + offset) rather than just a symbol. One of the overall goals for this refactoring is to make sure every relocation is associated with a BinaryData object. This requires adding "hole" BinaryData's wherever there are gaps in a section's address space. Most of the holes seem to be data that has no associated symbol info. In this case we can't do any better than lumping all the adjacent hole symbols into one big symbol (there may be more than one actual data object that contributes to a hole). At least the combined holes should be moveable. Jump tables have similar issues. They appear to mostly be sub-objects for top level local symbols. The main problem is that we can't recognize jump tables at the time we scan the symbol table, we have to wait til disassembly. When a jump table is discovered we add it as a sub-object to the existing local symbol. If there are one or more existing BinaryData's that appear in the address range of a newly created jump table, those are added as sub-objects as well. (cherry picked from FBD6362544)
2017-11-14 20:05:11 -08:00
bool isHole = (!BD->getParent() &&
!BD->getSize() &&
BD->isObject() &&
(BD->getName().startswith("SYMBOLat0x") ||
BD->getName().startswith("DATAat0x") ||
BD->getName().startswith("ANONYMOUS")));
return !isHole && BD->getSection() == Section && !BD->getParent();
};
auto BDStart = BinaryDataMap.begin();
auto BDEnd = BinaryDataMap.end();
auto Itr = FilteredBinaryDataIterator(isNotHole, BDStart, BDEnd);
auto End = FilteredBinaryDataIterator(isNotHole, BDEnd, BDEnd);
uint64_t EndAddress = Section.getAddress();
while (Itr != End) {
if (Itr->second->getAddress() > EndAddress) {
uint64_t Gap = Itr->second->getAddress() - EndAddress;
Holes.emplace_back(EndAddress, Gap);
[BOLT] Refactor global symbol handling code. Summary: This is preparation work for static data reordering. I've created a new class called BinaryData which represents a symbol contained in a section. It records almost all the information relevant for dealing with data, e.g. names, address, size, alignment, profiling data, etc. BinaryContext still stores and manages BinaryData objects similar to how it managed symbols and global addresses before. The interfaces are not changed too drastically from before either. There is a bit of overlap between BinaryData and BinaryFunction. I would have liked to do some more refactoring to make a BinaryFunctionFragment that subclassed from BinaryData and then have BinaryFunction be composed or associated with BinaryFunctionFragments. I've also attempted to use (symbol + offset) for when addresses are pointing into the middle of symbols with known sizes. This changes the simplify rodata loads optimization slightly since the expression on an instruction can now also be a (symbol + offset) rather than just a symbol. One of the overall goals for this refactoring is to make sure every relocation is associated with a BinaryData object. This requires adding "hole" BinaryData's wherever there are gaps in a section's address space. Most of the holes seem to be data that has no associated symbol info. In this case we can't do any better than lumping all the adjacent hole symbols into one big symbol (there may be more than one actual data object that contributes to a hole). At least the combined holes should be moveable. Jump tables have similar issues. They appear to mostly be sub-objects for top level local symbols. The main problem is that we can't recognize jump tables at the time we scan the symbol table, we have to wait til disassembly. When a jump table is discovered we add it as a sub-object to the existing local symbol. If there are one or more existing BinaryData's that appear in the address range of a newly created jump table, those are added as sub-objects as well. (cherry picked from FBD6362544)
2017-11-14 20:05:11 -08:00
}
EndAddress = Itr->second->getEndAddress();
++Itr;
}
if (EndAddress < Section.getEndAddress()) {
Holes.emplace_back(EndAddress, Section.getEndAddress() - EndAddress);
[BOLT] Refactor global symbol handling code. Summary: This is preparation work for static data reordering. I've created a new class called BinaryData which represents a symbol contained in a section. It records almost all the information relevant for dealing with data, e.g. names, address, size, alignment, profiling data, etc. BinaryContext still stores and manages BinaryData objects similar to how it managed symbols and global addresses before. The interfaces are not changed too drastically from before either. There is a bit of overlap between BinaryData and BinaryFunction. I would have liked to do some more refactoring to make a BinaryFunctionFragment that subclassed from BinaryData and then have BinaryFunction be composed or associated with BinaryFunctionFragments. I've also attempted to use (symbol + offset) for when addresses are pointing into the middle of symbols with known sizes. This changes the simplify rodata loads optimization slightly since the expression on an instruction can now also be a (symbol + offset) rather than just a symbol. One of the overall goals for this refactoring is to make sure every relocation is associated with a BinaryData object. This requires adding "hole" BinaryData's wherever there are gaps in a section's address space. Most of the holes seem to be data that has no associated symbol info. In this case we can't do any better than lumping all the adjacent hole symbols into one big symbol (there may be more than one actual data object that contributes to a hole). At least the combined holes should be moveable. Jump tables have similar issues. They appear to mostly be sub-objects for top level local symbols. The main problem is that we can't recognize jump tables at the time we scan the symbol table, we have to wait til disassembly. When a jump table is discovered we add it as a sub-object to the existing local symbol. If there are one or more existing BinaryData's that appear in the address range of a newly created jump table, those are added as sub-objects as well. (cherry picked from FBD6362544)
2017-11-14 20:05:11 -08:00
}
// If there is already a symbol at the start of the hole, grow that symbol
// to cover the rest. Otherwise, create a new symbol to cover the hole.
for (std::pair<uint64_t, uint64_t> &Hole : Holes) {
BinaryData *BD = getBinaryDataAtAddress(Hole.first);
[BOLT] Refactor global symbol handling code. Summary: This is preparation work for static data reordering. I've created a new class called BinaryData which represents a symbol contained in a section. It records almost all the information relevant for dealing with data, e.g. names, address, size, alignment, profiling data, etc. BinaryContext still stores and manages BinaryData objects similar to how it managed symbols and global addresses before. The interfaces are not changed too drastically from before either. There is a bit of overlap between BinaryData and BinaryFunction. I would have liked to do some more refactoring to make a BinaryFunctionFragment that subclassed from BinaryData and then have BinaryFunction be composed or associated with BinaryFunctionFragments. I've also attempted to use (symbol + offset) for when addresses are pointing into the middle of symbols with known sizes. This changes the simplify rodata loads optimization slightly since the expression on an instruction can now also be a (symbol + offset) rather than just a symbol. One of the overall goals for this refactoring is to make sure every relocation is associated with a BinaryData object. This requires adding "hole" BinaryData's wherever there are gaps in a section's address space. Most of the holes seem to be data that has no associated symbol info. In this case we can't do any better than lumping all the adjacent hole symbols into one big symbol (there may be more than one actual data object that contributes to a hole). At least the combined holes should be moveable. Jump tables have similar issues. They appear to mostly be sub-objects for top level local symbols. The main problem is that we can't recognize jump tables at the time we scan the symbol table, we have to wait til disassembly. When a jump table is discovered we add it as a sub-object to the existing local symbol. If there are one or more existing BinaryData's that appear in the address range of a newly created jump table, those are added as sub-objects as well. (cherry picked from FBD6362544)
2017-11-14 20:05:11 -08:00
if (BD) {
// BD->getSection() can be != Section if there are sections that
// overlap. In this case it is probably safe to just skip the holes
// since the overlapping section will not(?) have any symbols in it.
if (BD->getSection() == Section)
setBinaryDataSize(Hole.first, Hole.second);
} else {
getOrCreateGlobalSymbol(Hole.first, "HOLEat", Hole.second, 1);
[BOLT] Refactor global symbol handling code. Summary: This is preparation work for static data reordering. I've created a new class called BinaryData which represents a symbol contained in a section. It records almost all the information relevant for dealing with data, e.g. names, address, size, alignment, profiling data, etc. BinaryContext still stores and manages BinaryData objects similar to how it managed symbols and global addresses before. The interfaces are not changed too drastically from before either. There is a bit of overlap between BinaryData and BinaryFunction. I would have liked to do some more refactoring to make a BinaryFunctionFragment that subclassed from BinaryData and then have BinaryFunction be composed or associated with BinaryFunctionFragments. I've also attempted to use (symbol + offset) for when addresses are pointing into the middle of symbols with known sizes. This changes the simplify rodata loads optimization slightly since the expression on an instruction can now also be a (symbol + offset) rather than just a symbol. One of the overall goals for this refactoring is to make sure every relocation is associated with a BinaryData object. This requires adding "hole" BinaryData's wherever there are gaps in a section's address space. Most of the holes seem to be data that has no associated symbol info. In this case we can't do any better than lumping all the adjacent hole symbols into one big symbol (there may be more than one actual data object that contributes to a hole). At least the combined holes should be moveable. Jump tables have similar issues. They appear to mostly be sub-objects for top level local symbols. The main problem is that we can't recognize jump tables at the time we scan the symbol table, we have to wait til disassembly. When a jump table is discovered we add it as a sub-object to the existing local symbol. If there are one or more existing BinaryData's that appear in the address range of a newly created jump table, those are added as sub-objects as well. (cherry picked from FBD6362544)
2017-11-14 20:05:11 -08:00
}
}
}
assert(validateObjectNesting() && "object nesting inconsitency detected");
assert(validateHoles() && "top level hole detected in object map");
}
void BinaryContext::printGlobalSymbols(raw_ostream& OS) const {
[BOLT] Refactor global symbol handling code. Summary: This is preparation work for static data reordering. I've created a new class called BinaryData which represents a symbol contained in a section. It records almost all the information relevant for dealing with data, e.g. names, address, size, alignment, profiling data, etc. BinaryContext still stores and manages BinaryData objects similar to how it managed symbols and global addresses before. The interfaces are not changed too drastically from before either. There is a bit of overlap between BinaryData and BinaryFunction. I would have liked to do some more refactoring to make a BinaryFunctionFragment that subclassed from BinaryData and then have BinaryFunction be composed or associated with BinaryFunctionFragments. I've also attempted to use (symbol + offset) for when addresses are pointing into the middle of symbols with known sizes. This changes the simplify rodata loads optimization slightly since the expression on an instruction can now also be a (symbol + offset) rather than just a symbol. One of the overall goals for this refactoring is to make sure every relocation is associated with a BinaryData object. This requires adding "hole" BinaryData's wherever there are gaps in a section's address space. Most of the holes seem to be data that has no associated symbol info. In this case we can't do any better than lumping all the adjacent hole symbols into one big symbol (there may be more than one actual data object that contributes to a hole). At least the combined holes should be moveable. Jump tables have similar issues. They appear to mostly be sub-objects for top level local symbols. The main problem is that we can't recognize jump tables at the time we scan the symbol table, we have to wait til disassembly. When a jump table is discovered we add it as a sub-object to the existing local symbol. If there are one or more existing BinaryData's that appear in the address range of a newly created jump table, those are added as sub-objects as well. (cherry picked from FBD6362544)
2017-11-14 20:05:11 -08:00
const BinarySection* CurrentSection = nullptr;
bool FirstSection = true;
for (auto &Entry : BinaryDataMap) {
const BinaryData *BD = Entry.second;
const BinarySection &Section = BD->getSection();
[BOLT] Refactor global symbol handling code. Summary: This is preparation work for static data reordering. I've created a new class called BinaryData which represents a symbol contained in a section. It records almost all the information relevant for dealing with data, e.g. names, address, size, alignment, profiling data, etc. BinaryContext still stores and manages BinaryData objects similar to how it managed symbols and global addresses before. The interfaces are not changed too drastically from before either. There is a bit of overlap between BinaryData and BinaryFunction. I would have liked to do some more refactoring to make a BinaryFunctionFragment that subclassed from BinaryData and then have BinaryFunction be composed or associated with BinaryFunctionFragments. I've also attempted to use (symbol + offset) for when addresses are pointing into the middle of symbols with known sizes. This changes the simplify rodata loads optimization slightly since the expression on an instruction can now also be a (symbol + offset) rather than just a symbol. One of the overall goals for this refactoring is to make sure every relocation is associated with a BinaryData object. This requires adding "hole" BinaryData's wherever there are gaps in a section's address space. Most of the holes seem to be data that has no associated symbol info. In this case we can't do any better than lumping all the adjacent hole symbols into one big symbol (there may be more than one actual data object that contributes to a hole). At least the combined holes should be moveable. Jump tables have similar issues. They appear to mostly be sub-objects for top level local symbols. The main problem is that we can't recognize jump tables at the time we scan the symbol table, we have to wait til disassembly. When a jump table is discovered we add it as a sub-object to the existing local symbol. If there are one or more existing BinaryData's that appear in the address range of a newly created jump table, those are added as sub-objects as well. (cherry picked from FBD6362544)
2017-11-14 20:05:11 -08:00
if (FirstSection || Section != *CurrentSection) {
uint64_t Address, Size;
StringRef Name = Section.getName();
if (Section) {
Address = Section.getAddress();
Size = Section.getSize();
} else {
Address = BD->getAddress();
Size = BD->getSize();
}
OS << "BOLT-INFO: Section " << Name << ", "
<< "0x" + Twine::utohexstr(Address) << ":"
<< "0x" + Twine::utohexstr(Address + Size) << "/"
<< Size << "\n";
CurrentSection = &Section;
FirstSection = false;
}
OS << "BOLT-INFO: ";
const BinaryData *P = BD->getParent();
[BOLT] Refactor global symbol handling code. Summary: This is preparation work for static data reordering. I've created a new class called BinaryData which represents a symbol contained in a section. It records almost all the information relevant for dealing with data, e.g. names, address, size, alignment, profiling data, etc. BinaryContext still stores and manages BinaryData objects similar to how it managed symbols and global addresses before. The interfaces are not changed too drastically from before either. There is a bit of overlap between BinaryData and BinaryFunction. I would have liked to do some more refactoring to make a BinaryFunctionFragment that subclassed from BinaryData and then have BinaryFunction be composed or associated with BinaryFunctionFragments. I've also attempted to use (symbol + offset) for when addresses are pointing into the middle of symbols with known sizes. This changes the simplify rodata loads optimization slightly since the expression on an instruction can now also be a (symbol + offset) rather than just a symbol. One of the overall goals for this refactoring is to make sure every relocation is associated with a BinaryData object. This requires adding "hole" BinaryData's wherever there are gaps in a section's address space. Most of the holes seem to be data that has no associated symbol info. In this case we can't do any better than lumping all the adjacent hole symbols into one big symbol (there may be more than one actual data object that contributes to a hole). At least the combined holes should be moveable. Jump tables have similar issues. They appear to mostly be sub-objects for top level local symbols. The main problem is that we can't recognize jump tables at the time we scan the symbol table, we have to wait til disassembly. When a jump table is discovered we add it as a sub-object to the existing local symbol. If there are one or more existing BinaryData's that appear in the address range of a newly created jump table, those are added as sub-objects as well. (cherry picked from FBD6362544)
2017-11-14 20:05:11 -08:00
while (P) {
OS << " ";
P = P->getParent();
}
OS << *BD << "\n";
}
}
unsigned BinaryContext::addDebugFilenameToUnit(const uint32_t DestCUID,
const uint32_t SrcCUID,
unsigned FileIndex) {
DWARFCompileUnit *SrcUnit = DwCtx->getCompileUnitForOffset(SrcCUID);
const DWARFDebugLine::LineTable *LineTable =
DwCtx->getLineTableForUnit(SrcUnit);
const std::vector<DWARFDebugLine::FileNameEntry> &FileNames =
LineTable->Prologue.FileNames;
// Dir indexes start at 1, as DWARF file numbers, and a dir index 0
// means empty dir.
assert(FileIndex > 0 && FileIndex <= FileNames.size() &&
"FileIndex out of range for the compilation unit.");
StringRef Dir = "";
if (FileNames[FileIndex - 1].DirIdx != 0) {
if (Optional<const char *> DirName = dwarf::toString(
LineTable->Prologue
.IncludeDirectories[FileNames[FileIndex - 1].DirIdx - 1])) {
Dir = *DirName;
}
}
StringRef FileName = "";
if (Optional<const char *> FName =
dwarf::toString(FileNames[FileIndex - 1].Name))
FileName = *FName;
assert(FileName != "");
return cantFail(Ctx->getDwarfFile(Dir, FileName, 0, None, None, DestCUID));
}
std::vector<BinaryFunction *> BinaryContext::getSortedFunctions() {
std::vector<BinaryFunction *> SortedFunctions(BinaryFunctions.size());
std::transform(BinaryFunctions.begin(), BinaryFunctions.end(),
SortedFunctions.begin(),
[](std::pair<const uint64_t, BinaryFunction> &BFI) {
return &BFI.second;
});
std::stable_sort(SortedFunctions.begin(), SortedFunctions.end(),
[] (const BinaryFunction *A, const BinaryFunction *B) {
if (A->hasValidIndex() && B->hasValidIndex()) {
return A->getIndex() < B->getIndex();
}
return A->hasValidIndex();
});
return SortedFunctions;
}
std::vector<BinaryFunction *> BinaryContext::getAllBinaryFunctions() {
std::vector<BinaryFunction *> AllFunctions;
AllFunctions.reserve(BinaryFunctions.size() + InjectedBinaryFunctions.size());
std::transform(BinaryFunctions.begin(), BinaryFunctions.end(),
std::back_inserter(AllFunctions),
[](std::pair<const uint64_t, BinaryFunction> &BFI) {
return &BFI.second;
});
std::copy(InjectedBinaryFunctions.begin(), InjectedBinaryFunctions.end(),
std::back_inserter(AllFunctions));
return AllFunctions;
}
Optional<DWARFUnit *> BinaryContext::getDWOCU(uint64_t DWOId) {
auto Iter = DWOCUs.find(DWOId);
if (Iter == DWOCUs.end())
return None;
return Iter->second;
}
DWARFContext *BinaryContext::getDWOContext() {
if (DWOCUs.empty())
return nullptr;
return &DWOCUs.begin()->second->getContext();
}
/// Handles DWO sections that can either be in .o, .dwo or .dwp files.
void BinaryContext::preprocessDWODebugInfo() {
for (const std::unique_ptr<DWARFUnit> &CU : DwCtx->compile_units()) {
DWARFUnit *const DwarfUnit = CU.get();
if (llvm::Optional<uint64_t> DWOId = DwarfUnit->getDWOId()) {
DWARFUnit *DWOCU = DwarfUnit->getNonSkeletonUnitDIE(false).getDwarfUnit();
if (!DWOCU->isDWOUnit()) {
std::string DWOName = dwarf::toString(
DwarfUnit->getUnitDIE().find(
{dwarf::DW_AT_dwo_name, dwarf::DW_AT_GNU_dwo_name}),
"");
outs() << "BOLT-WARNING: Debug Fission: DWO debug information for "
<< DWOName
<< " was not retrieved and won't be updated. Please check "
"relative path.\n";
continue;
}
DWOCUs[*DWOId] = DWOCU;
}
}
}
void BinaryContext::preprocessDebugInfo() {
struct CURange {
uint64_t LowPC;
uint64_t HighPC;
DWARFUnit *Unit;
bool operator<(const CURange &Other) const {
return LowPC < Other.LowPC;
}
};
// Building a map of address ranges to CUs similar to .debug_aranges and use
// it to assign CU to functions.
std::vector<CURange> AllRanges;
AllRanges.reserve(DwCtx->getNumCompileUnits());
for (const std::unique_ptr<DWARFUnit> &CU : DwCtx->compile_units()) {
Expected<DWARFAddressRangesVector> RangesOrError =
CU->getUnitDIE().getAddressRanges();
if (!RangesOrError) {
consumeError(RangesOrError.takeError());
continue;
}
for (DWARFAddressRange &Range : *RangesOrError) {
// Parts of the debug info could be invalidated due to corresponding code
// being removed from the binary by the linker. Hence we check if the
// address is a valid one.
if (containsAddress(Range.LowPC))
AllRanges.emplace_back(CURange{Range.LowPC, Range.HighPC, CU.get()});
}
}
std::sort(AllRanges.begin(), AllRanges.end());
for (auto &KV : BinaryFunctions) {
const uint64_t FunctionAddress = KV.first;
BinaryFunction &Function = KV.second;
auto It = std::partition_point(AllRanges.begin(), AllRanges.end(),
[=](CURange R) { return R.HighPC <= FunctionAddress; });
if (It != AllRanges.end() && It->LowPC <= FunctionAddress) {
Function.setDWARFUnit(It->Unit);
}
}
// Populate MCContext with DWARF files from all units.
StringRef GlobalPrefix = AsmInfo->getPrivateGlobalPrefix();
for (const std::unique_ptr<DWARFUnit> &CU : DwCtx->compile_units()) {
const uint64_t CUID = CU->getOffset();
const DWARFDebugLine::LineTable *LineTable =
DwCtx->getLineTableForUnit(CU.get());
const std::vector<DWARFDebugLine::FileNameEntry> &FileNames =
LineTable->Prologue.FileNames;
// Assign a unique label to every line table, one per CU.
Ctx->getMCDwarfLineTable(CUID).setLabel(
Ctx->getOrCreateSymbol(GlobalPrefix + "line_table_start" + Twine(CUID)));
// Make sure empty debug line tables are registered too.
if (FileNames.empty()) {
cantFail(Ctx->getDwarfFile("", "<unknown>", 0, None, None, CUID));
continue;
}
for (size_t I = 0, Size = FileNames.size(); I != Size; ++I) {
// Dir indexes start at 1, as DWARF file numbers, and a dir index 0
// means empty dir.
StringRef Dir = "";
if (FileNames[I].DirIdx != 0)
if (Optional<const char *> DirName = dwarf::toString(
LineTable->Prologue
.IncludeDirectories[FileNames[I].DirIdx - 1]))
Dir = *DirName;
StringRef FileName = "";
if (Optional<const char *> FName = dwarf::toString(FileNames[I].Name))
FileName = *FName;
assert(FileName != "");
cantFail(Ctx->getDwarfFile(Dir, FileName, 0, None, None, CUID));
}
}
preprocessDWODebugInfo();
}
[BOLT] Support for lite mode with relocations Summary: Add '-lite' support for relocations for improved processing time, memory consumption, and more resilient processing of binaries with embedded assembly code. In lite relocation mode, BOLT will skip full processing of functions without a profile. It will run scanExternalRefs() on such functions to discover external references and to create internal relocations to update references to optimized functions. Note that we could have relied on the compiler/linker to provide relocations for function references. However, there's no assurance that all such references are reported. E.g., the compiler can resolve inter-procedural references internally, leaving no relocations for the linker. The scan process takes about <10 seconds per 100MB of code on modern hardware. It's a reasonable overhead to live with considering the flexibility it provides. If BOLT fails to scan or disassemble a function, .e.g., due to a data object embedded in code, or an unsupported instruction, it enables a patching mode to guarantee that the failed function will call optimized/moved versions of functions. The patching happens at original function entry points. '-skip=<func1,func2,...>' option now can be used to skip processing of arbitrary functions in the relocation mode. With '-use-old-text' or '-strict' we require all functions to be processed. As such, it is incompatible with '-lite' option, and '-skip' option will only disable optimizations of listed functions, not their disassembly and emission. (cherry picked from FBD22040717)
2020-06-15 00:15:47 -07:00
bool BinaryContext::shouldEmit(const BinaryFunction &Function) const {
if (opts::processAllFunctions())
return true;
if (Function.isIgnored())
return false;
// In relocation mode we will emit non-simple functions with CFG.
// If the function does not have a CFG it should be marked as ignored.
return HasRelocations || Function.isSimple();
}
void BinaryContext::printCFI(raw_ostream &OS, const MCCFIInstruction &Inst) {
uint32_t Operation = Inst.getOperation();
switch (Operation) {
case MCCFIInstruction::OpSameValue:
OS << "OpSameValue Reg" << Inst.getRegister();
break;
case MCCFIInstruction::OpRememberState:
OS << "OpRememberState";
break;
case MCCFIInstruction::OpRestoreState:
OS << "OpRestoreState";
break;
case MCCFIInstruction::OpOffset:
OS << "OpOffset Reg" << Inst.getRegister() << " " << Inst.getOffset();
break;
case MCCFIInstruction::OpDefCfaRegister:
OS << "OpDefCfaRegister Reg" << Inst.getRegister();
break;
case MCCFIInstruction::OpDefCfaOffset:
OS << "OpDefCfaOffset " << Inst.getOffset();
break;
case MCCFIInstruction::OpDefCfa:
OS << "OpDefCfa Reg" << Inst.getRegister() << " " << Inst.getOffset();
break;
case MCCFIInstruction::OpRelOffset:
OS << "OpRelOffset Reg" << Inst.getRegister() << " " << Inst.getOffset();
break;
case MCCFIInstruction::OpAdjustCfaOffset:
OS << "OfAdjustCfaOffset " << Inst.getOffset();
break;
case MCCFIInstruction::OpEscape:
OS << "OpEscape";
break;
case MCCFIInstruction::OpRestore:
OS << "OpRestore Reg" << Inst.getRegister();
break;
case MCCFIInstruction::OpUndefined:
OS << "OpUndefined Reg" << Inst.getRegister();
break;
case MCCFIInstruction::OpRegister:
OS << "OpRegister Reg" << Inst.getRegister() << " Reg"
<< Inst.getRegister2();
break;
case MCCFIInstruction::OpWindowSave:
OS << "OpWindowSave";
break;
case MCCFIInstruction::OpGnuArgsSize:
OS << "OpGnuArgsSize";
break;
default:
OS << "Op#" << Operation;
break;
}
}
void BinaryContext::printInstruction(raw_ostream &OS,
const MCInst &Instruction,
uint64_t Offset,
const BinaryFunction* Function,
[BOLT] Improve ICP for virtual method calls and jump tables using value profiling. Summary: Use value profiling data to remove the method pointer loads from vtables when doing ICP at virtual function and jump table callsites. The basic process is the following: 1. Work backwards from the callsite to find the most recent def of the call register. 2. Work back from the call register def to find the instruction where the vtable is loaded. 3. Find out of there is any value profiling data associated with the vtable load. If so, record all these addresses as potential vtables + method offsets. 4. Since the addresses extracted by #3 will be vtable + method offset, we need to figure out the method offset in order to determine the actual vtable base address. At this point I virtually execute all the instructions that occur between #3 and #2 that touch the method pointer register. The result of this execution should be the method offset. 5. Fetch the actual method address from the appropriate data section containing the vtable using the computed method offset. Make sure that this address maps to an actual function symbol. 6. Try to associate a vtable pointer with each target address in SymTargets. If every target has a vtable, then this is almost certainly a virtual method callsite. 7. Use the vtable address when generating the promoted call code. It's basically the same as regular ICP code except that the compare is against the vtable and not the method pointer. Additionally, the instructions to load up the method are dumped into the cold call block. For jump tables, the basic idea is the same. I use the memory profiling data to find the hottest slots in the jumptable and then use that information to compute the indices of the hottest entries. We can then compare the index register to the hot index values and avoid the load from the jump table. Note: I'm assuming the whole call is in a single BB. According to @rafaelauler, this isn't always the case on ARM. This also isn't always the case on X86 either. If there are non-trivial arguments that are passed by value, there could be branches in between the setup and the call. I'm going to leave fixing this until later since it makes things a bit more complicated. I've also fixed a bug where ICP was introducing a conditional tail call. I made sure that SCTC fixes these up afterwards. I have no idea why I made it introduce a CTC in the first place. (cherry picked from FBD6120768)
2017-10-20 12:11:34 -07:00
bool PrintMCInst,
bool PrintMemData,
bool PrintRelocations) const {
if (MIB->isEHLabel(Instruction)) {
OS << " EH_LABEL: " << *MIB->getTargetSymbol(Instruction) << '\n';
return;
}
OS << format(" %08" PRIx64 ": ", Offset);
if (MIB->isCFI(Instruction)) {
uint32_t Offset = Instruction.getOperand(0).getImm();
OS << "\t!CFI\t$" << Offset << "\t; ";
if (Function)
printCFI(OS, *Function->getCFIFor(Instruction));
OS << "\n";
return;
}
InstPrinter->printInst(&Instruction, 0, "", *STI, OS);
if (MIB->isCall(Instruction)) {
if (MIB->isTailCall(Instruction))
OS << " # TAILCALL ";
if (MIB->isInvoke(Instruction)) {
const Optional<MCPlus::MCLandingPad> EHInfo = MIB->getEHInfo(Instruction);
OS << " # handler: ";
if (EHInfo->first)
OS << *EHInfo->first;
else
OS << '0';
OS << "; action: " << EHInfo->second;
const int64_t GnuArgsSize = MIB->getGnuArgsSize(Instruction);
if (GnuArgsSize >= 0)
OS << "; GNU_args_size = " << GnuArgsSize;
}
} else if (MIB->isIndirectBranch(Instruction)) {
if (uint64_t JTAddress = MIB->getJumpTable(Instruction)) {
OS << " # JUMPTABLE @0x" << Twine::utohexstr(JTAddress);
} else {
OS << " # UNKNOWN CONTROL FLOW";
}
}
MIB->printAnnotations(Instruction, OS);
Indirect call promotion optimization. Summary: Perform indirect call promotion optimization in BOLT. The code scans the instructions during CFG creation for all indirect calls. Right now indirect tail calls are not handled since the functions are marked not simple. The offsets of the indirect calls are stored for later use by the ICP pass. The indirect call promotion pass visits each indirect call and examines the BranchData for each. If the most frequent targets from that callsite exceed the specified threshold (default 90%), the call is promoted. Otherwise, it is ignored. By default, only one target is considered at each callsite. When an candiate callsite is processed, we modify the callsite to test for the most common call targets before calling through the original generic call mechanism. The CFG and layout are modified by ICP. A few new command line options have been added: -indirect-call-promotion -indirect-call-promotion-threshold=<percentage> -indirect-call-promotion-topn=<int> The threshold is the minimum frequency of a call target needed before ICP is triggered. The topn option controls the number of targets to consider for each callsite, e.g. ICP is triggered if topn=2 and the total requency of the top two call targets exceeds the threshold. Example of ICP: C++ code: int B_count = 0; int C_count = 0; struct A { virtual void foo() = 0; } struct B : public A { virtual void foo() { ++B_count; }; }; struct C : public A { virtual void foo() { ++C_count; }; }; A* a = ... a->foo(); ... original: 400863: 49 8b 07 mov (%r15),%rax 400866: 4c 89 ff mov %r15,%rdi 400869: ff 10 callq *(%rax) 40086b: 41 83 e6 01 and $0x1,%r14d 40086f: 4d 89 e6 mov %r12,%r14 400872: 4c 0f 44 f5 cmove %rbp,%r14 400876: 4c 89 f7 mov %r14,%rdi ... after ICP: 40085e: 49 8b 07 mov (%r15),%rax 400861: 4c 89 ff mov %r15,%rdi 400864: 49 ba e0 0b 40 00 00 movabs $0x400be0,%r10 40086b: 00 00 00 40086e: 4c 3b 10 cmp (%rax),%r10 400871: 75 29 jne 40089c <main+0x9c> 400873: 41 ff d2 callq *%r10 400876: 41 83 e6 01 and $0x1,%r14d 40087a: 4d 89 e6 mov %r12,%r14 40087d: 4c 0f 44 f5 cmove %rbp,%r14 400881: 4c 89 f7 mov %r14,%rdi ... 40089c: ff 10 callq *(%rax) 40089e: eb d6 jmp 400876 <main+0x76> (cherry picked from FBD3612218)
2016-09-07 18:59:23 -07:00
if (opts::PrintDebugInfo) {
DebugLineTableRowRef RowRef =
DebugLineTableRowRef::fromSMLoc(Instruction.getLoc());
if (RowRef != DebugLineTableRowRef::NULL_ROW) {
const DWARFDebugLine::LineTable *LineTable;
if (Function && Function->getDWARFUnit() &&
Function->getDWARFUnit()->getOffset() == RowRef.DwCompileUnitIndex) {
LineTable = Function->getDWARFLineTable();
} else {
LineTable = DwCtx->getLineTableForUnit(
DwCtx->getCompileUnitForOffset(RowRef.DwCompileUnitIndex));
}
assert(LineTable &&
"line table expected for instruction with debug info");
const DWARFDebugLine::Row &Row = LineTable->Rows[RowRef.RowIndex - 1];
StringRef FileName = "";
if (Optional<const char *> FName =
dwarf::toString(LineTable->Prologue.FileNames[Row.File - 1].Name))
FileName = *FName;
OS << " # debug line " << FileName << ":" << Row.Line;
if (Row.Column)
OS << ":" << Row.Column;
if (Row.Discriminator)
OS << " discriminator:" << Row.Discriminator;
}
}
[BOLT] Improve ICP for virtual method calls and jump tables using value profiling. Summary: Use value profiling data to remove the method pointer loads from vtables when doing ICP at virtual function and jump table callsites. The basic process is the following: 1. Work backwards from the callsite to find the most recent def of the call register. 2. Work back from the call register def to find the instruction where the vtable is loaded. 3. Find out of there is any value profiling data associated with the vtable load. If so, record all these addresses as potential vtables + method offsets. 4. Since the addresses extracted by #3 will be vtable + method offset, we need to figure out the method offset in order to determine the actual vtable base address. At this point I virtually execute all the instructions that occur between #3 and #2 that touch the method pointer register. The result of this execution should be the method offset. 5. Fetch the actual method address from the appropriate data section containing the vtable using the computed method offset. Make sure that this address maps to an actual function symbol. 6. Try to associate a vtable pointer with each target address in SymTargets. If every target has a vtable, then this is almost certainly a virtual method callsite. 7. Use the vtable address when generating the promoted call code. It's basically the same as regular ICP code except that the compare is against the vtable and not the method pointer. Additionally, the instructions to load up the method are dumped into the cold call block. For jump tables, the basic idea is the same. I use the memory profiling data to find the hottest slots in the jumptable and then use that information to compute the indices of the hottest entries. We can then compare the index register to the hot index values and avoid the load from the jump table. Note: I'm assuming the whole call is in a single BB. According to @rafaelauler, this isn't always the case on ARM. This also isn't always the case on X86 either. If there are non-trivial arguments that are passed by value, there could be branches in between the setup and the call. I'm going to leave fixing this until later since it makes things a bit more complicated. I've also fixed a bug where ICP was introducing a conditional tail call. I made sure that SCTC fixes these up afterwards. I have no idea why I made it introduce a CTC in the first place. (cherry picked from FBD6120768)
2017-10-20 12:11:34 -07:00
if ((opts::PrintRelocations || PrintRelocations) && Function) {
const uint64_t Size = computeCodeSize(&Instruction, &Instruction + 1);
[BOLT] Improve ICP for virtual method calls and jump tables using value profiling. Summary: Use value profiling data to remove the method pointer loads from vtables when doing ICP at virtual function and jump table callsites. The basic process is the following: 1. Work backwards from the callsite to find the most recent def of the call register. 2. Work back from the call register def to find the instruction where the vtable is loaded. 3. Find out of there is any value profiling data associated with the vtable load. If so, record all these addresses as potential vtables + method offsets. 4. Since the addresses extracted by #3 will be vtable + method offset, we need to figure out the method offset in order to determine the actual vtable base address. At this point I virtually execute all the instructions that occur between #3 and #2 that touch the method pointer register. The result of this execution should be the method offset. 5. Fetch the actual method address from the appropriate data section containing the vtable using the computed method offset. Make sure that this address maps to an actual function symbol. 6. Try to associate a vtable pointer with each target address in SymTargets. If every target has a vtable, then this is almost certainly a virtual method callsite. 7. Use the vtable address when generating the promoted call code. It's basically the same as regular ICP code except that the compare is against the vtable and not the method pointer. Additionally, the instructions to load up the method are dumped into the cold call block. For jump tables, the basic idea is the same. I use the memory profiling data to find the hottest slots in the jumptable and then use that information to compute the indices of the hottest entries. We can then compare the index register to the hot index values and avoid the load from the jump table. Note: I'm assuming the whole call is in a single BB. According to @rafaelauler, this isn't always the case on ARM. This also isn't always the case on X86 either. If there are non-trivial arguments that are passed by value, there could be branches in between the setup and the call. I'm going to leave fixing this until later since it makes things a bit more complicated. I've also fixed a bug where ICP was introducing a conditional tail call. I made sure that SCTC fixes these up afterwards. I have no idea why I made it introduce a CTC in the first place. (cherry picked from FBD6120768)
2017-10-20 12:11:34 -07:00
Function->printRelocations(OS, Offset, Size);
}
OS << "\n";
[BOLT] Improve ICP for virtual method calls and jump tables using value profiling. Summary: Use value profiling data to remove the method pointer loads from vtables when doing ICP at virtual function and jump table callsites. The basic process is the following: 1. Work backwards from the callsite to find the most recent def of the call register. 2. Work back from the call register def to find the instruction where the vtable is loaded. 3. Find out of there is any value profiling data associated with the vtable load. If so, record all these addresses as potential vtables + method offsets. 4. Since the addresses extracted by #3 will be vtable + method offset, we need to figure out the method offset in order to determine the actual vtable base address. At this point I virtually execute all the instructions that occur between #3 and #2 that touch the method pointer register. The result of this execution should be the method offset. 5. Fetch the actual method address from the appropriate data section containing the vtable using the computed method offset. Make sure that this address maps to an actual function symbol. 6. Try to associate a vtable pointer with each target address in SymTargets. If every target has a vtable, then this is almost certainly a virtual method callsite. 7. Use the vtable address when generating the promoted call code. It's basically the same as regular ICP code except that the compare is against the vtable and not the method pointer. Additionally, the instructions to load up the method are dumped into the cold call block. For jump tables, the basic idea is the same. I use the memory profiling data to find the hottest slots in the jumptable and then use that information to compute the indices of the hottest entries. We can then compare the index register to the hot index values and avoid the load from the jump table. Note: I'm assuming the whole call is in a single BB. According to @rafaelauler, this isn't always the case on ARM. This also isn't always the case on X86 either. If there are non-trivial arguments that are passed by value, there could be branches in between the setup and the call. I'm going to leave fixing this until later since it makes things a bit more complicated. I've also fixed a bug where ICP was introducing a conditional tail call. I made sure that SCTC fixes these up afterwards. I have no idea why I made it introduce a CTC in the first place. (cherry picked from FBD6120768)
2017-10-20 12:11:34 -07:00
if (PrintMCInst) {
Instruction.dump_pretty(OS, InstPrinter.get());
OS << "\n";
}
}
ErrorOr<BinarySection&> BinaryContext::getSectionForAddress(uint64_t Address) {
auto SI = AddressToSection.upper_bound(Address);
if (SI != AddressToSection.begin()) {
--SI;
uint64_t UpperBound = SI->first + SI->second->getSize();
if (!SI->second->getSize())
UpperBound += 1;
if (UpperBound > Address)
return *SI->second;
}
return std::make_error_code(std::errc::bad_address);
}
[BOLT] Refactor global symbol handling code. Summary: This is preparation work for static data reordering. I've created a new class called BinaryData which represents a symbol contained in a section. It records almost all the information relevant for dealing with data, e.g. names, address, size, alignment, profiling data, etc. BinaryContext still stores and manages BinaryData objects similar to how it managed symbols and global addresses before. The interfaces are not changed too drastically from before either. There is a bit of overlap between BinaryData and BinaryFunction. I would have liked to do some more refactoring to make a BinaryFunctionFragment that subclassed from BinaryData and then have BinaryFunction be composed or associated with BinaryFunctionFragments. I've also attempted to use (symbol + offset) for when addresses are pointing into the middle of symbols with known sizes. This changes the simplify rodata loads optimization slightly since the expression on an instruction can now also be a (symbol + offset) rather than just a symbol. One of the overall goals for this refactoring is to make sure every relocation is associated with a BinaryData object. This requires adding "hole" BinaryData's wherever there are gaps in a section's address space. Most of the holes seem to be data that has no associated symbol info. In this case we can't do any better than lumping all the adjacent hole symbols into one big symbol (there may be more than one actual data object that contributes to a hole). At least the combined holes should be moveable. Jump tables have similar issues. They appear to mostly be sub-objects for top level local symbols. The main problem is that we can't recognize jump tables at the time we scan the symbol table, we have to wait til disassembly. When a jump table is discovered we add it as a sub-object to the existing local symbol. If there are one or more existing BinaryData's that appear in the address range of a newly created jump table, those are added as sub-objects as well. (cherry picked from FBD6362544)
2017-11-14 20:05:11 -08:00
ErrorOr<StringRef>
BinaryContext::getSectionNameForAddress(uint64_t Address) const {
if (ErrorOr<const BinarySection &> Section = getSectionForAddress(Address)) {
[BOLT] Refactor global symbol handling code. Summary: This is preparation work for static data reordering. I've created a new class called BinaryData which represents a symbol contained in a section. It records almost all the information relevant for dealing with data, e.g. names, address, size, alignment, profiling data, etc. BinaryContext still stores and manages BinaryData objects similar to how it managed symbols and global addresses before. The interfaces are not changed too drastically from before either. There is a bit of overlap between BinaryData and BinaryFunction. I would have liked to do some more refactoring to make a BinaryFunctionFragment that subclassed from BinaryData and then have BinaryFunction be composed or associated with BinaryFunctionFragments. I've also attempted to use (symbol + offset) for when addresses are pointing into the middle of symbols with known sizes. This changes the simplify rodata loads optimization slightly since the expression on an instruction can now also be a (symbol + offset) rather than just a symbol. One of the overall goals for this refactoring is to make sure every relocation is associated with a BinaryData object. This requires adding "hole" BinaryData's wherever there are gaps in a section's address space. Most of the holes seem to be data that has no associated symbol info. In this case we can't do any better than lumping all the adjacent hole symbols into one big symbol (there may be more than one actual data object that contributes to a hole). At least the combined holes should be moveable. Jump tables have similar issues. They appear to mostly be sub-objects for top level local symbols. The main problem is that we can't recognize jump tables at the time we scan the symbol table, we have to wait til disassembly. When a jump table is discovered we add it as a sub-object to the existing local symbol. If there are one or more existing BinaryData's that appear in the address range of a newly created jump table, those are added as sub-objects as well. (cherry picked from FBD6362544)
2017-11-14 20:05:11 -08:00
return Section->getName();
}
return std::make_error_code(std::errc::bad_address);
}
BinarySection &BinaryContext::registerSection(BinarySection *Section) {
auto Res = Sections.insert(Section);
assert(Res.second && "can't register the same section twice.");
// Only register allocatable sections in the AddressToSection map.
if (Section->isAllocatable() && Section->getAddress())
AddressToSection.insert(std::make_pair(Section->getAddress(), Section));
NameToSection.insert(
std::make_pair(std::string(Section->getName()), Section));
LLVM_DEBUG(dbgs() << "BOLT-DEBUG: registering " << *Section << "\n");
return *Section;
}
BinarySection &BinaryContext::registerSection(SectionRef Section) {
[BOLT] Static data reordering pass. Summary: Enable BOLT to reorder data sections in a binary based on memory profiling data. This diff adds a new pass to BOLT that can reorder data sections for better locality based on memory profiling data. For now, the algorithm to order data is primitive and just relies on the frequency of loads to order the contents of a section. We could probably do a lot better by looking at what functions use the hot data and grouping together hot data that is used by a single function (or cluster of functions). Block ordering might give some hints on how to order the data better as well. The new pass has two basic modes: inplace and split (when inplace is false). The default is split since inplace hasn't really been tested much. When splitting is on, the cold data is copied to a "cold" version of the section while the hot data is kept in the original section, e.g. for .rodata, .rodata will contain the hot data and .bolt.org.rodata will contain the cold bits. In inplace mode, the section contents are reordered inplace. In either mode, all relocations to data within that section are updated to reflect new data locations. Things to improve: - The current algorithm is really dumb and doesn't seem to lead to any wins. It certainly could use some improvement. - Private symbols can have data that leaks over to an adjacent symbol, e.g. a string that has a common suffix can start in one symbol and leak over (with the common suffix) into the next. For now, we punt on adjacent private symbols. - Handle ambiguous relocations better. Section relocations that point to the boundary of two symbols will prevent the adjacent symbols from being moved because we can't tell which symbol the relocation is for. - Handle jump tables. Right now jump table support must be basic if data reordering is enabled. - Being able to handle TLS. A good amount of data access in some binaries are happening in TLS. It would be worthwhile to be able to reorder any TLS sections too. - Handle sections with writeable data. This hasn't been tested so probably won't work. We could try to prevent false sharing in writeable sections as well. - A pie in the sky goal would be to use DWARF info to reorder types. (cherry picked from FBD6792876)
2018-04-20 20:03:31 -07:00
return registerSection(new BinarySection(*this, Section));
}
BinarySection &
BinaryContext::registerSection(StringRef SectionName,
const BinarySection &OriginalSection) {
return registerSection(new BinarySection(*this,
SectionName,
OriginalSection));
}
BinarySection &BinaryContext::registerOrUpdateSection(StringRef Name,
unsigned ELFType,
unsigned ELFFlags,
uint8_t *Data,
uint64_t Size,
unsigned Alignment) {
auto NamedSections = getSectionByName(Name);
if (NamedSections.begin() != NamedSections.end()) {
assert(std::next(NamedSections.begin()) == NamedSections.end() &&
"can only update unique sections");
BinarySection *Section = NamedSections.begin()->second;
LLVM_DEBUG(dbgs() << "BOLT-DEBUG: updating " << *Section << " -> ");
const bool Flag = Section->isAllocatable();
Section->update(Data, Size, Alignment, ELFType, ELFFlags);
LLVM_DEBUG(dbgs() << *Section << "\n");
// FIXME: Fix section flags/attributes for MachO.
if (isELF())
assert(Flag == Section->isAllocatable() &&
"can't change section allocation status");
return *Section;
}
[BOLT] Static data reordering pass. Summary: Enable BOLT to reorder data sections in a binary based on memory profiling data. This diff adds a new pass to BOLT that can reorder data sections for better locality based on memory profiling data. For now, the algorithm to order data is primitive and just relies on the frequency of loads to order the contents of a section. We could probably do a lot better by looking at what functions use the hot data and grouping together hot data that is used by a single function (or cluster of functions). Block ordering might give some hints on how to order the data better as well. The new pass has two basic modes: inplace and split (when inplace is false). The default is split since inplace hasn't really been tested much. When splitting is on, the cold data is copied to a "cold" version of the section while the hot data is kept in the original section, e.g. for .rodata, .rodata will contain the hot data and .bolt.org.rodata will contain the cold bits. In inplace mode, the section contents are reordered inplace. In either mode, all relocations to data within that section are updated to reflect new data locations. Things to improve: - The current algorithm is really dumb and doesn't seem to lead to any wins. It certainly could use some improvement. - Private symbols can have data that leaks over to an adjacent symbol, e.g. a string that has a common suffix can start in one symbol and leak over (with the common suffix) into the next. For now, we punt on adjacent private symbols. - Handle ambiguous relocations better. Section relocations that point to the boundary of two symbols will prevent the adjacent symbols from being moved because we can't tell which symbol the relocation is for. - Handle jump tables. Right now jump table support must be basic if data reordering is enabled. - Being able to handle TLS. A good amount of data access in some binaries are happening in TLS. It would be worthwhile to be able to reorder any TLS sections too. - Handle sections with writeable data. This hasn't been tested so probably won't work. We could try to prevent false sharing in writeable sections as well. - A pie in the sky goal would be to use DWARF info to reorder types. (cherry picked from FBD6792876)
2018-04-20 20:03:31 -07:00
return registerSection(new BinarySection(*this, Name, Data, Size, Alignment,
ELFType, ELFFlags));
}
bool BinaryContext::deregisterSection(BinarySection &Section) {
BinarySection *SectionPtr = &Section;
auto Itr = Sections.find(SectionPtr);
if (Itr != Sections.end()) {
auto Range = AddressToSection.equal_range(SectionPtr->getAddress());
while (Range.first != Range.second) {
if (Range.first->second == SectionPtr) {
AddressToSection.erase(Range.first);
break;
}
++Range.first;
}
auto NameRange =
NameToSection.equal_range(std::string(SectionPtr->getName()));
while (NameRange.first != NameRange.second) {
if (NameRange.first->second == SectionPtr) {
NameToSection.erase(NameRange.first);
break;
}
++NameRange.first;
}
Sections.erase(Itr);
delete SectionPtr;
return true;
}
return false;
}
void BinaryContext::printSections(raw_ostream &OS) const {
for (BinarySection *const &Section : Sections) {
OS << "BOLT-INFO: " << *Section << "\n";
}
}
[BOLT] Refactor global symbol handling code. Summary: This is preparation work for static data reordering. I've created a new class called BinaryData which represents a symbol contained in a section. It records almost all the information relevant for dealing with data, e.g. names, address, size, alignment, profiling data, etc. BinaryContext still stores and manages BinaryData objects similar to how it managed symbols and global addresses before. The interfaces are not changed too drastically from before either. There is a bit of overlap between BinaryData and BinaryFunction. I would have liked to do some more refactoring to make a BinaryFunctionFragment that subclassed from BinaryData and then have BinaryFunction be composed or associated with BinaryFunctionFragments. I've also attempted to use (symbol + offset) for when addresses are pointing into the middle of symbols with known sizes. This changes the simplify rodata loads optimization slightly since the expression on an instruction can now also be a (symbol + offset) rather than just a symbol. One of the overall goals for this refactoring is to make sure every relocation is associated with a BinaryData object. This requires adding "hole" BinaryData's wherever there are gaps in a section's address space. Most of the holes seem to be data that has no associated symbol info. In this case we can't do any better than lumping all the adjacent hole symbols into one big symbol (there may be more than one actual data object that contributes to a hole). At least the combined holes should be moveable. Jump tables have similar issues. They appear to mostly be sub-objects for top level local symbols. The main problem is that we can't recognize jump tables at the time we scan the symbol table, we have to wait til disassembly. When a jump table is discovered we add it as a sub-object to the existing local symbol. If there are one or more existing BinaryData's that appear in the address range of a newly created jump table, those are added as sub-objects as well. (cherry picked from FBD6362544)
2017-11-14 20:05:11 -08:00
BinarySection &BinaryContext::absoluteSection() {
if (ErrorOr<BinarySection &> Section = getUniqueSectionByName("<absolute>"))
[BOLT] Refactor global symbol handling code. Summary: This is preparation work for static data reordering. I've created a new class called BinaryData which represents a symbol contained in a section. It records almost all the information relevant for dealing with data, e.g. names, address, size, alignment, profiling data, etc. BinaryContext still stores and manages BinaryData objects similar to how it managed symbols and global addresses before. The interfaces are not changed too drastically from before either. There is a bit of overlap between BinaryData and BinaryFunction. I would have liked to do some more refactoring to make a BinaryFunctionFragment that subclassed from BinaryData and then have BinaryFunction be composed or associated with BinaryFunctionFragments. I've also attempted to use (symbol + offset) for when addresses are pointing into the middle of symbols with known sizes. This changes the simplify rodata loads optimization slightly since the expression on an instruction can now also be a (symbol + offset) rather than just a symbol. One of the overall goals for this refactoring is to make sure every relocation is associated with a BinaryData object. This requires adding "hole" BinaryData's wherever there are gaps in a section's address space. Most of the holes seem to be data that has no associated symbol info. In this case we can't do any better than lumping all the adjacent hole symbols into one big symbol (there may be more than one actual data object that contributes to a hole). At least the combined holes should be moveable. Jump tables have similar issues. They appear to mostly be sub-objects for top level local symbols. The main problem is that we can't recognize jump tables at the time we scan the symbol table, we have to wait til disassembly. When a jump table is discovered we add it as a sub-object to the existing local symbol. If there are one or more existing BinaryData's that appear in the address range of a newly created jump table, those are added as sub-objects as well. (cherry picked from FBD6362544)
2017-11-14 20:05:11 -08:00
return *Section;
return registerOrUpdateSection("<absolute>", ELF::SHT_NULL, 0u);
}
ErrorOr<uint64_t>
BinaryContext::getUnsignedValueAtAddress(uint64_t Address,
size_t Size) const {
const ErrorOr<const BinarySection &> Section = getSectionForAddress(Address);
if (!Section)
return std::make_error_code(std::errc::bad_address);
if (Section->isVirtual())
return 0;
DataExtractor DE(Section->getContents(), AsmInfo->isLittleEndian(),
AsmInfo->getCodePointerSize());
auto ValueOffset = static_cast<uint64_t>(Address - Section->getAddress());
return DE.getUnsigned(&ValueOffset, Size);
}
ErrorOr<uint64_t>
BinaryContext::getSignedValueAtAddress(uint64_t Address,
size_t Size) const {
const ErrorOr<const BinarySection &> Section = getSectionForAddress(Address);
if (!Section)
return std::make_error_code(std::errc::bad_address);
if (Section->isVirtual())
return 0;
DataExtractor DE(Section->getContents(), AsmInfo->isLittleEndian(),
[BOLT rebase] Rebase fixes on top of LLVM Feb2018 Summary: This commit includes all code necessary to make BOLT working again after the rebase. This includes a redesign of the EHFrame work, cherry-pick of the 3dnow disassembly work, compilation error fixes, and port of the debug_info work. The macroop fusion feature is not ported yet. The rebased version has minor changes to the "executed instructions" dynostats counter because REP prefixes are considered a part of the instruction it applies to. Also, some X86 instructions had the "mayLoad" tablegen property removed, which BOLT uses to identify and account for loads, thus reducing the total number of loads reported by dynostats. This was observed in X86::MOVDQUmr. TRAP instructions are not terminators anymore, changing our CFG. This commit adds compensation to preserve this old behavior and minimize tests changes. debug_info sections are now slightly larger. The discriminator field in the line table is slightly different due to a change upstream. New profiles generated with the other bolt are incompatible with this version because of different hash values calculated for functions, so they will be considered 100% stale. This commit changes the corresponding test to XFAIL so it can be updated. The hash function changes because it relies on raw opcode values, which change according to the opcodes described in the X86 tablegen files. When processing HHVM, bolt was observed to be using about 800MB more memory in the rebased version and being about 5% slower. (cherry picked from FBD7078072)
2018-02-06 15:00:23 -08:00
AsmInfo->getCodePointerSize());
auto ValueOffset = static_cast<uint64_t>(Address - Section->getAddress());
return DE.getSigned(&ValueOffset, Size);
}
void BinaryContext::addRelocation(uint64_t Address,
MCSymbol *Symbol,
uint64_t Type,
uint64_t Addend,
uint64_t Value) {
ErrorOr<BinarySection &> Section = getSectionForAddress(Address);
assert(Section && "cannot find section for address");
Section->addRelocation(Address - Section->getAddress(),
Symbol,
Type,
Addend,
Value);
}
void BinaryContext::addDynamicRelocation(uint64_t Address,
MCSymbol *Symbol,
uint64_t Type,
uint64_t Addend,
uint64_t Value) {
ErrorOr<BinarySection &> Section = getSectionForAddress(Address);
assert(Section && "cannot find section for address");
Section->addDynamicRelocation(Address - Section->getAddress(),
Symbol,
Type,
Addend,
Value);
}
bool BinaryContext::removeRelocationAt(uint64_t Address) {
ErrorOr<BinarySection &> Section = getSectionForAddress(Address);
assert(Section && "cannot find section for address");
return Section->removeRelocationAt(Address - Section->getAddress());
}
const Relocation *BinaryContext::getRelocationAt(uint64_t Address) {
ErrorOr<BinarySection &> Section = getSectionForAddress(Address);
if (!Section)
return nullptr;
return Section->getRelocationAt(Address - Section->getAddress());
[BOLT] Improve ICP for virtual method calls and jump tables using value profiling. Summary: Use value profiling data to remove the method pointer loads from vtables when doing ICP at virtual function and jump table callsites. The basic process is the following: 1. Work backwards from the callsite to find the most recent def of the call register. 2. Work back from the call register def to find the instruction where the vtable is loaded. 3. Find out of there is any value profiling data associated with the vtable load. If so, record all these addresses as potential vtables + method offsets. 4. Since the addresses extracted by #3 will be vtable + method offset, we need to figure out the method offset in order to determine the actual vtable base address. At this point I virtually execute all the instructions that occur between #3 and #2 that touch the method pointer register. The result of this execution should be the method offset. 5. Fetch the actual method address from the appropriate data section containing the vtable using the computed method offset. Make sure that this address maps to an actual function symbol. 6. Try to associate a vtable pointer with each target address in SymTargets. If every target has a vtable, then this is almost certainly a virtual method callsite. 7. Use the vtable address when generating the promoted call code. It's basically the same as regular ICP code except that the compare is against the vtable and not the method pointer. Additionally, the instructions to load up the method are dumped into the cold call block. For jump tables, the basic idea is the same. I use the memory profiling data to find the hottest slots in the jumptable and then use that information to compute the indices of the hottest entries. We can then compare the index register to the hot index values and avoid the load from the jump table. Note: I'm assuming the whole call is in a single BB. According to @rafaelauler, this isn't always the case on ARM. This also isn't always the case on X86 either. If there are non-trivial arguments that are passed by value, there could be branches in between the setup and the call. I'm going to leave fixing this until later since it makes things a bit more complicated. I've also fixed a bug where ICP was introducing a conditional tail call. I made sure that SCTC fixes these up afterwards. I have no idea why I made it introduce a CTC in the first place. (cherry picked from FBD6120768)
2017-10-20 12:11:34 -07:00
}
const Relocation *BinaryContext::getDynamicRelocationAt(uint64_t Address) {
ErrorOr<BinarySection &> Section = getSectionForAddress(Address);
if (!Section)
return nullptr;
return Section->getDynamicRelocationAt(Address - Section->getAddress());
}
void BinaryContext::markAmbiguousRelocations(BinaryData &BD,
const uint64_t Address) {
auto setImmovable = [&](BinaryData &BD) {
BinaryData *Root = BD.getAtomicRoot();
LLVM_DEBUG(if (Root->isMoveable()) {
dbgs() << "BOLT-DEBUG: setting " << *Root << " as immovable "
<< "due to ambiguous relocation referencing 0x"
<< Twine::utohexstr(Address) << '\n';
});
Root->setIsMoveable(false);
};
if (Address == BD.getAddress()) {
setImmovable(BD);
// Set previous symbol as immovable
BinaryData *Prev = getBinaryDataContainingAddress(Address - 1);
if (Prev && Prev->getEndAddress() == BD.getAddress())
setImmovable(*Prev);
}
if (Address == BD.getEndAddress()) {
setImmovable(BD);
// Set next symbol as immovable
BinaryData *Next = getBinaryDataContainingAddress(BD.getEndAddress());
if (Next && Next->getAddress() == BD.getEndAddress())
setImmovable(*Next);
}
}
BinaryFunction *BinaryContext::getFunctionForSymbol(const MCSymbol *Symbol,
uint64_t *EntryDesc) {
std::shared_lock<std::shared_timed_mutex> Lock(SymbolToFunctionMapMutex);
auto BFI = SymbolToFunctionMap.find(Symbol);
if (BFI == SymbolToFunctionMap.end())
return nullptr;
BinaryFunction *BF = BFI->second;
if (EntryDesc)
[BOLT] Change symbol handling for secondary function entries Summary: Some functions could be called at an address inside their function body. Typically, these functions are written in assembly as C/C++ does not have a multi-entry function concept. The addresses inside a function body that could be referenced from outside are called secondary entry points. In BOLT we support processing functions with secondary/multiple entry points. We used to mark basic blocks representing those entry points with a special flag. There was only one problem - each basic block has exactly one MCSymbol associated with it, and for the most efficient processing we prefer that symbol to be local/temporary. However, in certain scenarios, e.g. when running in non-relocation mode, we need the entry symbol to be global/non-temporary. We could create global symbols for secondary points ahead of time when the entry point is marked in the symbol table. But not all such entries are properly marked. This means that potentially we could discover an entry point only after disassembling the code that references it, and it could happen after a local label was already created at the same location together with all its references. Replacing the local symbol and updating the references turned out to be an error-prone process. This diff takes a different approach. All basic blocks are created with permanently local symbols. Whenever there's a need to add a secondary entry point, we create an extra global symbol or use an existing one at that location. Containing BinaryFunction maps a local symbol of a basic block to the global symbol representing a secondary entry point. This way we can tell if the basic block is a secondary entry point, and we emit both symbols for all secondary entry points. Since secondary entry points are quite rare, the overhead of this approach is minimal. Note that the same location could be referenced via local symbol from inside a function and via global entry point symbol from outside. This is true for both primary and secondary entry points. (cherry picked from FBD21150193)
2020-04-19 22:29:54 -07:00
*EntryDesc = BF->getEntryIDForSymbol(Symbol);
return BF;
}
void BinaryContext::exitWithBugReport(StringRef Message,
const BinaryFunction &Function) const {
errs() << "=======================================\n";
errs() << "BOLT is unable to proceed because it couldn't properly understand "
"this function.\n";
errs() << "If you are running the most recent version of BOLT, you may "
"want to "
"report this and paste this dump.\nPlease check that there is no "
"sensitive contents being shared in this dump.\n";
errs() << "\nOffending function: " << Function.getPrintName() << "\n\n";
ScopedPrinter SP(errs());
SP.printBinaryBlock("Function contents", *Function.getData());
errs() << "\n";
Function.dump();
errs() << "ERROR: " << Message;
errs() << "\n=======================================\n";
exit(1);
}
BinaryFunction *
BinaryContext::createInjectedBinaryFunction(const std::string &Name,
bool IsSimple) {
InjectedBinaryFunctions.push_back(new BinaryFunction(Name, *this, IsSimple));
BinaryFunction *BF = InjectedBinaryFunctions.back();
setSymbolToFunctionMap(BF->getSymbol(), BF);
BF->CurrentState = BinaryFunction::State::CFG;
return BF;
}
std::pair<size_t, size_t>
[BOLT][non-reloc] Change function splitting in non-relocation mode Summary: This diff applies to non-relocation mode mostly. In this mode, we are limited by original function boundaries, i.e. if a function becomes larger after optimizations (e.g. because of the newly introduced branches) then we might not be able to write the optimized version, unless we split the function. At the same time, we do not benefit from function splitting as we do in the relocation mode since we are not moving functions/fragments, and the hot code does not become more compact. For the reasons described above, we used to execute multiple re-write attempts to optimize the binary and we would only split functions that were too large to fit into their original space. After the first attempt, we would know functions that did not fit into their original space. Then we would re-run all our passes again feeding back the function information and forcefully splitting such functions. Some functions still wouldn't fit even after the splitting (mostly because of the branch relaxation for conditional tail calls that does not happen in non-relocation mode). Yet we have emitted debug info as if they were successfully overwritten. That's why we had one more stage to write the functions again, marking failed-to-emit functions non-simple. Sadly, there was a bug in the way 2nd and 3rd attempts interacted, and we were not splitting the functions correctly and as a result we were emitting less optimized code. One of the reasons we had the multi-pass rewrite scheme in place, was that we did not have an ability to precisely estimate the code size before the actual code emission. Recently, BinaryContext obtained such functionality, and now we can use it instead of relying on the multi-pass rewrite. This eliminates redundant work of re-running the same function passes multiple times. Because function splitting runs before a number of optimization passes that run on post-CFG state (those rely on the splitting pass), we cannot estimate the non-split code size with 100% accuracy. However, it is good enough for over 99% of the cases to extract most of the performance gains for the binary. As a result of eliminating the multi-pass rewrite, the processing time in non-relocation mode with `-split-functions=2` is greatly reduced. With debug info update, it is less than half of what it used to be. New semantics for `-split-functions=<n>`: -split-functions - split functions into hot and cold regions =0 - do not split any function =1 - in non-relocation mode only split functions too large to fit into original code space =2 - same as 1 (backwards compatibility) =3 - split all functions (cherry picked from FBD17362607)
2019-09-11 15:42:22 -07:00
BinaryContext::calculateEmittedSize(BinaryFunction &BF, bool FixBranches) {
// Adjust branch instruction to match the current layout.
[BOLT][non-reloc] Change function splitting in non-relocation mode Summary: This diff applies to non-relocation mode mostly. In this mode, we are limited by original function boundaries, i.e. if a function becomes larger after optimizations (e.g. because of the newly introduced branches) then we might not be able to write the optimized version, unless we split the function. At the same time, we do not benefit from function splitting as we do in the relocation mode since we are not moving functions/fragments, and the hot code does not become more compact. For the reasons described above, we used to execute multiple re-write attempts to optimize the binary and we would only split functions that were too large to fit into their original space. After the first attempt, we would know functions that did not fit into their original space. Then we would re-run all our passes again feeding back the function information and forcefully splitting such functions. Some functions still wouldn't fit even after the splitting (mostly because of the branch relaxation for conditional tail calls that does not happen in non-relocation mode). Yet we have emitted debug info as if they were successfully overwritten. That's why we had one more stage to write the functions again, marking failed-to-emit functions non-simple. Sadly, there was a bug in the way 2nd and 3rd attempts interacted, and we were not splitting the functions correctly and as a result we were emitting less optimized code. One of the reasons we had the multi-pass rewrite scheme in place, was that we did not have an ability to precisely estimate the code size before the actual code emission. Recently, BinaryContext obtained such functionality, and now we can use it instead of relying on the multi-pass rewrite. This eliminates redundant work of re-running the same function passes multiple times. Because function splitting runs before a number of optimization passes that run on post-CFG state (those rely on the splitting pass), we cannot estimate the non-split code size with 100% accuracy. However, it is good enough for over 99% of the cases to extract most of the performance gains for the binary. As a result of eliminating the multi-pass rewrite, the processing time in non-relocation mode with `-split-functions=2` is greatly reduced. With debug info update, it is less than half of what it used to be. New semantics for `-split-functions=<n>`: -split-functions - split functions into hot and cold regions =0 - do not split any function =1 - in non-relocation mode only split functions too large to fit into original code space =2 - same as 1 (backwards compatibility) =3 - split all functions (cherry picked from FBD17362607)
2019-09-11 15:42:22 -07:00
if (FixBranches)
BF.fixBranches();
// Create local MC context to isolate the effect of ephemeral code emission.
IndependentCodeEmitter MCEInstance = createIndependentMCCodeEmitter();
MCContext *LocalCtx = MCEInstance.LocalCtx.get();
MCAsmBackend *MAB =
TheTarget->createMCAsmBackend(*STI, *MRI, MCTargetOptions());
SmallString<256> Code;
raw_svector_ostream VecOS(Code);
std::unique_ptr<MCObjectWriter> OW = MAB->createObjectWriter(VecOS);
std::unique_ptr<MCStreamer> Streamer(TheTarget->createMCObjectStreamer(
*TheTriple, *LocalCtx, std::unique_ptr<MCAsmBackend>(MAB), std::move(OW),
std::unique_ptr<MCCodeEmitter>(MCEInstance.MCE.release()), *STI,
/*RelaxAll=*/false,
/*IncrementalLinkerCompatible=*/false,
/*DWARFMustBeAtTheEnd=*/false));
Streamer->initSections(false, *STI);
MCSection *Section = MCEInstance.LocalMOFI->getTextSection();
Section->setHasInstructions(true);
// Create symbols in the LocalCtx so that they get destroyed with it.
MCSymbol *StartLabel = LocalCtx->createTempSymbol();
MCSymbol *EndLabel = LocalCtx->createTempSymbol();
MCSymbol *ColdStartLabel = LocalCtx->createTempSymbol();
MCSymbol *ColdEndLabel = LocalCtx->createTempSymbol();
Streamer->SwitchSection(Section);
Streamer->emitLabel(StartLabel);
emitFunctionBody(*Streamer, BF, /*EmitColdPart=*/false,
/*EmitCodeOnly=*/true);
Streamer->emitLabel(EndLabel);
if (BF.isSplit()) {
MCSectionELF *ColdSection =
LocalCtx->getELFSection(BF.getColdCodeSectionName(), ELF::SHT_PROGBITS,
ELF::SHF_EXECINSTR | ELF::SHF_ALLOC);
ColdSection->setHasInstructions(true);
Streamer->SwitchSection(ColdSection);
Streamer->emitLabel(ColdStartLabel);
emitFunctionBody(*Streamer, BF, /*EmitColdPart=*/true,
/*EmitCodeOnly=*/true);
Streamer->emitLabel(ColdEndLabel);
// To avoid calling MCObjectStreamer::flushPendingLabels() which is private
Streamer->emitBytes(StringRef(""));
Streamer->SwitchSection(Section);
}
// To avoid calling MCObjectStreamer::flushPendingLabels() which is private or
// MCStreamer::Finish(), which does more than we want
Streamer->emitBytes(StringRef(""));
MCAssembler &Assembler =
static_cast<MCObjectStreamer *>(Streamer.get())->getAssembler();
MCAsmLayout Layout(Assembler);
Assembler.layout(Layout);
const uint64_t HotSize =
Layout.getSymbolOffset(*EndLabel) - Layout.getSymbolOffset(*StartLabel);
const uint64_t ColdSize = BF.isSplit()
? Layout.getSymbolOffset(*ColdEndLabel) -
Layout.getSymbolOffset(*ColdStartLabel)
: 0ULL;
// Clean-up the effect of the code emission.
for (const MCSymbol &Symbol : Assembler.symbols()) {
MCSymbol *MutableSymbol = const_cast<MCSymbol *>(&Symbol);
MutableSymbol->setUndefined();
MutableSymbol->setIsRegistered(false);
}
return std::make_pair(HotSize, ColdSize);
}
bool BinaryContext::validateEncoding(const MCInst &Inst,
ArrayRef<uint8_t> InputEncoding) const {
SmallString<256> Code;
SmallVector<MCFixup, 4> Fixups;
raw_svector_ostream VecOS(Code);
MCE->encodeInstruction(Inst, VecOS, Fixups, *STI);
auto EncodedData = ArrayRef<uint8_t>((uint8_t *)Code.data(), Code.size());
if (InputEncoding != EncodedData) {
if (opts::Verbosity > 1) {
errs() << "BOLT-WARNING: mismatched encoding detected\n"
<< " input: " << InputEncoding << '\n'
<< " output: " << EncodedData << '\n';
}
return false;
}
return true;
}
uint64_t BinaryContext::getHotThreshold() const {
static uint64_t Threshold = 0;
if (Threshold == 0) {
Threshold = std::max((uint64_t)opts::ExecutionCountThreshold,
NumProfiledFuncs ? SumExecutionCount / (2 * NumProfiledFuncs) : 1);
}
return Threshold;
}
BinaryFunction *
BinaryContext::getBinaryFunctionContainingAddress(uint64_t Address,
[BOLT] Basic support for split functions Summary: This adds very basic and limited support for split functions. In non-relocation mode, split functions are ignored, while their debug info is properly updated. No support in the relocation mode yet. Split functions consist of a main body and one or more fragments. For fragments, the main part is called their parent. Any fragment could only be entered via its parent or another fragment. The short-term goal is to correctly update debug information for split functions, while the long-term goal is to have a complete support including full optimization. Note that if we don't detect split bodies, we would have to add multiple entry points via tail calls, which we would rather avoid. Parent functions and fragments are represented by a `BinaryFunction` and are marked accordingly. For now they are marked as non-simple, and thus only supported in non-relocation mode. Once we start building a CFG, it should be a common graph (i.e. the one that includes all fragments) in the parent function. The function discovery is unchanged, except for the detection of `\.cold\.` pattern in the function name, which automatically marks the function as a fragment of another function. Because of the local function name ambiguity, we cannot rely on the function name to establish child fragment and parent relationship. Instead we rely on disassembly processing. `BinaryContext::getBinaryFunctionContainingAddress()` now returns a parent function if an address from its fragment is passed. There's no jump table support at the moment. Jump tables can have source and destinations in both fragment and parent. Parent functions that enter their fragments via C++ exception handling mechanism are not yet supported. (cherry picked from FBD14970569)
2019-04-16 10:24:34 -07:00
bool CheckPastEnd,
bool UseMaxSize) {
auto FI = BinaryFunctions.upper_bound(Address);
if (FI == BinaryFunctions.begin())
return nullptr;
--FI;
const uint64_t UsedSize =
UseMaxSize ? FI->second.getMaxSize() : FI->second.getSize();
if (Address >= FI->first + UsedSize + (CheckPastEnd ? 1 : 0))
return nullptr;
[BOLT] Basic support for split functions Summary: This adds very basic and limited support for split functions. In non-relocation mode, split functions are ignored, while their debug info is properly updated. No support in the relocation mode yet. Split functions consist of a main body and one or more fragments. For fragments, the main part is called their parent. Any fragment could only be entered via its parent or another fragment. The short-term goal is to correctly update debug information for split functions, while the long-term goal is to have a complete support including full optimization. Note that if we don't detect split bodies, we would have to add multiple entry points via tail calls, which we would rather avoid. Parent functions and fragments are represented by a `BinaryFunction` and are marked accordingly. For now they are marked as non-simple, and thus only supported in non-relocation mode. Once we start building a CFG, it should be a common graph (i.e. the one that includes all fragments) in the parent function. The function discovery is unchanged, except for the detection of `\.cold\.` pattern in the function name, which automatically marks the function as a fragment of another function. Because of the local function name ambiguity, we cannot rely on the function name to establish child fragment and parent relationship. Instead we rely on disassembly processing. `BinaryContext::getBinaryFunctionContainingAddress()` now returns a parent function if an address from its fragment is passed. There's no jump table support at the moment. Jump tables can have source and destinations in both fragment and parent. Parent functions that enter their fragments via C++ exception handling mechanism are not yet supported. (cherry picked from FBD14970569)
2019-04-16 10:24:34 -07:00
return &FI->second;
[BOLT] Basic support for split functions Summary: This adds very basic and limited support for split functions. In non-relocation mode, split functions are ignored, while their debug info is properly updated. No support in the relocation mode yet. Split functions consist of a main body and one or more fragments. For fragments, the main part is called their parent. Any fragment could only be entered via its parent or another fragment. The short-term goal is to correctly update debug information for split functions, while the long-term goal is to have a complete support including full optimization. Note that if we don't detect split bodies, we would have to add multiple entry points via tail calls, which we would rather avoid. Parent functions and fragments are represented by a `BinaryFunction` and are marked accordingly. For now they are marked as non-simple, and thus only supported in non-relocation mode. Once we start building a CFG, it should be a common graph (i.e. the one that includes all fragments) in the parent function. The function discovery is unchanged, except for the detection of `\.cold\.` pattern in the function name, which automatically marks the function as a fragment of another function. Because of the local function name ambiguity, we cannot rely on the function name to establish child fragment and parent relationship. Instead we rely on disassembly processing. `BinaryContext::getBinaryFunctionContainingAddress()` now returns a parent function if an address from its fragment is passed. There's no jump table support at the moment. Jump tables can have source and destinations in both fragment and parent. Parent functions that enter their fragments via C++ exception handling mechanism are not yet supported. (cherry picked from FBD14970569)
2019-04-16 10:24:34 -07:00
}
BinaryFunction *
BinaryContext::getBinaryFunctionAtAddress(uint64_t Address) {
// First, try to find a function starting at the given address. If the
// function was folded, this will get us the original folded function if it
// wasn't removed from the list, e.g. in non-relocation mode.
auto BFI = BinaryFunctions.find(Address);
if (BFI != BinaryFunctions.end()) {
return &BFI->second;
}
// We might have folded the function matching the object at the given
// address. In such case, we look for a function matching the symbol
// registered at the original address. The new function (the one that the
// original was folded into) will hold the symbol.
if (const BinaryData *BD = getBinaryDataAtAddress(Address)) {
uint64_t EntryID = 0;
BinaryFunction *BF = getFunctionForSymbol(BD->getSymbol(), &EntryID);
if (BF && EntryID == 0)
[BOLT] Basic support for split functions Summary: This adds very basic and limited support for split functions. In non-relocation mode, split functions are ignored, while their debug info is properly updated. No support in the relocation mode yet. Split functions consist of a main body and one or more fragments. For fragments, the main part is called their parent. Any fragment could only be entered via its parent or another fragment. The short-term goal is to correctly update debug information for split functions, while the long-term goal is to have a complete support including full optimization. Note that if we don't detect split bodies, we would have to add multiple entry points via tail calls, which we would rather avoid. Parent functions and fragments are represented by a `BinaryFunction` and are marked accordingly. For now they are marked as non-simple, and thus only supported in non-relocation mode. Once we start building a CFG, it should be a common graph (i.e. the one that includes all fragments) in the parent function. The function discovery is unchanged, except for the detection of `\.cold\.` pattern in the function name, which automatically marks the function as a fragment of another function. Because of the local function name ambiguity, we cannot rely on the function name to establish child fragment and parent relationship. Instead we rely on disassembly processing. `BinaryContext::getBinaryFunctionContainingAddress()` now returns a parent function if an address from its fragment is passed. There's no jump table support at the moment. Jump tables can have source and destinations in both fragment and parent. Parent functions that enter their fragments via C++ exception handling mechanism are not yet supported. (cherry picked from FBD14970569)
2019-04-16 10:24:34 -07:00
return BF;
}
return nullptr;
}
DebugAddressRangesVector BinaryContext::translateModuleAddressRanges(
const DWARFAddressRangesVector &InputRanges) const {
DebugAddressRangesVector OutputRanges;
for (const DWARFAddressRange Range : InputRanges) {
auto BFI = BinaryFunctions.lower_bound(Range.LowPC);
while (BFI != BinaryFunctions.end()) {
const BinaryFunction &Function = BFI->second;
if (Function.getAddress() >= Range.HighPC)
break;
const DebugAddressRangesVector FunctionRanges =
Function.getOutputAddressRanges();
std::move(std::begin(FunctionRanges),
std::end(FunctionRanges),
std::back_inserter(OutputRanges));
std::advance(BFI, 1);
}
}
return OutputRanges;
}
[BOLT] Support for lite mode with relocations Summary: Add '-lite' support for relocations for improved processing time, memory consumption, and more resilient processing of binaries with embedded assembly code. In lite relocation mode, BOLT will skip full processing of functions without a profile. It will run scanExternalRefs() on such functions to discover external references and to create internal relocations to update references to optimized functions. Note that we could have relied on the compiler/linker to provide relocations for function references. However, there's no assurance that all such references are reported. E.g., the compiler can resolve inter-procedural references internally, leaving no relocations for the linker. The scan process takes about <10 seconds per 100MB of code on modern hardware. It's a reasonable overhead to live with considering the flexibility it provides. If BOLT fails to scan or disassemble a function, .e.g., due to a data object embedded in code, or an unsupported instruction, it enables a patching mode to guarantee that the failed function will call optimized/moved versions of functions. The patching happens at original function entry points. '-skip=<func1,func2,...>' option now can be used to skip processing of arbitrary functions in the relocation mode. With '-use-old-text' or '-strict' we require all functions to be processed. As such, it is incompatible with '-lite' option, and '-skip' option will only disable optimizations of listed functions, not their disassembly and emission. (cherry picked from FBD22040717)
2020-06-15 00:15:47 -07:00
} // namespace bolt
} // namespace llvm