Because indirect call tables use static addresses for call sites, but pc
values recorded by runtime may be subject to ASLR in PIE, we couldn't
find indirect call descriptions by their runtime address in PIE. It
resulted in [unknown] entries in profile for all indirect calls. We need
to substract base address of .text from runtime addresses to get the
corresponding static addresses. Here we create a getter for base address
of .text and substract it's return value from recorded PC values. It
converts them to static addresses, which then may be used to find the
corresponding indirect call descriptions.
Reviewed By: rafauler
Differential Revision: https://reviews.llvm.org/D154121
(cherry picked from commit a86dd9ae60662cfe9f9fb709a33c71d6fec66dfb)
When a binary is instrumented with --instrumentation-sleep-time and
instrumentation-wait-forks options and lauched, the profile is
periodically written until all the forks die. The problem is that we
cannot wait for the whole process tree, and we have no way to tell when
it's safe to read the profile. Hovewer, if we keep profile open
throughout the life of the process tree, we can use fuser to determine
when writing is finished.
Reviewed By: rafauler
Differential Revision: https://reviews.llvm.org/D154436
(cherry picked from commit a799298152e3ef08b4919cdaac7a614f7cca9bc6)
Specify blocks order used in YAML profile. Needed to ensure profile backwards
compatibility with pre-D155514 DFS order by default.
Reviewed By: #bolt, maksfb
Differential Revision: https://reviews.llvm.org/D156176
A jump table in a split function may contain an entry matching a start
address of another fragment of the function. While converting addresses
to labels, we used to ignore such entries resulting in underpopulated
jump table. Change that, so we always create one label per address.
Reviewed By: Amir
Differential Revision: https://reviews.llvm.org/D156013
* Sort ORC entries in the internal table. Older Linux kernels did not
sort them in the file (only during boot time).
* Add an option to dump sorted ORC tables (--dump-orc).
* Associate entries in the internal ORC table with a BinaryFunction
even when we are not changing the function.
* If the function doesn't have ORC entry at the start, propagate ORC
state from a previous entry.
Reviewed By: Amir
Differential Revision: https://reviews.llvm.org/D155767
In one of the previous diffs LocBuffer was changed to pass by value. This lead to
performance regression running BOLT on binaries with DWARF4 split dwarf.
Reviewed By: maksfb
Differential Revision: https://reviews.llvm.org/D155763
Pass the revision to checkout to (cmp-rev) as nfc-check-setup option.
Simpifies the comparison against arbitrary commit, not just the previous one.
Reviewed By: #bolt, rafauler
Differential Revision: https://reviews.llvm.org/D155657
Use layout order in YAML profile reading/writing. Preserve old behavior (DFS order)
under `-profile-use-dfs` option.
Reviewed By: spupyrev
Differential Revision: https://reviews.llvm.org/D155514
Propagate Linux Kernel ORC information read from the file to the whole
function CFG once the graph has been built. We have a choice to either
attach ORC state annotation to every instruction, or to the first
instruction in the basic block to conserve processing memory. I chose to
attach to every instruction under --print-orc option which is currently
on by default.
Depends on D155153, D154815
Reviewed By: Amir
Differential Revision: https://reviews.llvm.org/D155156
Read ORC (oops rewind capability) info used for unwinding the stack by
Linux Kernel. The info is stored in .orc_unwind and .orc_unwind_ip
sections. There is also a related .orc_lookup section that is being
populated by the kernel during runtime. Contents of the sections are
sorted for quicker lookup by a post-link objtool.
Unless we modify stack access instructions, we don't have to change ORC
info attributed to instructions in the binary. However, we need to
update instruction addresses and sort both sections based on the new
layout.
For pretty printing, we add "--print-orc" option that prints ORC info
next to instructions in code dumps.
Reviewed By: Amir
Differential Revision: https://reviews.llvm.org/D154815
There are cases in DWARF4 when Skeleton CU has ranges, but dwo CU doesn't.
Bug was introduced in new DWARFRewriter where for DWARF4 it would fall through
to DWARF5 case.
Reviewed By: maksfb
Differential Revision: https://reviews.llvm.org/D155033
The DWO Unit DIE, doesn't have low_pc/high_pc, so we were printing this error
for valid cases.
Reviewed By: maksfb
Differential Revision: https://reviews.llvm.org/D155032
Setting initial offset of DIE to input DIE. This is to make "printf" debugging
easier.
Reviewed By: maksfb
Differential Revision: https://reviews.llvm.org/D155031
This is a preparatory patch for extending DWARFDebugLine to properly
parse line number programs with maximum_operations_per_instruction > 1
for VLIW targets.
Add some scaffolding for handling op-index in line number programs, and
add printouts for that in the table. As this affects a lot of tests,
this is done in a separate commit to get a cleaner review for the actual
op-index implementation.
Verbose printouts are not present in many tests, and adding op-index to
those will require a bit more code changes, so that is done in the
actual implementation patch.
Reviewed By: StephenTozer
Differential Revision: https://reviews.llvm.org/D152535
We need to explicitly mark DWARFUnitInfo as non-copyable since MSVC's
STL has a `noexcept(false)` move constructor for `unordered_map`; see
the added comment for more details.
An alternative might be using SmallVector instead of std::vector, since
that never tries to copy elements [1]. That would result in a bunch of
API changes though, so I figured a smaller targeted fix was better.
[1] https://llvm.org/docs/ProgrammersManual.html#llvm-adt-smallvector-h
Reviewed By: ayermolo, maksfb
Differential Revision: https://reviews.llvm.org/D154924
BOLT used `ToolOutputFile::keep` to make sure the intermediary object
file was written to disk for debugging purposes when `--keep-tmp` was
passed. However, since and intermediary `buffer_ostream` was used to
stream to, and this class only writes to its output stream in its
destructor, the object file was lost whenever its destructor wouldn't
run. This could happen, for example, if there is a crash while linking.
This patch makes sure the object file is written to disk immediately
after we're done creating it. This is very useful while debugging
JITLink crashes. This patch also gets rid of creating a temporary file
when `--keep-tmp` is not passed by streaming the object file directly to
a `SmallString`.
Reviewed By: maksfb
Differential Revision: https://reviews.llvm.org/D154826
To reduce memory footprint changed so that we process and write out TUs first,
reset DIEBuilder and process CUs. CUs are processed in buckets. First bucket
contains all the CUs with cross CU references. Rest processd one at a time.
clang-17 build in debug mode, by clang-17.
before
8:25.81 real, 834.37 user, 86.03 sys, 0 amem, 79525064 mmem
8:02.20 real, 820.46 user, 81.81 sys, 0 amem, 79501616 mmem
7:52.69 real, 802.01 user, 83.99 sys, 0 amem, 79534392 mmem
after
7:49.35 real, 822.04 user, 66.19 sys, 0 amem, 34934260 mmem
7:42.16 real, 825.46 user, 63.52 sys, 0 amem, 34951660 mmem
7:46.71 real, 821.11 user, 63.14 sys, 0 amem, 34981164 mmem
Reviewed By: maksfb
Differential Revision: https://reviews.llvm.org/D151909
* Some cleanup and minor fixes for the new debug information re-writer before moving on
to productatization.
* The new rewriter wasn't handling binary with DWARF5 and DWARF4 with
-fdebug-types-sections.
* Removed dead cross cu reference code.
* Added support for DW_AT_sibling.
* With the new re-writer abbrev number can change which can lead to offset of Type
Units changing. Before we would just copy raw data. Changed to write out Type
Unit List. This is generated by gdb-add-index.
* Fixed how bolt handles gdb-index generated by gdb-11 with types sections.
Simplified logic that handles variations of gdb-index.
* Clang can generate two type units with the same hash, but different content. LLD
does not de-duplicate when ThinLTO is involved. Changed so that TU hash and
offset are used to make TU's unique.
* It is possible to have references within location expression to another DIE.
Fixed it so that relative offset is updated correctly.
* Removed all the code related to patching.
* Removed dead code. Changed how we handling writting out TUs and TU Index. It now
should fully work for DWARF4 and DWARF5.
* Removed unused arguments from some APIs, changed return type to void, and other
small cleanups.
Reviewed By: maksfb
Differential Revision: https://reviews.llvm.org/D151906
This revision implement new mechanism for DWARFRewriter.
In the new mechanism, we adopt the same way with DWARFLinker did.
By parsing Debug information into IR, we are allowed to handle debug information more flexible.
Now the debug information updating process relies on IR and IR will be written out to binary once the updating finished.
A new class was added: DIEBuilder. This class is responsible for parsing debug information and raising it to the IR level.
This class is also used to write out the .debug_info and .debug_abbrev sections.
Since we output brand new Abbrev section we won't need to always convert low_pc/high_pc into ranges.
When conversion does happen we can also remove low_pc entry.
Reviewed By: maksfb, ayermolo
Differential Revision: https://reviews.llvm.org/D130315
The issue was caused by the absence of placement new definition. It
worked for clang and thus passed Phabricator checks, but broke when
compiled with GCC on buildbot.
Full problem description: https://reviews.llvm.org/D153771#4468239
Original patch description:
In absence of instrumentation-file-append-pid option,
global allocator uses shared pages for allocation. However, since it is a
global variable, it gets COW'd after fork if instrumentation-sleep-time
is used, or if a process forks by itself. This means it handles the same
pages to every process which causes hash table corruption. Thus, if we
want shared pages, we need to put the allocator itself in a shared page,
which we do in this commit in __bolt_instr_setup.
I also added a couple of assertions to sanity-check the hash table.
Reviewed By: rafauler, Amir
Differential Revision: https://reviews.llvm.org/D153771
This reverts commit 460a2244430fae192298a5fd9fa2a269e540e8c1.
It breaks building on macOS, and it was landed with a review URL
pointing to some Facebook-internal service.
Also reverts a bunch of follow-ups:
Revert "[BOLT][DWARF] Don't check string offsets"
This reverts commit f9d6f48c8bf5acaac07502403c41cf0b0d89c8d2.
Revert "[BOLT][DWARF] Change to process and write out TUs first then CUs in batches"
This reverts commit 88e95c1e4bb6e2ad3bfd185b96341ad5c09eff6b.
Revert "[BOLT][DWARF] Output DWO files as they are being processed"
This reverts commit 46ca2e3fcd419b1246357ed3b9cd36630f16e64d.
Revert "[BOLT][DWARF] Don't check string offsets"
This reverts commit cfe4a4b04f219a9dbb4e3fc01883437b6ff0e702.
Revert "[BOLT][DWARF] Numerous fixes for a new DWARFRewriter"
This reverts commit 2701a661daa393ad5901ac88d420d7aa931eda0d.
Summary:
To reduce memory footprint changed so that we process and write out TUs first,
reset DIEBuilder and process CUs. CUs are processed in buckets. First bucket
contains all the CUs with cross CU references. Rest processd one at a time.
clang-17 build in debug mode, by clang-17.
before
8:25.81 real, 834.37 user, 86.03 sys, 0 amem, 79525064 mmem
8:02.20 real, 820.46 user, 81.81 sys, 0 amem, 79501616 mmem
7:52.69 real, 802.01 user, 83.99 sys, 0 amem, 79534392 mmem
after
7:49.35 real, 822.04 user, 66.19 sys, 0 amem, 34934260 mmem
7:42.16 real, 825.46 user, 63.52 sys, 0 amem, 34951660 mmem
7:46.71 real, 821.11 user, 63.14 sys, 0 amem, 34981164 mmem
Differential Revision: https://phabricator.intern.facebook.com/D45883198
Summary:
* Some cleanup and minor fixes for the new debug information re-writer before moving on
to productatization.
* The new rewriter wasn't handling binary with DWARF5 and DWARF4 with
-fdebug-types-sections.
* Removed dead cross cu reference code.
* Added support for DW_AT_sibling.
* With the new re-writer abbrev number can change which can lead to offset of Type
Units changing. Before we would just copy raw data. Changed to write out Type
Unit List. This is generated by gdb-add-index.
* Fixed how bolt handles gdb-index generated by gdb-11 with types sections.
Simplified logic that handles variations of gdb-index.
* Clang can generate two type units with the same hash, but different content. LLD
does not de-duplicate when ThinLTO is involved. Changed so that TU hash and
offset are used to make TU's unique.
* It is possible to have references within location expression to another DIE.
Fixed it so that relative offset is updated correctly.
* Removed all the code related to patching.
* Removed dead code. Changed how we handling writting out TUs and TU Index. It now
should fully work for DWARF4 and DWARF5.
* Removed unused arguments from some APIs, changed return type to void, and other
small cleanups.
Test Plan:
Reviewers:
Subscribers:
Tasks:
Tags:
Differential Revision: https://phabricator.intern.facebook.com/D46168257
Summary:
This revision implement new mechanism for DWARFRewriter.
In the new mechanism, we adopt the same way with DWARFLinker did.
By parsing Debug information into IR, we are allowed to handle debug information more flexible.
Now the debug information updating process relies on IR and IR will be written out to binary once the updating finished.
A new class was added: DIEBuilder. This class is responsible for parsing debug information and raising it to the IR level.
This class is also used to write out the .debug_info and .debug_abbrev sections.
Since we output brand new Abbrev section we won't need to always convert low_pc/high_pc into ranges.
When conversion does happen we can also remove low_pc entry.
Differential Revision: https://phabricator.intern.facebook.com/D39484421
Tasks: T117448832
There was a bug in a code that pre-populated line string for a case where parts
of .debug_line are not processed by BOLT, but copied as raw data. We were not
switching sections. This resulted in parts of the binary being over-written with
debug data.
Reviewed By: maksfb
Differential Revision: https://reviews.llvm.org/D154544
Create LinuxKernelRewriter and move kernel-specific code to this class.
Depends on D154023
Reviewed By: Amir
Differential Revision: https://reviews.llvm.org/D154024
Use new MetdataRewriter interface to update pseudo probes and move
ProbeDecoder out of BinaryContext into new PseudoProbeRewriter class.
Depends on D154021
Reviewed By: Amir
Differential Revision: https://reviews.llvm.org/D154022
Differential Revision: https://reviews.llvm.org/D154023
Migrate SDT markers processing to the new MetadataRewriter interface.
Depends on D154020
Reviewed By: Amir
Differential Revision: https://reviews.llvm.org/D154021
Introduce the MetadataRewriter interface to handle updates for various
types of auxiliary data stored in a binary file.
To implement metadata processing using this new interface, all metadata
rewriters should derive from the RewriterBase class and implement
one or more of the following methods, depending on the timing of metadata
read and write operations:
* preCFGInitializer()
* postCFGInitializer() // TBD
* preEmitFinalizer() // TBD
* postEmitFinalizer()
By adopting this approach, we aim to simplify the RewriteInstance class
and improve its scalability to accommodate new extensions of file formats,
including various metadata types of the Linux Kernel.
Differential Revision: https://reviews.llvm.org/D154020
This reverts commit c15e9b6814e53bccb0194268a826c1213a84b395.
The issue appears unrelated as the crash happened in the BOLTed binary, not
instrumented binary.
In a very rare case that mmap call fails, we'll at least get a message
instead of segfault.
Reviewed By: rafauler, Amir
Differential Revision: https://reviews.llvm.org/D154056