2503 Commits

Author SHA1 Message Date
Greg Clayton
e882edd5f1 Fixed an issue where llvm-gsymutil would crash when parsing bad inline ranges.
If a function contains inline function ranges whose address ranges are not contained in the parent scope, then emit an error message and omit them from the final GSYM. Prior to this we would only test if an inline function's address range was within the concrete function's ranges. If we ran into a case where the inline range was within the function's ranges, but not within one of the parent inline function's ranges, then we would fail to produce a GSYM file and exit with an error.

The current code will emit full details on invalid inline ranges as they are being parsed and will omit any bad ranges from the final GSYM file.

Differential Revision: https://reviews.llvm.org/D155254
2023-07-26 11:30:16 -07:00
Fangrui Song
52eab3430a PDBFileBuilder: Switch to xxh3_64bits
Following recent changes switching from xxh64 to xxh32 for better
hashing performance (e.g., D154813). I am not familiar with this use
case, but this change will ensure that the lld executable doesn't need
xxHash64 after wasm-ld migrates.
2023-07-19 15:31:51 -07:00
Maurice Heumann
d3c304fd63 [DWARF] Fix undefined behaviour in dwarf type printer
The value to be formatted here, Val, is an int64_t which cannot be
formatted using %x. This commit adjusts all misuses I was able to find
in the llvm-dwarfdump project.

Failing tests in https://reviews.llvm.org/D153800 lead to the discovery
and analysis of this issue.

Differential Revision: https://reviews.llvm.org/D155093
2023-07-14 12:54:18 -07:00
Jon Roelofs
a01d1831bc
[DebugInfo] Add missing dependency on intrinsics_gen 2023-07-13 17:11:53 -07:00
Eduard Zingerman
c8e055d485 [BPF][DebugInfo] Use .BPF.ext for line info when DWARF is not available
"BTF" is a debug information format used by LLVM's BPF backend.
The format is much smaller in scope than DWARF, the following info is
available:
- full set of C types used in the binary file;
- types for global values;
- line number / line source code information .

BTF information is embedded in ELF as .BTF and .BTF.ext sections.
Detailed format description could be found as a part of Linux Source
tree, e.g. here: [1].

This commit modifies `llvm-objdump` utility to use line number
information provided by BTF if DWARF information is not available.
E.g., the goal is to make the following to print source code lines,
interleaved with disassembly:

    $ clang --target=bpf -g test.c -o test.o
    $ llvm-strip --strip-debug test.o
    $ llvm-objdump -Sd test.o

    test.o:	file format elf64-bpf

    Disassembly of section .text:

    <foo>:
    ; void foo(void) {
    	r1 = 0x1
    ;   consume(1);
    	call -0x1
    	r1 = 0x2
    ;   consume(2);
    	call -0x1
    ; }
    	exit

A common production use case for BPF programs is to:
- compile separate object files using clang with `-g -c` flags;
- link these files as a final "static" binary using bpftool linker ([2]).
The bpftool linker discards most of the DWARF sections
(line information sections as well) but merges .BTF and .BTF.ext sections.
Hence, having `llvm-objdump` capable to print source code using .BTF.ext
is valuable.

The commit consists of the following modifications:

- llvm/lib/DebugInfo/BTF aka `DebugInfoBTF` component is added to host
  the code needed to process BTF (with assumption that BTF support
  would be added to some other tools as well, e.g. `llvm-readelf`):
  - `DebugInfoBTF` provides `llvm::BTFParser` class, that loads information
    from `.BTF` and `.BTF.ext` sections of a given `object::ObjectFile`
    instance and allows to query this information.
    Currently only line number information is loaded.

  - `DebugInfoBTF` also provides `llvm::BTFContext` class, which is an
    implementation of `DIContext` interface, used by `llvm-objdump` to
    query information about line numbers corresponding to specific
    instructions.

- Structure `DILineInfo` is modified with field `LineSource`.

  `DIContext` interface uses `DILineInfo` structure to communicate
  line number and source code information.
  Specifically, `DILineInfo::Source` field encodes full file source code,
  if available. BTF only stores source code for selected lines of the
  file, not a complete source file. Moreover, stored lines are not
  guaranteed to be sorted in a specific order.

  To avoid reconstruction of a file source code from a set of
  available lines, this commit adds `LineSource` field instead.

- `Symbolize` class is modified to use `BTFContext` instead of
  `DWARFContext` when DWARF sections are not available but BTF
  sections are present in the object file.
  (`Symbolize` is instantiated by `llvm-objdump`).

- Integration and unit tests.

Note, that DWARF has a notion of "instruction sequence".
DWARF implementation of `DIContext::getLineInfoForAddress()` provides
inexact responses if exact address information is not available but
address falls within "instruction sequence" with some known line
information (see `DWARFDebugLine::LineTable::findRowInSeq()`).

BTF does not provide instruction sequence groupings, thus
`getLineInfoForAddress()` queries only return exact matches.
This does not seem to be a big issue in practice, but output
of the `llvm-objdump -Sd` might differ slightly when BTF
is used instead of DWARF.

[1] https://www.kernel.org/doc/html/latest/bpf/btf.html
[2] https://github.com/libbpf/bpftool

Depends on https://reviews.llvm.org/D149501

Reviewed By: MaskRay, yonghong-song, nickdesaulniers, #debug-info

Differential Revision: https://reviews.llvm.org/D149058
2023-07-12 09:51:09 -07:00
David Stenberg
fe6cddef20 [DWARF] Allow op-index in line number programs
This extends DWARFDebugLine to properly parse line number programs with
maximum_operations_per_instruction > 1 for VLIW targets.

No functions that use that parsed output to retrieve line information
have been extended to support multiple op-indexes. This means that when
retrieving information for an address with multiple op-indexes, e.g.
when using llvm-addr2line, the penultimate row for that address will be
used, which in most cases is the row for the second largest op-index.
This will be addressed in further changes, but this patch at least
allows us to correctly parse such line number programs, with a warning
saying that the line number information may be incorrect (incomplete).

Reviewed By: StephenTozer

Differential Revision: https://reviews.llvm.org/D152536
2023-07-12 13:46:29 +02:00
David Stenberg
6aa94c64a5 [DWARF] Add printout for op-index
This is a preparatory patch for extending DWARFDebugLine to properly
parse line number programs with maximum_operations_per_instruction > 1
for VLIW targets.

Add some scaffolding for handling op-index in line number programs, and
add printouts for that in the table. As this affects a lot of tests,
this is done in a separate commit to get a cleaner review for the actual
op-index implementation.

Verbose printouts are not present in many tests, and adding op-index to
those will require a bit more code changes, so that is done in the
actual implementation patch.

Reviewed By: StephenTozer

Differential Revision: https://reviews.llvm.org/D152535
2023-07-12 12:03:44 +02:00
Rui Zhong
87fb0ea27e [BOLT][DWARF] Implement new mechanism for DWARFRewriter
This revision implement new mechanism for DWARFRewriter.
In the new mechanism, we adopt the same way with DWARFLinker did.
By parsing Debug information into IR, we are allowed to handle debug information more flexible.
Now the debug information updating process relies on IR and IR will be written out to binary once the updating finished.

A new class was added: DIEBuilder. This class is responsible for parsing debug information and raising it to the IR level.
This class is also used to write out the .debug_info and .debug_abbrev sections.
Since we output brand new Abbrev section we won't need to always convert low_pc/high_pc into ranges.
When conversion does happen we can also remove low_pc entry.

Reviewed By: maksfb, ayermolo

Differential Revision: https://reviews.llvm.org/D130315
2023-07-10 14:42:03 -07:00
Nico Weber
de7781ea42 Revert "[DWARF][BOLT] Implement new mechanism for DWARFRewriter"
This reverts commit 460a2244430fae192298a5fd9fa2a269e540e8c1.
It breaks building on macOS, and it was landed with a review URL
pointing to some Facebook-internal service.

Also reverts a bunch of follow-ups:

Revert "[BOLT][DWARF] Don't check string offsets"
This reverts commit f9d6f48c8bf5acaac07502403c41cf0b0d89c8d2.

Revert "[BOLT][DWARF] Change to process and write out TUs first then CUs in batches"
This reverts commit 88e95c1e4bb6e2ad3bfd185b96341ad5c09eff6b.

Revert "[BOLT][DWARF] Output DWO files as they are being processed"
This reverts commit 46ca2e3fcd419b1246357ed3b9cd36630f16e64d.

Revert "[BOLT][DWARF] Don't check string offsets"
This reverts commit cfe4a4b04f219a9dbb4e3fc01883437b6ff0e702.

Revert "[BOLT][DWARF] Numerous fixes for a new DWARFRewriter"
This reverts commit 2701a661daa393ad5901ac88d420d7aa931eda0d.
2023-07-07 08:07:01 -04:00
Alexander Yermolovich
460a224443 [DWARF][BOLT] Implement new mechanism for DWARFRewriter
Summary:
This revision implement new mechanism for DWARFRewriter.
In the new mechanism, we adopt the same way with DWARFLinker did.
By parsing Debug information into IR, we are allowed to handle debug information more flexible.
Now the debug information updating process relies on IR and IR will be written out to binary once the updating finished.

A new class was added: DIEBuilder. This class is responsible for parsing debug information and raising it to the IR level.
This class is also used to write out the .debug_info and .debug_abbrev sections.
Since we output brand new Abbrev section we won't need to always convert low_pc/high_pc into ranges.
When conversion does happen we can also remove low_pc entry.

Differential Revision: https://phabricator.intern.facebook.com/D39484421

Tasks: T117448832
2023-07-06 14:21:26 -07:00
Daniel Thornburgh
7b31a73ffe [Symbolizer] Ignore unknown additional symbolizer markup fields
The symbolizer markup syntax is structured such that fields require only
previous fields for their interpretation; this was originally intended
to make adding new fields a natural extension mechanism for existing
elements. This codifies this into the spec and makes the behavior of the
llvm-symbolizer match. Extra fields are now warned about, but ignored,
rather than ignoring the whole element.

Reviewed By: mcgrathr

Differential Revision: https://reviews.llvm.org/D153821
2023-06-28 10:39:28 -07:00
Alex Langford
c1b84d9b7c [DebugInfo] Add error handling to DWARFDebugAbbrev::getAbbreviationDeclarationSet
This gives us more meaningful information when
`getAbbreviationDeclarationSet` fails. Right now only
`verifyAbbrevSection` actually uses the error that it returns, but the
other call sites could be rewritten to take advantage of the returned error.

Differential Revision: https://reviews.llvm.org/D153459
2023-06-27 10:30:50 -07:00
Elliot Goodrich
b0abd4893f [llvm] Add missing StringExtras.h includes
In preparation for removing the `#include "llvm/ADT/StringExtras.h"`
from the header to source file of `llvm/Support/Error.h`, first add in
all the missing includes that were previously included transitively
through this header.
2023-06-25 15:42:22 +01:00
Fangrui Song
08be7791e1 [DWARF] Remove remnant .zdebug section recognition
There is a minor behavior difference that is not worth testing for the obsoleted
format. Previously, llvm-dwarfdump considers .zdebug_info as a debug section but
does not decompress it, leading to a warning when the content cannot be parsed.
Now llvm-dwarfdump just ignores the section without a warning.
2023-06-24 11:07:12 -07:00
Fangrui Song
e28b3ea52a [DWARF] Adjust warning condition for .dwo sections with relocations
D106624 added a .dwo warning (when there are relocations) that may fire for
non-debug sections, e.g. `.rodata.dwo` when there is a data symbol foo in
-fdata-sections mode. Adjust it to only warn for .debug sections.

While here, change the diagnostic to be more conventional
https://llvm.org/docs/CodingStandards.html#error-and-warning-messages and use
the relocated section name instead of the relocation section name.

This change does not handle `.zdebug` (support was mostly removed from LLVM) or
`__debug` (Mach-O, no DWO support).

Reviewed By: ayermolo, HaohaiWen

Differential Revision: https://reviews.llvm.org/D153602
2023-06-24 10:06:17 -07:00
Fangrui Song
620bff758d [llvm-addr2line] Replace checkFileExists with getOrCreateModuleInfo
GNU addr2line exits immediately if -e (default to a.out) specifies a file that
cannot be open or a directory. llvm-addr2line used to wait for input on if the
input file cannot be open and addresses are not specified in command line.
Replace the D147652 checkFileExists with getOrCreateModuleInfo to avoid
a separate `sys::fs::status` operation.

Reviewed By: sepavloff

Differential Revision: https://reviews.llvm.org/D153595
2023-06-23 10:04:13 -07:00
Serge Pavlov
1099208b99 [symbolizer] Check existence of input file in GNU mode
GNU addr2line exits immediately if it cannot open the file specified as
executable/relocatable. In contrast llvm-addr2line does not exit and, if
addresses are not specified in command line, waits for input on stdin. This
causes the test compiler-rt/test/asan/TestCases/Posix/asan-symbolize-bad-path.cc to block
forever on Gentoo (see https://reviews.llvm.org/rG27c4777f41d2ab204c1cf84ff1cccd5ba41354da#1190273).
To fix this issue the behavior llvm-addr2line now exits if
executable/relocatable file cannot be found.

It fixes https://github.com/llvm/llvm-project/issues/42099 (llvm-addr2line
does not exit when passed a non-existent file).

Differential Revision: https://reviews.llvm.org/D147652
2023-06-23 17:20:15 +07:00
Felipe de Azevedo Piovezan
15a1f7f6f7 [AppleTables] Implement iterator over all entries in table
This commit adds functionality to the Apple Accelerator table allowing iteration
over all elements in the table.

Our iterators look like streaming iterators: when we increment the iterator we
check if there is still enough data in the "stream" (in our case, the blob of
data of the accelerator table) and extract the next entry. If any failures
occur, we immediately set the iterator to be the end iterator.

Since the ultimate user of this functionality is LLDB, there are roughly two
iteration methods we want support: one that also loads the name of each entry,
and one which does not. Loading names is measurably slower (one order the
magnitude) than only loading DIEs, so we used some template metaprograming to
implement both iteration methods.

Depends on D153066

Differential Revision: https://reviews.llvm.org/D153066
2023-06-21 06:44:56 -04:00
Scott Linder
58669354bf [DebugInfo] Add DW_OP_LLVM_user extension point
The extension codespace for DWARF expressions (DW_OP_LLVM_{lo,hi}_user)
has shrunk over time, as no extension is ever "retired" in practice. To
facilitate future extensions, this patch reserves one open opcode as an extension
point (0xfe), which is followed by a ULEB128-encoded SubOperation, and
then by the subop's operands.

There is some prior-art, namely DW_OP_AARCH64_operation
(see edd7460d87/aadwarf64/aadwarf64.rst (45dwarf-expression-operations)).

This version makes some different tradeoffs, opting to use a ULEB128 for
the subop encoding for future-proofing.

Reviewed By: #debug-info, dblaikie

Differential Revision: https://reviews.llvm.org/D147271
2023-06-19 21:46:24 +00:00
Scott Linder
2e6bb8c9b8 [DebugInfo] Support more than 2 operands in DWARF operations
Update DWARFExpression::Operation and LVOperation to support more than
2 operands.

Take the opportunity to use a SmallVector, which will handle at least 2
operands without allocation anyway, and removes the static limit
completely.

As there is no longer the concept of an "unused operand", remove
Operation::Encoding::SizeNA. Any use of it is now replaced with explicit
checks for how many operands an operation has.

There are still places where the limit remains 2, namely in the
DWARFLinker and in DIExpressions, but these can be updated in later
patches as-needed.

There are no explicit tests as this is nearly NFC: no new operation is
added which makes use of the additional operand capacity yet. A future
patch adding a new DWARF extension point will include operations which
require the support.

Reviewed By: Orlando, CarlosAlbertoEnciso

Differential Revision: https://reviews.llvm.org/D147270
2023-06-19 19:38:26 +00:00
Alex Langford
fecabf4006 [DebugInfo][NFCI] Follow-up to 0356ceedf2e9 2023-06-16 13:32:53 -07:00
Alex Langford
0356ceedf2 [DebugInfo] Change DWARFDebugAbbrev initialization
I plan on adding better error handling to DWARFDebugAbbrev, but first I
want us to be able to better reason about a state of a DWARFDebugAbbrev.
I want us to be able to initialize a DWARFDebugAbbrev in its complete
state instead of creating it and then calling an `extract` method
manually. If its Data field is populated, then parsing is not complete.
If Data is `std::nullopt`, then parsing is done and this section is
"finalized" in some sense.

Additionally, I have removed the `clear()` method. This makes sense for other
classes like DWARFAbbreviationDeclaration where we may want to re-use an object
repeatedly to avoid repeated initializations and allocations, but for
DWARFDebugAbbrev we pretty much create one and stick with it.

Differential Revision: https://reviews.llvm.org/D152947
2023-06-16 11:16:16 -07:00
Felipe de Azevedo Piovezan
0b49e503a5 [nfc][AppleTables] Rename iterator
We will soon need different kinds of iterators
We also use one of LLVM's iterator classes to implement some basic iterator
operations.

Differential Revision: https://reviews.llvm.org/D153065
2023-06-16 11:47:24 -04:00
Felipe de Azevedo Piovezan
bae2e56115 [DebugInfo] Fix detection of hash collision in Apple Accel tables
The current implementation was ignoring the possibility of collisions.

Differential Revision: D152586
2023-06-15 12:46:21 -04:00
Felipe de Azevedo Piovezan
0f11a5f172 Revert "[DebugInfo] Fix detection of hash collision in Apple Accel tables"
Committed a slightly older patch without the proper link to the review.

This reverts commit 42874f66475548541cd871d5790116b2bc68b89a.
2023-06-15 12:36:58 -04:00
Felipe de Azevedo Piovezan
42874f6647 [DebugInfo] Fix detection of hash collision in Apple Accel tables
The current implementation was ignoring the possibility of collisions.
2023-06-15 12:35:48 -04:00
Felipe de Azevedo Piovezan
0af7d97fb5 [AppleAccelTable][NFC] Refactor iterator class
The current implementation of the AppleAcceleratorTable::Entry is problematic
for a few reasons:

1. It is heavyweight. Iterators should be cheap, but the current implementation
tracks 3 different integer values, one "Entry" object, and one pointer to the
actual accelerator table. Most of this information is redundant and can be
removed.

2. It performs "memory reads" outside of the dereference operation. This
violates the usual expectations of iterators, whereby we don't access anything
so long as we don't dereference  the iterator.

3. It doesn't commit to tracking _one_ thing only. It tries to track both an
"index" into a list of HashData entries and a pointer in a blob of data. For
this reason, it allows for multiple "invalid" states, keeps redundant
information around and is difficult to understand.

4. It couples the interpretation of the data with the iterator increment. As
such, if the *interpretation* fails, the iterator will keep on producing garbage
values without ever indicating so to consumers.

The problem this iterator is trying to solve is simple: we have a blob of data
containing many "HashData" entries and we want to iterate over them. As such,
this commit makes the iterator only track a pointer over that data, and it
decouples the iterator increments from the interpretation of this blob of data.

We maintain the already existing assumption that failures never happen, but now
make it explicit with an assert.

Depends on D152158

Differential Revision: https://reviews.llvm.org/D152159
2023-06-09 12:40:08 -04:00
Felipe de Azevedo Piovezan
47755e1ff8 [AppleAccelTable][NFC] Improve code readability
This commit does a few minor NFC cleanups:

* A variable was called "Atom", probably trying to claim it was an AtomType.
This was incorrect, it is actually a FormValue.

* LLVM provides a `zip_equal` to express the intent of asserting ranges with the
same size. We change the lookup method to use that.

* The use of tuples made the code slightly difficult to follow, as such we
unpack the tuple with structure binding to improve readability.

Depends on D152157

Differential Revision: https://reviews.llvm.org/D152158
2023-06-08 16:36:08 -04:00
Felipe de Azevedo Piovezan
c1059dcb1c [AppleAccelTable] Keep track of the size of each hash data entry
In a future patch, it will be desirable to skip over all hash data entries for a
particular string. In order to do so, we must know how many bytes each of those
entries have.

In its full specification, Apple tables allow for variable length entries, which
would make the above impossible without reading the data of each entry. However,
this is largely unsupported today (as a FIXME in the code indicates, there is a
bug with hash collisions exactly because we don't know how to skip over data),
and the documentation[1] states that:

> For the current implementations of the “.apple_names” (all functions +
> globals), the “.apple_types” (names of all types that are defined), and the
> “.apple_namespaces” (all namespaces), we currently set the Atom array to be:
> [...]
> This defines the contents to be the DIE offset (eAtomTypeDIEOffset) that is
> encoded as a 32 bit value (DW_FORM_data4).

In other words, we only produce fixed sized entries.

A few tests containing invalid dwarf had to be updated, as the error is now
detected earlier (when the accelerator table is being parsed, instead of inside
the explicit call to "verify").

[1]: https://llvm.org/docs/SourceLevelDebugging.html#fixed-lookup

Depends on D152156

Differential Revision: https://reviews.llvm.org/D152157
2023-06-08 14:20:54 -04:00
Alex Langford
715726429e [DebugInfo] Add error-handling to DWARFAbbreviationDeclarationSet
This commit aims to improve error handling in the
DWARFAbbreviationDeclarationSet class. Specifically, we change the return type
of DWARFAbbreviationDeclarationSet::extract to an llvm::Error. In doing
so, we propagate the error from DWARFAbbreviationDeclaration::extract
another layer upward.

I have built on the previous unittest for DWARFDebugAbbrev that I
wrote a few days prior.
Namely, I am verifying that the following should give an error:
- An invalid tag following a non-null code
- An invalid attribute with a valid form
- A valid attribute with an invalid form
- An incorrectly terminated DWARFAbbreviationDeclaration

Additionally, I uncovered some invalid DWARF in an unrelated dsymutil
test. Namely the last Abbreviation Decl was missing a code.
This test has been updated accordingly. However, this commit does
not fix the underlying issue: llvm-dwarfdump does not correctly
verify the debug abbreviation section to catch these kinds of
mistakes. I have updated DWARFVerifier to not dereference a
pointer without first checking it and left a FIXME for future
contributors.

Differential Revision: https://reviews.llvm.org/D151353
2023-06-08 11:16:42 -07:00
Felipe de Azevedo Piovezan
a2404bc323 [AppleAccelTable][NFC] Make FormParams member of AppleAccelTable
These are used throughout the class and are recreated every time they are used.
To prevent the risk of it being created incorrectly in different places, we
create it once and in the earliest moment possible: when the table is extracted.

Depends on D151989

Differential Revision: https://reviews.llvm.org/D152156
2023-06-08 11:15:56 -04:00
Felipe de Azevedo Piovezan
0a7980988e [AppleAccelTable][NFC] Refactor equal_range code
The current implementation of AppleAcceleratorTable::equal_range has a couple of
drawbacks:

1. Assumptions about how the hash table is structured are hard-coded throughout
the code. Unless you are familiar with the data layout of the table, it becomes
tricky to follow the code.
2. We currently load strings from the string table even for hashes that don't
match our current search hash. This is wasteful.
3. There is no error checking in most DataExtractor calls that can fail.

This patch cleans up (1) by making helper methods that hide the details of the
data layout from the algorithms relying on them. This should also help us in
future patches, where we want to expand the interface to allow iterating over
_all_ entries in the table, and potentially clean up the existing Iterator
class.

The changes above also fix (2), as the problem "just vanishes" when you have a
function called "idxOfHashInBucket(SearchHash)".

The changes above also fix (3), as having individual functions allow us to
expose the points in which reading data can fail. This is particularly important
as we would like to share this implementation with LLDB, which requires robust
error handling.

The changes above are also a step towards addressing a comment left by the
original author:
```
  /// TODO: Generalize the rest of the AppleAcceleratorTable interface and move it
  /// to this class.
```
I suspect a lot of these helper functions created also apply to DWARF 5's
accelerator table, so they could be moved to the base class.

The changes above also expose a bug in this implementation: the previous
algorithm looks at _one_ string inside the bucket, but never bothers checking
for collisions. When the main search loop is written as it is with this patch,
the problem becomes evident. We do not fix the issue in this patch, as it is
intended to be NFC.

Differential Revision: https://reviews.llvm.org/D151989
2023-06-08 10:26:58 -04:00
Alex Langford
94935c0d9a Re-apply "Revert "[DebugInfo] Add error checking around data extraction in DWARFAbbreviationDeclaration::extract""
This reverts commit 11d61c079d4b4927efea42a38a27d4586887b764 to re-apply
6836a47b7e6b57927664ec6ec750ae37bb951129 with modifications.

Specifically, the errors in DWARFAbbreviationDeclaration::extract needed
to be moved as they are returned to ensure the right Error constructor
is selected.
2023-06-06 11:13:31 -07:00
Alex Langford
11d61c079d Revert "[DebugInfo] Add error checking around data extraction in DWARFAbbreviationDeclaration::extract"
This reverts commit 6836a47b7e6b57927664ec6ec750ae37bb951129.
This breaks some bots, need to investigate.
2023-06-06 10:59:59 -07:00
Alex Langford
6836a47b7e [DebugInfo] Add error checking around data extraction in DWARFAbbreviationDeclaration::extract
In trying to hoist errors further up this callstack, I discovered that
if the data in the debug_abbrev section is invalid entirely, the code
that parses the debug_abbrev section may do strange or unpredictable
things. The underlying issue is that DataExtractor will return a value
of 0 when it encounters an error in extracting a LEB128 value. It's thus
difficult to determine if there was an error just by looking at the
return value. This patch aims to bail at the first sight of an error in
the debug_abbrev parsing code.

Differential Revision: https://reviews.llvm.org/D151755
2023-06-06 10:50:54 -07:00
Nick Desaulniers
8abbc17ff3 reland: [Demangle] make llvm::demangle take std::string_view rather than const std::string&
As suggested by @erichkeane in
https://reviews.llvm.org/D141451#inline-1429549

There's potential for a lot more cleanups around these APIs. This is
just a start.

Callers need to be more careful about sub-expressions producing strings
that don't outlast the expression using `llvm::demangle`. Add a
release note.

Differential Revision: https://reviews.llvm.org/D149104
2023-06-06 10:18:06 -07:00
Nick Desaulniers
db98ac0827 [Demangle] convert microsoftDemangle to take a std::string_view
This should be last of the "bottom-up conversions" of various demanglers
to accept std::string_view.  After this, D149104 may be revisited.

Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D152176
2023-06-05 13:00:20 -07:00
David Blaikie
5e74b2e8bb llvm-dwarfdump --verify: Add support for .debug_str_offsets[.dwo]
Had a couple of issues lately causing corrupted strings due to
problematic str_offsets (overflow due to >4GB .debug_str.dwo section in
a dwp and the dwp tool silently overflowing the 32 bit offsets updated
in the .debug_str_offsets.dwo section, and then more recently two CUs in
a dwo caused the dwp tool to reapply the offset adjustment twice
corrupting str_offsets.dwo as well) - so let's check that the offsets
are valid.

This assumes no suffix merging - if anyone implements that, then this
checking should just be removed for the most part (we could still check
the offsets are within the bounds of .debug_str[.dwo], but nothing more
- any offset in the range would be valid, the offsets wouldn't have to
land at the start of a string)
2023-06-05 19:59:37 +00:00
Nick Desaulniers
61e1c3d80d [Demangle] convert itaniumDemangle and nonMicrosoftDemangle to use std::string_view
D149104 converted llvm::demangle to use std::string_view. Enabling
"expensive checks" (via -DLLVM_ENABLE_EXPENSIVE_CHECKS=ON) causes
lld/test/wasm/why-extract.s to fail. The reason for this is obscure:

Reason #10007 why std::string_view is dangerous:
Consider the following pattern:

  std::string_view s = ...;
  const char *c = s.data();
  std::strlen(c);

Is c a NUL-terminated C style string? It depends; but if it's not then
it's not safe to call std::strlen on the std::string_view::data().
std::string_view::length() should be used instead.

Fixing this fixes the one lone test that caught this.

microsoftDemangle, rustDemangle, and dlangDemangle should get this same
treatment, too. I will do that next.

Reviewed By: MaskRay, efriedma

Differential Revision: https://reviews.llvm.org/D149675
2023-06-02 14:53:49 -07:00
David Blaikie
3c5e3b70a3 llvm-symbolizer: access the base address from the skeleton CU, not the split unit
In Split DWARF, if the unit had a non-trivial base address (a real
low_pc, rather than one with fixed value 0) then computing addresses
needs to access that base address to add to any base address-relative
values. But the code was trying to access the base address in the split
unit, when it's actually in the skeleton unit. So delegate to the
skeleton if it's available.

Fixes #62941
2023-05-26 00:59:40 +00:00
Craig Topper
6006d43e2d LLVM_FALLTHROUGH => [[fallthrough]]. NFC
Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D150996
2023-05-24 12:40:10 -07:00
Scott Linder
7f033d0f75 [llvm-debuginfo-analyzer] Support both Reference and Type attrs in single DIE
Relax the assumption that at most one Reference-or-Type-like attribute is
present on a DWARF DIE.

Add support for at most one Type attribute (i.e. DW_AT_import xor
DW_AT_type) and separately at most one Reference attribute (i.e.
DW_AT_specification xor DW_AT_abstract_origin xor ...).

Update comment describing old assumption and tag it as a "FIXME" to
reflect the fact that this is perhaps still not general enough.

Add a test based on the case which led me to encounter the bug in the
wild.

Reviewed By: CarlosAlbertoEnciso

Differential Revision: https://reviews.llvm.org/D150713
2023-05-23 20:52:45 +00:00
Alex Langford
631ff46cbf [DebugInfo][NFCI] Refactor DWARFAbbreviationDeclaration::extract
The motivation behind this refactor is to be able to use
DWARFAbbreviationDeclaration from LLDB. LLDB has its own implementation
of DWARFAbbreviationDeclaration that is very similar to LLVM's but it
has different semantics around error handling.

This patch modifies llvm::DWARFAbbreviationDeclaration::extract to
return an `llvm::Expected<ExtractState>` to differentiate between "I am
done extracting" and "An error has occured", something which the current
return type (bool) does not accurately capture.

Differential Revision: https://reviews.llvm.org/D150607
2023-05-16 12:54:38 -07:00
Alexey Lapshin
bd0dd27bb5 [DWARFLinker][DWARFv5] Add handling of DW_OP_addrx and DW_OP_constx expression operands.
This patch adds handling of DW_OP_addrx and DW_OP_constx expression operands.
In --update case these operands are preserved as is. Otherwise they are
converted into the DW_OP_addr and DW_OP_const[*]u correspondingly.

Differential Revision: https://reviews.llvm.org/D147066
2023-05-16 19:27:16 +02:00
Kazu Hirata
6c3ea866e9 [llvm] Migrate {starts,ends}with_insensitive to {starts,ends}_with_insensitive (NFC)
This patch migrates uses of StringRef::{starts,ends}with_insensitive
to StringRef::{starts,ends}_with_insensitive so that we can use names
similar to those used in std::string_view.  I'm planning to deprecate
StringRef::{starts,ends}with_insensitive once the migration is
complete across the code base.

Differential Revision: https://reviews.llvm.org/D150426
2023-05-12 15:37:37 -07:00
Nick Desaulniers
3e3c6f24ff Revert "[Demangle] make llvm::demangle take std::string_view rather than const std::string&"
This reverts commit c117c2c8ba4afd45a006043ec6dd858652b2ffcc.

itaniumDemangle calls std::strlen with the results of
std::string_view::data() which may not be NUL-terminated. This causes
lld/test/wasm/why-extract.s  to fail when "expensive checks" are enabled
via -DLLVM_ENABLE_EXPENSIVE_CHECKS=ON. See D149675 for further
discussion. Back this out until the individual demanglers are converted
to use std::string_view.
2023-05-02 15:54:09 -07:00
Nick Desaulniers
c117c2c8ba [Demangle] make llvm::demangle take std::string_view rather than const std::string&
As suggested by @erichkeane in
https://reviews.llvm.org/D141451#inline-1429549

There's potential for a lot more cleanups around these APIs. This is
just a start.

Callers need to be more careful about sub-expressions producing strings
that don't outlast the expression using ``llvm::demangle``. Add a
release note.

Reviewed By: MaskRay, #lld-macho

Differential Revision: https://reviews.llvm.org/D149104
2023-05-02 11:20:15 -07:00
Serge Pavlov
1d5fa4f88f [symbolizer] Change error message if module not found (recommit)
This is a recommit of 75f1f158812d, reverted in 7a443b1c493d, because
it caused compilation error in
compiler-rt/lib/sanitizer_common/symbolizer/sanitizer_symbolize.cpp.
The error was fixed by Kasimir Georgiev in de4c038c7ba2, but this
commit was reverted in de088dd3a0aa, because the initial commit was
reverted.

This commit reverts both the reverting commits, 7a443b1c493d and
de088dd3a0aa.

Original commit message is below.

If llvm-symbolize did not find module, the error looked like:

    LLVMSymbolizer: error reading file: No such file or directory

This message does not follow common practice: LLVMSymbolizer is not an
utility name. Also the message did not not contain the name of missed file.

With this change the error message looks differently:

    llvm-symbolizer: error: 'abc': No such file or directory

This format is closer to messages produced by other utilities and allow
proper coloring.

Differential Revision: https://reviews.llvm.org/D148032
2023-04-23 06:35:35 +00:00
Nick Desaulniers
c2709fcb0a [Demangle] remove unused params of microsoftDemangle
No call sites use these parameters, so drop them.

Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D148940
2023-04-21 15:37:00 -07:00
Alexander Yermolovich
3ba4d082da [dwarfdump][dwarf] Fix parsing of tu-index
Fixed issue where {tu,cu}-index fixup code for DWARF5 that would report an error when
fixup map is empty. Which is the case when seciton(s) are not over 4GB or
--manaully-generate-unit-index is not specified

Differential Revision: https://reviews.llvm.org/D148578
2023-04-20 11:42:24 -07:00