D149104 converted llvm::demangle to use std::string_view. Enabling
"expensive checks" (via -DLLVM_ENABLE_EXPENSIVE_CHECKS=ON) causes
lld/test/wasm/why-extract.s to fail. The reason for this is obscure:
Reason #10007 why std::string_view is dangerous:
Consider the following pattern:
std::string_view s = ...;
const char *c = s.data();
std::strlen(c);
Is c a NUL-terminated C style string? It depends; but if it's not then
it's not safe to call std::strlen on the std::string_view::data().
std::string_view::length() should be used instead.
Fixing this fixes the one lone test that caught this.
microsoftDemangle, rustDemangle, and dlangDemangle should get this same
treatment, too. I will do that next.
Reviewed By: MaskRay, efriedma
Differential Revision: https://reviews.llvm.org/D149675
In Split DWARF, if the unit had a non-trivial base address (a real
low_pc, rather than one with fixed value 0) then computing addresses
needs to access that base address to add to any base address-relative
values. But the code was trying to access the base address in the split
unit, when it's actually in the skeleton unit. So delegate to the
skeleton if it's available.
Fixes#62941
Relax the assumption that at most one Reference-or-Type-like attribute is
present on a DWARF DIE.
Add support for at most one Type attribute (i.e. DW_AT_import xor
DW_AT_type) and separately at most one Reference attribute (i.e.
DW_AT_specification xor DW_AT_abstract_origin xor ...).
Update comment describing old assumption and tag it as a "FIXME" to
reflect the fact that this is perhaps still not general enough.
Add a test based on the case which led me to encounter the bug in the
wild.
Reviewed By: CarlosAlbertoEnciso
Differential Revision: https://reviews.llvm.org/D150713
The motivation behind this refactor is to be able to use
DWARFAbbreviationDeclaration from LLDB. LLDB has its own implementation
of DWARFAbbreviationDeclaration that is very similar to LLVM's but it
has different semantics around error handling.
This patch modifies llvm::DWARFAbbreviationDeclaration::extract to
return an `llvm::Expected<ExtractState>` to differentiate between "I am
done extracting" and "An error has occured", something which the current
return type (bool) does not accurately capture.
Differential Revision: https://reviews.llvm.org/D150607
This patch adds handling of DW_OP_addrx and DW_OP_constx expression operands.
In --update case these operands are preserved as is. Otherwise they are
converted into the DW_OP_addr and DW_OP_const[*]u correspondingly.
Differential Revision: https://reviews.llvm.org/D147066
This patch migrates uses of StringRef::{starts,ends}with_insensitive
to StringRef::{starts,ends}_with_insensitive so that we can use names
similar to those used in std::string_view. I'm planning to deprecate
StringRef::{starts,ends}with_insensitive once the migration is
complete across the code base.
Differential Revision: https://reviews.llvm.org/D150426
This reverts commit c117c2c8ba4afd45a006043ec6dd858652b2ffcc.
itaniumDemangle calls std::strlen with the results of
std::string_view::data() which may not be NUL-terminated. This causes
lld/test/wasm/why-extract.s to fail when "expensive checks" are enabled
via -DLLVM_ENABLE_EXPENSIVE_CHECKS=ON. See D149675 for further
discussion. Back this out until the individual demanglers are converted
to use std::string_view.
As suggested by @erichkeane in
https://reviews.llvm.org/D141451#inline-1429549
There's potential for a lot more cleanups around these APIs. This is
just a start.
Callers need to be more careful about sub-expressions producing strings
that don't outlast the expression using ``llvm::demangle``. Add a
release note.
Reviewed By: MaskRay, #lld-macho
Differential Revision: https://reviews.llvm.org/D149104
This is a recommit of 75f1f158812d, reverted in 7a443b1c493d, because
it caused compilation error in
compiler-rt/lib/sanitizer_common/symbolizer/sanitizer_symbolize.cpp.
The error was fixed by Kasimir Georgiev in de4c038c7ba2, but this
commit was reverted in de088dd3a0aa, because the initial commit was
reverted.
This commit reverts both the reverting commits, 7a443b1c493d and
de088dd3a0aa.
Original commit message is below.
If llvm-symbolize did not find module, the error looked like:
LLVMSymbolizer: error reading file: No such file or directory
This message does not follow common practice: LLVMSymbolizer is not an
utility name. Also the message did not not contain the name of missed file.
With this change the error message looks differently:
llvm-symbolizer: error: 'abc': No such file or directory
This format is closer to messages produced by other utilities and allow
proper coloring.
Differential Revision: https://reviews.llvm.org/D148032
Fixed issue where {tu,cu}-index fixup code for DWARF5 that would report an error when
fixup map is empty. Which is the case when seciton(s) are not over 4GB or
--manaully-generate-unit-index is not specified
Differential Revision: https://reviews.llvm.org/D148578
Gracefully handle non-1 stack sizes in printCompactDWARFExpr rather than
assert. Add support for DW_OP_nop and test the zero-sized stack case.
This is intended to be nearly NFC.
Differential Revision: https://reviews.llvm.org/D147269
This patch replaces the uses of PointerUnion.is function by llvm::isa,
PointerUnion.get function by llvm::cast, and PointerUnion.dyn_cast by
llvm::dyn_cast_if_present. This is according to the FIXME in
the definition of the class PointerUnion.
This patch does not remove them as they are being used in other
subprojects.
Reviewed By: mehdi_amini
Differential Revision: https://reviews.llvm.org/D148449
If llvm-symbolize did not find module, the error looked like:
LLVMSymbolizer: error reading file: No such file or directory
This message does not follow common practice: LLVMSymbolizer is not an
utility name. Also the message did not not contain the name of missed file.
With this change the error message looks differently:
llvm-symbolizer: error: 'abc': No such file or directory
This format is closer to messages produced by other utilities and allow
proper coloring.
Differential Revision: https://reviews.llvm.org/D148032
This makes parsing for build IDs in the markup filter slightly more
permissive, in line with fromHex.
It also removes the distinction between missing build ID and empty build
ID; empty build IDs aren't a useful concept, since their purpose is to
uniquely identify a binary. This removes a layer of indirection wherever
build IDs are obtained.
Reviewed By: jhenderson
Differential Revision: https://reviews.llvm.org/D147485
DWARF 5 uses a 0-based index while previous versions use a 1-based
index. Fix the verifier and add a test.
Differential revision: https://reviews.llvm.org/D147202
DW_FORM_line_strp reads from the .debug_line_str section, but
previously the out-of-bounds error reported the .debug_line section.
This incorrect error message showed up when debugging an issue with
some invalid DWARF5 data.
The verify_string.s test has now been extended to check this (which
required a small change to the DWARF verifier to also look at
DW_FORM_line_strp).
Differential Revision: https://reviews.llvm.org/D146539
Move the conversion of DILineInfo to JSON into a separate function, so
it can be used in other places too.
This is a prerequisite patch for implementation of symbol+offset lookup.
Differential Revision: https://reviews.llvm.org/D147112
llvm-symbolizer echoed input if it was not recognized as a valid address.
This behavior was extended to llvm-addr2line as well. GNU addr2line in
this case optputs "??:0". This difference prevents implementation of
symbol+offset lookup available in the recent versions of GNU binutils.
In that case a string that is not an address may be a symbol.
This change make reaction of llvm-addr2line on unrecognized input closer
to GNU addr2line.
the v4 rebuilding is a best-effort because it's not possible to reliably
parse the DWO ID as it requires the abbrev section (& if the index isn't
trustworthy then there's no way to find the associated abbrev section
contribution for a given info section contribution)
But in v5 the DWO ID/type signature is in the header and can be rebuilt
losslessly (only at the cost of performance of rescanning/parsing the
headers of all the units), so let's implement that.
the testing isn't /ideal/ - I think the testing should've been
implemented as a hardcoded dwp file with a corrupted/incorrect index,
then the test could've demonstrated that reparsing the index produces
the right answer - but this is a quick port of the existing v5 test back
to v4 so that we don't lose coverage on the v4 codepath now that it's
separated from the v5 codepath.
Differential Revision: https://reviews.llvm.org/D146662
GDB 11.2 generates V8 version of gdb-index where it de-duplicates entries in
constant pool based on cu indices. Changed how constant pool entries are counted
to account for this.
Differential Revision: https://reviews.llvm.org/D146852
This allows the DWARFDebugLine::SectionParser to try parsing line tables
at 4 or 8-byte boundaries if the unaligned offset appears invalid. If
aligning the offset does not reduce errors the offset is used unchanged.
This is needed for llvm-dwarfdump to be able to extract the line tables
(with --debug-lines) from binaries produced by certain compilers that
like to align each line table in the .debug_line section. Note that this
alignment does not seem to be invalid since the units do point to the
correct line table offsets via the DW_AT_stmt_list attribute.
Differential Revision: https://reviews.llvm.org/D143513
This patch adds llvm::codeview::SourceLanguage entries, DWARF translations, and PDB source file extensions in LLVM and allow LLDB's PDB parsers to recognize them correctly.
The CV_CFL_LANG enum in the Visual Studio 2022 documentation https://learn.microsoft.com/en-us/visualstudio/debugger/debug-interface-access/cv-cfl-lang defines:
```
CV_CFL_OBJC = 0x11,
CV_CFL_OBJCXX = 0x12,
```
Since the initial commit in D24317, ObjC was emitted as C language and ObjC++ as Masm.
Reviewed By: DavidSpickett
Differential Revision: https://reviews.llvm.org/D146221
Previously we'd stash a null pointer in a sorted vector of CUs - the
next time around, we'd try to do a binary search in that vector (sorting
on a key inside the objects pointed to by the elements of the vector)
which would deref null if we'd stashed a null in there previously.
As a reasonable, but not ideal, workaround - don't stash any result in
the vector - this means every query will produce a new warning
(resulting in duplicate warnings) but better than a crash.
Stashing null in the list could be workable if we also stashed the
offset in a pair - but then all the clients would need to be fixed up
(maybe using a filtering iterator) which seems like overkill for this
uncommon error case.
The result of DWARFFormValue::isFormClass depends on DWARF version in some cases.
The current implementation takes DWARF version from the stored DWARFUnit.
If there is no stored DWARFUnit then the current behavior is to assume
DwarfVersion <= 3. This patch adds new function which has a DWARF version as a
parameter so it is possible to check form class for various DWARF versions.
Differential Revision: https://reviews.llvm.org/D145499
Getting compile units for data addresses is much slower, as it often
requires a slow fallback path to walk every DWARF entry, as
data addresses don't fall into the compilation unit ranges.
Most lookups are code addresses, and don't need this logic. Split the
functionality out so that we restore the fast-path behaviour for the
code lookups.
More context at:
https://discourse.llvm.org/t/llvm-symbolizer-has-gotten-extremely-slow/67262
Reviewed By: dblaikie
Differential Revision: https://reviews.llvm.org/D145009
Some workflows can generate large GSYM files and sharding GSYM files into segments can help some performant workflows that can take advantage of smaller GSYM files. This patch add a new --segment-size option to llvm-gsymutil. This option can specify a rough size in bytes of how large each segment should be.
Segmented GSYM files contain only the strings and files that are needed for the FunctionInfo objects that are added to each shard. The output file path gets the first address of the first contained function info appended as a suffix to the filename. If a base address of an image is set in the GsymCreator, then all segments will use this same base address which allows lookups for symbolication to happen correctly when the image has been slid in memory.
Code has been addeed to refactor and re-use methods within the GsymCreator to allow for segments to be created easily and tested.
Example of segmenting GSYM files:
$ llvm-gsymutil --convert llvm-gsymutil.dSYM -o llvm-gsymutil.gsym --segment-size 10485760
$ ls -l llvm-gsymutil.gsym-*
-rw-r--r-- 1 gclayton staff 10485839 Feb 9 10:45 llvm-gsymutil.gsym-0x1000030c0
-rw-r--r-- 1 gclayton staff 10485765 Feb 9 10:45 llvm-gsymutil.gsym-0x100668888
-rw-r--r-- 1 gclayton staff 10485881 Feb 9 10:45 llvm-gsymutil.gsym-0x100c948b8
-rw-r--r-- 1 gclayton staff 10485954 Feb 9 10:45 llvm-gsymutil.gsym-0x101659e70
-rw-r--r-- 1 gclayton staff 10485792 Feb 9 10:45 llvm-gsymutil.gsym-0x1022b1dc0
-rw-r--r-- 1 gclayton staff 10485889 Feb 9 10:45 llvm-gsymutil.gsym-0x102a18b10
-rw-r--r-- 1 gclayton staff 10485893 Feb 9 10:45 llvm-gsymutil.gsym-0x1030b05d0
-rw-r--r-- 1 gclayton staff 10485802 Feb 9 10:45 llvm-gsymutil.gsym-0x1037caaac
-rw-r--r-- 1 gclayton staff 10485781 Feb 9 10:45 llvm-gsymutil.gsym-0x103e767a0
-rw-r--r-- 1 gclayton staff 10485832 Feb 9 10:45 llvm-gsymutil.gsym-0x10452d0d4
-rw-r--r-- 1 gclayton staff 10485782 Feb 9 10:45 llvm-gsymutil.gsym-0x104b93310
-rw-r--r-- 1 gclayton staff 6255785 Feb 9 10:45 llvm-gsymutil.gsym-0x10526bf34
Differential Revision: https://reviews.llvm.org/D145448
Some workflows can generate large GSYM files and sharding GSYM files into segments can help some performant workflows that can take advantage of smaller GSYM files. This patch add a new --segment-size option to llvm-gsymutil. This option can specify a rough size in bytes of how large each segment should be.
Segmented GSYM files contain only the strings and files that are needed for the FunctionInfo objects that are added to each shard. The output file path gets the first address of the first contained function info appended as a suffix to the filename. If a base address of an image is set in the GsymCreator, then all segments will use this same base address which allows lookups for symbolication to happen correctly when the image has been slid in memory.
Code has been addeed to refactor and re-use methods within the GsymCreator to allow for segments to be created easily and tested.
Example of segmenting GSYM files:
$ llvm-gsymutil --convert llvm-gsymutil.dSYM -o llvm-gsymutil.gsym --segment-size 10485760
$ ls -l llvm-gsymutil.gsym-*
-rw-r--r-- 1 gclayton staff 10485839 Feb 9 10:45 llvm-gsymutil.gsym-0x1000030c0
-rw-r--r-- 1 gclayton staff 10485765 Feb 9 10:45 llvm-gsymutil.gsym-0x100668888
-rw-r--r-- 1 gclayton staff 10485881 Feb 9 10:45 llvm-gsymutil.gsym-0x100c948b8
-rw-r--r-- 1 gclayton staff 10485954 Feb 9 10:45 llvm-gsymutil.gsym-0x101659e70
-rw-r--r-- 1 gclayton staff 10485792 Feb 9 10:45 llvm-gsymutil.gsym-0x1022b1dc0
-rw-r--r-- 1 gclayton staff 10485889 Feb 9 10:45 llvm-gsymutil.gsym-0x102a18b10
-rw-r--r-- 1 gclayton staff 10485893 Feb 9 10:45 llvm-gsymutil.gsym-0x1030b05d0
-rw-r--r-- 1 gclayton staff 10485802 Feb 9 10:45 llvm-gsymutil.gsym-0x1037caaac
-rw-r--r-- 1 gclayton staff 10485781 Feb 9 10:45 llvm-gsymutil.gsym-0x103e767a0
-rw-r--r-- 1 gclayton staff 10485832 Feb 9 10:45 llvm-gsymutil.gsym-0x10452d0d4
-rw-r--r-- 1 gclayton staff 10485782 Feb 9 10:45 llvm-gsymutil.gsym-0x104b93310
-rw-r--r-- 1 gclayton staff 6255785 Feb 9 10:45 llvm-gsymutil.gsym-0x10526bf34
Differential Revision: https://reviews.llvm.org/D143793
llvm-debuginfo-analyzer is a command line tool that processes debug
info contained in a binary file and produces a debug information
format agnostic “Logical View”, which is a high-level semantic
representation of the debug info, independent of the low-level
format.
The code has been divided into the following patches:
1) Interval tree
2) Driver and documentation
3) Logical elements
4) Locations and ranges
5) Select elements
6) Warning and internal options
7) Compare elements
8) ELF Reader
9) CodeView Reader
Full details:
https://discourse.llvm.org/t/llvm-dev-rfc-llvm-dva-debug-information-visual-analyzer/62570
This patch:
This is a high level summary of the changes in this patch.
CodeView Reader
- Support for CodeView/PDB.
LVCodeViewReader, LVTypeVisitor, LVSymbolVisitor, LVLogicalVisitor
Reviewed By: psamolysov, probinson, djtodoro, zequanwu
Differential Revision: https://reviews.llvm.org/D125784
In Codeview, the basic type of a complex represents the size
of an individual component rather than the sum of the real
and imaginary components.
Differential Revision: https://reviews.llvm.org/D143760
We hit this in Chromium builds where the PDB file was just under 4GB,
but the stream directory was actually too large to be correctly
represented.
llvm-pdbutil would error about this in llvm::msf::validateSuperBlock,
but lld should not write such PDB files in the first place.
Differential revision: https://reviews.llvm.org/D144385
llvm-debuginfo-analyzer is a command line tool that processes debug
info contained in a binary file and produces a debug information
format agnostic “Logical View”, which is a high-level semantic
representation of the debug info, independent of the low-level
format.
The code has been divided into the following patches:
1) Interval tree
2) Driver and documentation
3) Logical elements
4) Locations and ranges
5) Select elements
6) Warning and internal options
7) Compare elements
8) ELF Reader
8a) Memory Management
9) CodeView Reader
Full details:
https://discourse.llvm.org/t/llvm-dev-rfc-llvm-dva-debug-information-visual-analyzer/62570
This patch:
This is a high level summary of the changes in this patch.
Memory Management
- Use Bump allocators for memory management.
As the logical elements are only allocated in one pass (debuginfo
parsing) and they are never manipulated/created/destroyed later,
use the SpecificBumpPtrAllocator for the memory management.
Reviewed By: dblaikie, Orlando
Differential Revision: https://reviews.llvm.org/D137933
On i386 Windows, after C++ names have been Itanium-mangled, the C name
mangling specific to its call convention may also be applied on top.
This change teaches symbolizer to be able to demangle this type of
mangled names.
As part of this change, `demanglePE32ExternCFunc` has also been modified
to fix unwanted stripping for vectorcall names when the demangled name
is supposed to contain a leading `_`. Notice that the vectorcall
mangling does not add either an `_` or `@` prefix. The old code always
tries to strip the prefix first, which for Itanium mangled names in
vectorcall, the leading underscore of the Itanium name gets stripped
instead and breaks the Itanium demangler.
Differential Revision: https://reviews.llvm.org/D144049
As describe in
https://github.com/llvm/llvm-project/issues/60363
the following DebugInfo LogicalView Tests unit tests failed:
- ELFReader
- SelectElements
The tests fail only on the OSX-64 platform with the CMake options:
-DLLVM_BUILD_LLVM_DYLIB=ON -DLLVM_LINK_LLVM_DYLIB=ON
Using the same options on a Linux platform all the tests pass:
- https://lab.llvm.org/buildbot/#/builders/196
- llvm-x86_64-debian-dylib
Basically it is a dynamic library initialization affecting a static
instance for the string pool (LVStringPool).
That string pool instance is accessed by all the logical elements
to store/retrieve any associated string during the creation of the
logical view.
For a logical view comparison, both logical readers (Reference and
Target) use retrieved indexes when comparing their strings.
Moved the static instance to LVSupport module (unnamed namespace).
Reviewed By: jmorse
Differential Revision: https://reviews.llvm.org/D143716
This fixes a few places where the addrx3 and strx3 forms were missed.
Previously this meant if one of these forms appeared somewhere various
errors could occur. This now also adds an extra test case for the addrx3
form (which previously failed).
Differential Revision: https://reviews.llvm.org/D143488
Previously this would incorrectly return the raw offset into the .debug_addr section for the
DW_FORM_addrx1/2/3/4 forms rather than the actual address.
Note that this was handled correctly in the dump() function so this issue only occurs for users
of this API and not in tools such as llvm-dwarfdump. The dump() method has now been updated
to use this method to increase coverage.
This also now adds a few unit tests for indexed addresses to DWARFDebugInfoTest.
Differential Revision: https://reviews.llvm.org/D143073
According to DWARF5 specification and gnu specification for DWARF4 the offset
entry in the CU/TU Index is 32 bits. This presents a problem when
.debug_info.dwo in DWP file grows beyond 4GB. The CU Index becomes partially
corrupted.
This diff adds manual parsing of .debug_info.dwo/.debug_abbrev.dwo to
reconstruct CU index in general, and TU index for DWARF5. This is a work around
until DWARF6 spec is finalized.
Next patch will change internal CU/TU struct to 64 bit, and change uses as
necessary. The plan is to land all the patches in one go after all are approved.
This patch originates from the discussion in: https://discourse.llvm.org/t/dwarf-dwp-4gb-limit/63902
Reviewed By: dblaikie
Differential Revision: https://reviews.llvm.org/D137882
Changed contribution data structure to 64 bit. I added the 32bit and 64bit
accessors to make it explicit where we use 32bit and where we use 64bit. Also to
make sure sure we catch all the cases where this data structure is used.
Reviewed By: dblaikie
Differential Revision: https://reviews.llvm.org/D139379