This patch moves NEON immediate argument specification and checking to
the system currently shared by both SVE and SME.
In its current form, the TableGen definition of a NEON intrinsic cannot
control how its immediate arguments are range-checked, this information
must be inferred from the name of the intrinsic by NeonEmitter, which
also assumes that any NEON instruction will only ever receive a single
immediate argument. For SVE/SME instrinsics, this information is more
conveniently supplied in the TableGen definition.
As a result, for each immediate argument, NEON instructions must define
- The index of the immediate argument to be checked
- The type of immediate range check to be performed,
(e.g., ImmCheckShiftRight)
- The index of the argument whose type defines the context
of this immediate check (base type, vector size).
- **Difference from SVE/SME** If this definition generates a polymorphic
NEON builtin, the base type defined by this argument is overwritten by
that of the type code supplied to the overloaded builtin call. This
third argument is omitted in some cases due to this.
Here is an example for
[`vfma_laneq`](https://developer.arm.com/architectures/instruction-sets/intrinsics/#f:@navigationhierarchiessimdisa=[Neon]&q=vfma_laneq)
- The immediate is supplied in argument 3
- The immediate is used as an index into the lanes of argument 2
- So we must perform an immediate check on argument 3, based on the type
information of argument 2.
- `ImmCheck<3, ImmCheckLaneIndex, 2>`
During this work, we discovered that the existing immediate
range-checking system was largely untested, which made it difficult to
make reliable progress. Missing tests have been added to verify this
implementation against all intrinsics which take constrained immediate
arguments. All test immediate range checking tests for NEON intrinsics
are moved to a dedicated directory
`clang/test/Sema/aarch64-neon-immediate-ranges/`.
Using the flag `-split_layout` in llvm-profdata merge, the output
profile can write profiles with and without inlined function into two
different extbinary sections (and their FuncOffsetTable too). The
section without inlined functions are marked with `SecFlagFlat` and is
skipped by ThinLTO because it provides no useful info.
The split layout feature was already implemented in SampleProfWriter but
previously there is no way to use it from llvm-profdata.
LLVM Symbolizer attempt to symbolize addresses of optimized binaries
reports missing line numbers for some cases. It maybe due to compiler
which sometimes cannot map an instruction to line number due to
optimizations. Symbolizer should handle those cases gracefully.
Adding an option '--skip-line-zero' to symbolizer so as to report the
nearest non-zero line number.
---------
Co-authored-by: Amit Pandey <amit.pandey@amd.com>
--change-section address and its alias --adjust-section-vma allows
modification
of section addresses in a relocatable file. This used to be used, for
example,
in Fiasco microkernel.
On a relocatable file this option behaves the same as GNU objcopy, apart
from
the fact that it does not issue any warnings, for example, when an
argument is
not used.
GNU objcopy does not produce an error when passed an executable file but
the
usecase for this is not clear, and the behaviour is inconsistent. The
idea of
GNU objcopy --change-section-address is that the option should change
both LMA
and VMA in an executable file. Since this patch does not implement
executable
file support, only VMA is changed.
LLVM lit assumes control of the test parallelism when running a test
suite. This style of testing doesn't play nicely with build systems like
Buck or Bazel since it prefers finer grained actions on a per-test
level. In order for external build systems to control the test
parallelism, add an option to disable `.lit_test_times.txt` under the
`--skip-test-time-recording` flag, thus allowing other build systems
to determine the parallelism and avoid race conditions when writing
to that file. I went for `--skip-test-time-recording` instead of `--time-tests` in
order to preserve the original functionality of writing to `.lit_test_times.txt`
as the default behavior and only opt-in for those who do _not_ want
`.lit_test_times.txt` file.
llvm-objcopy did not support change-section-lma argument.
This patch adds support for a use case of change-section-lma, that is
shifting load address of all sections by the same offset. This seems to
be the only practical use case of change-section-lma, found in other
software such as Zephyr RTOS's build system.
This is an option that could possibly be supported in some other than
ELF formats, however this change only implements it for ELF. When used
with other formats an error message is raised.
In comparison, the behavior of GNU objcopy is inconsistent. For some ELF
files it behaves the same as described above. For others, it copies the
file without modifying the p_paddr fields when it would be expected. In
some experiments it modifies arbitrary fields in section or program
headers. It is unclear what exactly determines this.
The executable file generated by yaml2obj in this test is not parsable
by GNU objcopy. With Machine set to EM_AARCH64, the file can be parsed
and the first test in the test file completes with 0 exit code. However,
the result is rather arbitrary. AArch64 GNU objcopy subtracts 0x1000
from p_filesz and p_memsz of the first LOAD section and 0x1000 from
p_offset of the second LOAD section. It does not look meaningful.
This fixes a minor oversight in the FileCheck documentation on what is
considered a valid variable name.
Global variables are prefixed with a `$`, which is explained two
paragraphs below, but this was omitted in the presented regex in this
paragraph.
Add a -q/--quiet flag to suppress dsymutil output. For now the flag is
limited to dsymutil, though there might be other places in the DWARF
linker that could be conditionalized by this flag.
The motivation is having a way to silence the "no debug symbols in
executable" warning. This is useful when we want to generate a dSYM for
a binary not containing debug symbols, but still want a dSYM that can be
indexed by spotlight.
rdar://127843467
[llvm-mca] Abort on parse error without -skip-unsupported-instructions
Prior to this patch, llvm-mca would continue executing after parse
errors. These errors can lead to some confusion since some analysis
results are printed on the standard output, and they're printed after
the errors, which could otherwise be easy to miss.
However it is still useful to be able to continue analysis after errors;
so extend the recently added -skip-unsupported-instructions to support
this.
Two tests which have parse errors for some of the 'RUN' branches are
updated to use -skip-unsupported-instructions so they can remain as-is.
Add a description of -skip-unsupported-instructions to the llvm-mca
command guide, and add it to the llvm-mca --help output:
```
--skip-unsupported-instructions=<value> - Force analysis to continue in the presence of unsupported instructions
=none - Exit with an error when an instruction is unsupported for any reason (default)
=lack-sched - Skip instructions on input which lack scheduling information
=parse-failure - Skip lines on the input which fail to parse for any reason
=any - Skip instructions or lines on input which are unsupported for any reason
```
Tests within this patch are intended to cover each of the cases.
Reason | Flag | Comment
--------------|------|-------
none | none | Usual case, existing test suite
lack-sched | none | Advises user to use -skip-unsupported-instructions=lack-sched, tested in llvm/test/tools/llvm-mca/X86/BtVer2/unsupported-instruction.s
parse-failure | none | Advises user to use -skip-unsupported-instructions=parse-failure, tested in llvm/test/tools/llvm-mca/bad-input.s
any | none | (N/A, covered above)
lack-sched | any | Continues, prints warnings, tested in llvm/test/tools/llvm-mca/X86/BtVer2/unsupported-instruction.s
parse-failure | any | Continues, prints errors, tested in llvm/test/tools/llvm-mca/bad-input.s
lack-sched | parse-failure | Advises user to use -skip-unsupported-instructions=lack-sched, tested in llvm/test/tools/llvm-mca/X86/BtVer2/unsupported-instruction.s
parse-failure | lack-sched | Advises user to use -skip-unsupported-instructions=parse-failure, tested in llvm/test/tools/llvm-mca/bad-input.s
none | * | This would be any test case with skip-unsupported-instructions, coverage added in llvm/test/tools/llvm-mca/X86/BtVer2/simple-test.s
any | * | (Logically covered by the other cases)
https://reviews.llvm.org/D47919 dumped RELR relocations as
`R_*_RELATIVE` and added --raw-relr (not in GNU) for testing purposes
(more readable than `llvm-readelf -x .relr.dyn`). The option is obsolete
after `llvm-readelf -r` output gets improved (#89162).
Since --raw-relr never seems to get more adoption. Let's remove it to
avoid some complexity.
Pull Request: https://github.com/llvm/llvm-project/pull/89426
--compress-sections is similar to --compress-debug-sections but applies
to arbitrary sections.
* `--compress-sections <section>=none`: decompress sections
* `--compress-sections <section>=[zlib|zstd]`: compress sections with zlib/zstd
Like `--remove-section`, the pattern is by default a glob, but a regex
when --regex is specified.
For `--remove-section` like options, `!` prevents matches and is not
dependent on ordering (see `ELF/wildcard-syntax.test`). Since
`--compress-sections a=zlib --compress-sections a=none` naturally allows
overriding, having an order-independent `!` would be confusing.
Therefore, `!` is disallowed.
Sections within a segment are effectively immutable. Report an error for
an attempt to (de)compress them. `SHF_ALLOC` sections in a relocatable
file can be compressed, but linkers usually reject them.
Note: Before this patch, a compressed relocation section is recognized
as a `RelocationSectionBase` as well and `removeSections` `!ToRemove(*ToRelSec)`
may incorrectly interpret a `CompressedSections` as `RelocationSectionBase`,
leading to ubsan failure for the new test. Fix this by setting
`OriginalFlags` in CompressedSection::CompressedSection.
Link: https://discourse.llvm.org/t/rfc-compress-arbitrary-sections-with-ld-lld-compress-sections/71674
Pull Request: https://github.com/llvm/llvm-project/pull/85036
This reverts commit 9e3b64b9f95aadf57568576712902a272fe66503.
Reason: Broke the UBSan buildbot. See the comments in the pull request
(https://github.com/llvm/llvm-project/pull/85036) for more information.
--compress-sections is similar to --compress-debug-sections but applies
to arbitrary sections.
* `--compress-sections <section>=none`: decompress sections
* `--compress-sections <section>=[zlib|zstd]`: compress sections with zlib/zstd
Like `--remove-section`, the pattern is by default a glob, but a regex
when --regex is specified.
For `--remove-section` like options, `!` prevents matches and is not
dependent on ordering (see `ELF/wildcard-syntax.test`). Since
`--compress-sections a=zlib --compress-sections a=none` naturally allows
overriding, having an order-independent `!` would be confusing.
Therefore, `!` is disallowed.
Sections within a segment are effectively immutable. Report an error for
an attempt to (de)compress them. `SHF_ALLOC` sections in a relocatable
file can be compressed, but linkers usually reject them.
Link: https://discourse.llvm.org/t/rfc-compress-arbitrary-sections-with-ld-lld-compress-sections/71674
Pull Request: https://github.com/llvm/llvm-project/pull/85036
Add --skip-symbol and --skip-symbols options that allow to skip symbols
when executing other options that can change the symbol's name, binding
or visibility, similar to an existing option --keep-symbol that keeps a
symbol from being removed by other options.
Remove support for obfuscated bitcode in dsymutil and the DWARF linker.
We no longer support bitcode submissions and the obfuscation support has
been removed from the rest of the compiler.
rdar://123863918
Remove support for obfuscated bitcode in dsymutil and the DWARF linker.
We no longer support bitcode submissions and the obfuscation support has
been removed from the rest of the compiler.
rdar://123863918
As part of the WebAssembly support work
https://github.com/llvm/llvm-project/pull/82588
As the object files used in the test cases are a single object (just
produced by clang without being processed by wasm-ld), it was determined
to use .o intead of .wasm.
Update the README.txt to reflect that the tool now supports WebAssembly.
Add support for the WebAssembly binary format and be able to generate
logical views.
https://github.com/llvm/llvm-project/issues/69181
The README.txt includes information about how to build the test cases.
Detect COFF files by default and allow specifying it with --format
argument.
This is important for ARM64EC, which uses a separated symbol map for EC
symbols. Since K_COFF is mostly compatible with K_GNU, this shouldn't
really make a difference for other targets.
This originally landed as #82642, but was reverted due to test failures
in tests using no symbol table. Since COFF symbol can't express it,
fallback to GNU format in that case.
Add options --set-symbol-visibility and --set-symbols-visibility to
manually change the visibility of symbols.
There is already an option to set the visibility of newly added symbols
via --add-symbol and --new-symbol-visibility. This option will allow to
change the visibility of already existing symbols.
This patch adds a LLVM-EXEGESIS-LOOP-REGISTER snippet annotation which
allows a user to specify the register to use for the loop counter in the
loop repetition mode. This allows for executing snippets that don't work
with the default value (currently R8 on X86).
Primary change is to add a flag `--pretty-pgo-analysis-map` to
llvm-readobj and llvm-objdump that prints block frequencies and branch
probabilities in the same manner as BFI and BPI respectively. This can
be helpful if you are manually inspecting the outputs from the tools.
In order to print, I moved the `printBlockFreqImpl` function from
Analysis to Support and renamed it to `printRelativeBlockFreq`.
https://reviews.llvm.org/D63820 added
llvm/docs/CommandGuide/llvm-objcopy.rst with clearer semantics, e.g.
```
Read a list of names from the file <filename> and mark defined symbols with those names as global in the output
instead of the help message
Read a list of symbols from <filename> and marks them global" (omits "defined")
Rename sections called <old> to <new> in the output
instead of the help message
Rename a section from old to new (multiple sections may be named <old>
```
Sync the help messages to incorporate the CommandGuide improvement.
While here, switch to the conventional imperative sentences for a few
options. Additionally, mark some options as grp_coff or grp_macho.
When a section has the SHF_COMPRESSED flag, -p/-x dump the compressed
content by default. In GNU readelf, if --decompress/-z is specified,
-p/-x will dump the decompressed content. This patch implements the
option.
Close#82507
This patch replaces --num-repetitions with --min-instructions to make it
more clear that the value refers to the minimum number of instructions
in the final assembled snippet rather than the number of repetitions of
the snippet. This patch also refactors some llvm-exegesis internal
variable names to reflect the name change.
Fixes#76890.
This patch adds two new repetition modes to llvm-exegesis, particularly
loop and duplicate repetition modes of what I am terming the middle half
repetition mode. The middle half repetition mode essentially runs each
measurement twice, one with twice the number of iterations of the other.
These two measurements are then agregated by taking their difference.
This subtracts away any setup/overhead that is unrelated to the code in
the snippet, providing more accurate results.
Using this mode on a couple toy examples, I am able to get exact
(integer) throughput values on all of them in contrast to the default
duplicate/loop repetition modes which show a little bit of noise on the
snippet value.
`--function=<regex>` Include functions matching regex in the output
`--no-function=<regex>` Exclude functions matching regex from the output
If both are specified, `--no-function` has a higher precedence if a
function name matches both filters
Added -p / --no-params flag to skip demangling function parameters
similar to how it is supported by GNU c++filt tool.
There are cases when users want to demangle a large number of symbols in
bulk, for example, at startup, and do not care about function parameters
and overloads at that time. Skipping the demangling of parameter types
led to a measurable improvement in performance. Our users reported about
15% speed up with GNU c++filt and we expect similar results with
llvm-cxxfilt with this patch.
Added -p / --no-params flag to skip demangling function parameters
similar to how it is supported by GNU c++filt tool.
There are cases when users want to demangle a large number of symbols in
bulk, for example, at startup, and do not care about function parameters
and overloads at that time. Skipping the demangling of parameter types
led to a measurable improvement in performance. Our users reported about
15% speed up with GNU c++filt and we expect similar results with
llvm-cxxfilt with this patch.
This patch makes minor adjustments to the llvm-exegesis docs for
clarity. Particularly, an update is made to the list of snippet
annotations to list the correct number of annotations that was not
updated when the docs were originally updated for the snippet address
annotation. In addition, this patch changes a decimal value for the
snippet memory annotation example for an explicit hex value to emphasize
that the LLVM-EXEGESIS-MEM-DEF annotation takes a hex value for the
memory value.
When llvm-objdump switched from cl:: to OptTable
(https://reviews.llvm.org/D100433), we dropped support for LLVM cl::
options. Some LLVM_DEBUG in `llvm/lib/Target/$target/MCDisassembler/`
files might be useful. Add -mllvm to allow dumping the information.
```
# -debug is available in an LLVM_ENABLE_ASSERTIONS=on build
llvm-objdump -d -mllvm -debug a.o > /dev/null
```
Link:
https://discourse.llvm.org/t/how-to-enable-debug-logs-in-llvm-objdump/75758
`--gap-fill <value>` fills the gaps between sections with a specified
8-bit value, instead of zero.
`--pad-to <address>` pads the output binary up to the specified load
address, using the 8-bit value from `--gap-fill` or zero.
These options are only supported for ELF input and binary output.