For each instruction from the input assembly sequence, DependencyGraph
has a dedicated node (DGNode). Outgoing edges (data, resource and memory
dependencies) are tracked as SmallVector<..., 8> for each DGNode in the
graph. However, it's unlikely that a usual input instruction will have
approximately eight dependent instructions. Below is my statistics for
several RISC-V input sequences:
```
Number of | Number of nodes with
edges | this # of edges
---------------------------------
0 | 8239447
1 | 464252
2 | 6164
3 | 6783
4 | 939
5 | 500
6 | 545
7 | 116
8 | 2
9 | 1
10 | 1
```
Approximately the same distribution is produced by llvm-mca lit tests
for X86, AArch and RISC-V (even modified ones with extra dependencies
added).
On a rather big input asm sequences, the use of SmallVector<..., 8>
dramatically increases memory consumption without any need for it. In my
case, replacing it with SmallVector<...,0> reduces memory usage by ~28%
or ~1700% of input file size (2.2GB in absolute values).
There is no change in execution time, I verified it on mca lit-tests and
on my big test (execution time is ~30s in both cases).
This change was made with the same intention as #124904 and optimizes I
believe quite an unusual scenario. However, if there is no negative
impact on other known scenarios, I'd like to have the change in
llvm-project.
ResourceUsage is a very sparse table. On large input asm sequences it
consumes a lot of memory utilizing only a few percents of it (~4% on my
benchmark). Reorganization of ResourceUsage to keep only used fields
allows saving up to 18% of total memory use by mca or ~850% of input
file size (~1.1GB in absolute values in my case).
Currently we assume a constant latency of 100 cycles for call
instructions. This commit allows the user to specify a custom value for
the same as a command line argument. Default latency is set to 100.
[llvm-mca] Abort on parse error without -skip-unsupported-instructions
Prior to this patch, llvm-mca would continue executing after parse
errors. These errors can lead to some confusion since some analysis
results are printed on the standard output, and they're printed after
the errors, which could otherwise be easy to miss.
However it is still useful to be able to continue analysis after errors;
so extend the recently added -skip-unsupported-instructions to support
this.
Two tests which have parse errors for some of the 'RUN' branches are
updated to use -skip-unsupported-instructions so they can remain as-is.
Add a description of -skip-unsupported-instructions to the llvm-mca
command guide, and add it to the llvm-mca --help output:
```
--skip-unsupported-instructions=<value> - Force analysis to continue in the presence of unsupported instructions
=none - Exit with an error when an instruction is unsupported for any reason (default)
=lack-sched - Skip instructions on input which lack scheduling information
=parse-failure - Skip lines on the input which fail to parse for any reason
=any - Skip instructions or lines on input which are unsupported for any reason
```
Tests within this patch are intended to cover each of the cases.
Reason | Flag | Comment
--------------|------|-------
none | none | Usual case, existing test suite
lack-sched | none | Advises user to use -skip-unsupported-instructions=lack-sched, tested in llvm/test/tools/llvm-mca/X86/BtVer2/unsupported-instruction.s
parse-failure | none | Advises user to use -skip-unsupported-instructions=parse-failure, tested in llvm/test/tools/llvm-mca/bad-input.s
any | none | (N/A, covered above)
lack-sched | any | Continues, prints warnings, tested in llvm/test/tools/llvm-mca/X86/BtVer2/unsupported-instruction.s
parse-failure | any | Continues, prints errors, tested in llvm/test/tools/llvm-mca/bad-input.s
lack-sched | parse-failure | Advises user to use -skip-unsupported-instructions=lack-sched, tested in llvm/test/tools/llvm-mca/X86/BtVer2/unsupported-instruction.s
parse-failure | lack-sched | Advises user to use -skip-unsupported-instructions=parse-failure, tested in llvm/test/tools/llvm-mca/bad-input.s
none | * | This would be any test case with skip-unsupported-instructions, coverage added in llvm/test/tools/llvm-mca/X86/BtVer2/simple-test.s
any | * | (Logically covered by the other cases)
Prior to this patch, if llvm-mca encountered an instruction which parses
but has no scheduler info, the instruction is always reported as
unsupported, and llvm-mca halts with an error.
However, it would still be useful to allow MCA to continue even in the
case of instructions lacking scheduling information. Obviously if
scheduling information is lacking, it's not possible to give an accurate
analysis for those instructions, and therefore a warning is emitted.
A user could previously have worked around such unsupported instructions
manually by deleting such instructions from the input, but this provides
them a way of doing this for bulk inputs where they may not have a list
of such unsupported instructions to drop up front.
Note that this behaviour of instructions with no scheduling information
under -skip-unsupported-instructions is analagous to current
instructions which fail to parse: those are currently dropped from the
input with a message printed, after which the analysis continues.
~Testing the feature is a little awkward currently, it relies on an
instruction
which is currently marked as unsupported, which may not remain so;
should the
situation change it would be necessary to find an alternative
unsupported
instruction or drop the test.~
A test is added to check that analysis still reports an error if all
instructions are removed from the input, to mirror the current behaviour
of giving an error if no instructions are supplied.
D150312 added a TODO:
TODO: consider renaming the field `StartAtCycle` and `Cycles` to
`AcquireAtCycle` and `ReleaseAtCycle` respectively, to stress the
fact that resource allocation is now represented as an interval,
relatively to the issue cycle of the instruction.
This patch implements that TODO. This naming clarifies how to use these
fields in the scheduler. In addition it was confusing that `StartAtCycle` was
singular but `Cycles` was plural. This renaming fixes this inconsistency.
This commit as previously reverted since it missed renaming that came
down after rebasing. This version of the commit fixes those problems.
Differential Revision: https://reviews.llvm.org/D158568
D150312 added a TODO:
TODO: consider renaming the field `StartAtCycle` and `Cycles` to
`AcquireAtCycle` and `ReleaseAtCycle` respectively, to stress the
fact that resource allocation is now represented as an interval,
relatively to the issue cycle of the instruction.
This patch implements that TODO. This naming clarifies how to use these
fields in the scheduler. In addition it was confusing that `StartAtCycle` was
singular but `Cycles` was plural. This renaming fixes this inconsistency.
This commit as previously reverted since it missed renaming that came
down after rebasing. This version of the commit fixes those problems.
Differential Revision: https://reviews.llvm.org/D158568
D150312 added a TODO:
TODO: consider renaming the field `StartAtCycle` and `Cycles` to
`AcquireAtCycle` and `ReleaseAtCycle` respectively, to stress the
fact that resource allocation is now represented as an interval,
relatively to the issue cycle of the instruction.
This patch implements that TODO. This naming clarifies how to use these
fields in the scheduler. In addition it was confusing that `StartAtCycle` was
singular but `Cycles` was plural. This renaming fixes this inconsistency.
Differential Revision: https://reviews.llvm.org/D158568
Since the LMUL data that is needed to create an instrument is
avaliable statically from vsetivli and vsetvli instructions,
LMUL instruments can be automatically generated so that clients
of the tool do no need to manually insert instrument comments.
Instrument comments may be placed after a vset{i}vli instruction,
which will override instrument that was automatically inserted.
As a result, clients of llvm-mca instruments do not need to update
their existing instrument comments. However, if the instrument
has the same LMUL as the vset{i}vli, then it is reccomended to
remove the instrument comment as it becomes redundant.
Differential Revision: https://reviews.llvm.org/D154526
Previous reports calculated the overall report using Instrument
information but did not print out per-instruction data using
Instrument information. This patch fixes that.
Differential Revision: https://reviews.llvm.org/D150459
There was a memory leak that presented itself once the llvm-mca
tests were committed. This leak was not checked for by the pre-commit
tests. This change changes the shared_ptr to a unique_ptr to avoid
this problem.
We will know that this fix works once committed since I don't know
whether it is possible to force a lit test to use LSan. I spent the
day trying to build llvm with LSan enabled without much luck. If
anyone knows how to build llvm with LSan for the lit-tests, I am
happy to give it another try locally.
Differential Revision: https://reviews.llvm.org/D150816
Parsing instruments and analysis regions causes us to see the same
labels two times since we parse the same file twice under the same
context.
This change creates a seperate context for instrument parsing
and another for analysis region parsing. I will post a follow up
commit once I get some free cycles to parse analysis regions and
instruments in one parsing pass under a single context.
Differential Revision: https://reviews.llvm.org/D149781
This does not work by a mere composition of `enumerate` and `zip_equal`,
because C++17 does not allow for recursive expansion of structured
bindings.
This implementation uses `zippy` to manage the iteratees and adds the
stream of indices as the first zipped range. Because we have an upfront
assertion that all input ranges are of the same length, we only need to
check if the second range has ended during iteration.
As a consequence of using `zippy`, `enumerate` will now follow the
reference and lifetime semantics of the `zip*` family of functions. The
main difference is that `enumerate` exposes each tuple of references
through a new tuple-like type `enumerate_result`, with the familiar
`.index()` and `.value()` member functions.
Because the `enumerate_result` returned on dereference is a
temporary, enumeration result can no longer be used through an
lvalue ref.
Reviewed By: dblaikie, zero9178
Differential Revision: https://reviews.llvm.org/D144503
The forwarding header is left in place because of its use in
`polly/lib/External/isl/interface/extract_interface.cc`, but I have
added a GCC warning about the fact it is deprecated, because it is used
in `isl` from where it is included by Polly.
This is a fairly large changeset, but it can be broken into a few
pieces:
- `llvm/Support/*TargetParser*` are all moved from the LLVM Support
component into a new LLVM Component called "TargetParser". This
potentially enables using tablegen to maintain this information, as
is shown in https://reviews.llvm.org/D137517. This cannot currently
be done, as llvm-tblgen relies on LLVM's Support component.
- This also moves two files from Support which use and depend on
information in the TargetParser:
- `llvm/Support/Host.{h,cpp}` which contains functions for inspecting
the current Host machine for info about it, primarily to support
getting the host triple, but also for `-mcpu=native` support in e.g.
Clang. This is fairly tightly intertwined with the information in
`X86TargetParser.h`, so keeping them in the same component makes
sense.
- `llvm/ADT/Triple.h` and `llvm/Support/Triple.cpp`, which contains
the target triple parser and representation. This is very intertwined
with the Arm target parser, because the arm architecture version
appears in canonical triples on arm platforms.
- I moved the relevant unittests to their own directory.
And so, we end up with a single component that has all the information
about the following, which to me seems like a unified component:
- Triples that LLVM Knows about
- Architecture names and CPUs that LLVM knows about
- CPU detection logic for LLVM
Given this, I have also moved `RISCVISAInfo.h` into this component, as
it seems to me to be part of that same set of functionality.
If you get link errors in your components after this patch, you likely
need to add TargetParser into LLVM_LINK_COMPONENTS in CMake.
Differential Revision: https://reviews.llvm.org/D137838
This change is rather more invasive than intended. The main intention
here is to make CommandLine.cpp not rely on llvm/Support/Host.h. Right
now, this reliance is only in 3 superficial places:
- Choosing how to expand response files (in two places)
- Printing the default triple and current CPU in `--version` output.
The built in version system has a method for adding "extra version
printers", commonly used by several tools (such as llc) to report the
registered targets in the built version of LLVM. It was reasonably easy
to move the logic for printing the default triple and current CPU into
a similar function, and register it with any relevant binaries.
The incompatible change here is that now, even if
LLVM_VERSION_PRINTER_SHOW_HOST_TARGET_INFO is defined, most binaries
will no longer print out the default target triple and cpu when provided
with `--version`, for instance llvm-as and llvm-dis. This breakage is
intended, but the changes in this patch keep printing the default target
and detected in `llc` and `opt` as these were remarked as important
binaries in the LLVM install.
The change to expanding response files may also be controversial, but I
believe that these macros should correspond exactly to the host triple
introspection used before.
Differential Revision: https://reviews.llvm.org/D137837
value() has undesired exception checking semantics and calls
__throw_bad_optional_access in libc++. Moreover, the API is unavailable without
_LIBCPP_NO_EXCEPTIONS on older Mach-O platforms (see
_LIBCPP_AVAILABILITY_BAD_OPTIONAL_ACCESS).
This fixes check-llvm.
Next patch after D139548 and D139439. Same expectations, the change seems safe with as far as llvm goes, we cannot check downstream implementations.
Differential Revision: https://reviews.llvm.org/D139614
Before performing this change, I checked that `ByteAlignment` was never `0` inside `MCAsmStreamer:emitZeroFill` and `MCAsmStreamer::emitLocalCommonSymbol`.
I believe it is NFC as `0` values are illegal in `emitZeroFill` anyways, `Log2(ByteAlignment)` would be undefined.
And currently, all calls to `emitLocalCommonSymbol` are provably `>0`.
Differential Revision: https://reviews.llvm.org/D139439
This breaks Windows bots with
`warning C4334: '<<': result of 32-bit shift implicitly converted to 64 bits (was 64-bit shift intended?)`
Some shift operators are lacking a proper literal unit ('1ULL' instead of
'1'). Will reland once fixed.
This reverts commit c621c1a8e81856e6bf2be79714767d80466e9ede.
Before performing this change, I checked that `ByteAlignment` was never `0` inside `MCAsmStreamer:emitZeroFill` and `MCAsmStreamer::emitLocalCommonSymbol`.
I believe it is NFC as `0` values are illegal in `emitZeroFill` anyways, `Log2(ByteAlignment)` would be undefined.
And currently, all calls to `emitLocalCommonSymbol` are provably `>0`.
Differential Revision: https://reviews.llvm.org/D139439
Fixes issue [59091](https://github.com/llvm/llvm-project/issues/59091).
`CodeRegionGenerator::parseCodeRegions` is implemented by `AsmCodeRegionGenerator`.
If it were to be implemented in `AnalysisRegionGenerator` or `InstrumentRegionGenerator`,
then `parseCodeRegions` from an `AsmAnalysisRegionGenerator` or `AsmInstrumentRegionGenerator`
object would be ambiguous. To solve this, `AsmAnalysisRegionGenerator` and
`AsmInstrumentRegionGenerator` qualify their call to `AsmCodeRegionGenerator::parseCodeRegions`.
Differential Revision: https://reviews.llvm.org/D138462
On x86 and AArch, SIMD instructions encode all of the scheduling information in the instruction
itself. For example, VADD.I16 q0, q1, q2 is a neon instruction that operates on 16-bit integer
elements stored in 128-bit Q registers, which leads to eight 16-bit lanes in parallel. This kind
of information impacts how the instruction takes to execute and what dependencies this may cause.
On RISCV however, the data that impacts scheduling is encoded in CSR registers such as vtype or
vl, in addition with the instruction itself. But MCA does not track or use the data in these
registers. This patch fixes this problem by introducing Instruments into MCA.
* Replace `CodeRegions` with `AnalysisRegions`
* Add `Instrument` and `InstrumentManager`
* Add `InstrumentRegions`
* Add RISCV Instrument and `InstrumentManager`
* Parse `Instruments` in driver
* Use instruments to override schedule class
* RISCV use lmul instrument to override schedule class
* Fix unit tests to pass empty instruments
* Add -ignore-im clopt to disable this change
A prior version of this patch was commited in 5e82ee537321. 2323a4ee610f reverted
that change because the unit test files caused build errors. The change with fixes
were committed in b88b8307bf9e but reverted once again e8e92c8313a0 due to more
build errors.
This commit adds the prior changes and fixes the build error.
Differential Revision: https://reviews.llvm.org/D137440
On x86 and AArch, SIMD instructions encode all of the scheduling information in the instruction
itself. For example, VADD.I16 q0, q1, q2 is a neon instruction that operates on 16-bit integer
elements stored in 128-bit Q registers, which leads to eight 16-bit lanes in parallel. This kind
of information impacts how the instruction takes to execute and what dependencies this may cause.
On RISCV however, the data that impacts scheduling is encoded in CSR registers such as vtype or
vl, in addition with the instruction itself. But MCA does not track or use the data in these
registers. This patch fixes this problem by introducing Instruments into MCA.
* Replace `CodeRegions` with `AnalysisRegions`
* Add `Instrument` and `InstrumentManager`
* Add `InstrumentRegions`
* Add RISCV Instrument and `InstrumentManager`
* Parse `Instruments` in driver
* Use instruments to override schedule class
* RISCV use lmul instrument to override schedule class
* Fix unit tests to pass empty instruments
* Add -ignore-im clopt to disable this change
A prior version of this patch was commited in. It was reverted in
5e82ee5373211db8522181054800ccd49461d9d8. 2323a4ee610f5e1db74d362af4c6fb8c704be8f6 reverted
that change because the unit test files caused build errors. This commit adds the original changes
and the fixed test files.
Differential Revision: https://reviews.llvm.org/D137440
On x86 and AArch, SIMD instructions encode all of the scheduling information in the instruction
itself. For example, VADD.I16 q0, q1, q2 is a neon instruction that operates on 16-bit integer
elements stored in 128-bit Q registers, which leads to eight 16-bit lanes in parallel. This kind
of information impacts how the instruction takes to execute and what dependencies this may cause.
On RISCV however, the data that impacts scheduling is encoded in CSR registers such as vtype or
vl, in addition with the instruction itself. But MCA does not track or use the data in these
registers. This patch fixes this problem by introducing Instruments into MCA.
* Replace `CodeRegions` with `AnalysisRegions`
* Add `Instrument` and `InstrumentManager`
* Add `InstrumentRegions`
* Add RISCV Instrument and `InstrumentManager`
* Parse `Instruments` in driver
* Use instruments to override schedule class
* RISCV use lmul instrument to override schedule class
* Fix unit tests to pass empty instruments
* Add -ignore-im clopt to disable this change
Differential Revision: https://reviews.llvm.org/D137440
The new resumable mca::Pipeline capability introduced in this patch
allows users to save the current state of pipeline and resume from the
very checkpoint.
It is better (but not require) to use with the new IncrementalSourceMgr,
where users can add mca::Instruction incrementally rather than having a
fixed number of instructions ahead-of-time.
Note that we're using unit tests to test these new features. Because
integrating them into the `llvm-mca` tool will make too many churns.
Differential Revision: https://reviews.llvm.org/D127083