The issue with slow compile-time was caused by an assert in
AArch64RegisterInfo.cpp. The assert invokes 'checkAllSuperRegsMarked'
after adding all the reserved registers. This call gets very expensive
after adding the _HI registers due to the way the function searches
in the 'Exception' list, which is expected to be a small list but isn't
(the patch added 190 _HI regs).
It was possible to rewrite the code in such a way that the _HI registers
are marked as reserved after the check. This makes the problem go away
entirely and restores compile-time to what it was before (tested for
`check-runtimes`, which previously showed a ~5x slowdown).
This reverts commits:
1434d2ab215e3ea9c5f34689d056edd3d4423a78
2704647fb7986673b89cef1def729e3b022e2607
In MCA, the load/store unit is modeled through a `LSUnitBase` class.
Judging from the name `LSUnitBase`, I believe there is an intent to
allow for different specialized load/store unit implementations.
(However, currently there is only one implementation used in-tree,
`LSUnit`.)
PR #101534 fixed one instance where the specialized `LSUnit` was
hard-coded, opening the door for other subclasses to be used, but what
subclasses can do is, in my opinion, still overly limited due to a
reliance on the `MemoryGroup` class, e.g.
[here](8b55162e19/llvm/lib/MCA/HardwareUnits/Scheduler.cpp (L88)).
The `MemoryGroup` class is currently used in the default `LSUnit`
implementation to model data dependencies/hazards in the pipeline.
`MemoryGroups` form a graph of memory dependencies that inform the
scheduler when load/store instructions can be executed relative to each
other.
In my eyes, this is an implementation detail. Other `LSUnit`s may want
to keep track of data dependencies in different ways. As a concrete
example, a downstream use I am working on<sup>[1]</sup> uses a custom
load/store unit that makes use of available aliasing information. I
haven't been able to shoehorn our additional aliasing information into
the existing `MemoryGroup` abstraction. I think there is no need to
force subclasses to use `MemoryGroup`s; users of `LSUnitBase` are only
concerned with when, and for how long, a load/store instruction
executes.
This PR makes changes to instead leave it up to the subclasses how to
model such dependencies, and only prescribes an abstract interface in
`LSUnitBase`. It also moves data members and methods that are not
necessary to provide an abstract interface from `LSUnitBase` to the
`LSUnit` subclass. I decided to make the `MemoryGroup` a protected
subclass of `LSUnit`; that way, specializations may inherit from
`LSUnit` and still make use of `MemoryGroup`s if they wish to do so
(e.g. if they want to only overwrite the `dispatch` method).
**Drawbacks / Considerations**
My reason for suggesting this PR is an out-of-tree use. As such, these
changes don't introduce any new functionality for in-tree LLVM uses.
However, in my opinion, these changes improve code clarity and prescribe
a clear interface, which would be the main benefit for the LLVM
community.
A drawback of the more abstract interface is that virtual dispatching is
used in more places. However, note that virtual dispatch is already
currently used in some critical parts of the `LSUnitBase`, e.g. the
`isAvailable` and `dispatch` methods. As a quick check to ensure these
changes don't significantly negatively impact performance, I also ran
`time llvm-mca -mtriple=x86_64-unknown-unknown -mcpu=btver2
-iterations=3000 llvm/test/tools/llvm-mca/X86/BtVer2/dot-product.s`
before and after the changes; there was no observable difference in
runtimes (`0.292 s` total before, `0.286 s` total after changes).
<sup>[1]: MCAD started by @mshockwave and @chinmaydd.</sup>
This is a step towards enabling subreg liveness tracking for AArch64,
which requires that registers are fully covered by their subregisters,
as covered here #109797.
There are several changes in this patch:
* AArch64RegisterInfo.td and tests: Define the high bits like B0_HI,
H0_HI, S0_HI, D0_HI, Q0_HI. Because the bits must be defined by some
register class, this added a register class which meant that we had to
update 'magic numbers' in several tests.
The use of ComposedSubRegIndex helped 'compress' the number of bits
required for the lanemask. The correctness of the masks is tested by an
explicit unit tests.
* LoadStoreOptimizer: previously 'HasDisjunctSubRegs' was only true for
register tuples, but with this change to describe the high bits, a
register like 'D0' will also have 'HasDisjunctSubRegs' set to true
(because it's fullly covered by S0 and S0_HI). The fix here is to
explicitly test if the register class is one of the known D/Q/Z tuples.
Before this patch, the pipeline selection logic in
ResourceManager::issueInstruction() didn't know how to correctly handle
instructions which consume multiple partially overlapping resource
groups. In some cases (like the test case from #108157), the inability
to correctly allocate resources on instruction issue was leading to
crashes.
The presence of multiple partially overlapping groups complicates the
selection process by introducing extra constraints. For those cases, the
issue logic now prioritizes groups which are more constrained than
others.
Fixes#108157
This patch looks up variant scheduling classes using a hash of the
instruction. Keying by the pointer breaks certain use cases that might
occur out of tree, like decoding an execution trace instruction by
instruction and creating MCA instructions as one goes along, like in the
MCAD case. In this case, the MCInst will always have the same address
and thus all instructions with the same variant scheduling class will
end up with the same instruction description, leading to undesired
behavior (assertions, uses after free, invalid results, etc.).
Currently we assume a constant latency of 100 cycles for call
instructions. This commit allows the user to specify a custom value for
the same as a command line argument. Default latency is set to 100.
Constant registers like the zero registers XZR and WZR are treated as
any other register by LLVM-MCA. This can create non existent dependency
chains.
Currently there is no method in MCA to query if a register is constant.
This patch fixes the issue by adding a bool Constant
variable to MCRegisterDesc that is true for constant registers. Since
constant registers do not create dependencies, it makes sense to add
this check to MCA.
Prior to this patch, if llvm-mca encountered an instruction which parses
but has no scheduler info, the instruction is always reported as
unsupported, and llvm-mca halts with an error.
However, it would still be useful to allow MCA to continue even in the
case of instructions lacking scheduling information. Obviously if
scheduling information is lacking, it's not possible to give an accurate
analysis for those instructions, and therefore a warning is emitted.
A user could previously have worked around such unsupported instructions
manually by deleting such instructions from the input, but this provides
them a way of doing this for bulk inputs where they may not have a list
of such unsupported instructions to drop up front.
Note that this behaviour of instructions with no scheduling information
under -skip-unsupported-instructions is analagous to current
instructions which fail to parse: those are currently dropped from the
input with a message printed, after which the analysis continues.
~Testing the feature is a little awkward currently, it relies on an
instruction
which is currently marked as unsupported, which may not remain so;
should the
situation change it would be necessary to find an alternative
unsupported
instruction or drop the test.~
A test is added to check that analysis still reports an error if all
instructions are removed from the input, to mirror the current behaviour
of giving an error if no instructions are supplied.
https://github.com/llvm/llvm-project/issues/83775 shows llvm-mca hits
sanitizer error in cycleEnd. There was an instruction that takes
multiple cycles to issue and is finished executing directly after issue.
Prior to this patch, the instruction is retired on the first issue
cycle, despite taking multiple cycles to issue.
To fix this, if an instruction takes multiple cycles to issue and is
done executing after issue, let updateCarriedOver retire the instruction
when it is fully issued.
D150312 added a TODO:
TODO: consider renaming the field `StartAtCycle` and `Cycles` to
`AcquireAtCycle` and `ReleaseAtCycle` respectively, to stress the
fact that resource allocation is now represented as an interval,
relatively to the issue cycle of the instruction.
This patch implements that TODO. This naming clarifies how to use these
fields in the scheduler. In addition it was confusing that `StartAtCycle` was
singular but `Cycles` was plural. This renaming fixes this inconsistency.
This commit as previously reverted since it missed renaming that came
down after rebasing. This version of the commit fixes those problems.
Differential Revision: https://reviews.llvm.org/D158568
D150312 added a TODO:
TODO: consider renaming the field `StartAtCycle` and `Cycles` to
`AcquireAtCycle` and `ReleaseAtCycle` respectively, to stress the
fact that resource allocation is now represented as an interval,
relatively to the issue cycle of the instruction.
This patch implements that TODO. This naming clarifies how to use these
fields in the scheduler. In addition it was confusing that `StartAtCycle` was
singular but `Cycles` was plural. This renaming fixes this inconsistency.
This commit as previously reverted since it missed renaming that came
down after rebasing. This version of the commit fixes those problems.
Differential Revision: https://reviews.llvm.org/D158568
D150312 added a TODO:
TODO: consider renaming the field `StartAtCycle` and `Cycles` to
`AcquireAtCycle` and `ReleaseAtCycle` respectively, to stress the
fact that resource allocation is now represented as an interval,
relatively to the issue cycle of the instruction.
This patch implements that TODO. This naming clarifies how to use these
fields in the scheduler. In addition it was confusing that `StartAtCycle` was
singular but `Cycles` was plural. This renaming fixes this inconsistency.
Differential Revision: https://reviews.llvm.org/D158568
Since the LMUL data that is needed to create an instrument is
avaliable statically from vsetivli and vsetvli instructions,
LMUL instruments can be automatically generated so that clients
of the tool do no need to manually insert instrument comments.
Instrument comments may be placed after a vset{i}vli instruction,
which will override instrument that was automatically inserted.
As a result, clients of llvm-mca instruments do not need to update
their existing instrument comments. However, if the instrument
has the same LMUL as the vset{i}vli, then it is reccomended to
remove the instrument comment as it becomes redundant.
Differential Revision: https://reviews.llvm.org/D154526
There was a memory leak that presented itself once the llvm-mca
tests were committed. This leak was not checked for by the pre-commit
tests. This change changes the shared_ptr to a unique_ptr to avoid
this problem.
We will know that this fix works once committed since I don't know
whether it is possible to force a lit test to use LSan. I spent the
day trying to build llvm with LSan enabled without much luck. If
anyone knows how to build llvm with LSan for the lit-tests, I am
happy to give it another try locally.
Differential Revision: https://reviews.llvm.org/D150816
All users of MCCodeEmitter::encodeInstruction use a raw_svector_ostream
to encode the instruction into a SmallVector. The raw_ostream however
incurs some overhead for the actual encoding.
This change allows an MCCodeEmitter to directly emit an instruction into
a SmallVector without using a raw_ostream and therefore allow for
performance improvments in encoding. A default path that uses existing
raw_ostream implementations is provided.
Reviewed By: MaskRay, Amir
Differential Revision: https://reviews.llvm.org/D145791
The new methods return a range for easier iteration. Use them everywhere
instead of getImplicitUses, getNumImplicitUses, getImplicitDefs and
getNumImplicitDefs. A future patch will remove the old methods.
In some use cases the new methods are less efficient because they always
have to scan the whole uses/defs array to count its length, but that
will be fixed in a future patch by storing the number of implicit
uses/defs explicitly in MCInstrDesc. At that point there will be no need
to 0-terminate the arrays.
Differential Revision: https://reviews.llvm.org/D142215
Change MCInstrDesc::operands to return an ArrayRef so we can easily use
it everywhere instead of the (IMHO ugly) opInfo_begin and opInfo_end.
A future patch will remove opInfo_begin and opInfo_end.
Also use it instead of raw access to the OpInfo pointer. A future patch
will remove this pointer.
Differential Revision: https://reviews.llvm.org/D142213
On x86 and AArch, SIMD instructions encode all of the scheduling information in the instruction
itself. For example, VADD.I16 q0, q1, q2 is a neon instruction that operates on 16-bit integer
elements stored in 128-bit Q registers, which leads to eight 16-bit lanes in parallel. This kind
of information impacts how the instruction takes to execute and what dependencies this may cause.
On RISCV however, the data that impacts scheduling is encoded in CSR registers such as vtype or
vl, in addition with the instruction itself. But MCA does not track or use the data in these
registers. This patch fixes this problem by introducing Instruments into MCA.
* Replace `CodeRegions` with `AnalysisRegions`
* Add `Instrument` and `InstrumentManager`
* Add `InstrumentRegions`
* Add RISCV Instrument and `InstrumentManager`
* Parse `Instruments` in driver
* Use instruments to override schedule class
* RISCV use lmul instrument to override schedule class
* Fix unit tests to pass empty instruments
* Add -ignore-im clopt to disable this change
A prior version of this patch was commited in 5e82ee537321. 2323a4ee610f reverted
that change because the unit test files caused build errors. The change with fixes
were committed in b88b8307bf9e but reverted once again e8e92c8313a0 due to more
build errors.
This commit adds the prior changes and fixes the build error.
Differential Revision: https://reviews.llvm.org/D137440
On x86 and AArch, SIMD instructions encode all of the scheduling information in the instruction
itself. For example, VADD.I16 q0, q1, q2 is a neon instruction that operates on 16-bit integer
elements stored in 128-bit Q registers, which leads to eight 16-bit lanes in parallel. This kind
of information impacts how the instruction takes to execute and what dependencies this may cause.
On RISCV however, the data that impacts scheduling is encoded in CSR registers such as vtype or
vl, in addition with the instruction itself. But MCA does not track or use the data in these
registers. This patch fixes this problem by introducing Instruments into MCA.
* Replace `CodeRegions` with `AnalysisRegions`
* Add `Instrument` and `InstrumentManager`
* Add `InstrumentRegions`
* Add RISCV Instrument and `InstrumentManager`
* Parse `Instruments` in driver
* Use instruments to override schedule class
* RISCV use lmul instrument to override schedule class
* Fix unit tests to pass empty instruments
* Add -ignore-im clopt to disable this change
A prior version of this patch was commited in. It was reverted in
5e82ee5373211db8522181054800ccd49461d9d8. 2323a4ee610f5e1db74d362af4c6fb8c704be8f6 reverted
that change because the unit test files caused build errors. This commit adds the original changes
and the fixed test files.
Differential Revision: https://reviews.llvm.org/D137440
On x86 and AArch, SIMD instructions encode all of the scheduling information in the instruction
itself. For example, VADD.I16 q0, q1, q2 is a neon instruction that operates on 16-bit integer
elements stored in 128-bit Q registers, which leads to eight 16-bit lanes in parallel. This kind
of information impacts how the instruction takes to execute and what dependencies this may cause.
On RISCV however, the data that impacts scheduling is encoded in CSR registers such as vtype or
vl, in addition with the instruction itself. But MCA does not track or use the data in these
registers. This patch fixes this problem by introducing Instruments into MCA.
* Replace `CodeRegions` with `AnalysisRegions`
* Add `Instrument` and `InstrumentManager`
* Add `InstrumentRegions`
* Add RISCV Instrument and `InstrumentManager`
* Parse `Instruments` in driver
* Use instruments to override schedule class
* RISCV use lmul instrument to override schedule class
* Fix unit tests to pass empty instruments
* Add -ignore-im clopt to disable this change
Differential Revision: https://reviews.llvm.org/D137440
This patch mostly reverts commit 70b37f4c03c which fixed PR50725.
In case of explicit consumption of multiple partially overlapping group
resources, the ResourceManager was not correctly checking pipeline
esources availability.
The fix for PR50725 only partially addressed a few instances of that issue.
This is a more general (although, technically slower) fix for that same issue.
It also fixes Issue #57548
Thanks to Haohai Wen for the small reproducible.
This patch replaces calls to GreatestCommonDivisor64 with std::gcd
where both arguments are known to be of unsigned types no larger than
64 bits in size.
This patch introduces a new feature that allows InstrBuilder to reuse
mca::Instruction recycled from IncrementalSourceMgr. This significantly
reduces the memory footprint.
Note that we're only recycling instructions that have static InstrDesc
and no variadic operands.
Differential Revision: https://reviews.llvm.org/D127084
The new resumable mca::Pipeline capability introduced in this patch
allows users to save the current state of pipeline and resume from the
very checkpoint.
It is better (but not require) to use with the new IncrementalSourceMgr,
where users can add mca::Instruction incrementally rather than having a
fixed number of instructions ahead-of-time.
Note that we're using unit tests to test these new features. Because
integrating them into the `llvm-mca` tool will make too many churns.
Differential Revision: https://reviews.llvm.org/D127083
memory-barrier instructions to providing targets and developers a convenient
way to explicitly declare which instructions are memory-barriers.
Differential Revision: https://reviews.llvm.org/D116779
This reverts commit fd4808887ee47f3ec8a030e9211169ef4fb094c3.
This patch causes gcc to issue a lot of warnings like:
warning: base class ‘class llvm::MCParsedAsmOperand’ should be
explicitly initialized in the copy constructor [-Wextra]
This patch fixes the warning
InstructionTables.cpp:27:56: error: loop variable 'Resource' of type
'const std::pair<const uint64_t, ResourceUsage> &' (aka 'const
pair<const unsigned long, llvm::mca::ResourceUsage> &') binds to a
temporary constructed from type 'const std::pair<unsigned long,
llvm::mca::ResourceUsage> &' [-Werror,-Wrange-loop-construct]
Note that Resource is declared as:
SmallVector<std::pair<uint64_t, ResourceUsage>, 4> Resources;
without "const" for uint64_t.