This patch adds support for describing per-write resource cycle counts
for ReadAdvance records via a new optional field called `tunables`.
This makes it possible to declare ReadAdvance records such as:
def : ReadAdvance<Read_C, 1, [Write_A, Write_B], [2]>;
The above will effectively declare two entries in the ReadAdvance
table for Read_C, one for Write_A with a cycle count of 1+2, and one for
Write_B with a cycle count of 1+0 (omitted values are assumed 0).
The field `tunables` provides a list of deltas relative to the base
`cycle` count of the ReadAdvance. Since the field is optional and
defaults to a list of 0's, this change doesn't affect current targets.
This patch adds a new option `-aarch64-enable-zpr-predicate-spills`
(which is disabled by default), this option replaces predicate spills
with vector spills in streaming[-compatible] functions.
For example:
```
str p8, [sp, #7, mul vl] // 2-byte Folded Spill
// ...
ldr p8, [sp, #7, mul vl] // 2-byte Folded Reload
```
Becomes:
```
mov z0.b, p8/z, #1
str z0, [sp] // 16-byte Folded Spill
// ...
ldr z0, [sp] // 16-byte Folded Reload
ptrue p4.b
cmpne p8.b, p4/z, z0.b, #0
```
This is done to avoid streaming memory hazards between FPR/vector and
predicate spills, which currently occupy the same stack area even when
the `-aarch64-stack-hazard-size` flag is set.
This is implemented with two new pseudos SPILL_PPR_TO_ZPR_SLOT_PSEUDO
and FILL_PPR_FROM_ZPR_SLOT_PSEUDO. The expansion of these pseudos
handles scavenging the required registers (z0 in the above example) and,
in the worst case spilling a register to an emergency stack slot in the
expansion. The condition flags are also preserved around the `cmpne` in
case they are live at the expansion point.
Use this to improve performance of SubtargetEmitter::findWriteResources
and SubtargetEmitter::findReadAdvance. Now we can do a map lookup
instead of a linear search through all WriteRes/ReadAdvance records.
This reduces the build time of RISCVGenSubtargetInfo.inc on my
machine from 43 seconds to 10 seconds.
Code cleanups for TableGen files, changes includes function names,
variable names and unused imports.
---------
Co-authored-by: Matt Arsenault <Matthew.Arsenault@amd.com>
Adopt scaled indent in PredicateExpander.
Added pre/post inc/dec operators to `indent` and related unit tests.
Verified by comparing *.inc files generated by LLVM build with/without
the change.
It is almost always simpler to use {} instead of std::nullopt to
initialize an empty ArrayRef. This patch changes all occurrences I could
find in LLVM itself. In future the ArrayRef(std::nullopt_t) constructor
could be deprecated or removed.
Split RecordKeeper `getAllDerivedDefinitions` family of functions into
two variants:
(a) non-const ones that return vectors of `Record *` and
(b) const ones, that return vector/ArrayRef of `const Record *`.
This will help gradual migration of TableGen backends to use
`const RecordKeeper` and by implication change code to work
with const pointers and better const correctness.
Existing backends are not yet compatible with the const family of
functions, so change them to use a non-constant `RecordKeeper`
reference, till they are migrated.
- Extract code for sorting and checking duplicate Records into a helper
function and update `collectProcModels` to use the helper.
- Update `FeatureKeyValues` to:
(a) Remove code for duplicate checks and use the helper.
(b) Trim features with empty name explicitly to be able to use
the helper.
- Make the sorting deterministic by using record name as a secondary
key for sorting, and re-enable SubtargetFeatureUniqueNames.td test
that was disabled due to the non-determinism of the error messages.
- Change wording of error message when duplicate records are found to
be source code position agnostic, since `First` may not be before
`Second` lexically.
- Keep track of last definition of a feature in a `DenseMap` and use
it to report a better error message when a duplicate feature is found.
- Use StringMap instead of a std::map in `EmitStageAndOperandCycleData`
- Add a unit test to check if duplicate names are flagged.
SubtargetEmitter::GenSchedClassTables takes a CodeGenProcModel, but
calls hasReadOfWrite which loops over all ProcModels. We move
hasReadOfWrite to CodeGenProcModel and remove the loop over all
ProcModels. This leads to a 144% speedup on the RISC-V backend of our
downstream.
1. Bitwise operations are used to access HwMode, allowing for the
coexistence of HwMode IDs for different features (such as RegInfo and
EncodingInfo). This will provide better scalability for HwMode.
Currently, most users utilize HwMode primarily for configuring
Register-related information, and few use it for configuring Encoding.
The limited scalability of HwMode has been a significant factor in this
usage pattern.
2. Sink the HwMode Encodings selection logic down to per instruction
level, this makes the logic for choosing encodings clearer and provides
better error messages.
3. Add some HwMode ID conflict detection to the getHwMode() interface.
Refactor of the llvm-tblgen source into:
- a "Basic" library, which contains the bare minimum utilities to build
`llvm-min-tablegen`
- a "Common" library which contains all of the helpers for TableGen
backends. Such helpers can be shared by more than one backend, and even
unit tested (e.g. CodeExpander is, maybe we can add more over time)
Fixes#80647
Forming a `ReadAdvance` with an entry in the `ValidWrites` list that is
not used by any instruction results in the entire `ReadAdvance` to be
ignored by the scheduler due to an invalid entry.
The `SchedRW` collection code only picks up `SchedWrites` that are
reachable from `Instructions`, `InstRW`, `ItinRW` and `SchedAlias`,
leaving the unreachable ones with an invalid entry (0) in
`SubtargetEmitter::GenSchedClassTables` when going through the list of
`ReadAdvances`
Historically TableGen has used `A.swap(B)` to move containers without
the expense of copying them. Perhaps this predated rvalue references. In
any case `A = std::move(B)` seems like a more direct way to implement
this when only A is required after the operation.
`Fusion` is inherited from `SubtargetFeature` now. Each definition
of `Fusion` will define a `SubtargetFeature` accordingly.
Method `getMacroFusions` is added to `TargetSubtargetInfo`, which
returns a list of `MacroFusionPredTy` that will be evaluated by
MacroFusionMution.
`getMacroFusions` will be auto-generated if the target has `Fusion`
definitions.
Use of llvm::Optional was migrated to std::optional. This included a
change in the constructor of ArrayRef.
However, there are still 2 places in the SubtargetEmitter which uses
llvm::None, causing a compile error when emitted.
D150312 added a TODO:
TODO: consider renaming the field `StartAtCycle` and `Cycles` to
`AcquireAtCycle` and `ReleaseAtCycle` respectively, to stress the
fact that resource allocation is now represented as an interval,
relatively to the issue cycle of the instruction.
This patch implements that TODO. This naming clarifies how to use these
fields in the scheduler. In addition it was confusing that `StartAtCycle` was
singular but `Cycles` was plural. This renaming fixes this inconsistency.
This commit as previously reverted since it missed renaming that came
down after rebasing. This version of the commit fixes those problems.
Differential Revision: https://reviews.llvm.org/D158568
D150312 added a TODO:
TODO: consider renaming the field `StartAtCycle` and `Cycles` to
`AcquireAtCycle` and `ReleaseAtCycle` respectively, to stress the
fact that resource allocation is now represented as an interval,
relatively to the issue cycle of the instruction.
This patch implements that TODO. This naming clarifies how to use these
fields in the scheduler. In addition it was confusing that `StartAtCycle` was
singular but `Cycles` was plural. This renaming fixes this inconsistency.
This commit as previously reverted since it missed renaming that came
down after rebasing. This version of the commit fixes those problems.
Differential Revision: https://reviews.llvm.org/D158568
D150312 added a TODO:
TODO: consider renaming the field `StartAtCycle` and `Cycles` to
`AcquireAtCycle` and `ReleaseAtCycle` respectively, to stress the
fact that resource allocation is now represented as an interval,
relatively to the issue cycle of the instruction.
This patch implements that TODO. This naming clarifies how to use these
fields in the scheduler. In addition it was confusing that `StartAtCycle` was
singular but `Cycles` was plural. This renaming fixes this inconsistency.
Differential Revision: https://reviews.llvm.org/D158568
The sort of the elements in the GET_SUBTARGETINFO_MACRO block is done on
the "Name" field of each record. This field is not guaranteed to be unique,
is not guaranteed to even have a value at all, and is not used in the
output anyway. Change to sort on the "FieldName" field which should be
unique.
Problem spotted when lib/Target/PowerPC/PPCGenSubtargetInfo.inc changed
unexpectedly.
Differential Revision: https://reviews.llvm.org/D153371
SubtargetFeature.h is currently part of MC while it doesn't depend on
anything in MC. Since some LLVM components might have the need to work
with target features without necessarily needing MC, it might be
worthwhile to move SubtargetFeature.h to a different location. This will
reduce the dependencies of said components.
Note that I choose TargetParser as the destination because that's where
Triple lives and SubtargetFeatures feels related to that.
This issues came up during a JITLink review (D149522). JITLink would
like to avoid a dependency on MC while still needing to store target
features.
Reviewed By: MaskRay, arsenm
Differential Revision: https://reviews.llvm.org/D150549
The word "attribute" has a specific meaning in LLVM. Avoid using it
here to mean something different.
This addresses feedback from D153180.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D153444
Re-landing the code that was reverted because of the buildbot failure
in https://lab.llvm.org/buildbot#builders/9/builds/27319.
Original commit message
======================
The class `ResourceSegments` is used to keep track of the intervals
that represent resource usage of a list of instructions that are
being scheduled by the machine scheduler.
The collection is made of intervals that are closed on the left and
open on the right (represented by the standard notation `[a, b)`).
These collections of intervals can be extended by `add`ing new
intervals accordingly while scheduling a basic block.
Unit tests are added to verify the possible configurations of
intervals, and the relative possibility of scheduling a new
instruction in these configurations. Specifically, the methods
`getFirstAvailableAtFromBottom` and `getFirstAvailableAtFromTop` are
tested to make sure that both bottom-up and top-down scheduling work
when tracking resource usage across the basic block with
`ResourceSegments`.
Note that the scheduler tracks resource usage with two methods:
1. counters (via `std::vector<unsigned> ReservedCycles;`);
2. intervals (via `std::map<unsigned, ResourceSegments> ReservedResourceSegments;`).
This patch can be considered a NFC test for existing scheduling models
because the tracking system that uses intervals is turned off by
default (field `bit EnableIntervals = false;` in the tablegen class
`SchedMachineModel`).
Reviewed By: andreadb
Differential Revision: https://reviews.llvm.org/D150312
Reverted because it produces the following builbot failure at https://lab.llvm.org/buildbot#builders/9/builds/27319:
/b/ml-opt-rel-x86-64-b1/llvm-project/llvm/unittests/CodeGen/SchedBoundary.cpp: In member function ‘virtual void ResourceSegments_getFirstAvailableAtFromBottom_empty_Test::TestBody()’:
/b/ml-opt-rel-x86-64-b1/llvm-project/llvm/unittests/CodeGen/SchedBoundary.cpp:395:31: error: call of overloaded ‘ResourceSegments(<brace-enclosed initializer list>)’ is ambiguous
395 | auto X = ResourceSegments({});
| ^
This reverts commit dc312f0331309692e8d6e06e93b3492b6a40989f.
The class `ResourceSegments` is used to keep track of the intervals
that represent resource usage of a list of instructions that are
being scheduled by the machine scheduler.
The collection is made of intervals that are closed on the left and
open on the right (represented by the standard notation `[a, b)`).
These collections of intervals can be extended by `add`ing new
intervals accordingly while scheduling a basic block.
Unit tests are added to verify the possible configurations of
intervals, and the relative possibility of scheduling a new
instruction in these configurations. Specifically, the methods
`getFirstAvailableAtFromBottom` and `getFirstAvailableAtFromTop` are
tested to make sure that both bottom-up and top-down scheduling work
when tracking resource usage across the basic block with
`ResourceSegments`.
Note that the scheduler tracks resource usage with two methods:
1. counters (via `std::vector<unsigned> ReservedCycles;`);
2. intervals (via `std::map<unsigned, ResourceSegments> ReservedResourceSegments;`).
This patch can be considered a NFC test for existing scheduling models
because the tracking system that uses intervals is turned off by
default (field `bit EnableIntervals = false;` in the tablegen class
`SchedMachineModel`).
Reviewed By: andreadb
Differential Revision: https://reviews.llvm.org/D150312
Conditions that need to be met:
1. count(StartAtCycle) == count(ReservedCycles);
2. For each i: StartAtCycles[i] < ReservedCycles[i];
3. For each i: StartAtCycles[i] >= 0;
4. If left unspecified, the elements are set to 0.
Differential Revision: https://reviews.llvm.org/D150310
Use deduction guides instead of helper functions.
The only non-automatic changes have been:
1. ArrayRef(some_uint8_pointer, 0) needs to be changed into ArrayRef(some_uint8_pointer, (size_t)0) to avoid an ambiguous call with ArrayRef((uint8_t*), (uint8_t*))
2. CVSymbol sym(makeArrayRef(symStorage)); needed to be rewritten as CVSymbol sym{ArrayRef(symStorage)}; otherwise the compiler is confused and thinks we have a (bad) function prototype. There was a few similar situation across the codebase.
3. ADL doesn't seem to work the same for deduction-guides and functions, so at some point the llvm namespace must be explicitly stated.
4. The "reference mode" of makeArrayRef(ArrayRef<T> &) that acts as no-op is not supported (a constructor cannot achieve that).
Per reviewers' comment, some useless makeArrayRef have been removed in the process.
This is a follow-up to https://reviews.llvm.org/D140896 that introduced
the deduction guides.
Differential Revision: https://reviews.llvm.org/D140955