Machine scheduler will suppress register pressure when the scheduling
window is too small, but now it doesn't consider i64 register type,
and this MR extends it into i64 register type, so architecture like
RISCV64 that only supports i64 interger register will have the same
behavior like RISCV32.
This PR is stacked on #76186.
This PR keeps the default strategy as top-down since that is what
existing targets expect. It can be enabled using
`-misched-postra-direction=bidirectional`.
It is up to targets to decide whether they would like to enable this
option for themselves.
This is another part of #70452 which makes getMemOperandsWithOffsetWidth
use a LocationSize for Width, as opposed to the unsigned it currently
uses. The advantages on it's own are not super high if
getMemOperandsWithOffsetWidth usually uses known sizes, but if the
values can come from an MMO it can help be more accurate in case they
are Unknown (and in the future, scalable).
There is the possibility that the bottom-up direction will lead to
performance improvements on certain targets, as this is certainly the case for
the pre-regalloc GenericScheduler. This patch will give people the
opportunity to experiment for their sub-targets. However, this patch
keeps the top-down approach as the default for the PostGenericScheduler
since that is what subtargets expect today.
This is a precommit to supporting post reg-alloc bottom up scheduling.
We'd like to have post-ra scheduling direction that can be different from
pre-ra direction. The current dumpSchedule function is changed in this
patch to support the fact that the post-ra and pre-ra directions will
depend on different command line options.
b1ae461a5358932851de42b66ffde8748da51a83 renamed Cycle ->
ReleaseAtCycle.
7e09239e24b339f45f63a670e2e831150826bf70 was committed without rebasing
but used the old Cycle syntax.
This caused a build failure when
7e09239e24b339f45f63a670e2e831150826bf70 was squash-and-merged. This
patch fixes this problem.
TargetSchedule.td explicitly allows the usage of a ProcResource for zero
cycles, in order to represent that the ProcResource must be available
but is not consumed by the instruction. On the other hand,
ResourceSegments explicitly does not allow for a zero sized interval. In
order to remedy this, this patch handles the special case of when there
is an empty interval usage of a resource by not adding an empty
interval.
We ran into this issue downstream, but it makes sense to have
this upstream since it is explicitly allowed by TargetSchedule.td.
Reordering based on the sort order of the MemOpInfo array was disabled
in <https://reviews.llvm.org/D72706>. However, it's not clear this is
desirable for al targets. It also makes it more difficult to compare the
incremental benefit of enabling load clustering in the selectiondag
scheduler as well was the machinescheduler, as the sdag scheduler does
seem to allow this reordering.
This patch adds a parameter that can control the behaviour on a
per-target basis.
Split out from #73789.
The member functions of ScheduleDAGMI are called back from
PostMachineScheduler::runOnMachineFunction, instead of
MachineScheduler::runOnMachineFunction.
These are picked up from getMemOperandsWithOffsetWidth but weren't then
being passed through to shouldClusterMemOps, which forces backends to
collect the information again if they want to use the kind of heuristics
typically used for the similar shouldScheduleLoadsNear function (e.g.
checking the offset is within 1 cache line).
This patch just adds the parameters, but doesn't attempt to use them.
There is potential to use them in the current PPC and AArch64
shouldClusterMemOps implementation, and I intend to use the offset in
the heuristic for RISC-V. I've left these for future patches in the
interest of being as incremental as possible.
As noted in the review and in an inline FIXME, an ElementCount-style abstraction may later be used to condense these two parameters to one argument. ElementCount isn't quite suitable as it doesn't support negative offsets.
D150312 added a TODO:
TODO: consider renaming the field `StartAtCycle` and `Cycles` to
`AcquireAtCycle` and `ReleaseAtCycle` respectively, to stress the
fact that resource allocation is now represented as an interval,
relatively to the issue cycle of the instruction.
This patch implements that TODO. This naming clarifies how to use these
fields in the scheduler. In addition it was confusing that `StartAtCycle` was
singular but `Cycles` was plural. This renaming fixes this inconsistency.
This commit as previously reverted since it missed renaming that came
down after rebasing. This version of the commit fixes those problems.
Differential Revision: https://reviews.llvm.org/D158568
D150312 added a TODO:
TODO: consider renaming the field `StartAtCycle` and `Cycles` to
`AcquireAtCycle` and `ReleaseAtCycle` respectively, to stress the
fact that resource allocation is now represented as an interval,
relatively to the issue cycle of the instruction.
This patch implements that TODO. This naming clarifies how to use these
fields in the scheduler. In addition it was confusing that `StartAtCycle` was
singular but `Cycles` was plural. This renaming fixes this inconsistency.
This commit as previously reverted since it missed renaming that came
down after rebasing. This version of the commit fixes those problems.
Differential Revision: https://reviews.llvm.org/D158568
D150312 added a TODO:
TODO: consider renaming the field `StartAtCycle` and `Cycles` to
`AcquireAtCycle` and `ReleaseAtCycle` respectively, to stress the
fact that resource allocation is now represented as an interval,
relatively to the issue cycle of the instruction.
This patch implements that TODO. This naming clarifies how to use these
fields in the scheduler. In addition it was confusing that `StartAtCycle` was
singular but `Cycles` was plural. This renaming fixes this inconsistency.
Differential Revision: https://reviews.llvm.org/D158568
When dealing with the subunits of a resource group, we should reset
the subunits availability at the first avaiable cycle of the resource
that contains the subunits. Previously, the reset operation was
returning cycle 0, effectively erasing the booking history of the
subunits.
Without this change, when using intervals for models have make use of
subunits, the erasing of resource booking for subunits can raise the
assertion "A resource is being overwritten" in
`ResourceSegments::add`. The test added in the patch is one of such
cases.
Reviewed By: andreadb
Differential Revision: https://reviews.llvm.org/D156530
BUG 1 - choosing the right cycle when booking a resource.
---------------------------------------------------------
Bottom up scheduling should take in account the current cycle at
the scheduling boundary when determing at what cycle a resource can be
issued. Supposed the schedule boundary is at cycle `C`, and that we
want to check at what cycle a 3 cycles resource can be instantiated.
We have two cases: A, in which the last seen resource cycle LSRC in
which the resource is known to be used is more than oe euqual to 3
cycles away from current cycle `C`, (`C - LSRC >=3`) and B in which
the LSRC is less than 3 cycles away from C (`C - LSRC < 3`). Note
that, in bottom-up scheduling LRS is always smaller or eaual to the
current cycle `C`.
The two cases can be schematized as follow:
```
... | C + 1 | C | C - 1 | C - 2 | C - 3 | C - 4 | ...
| | | | | | LSRC | -> Case A
| | | | LSRC | | | -> Case B
// Before allocating the resource
LSRC(A) = C - 4
LSRC(B) = C - 2
```
In case A, the scheduler sees cycles `C`, `C-1` and `C-2` being
available for booking the 3-cycles resource. Therefore the LSRC can be
updated to be `C`, and the resource can be scheduled from cycle `C`
(the `X` in the table):
```
... | C + 1 | C | C - 1 | C - 2 | C - 3 | C - 4 | ...
| | X | X | X | | | -> Case A
// After allocating the resource
LSRC(A) = C
```
In case B, the 3-cycle resource usage would clash with the LSRC if
allocated starting from cycle C:
```
... | C + 1 | C | C - 1 | C - 2 | C - 3 | C - 4 | ...
| | X | X | X | | | -> clash at cycle C - 2
| | | | LSRC | | | -> Case B
```
Therefore, the cycle in which the resource can be scheduled needs to
be greater than `C`. For the example, the resource is booked
in cycle `C + 1`.
```
... | C + 1 | C | C - 1 | C - 2 | C - 3 | C - 4 | ...
| X | X | X | | | |
// After allocating the resource
LSRC(B) = C + 1
```
The behavior we need to correctly support cases A and B is obtained by
computing the next value of the LSRC as the maximum between:
1. the current cycle `C`;
2. and the previous LSRC plus the number of cycle CYCLES the resource will need.
In formula:
```
LSRC(next) = max(C, LSRC(previous) + CYCLES)
```
BUG 2 - booking the resource for the correct number of cycles.
--------------------------------------------------------------
When storing the next LSRC, the funcion `getNextResourceCycle` was
being invoked setting to 0 the number of cycles a resource was using.
The invocation of `getNextResourceCycle` is now using the values of
`Cycles` instead of 0.
Effects on code generation
--------------------------
This fix have effects only on AArch64, for the Cortex-A55
scheduling model (`-mcpu=cortex-a55`).
The changes in the MIR tests caused by this patch show that the value
now reported by `getNextResourceCycle` is correct.
Other cortex-a55 tests have been touched by this change, where some
instructions have been swapped. The final generated code is equivalent
in term of the total number of cycles. The test
`llvm/test/CodeGen/AArch64/misched-detail-resource-booking-02.mir`
shows in details the correctness of the bottom up scheduling, and the
effect on the codegen change that are visible in the test
`llvm/test/CodeGen/AArch64/aarch64-smull.ll`.
Reviewed By: andreadb, dmgreen
Differential Revision: https://reviews.llvm.org/D153117
The option `-misched-detail-resource-booking` prints the following
information every time the method
`SchedBoundary::getNextResourceCycle` is invoked:
1. counters of the resources that have already been booked;
2. the values returned by `getNextResourceCycle`, which is the next
available cycle in which a resource can be booked.
The method is useful to debug low-level checks inside the machine
scheduler that make decisions based on the values returned by
`getNextResourceCycle`.
Reviewed By: andreadb
Differential Revision: https://reviews.llvm.org/D153116
Reverting because of https://lab.llvm.org/buildbot#builders/75/builds/32485:
llvm-project/llvm/lib/CodeGen/MachineScheduler.cpp:2374:7: error: use of undeclared identifier 'MischedDetailResourceBooking'
if (MischedDetailResourceBooking)
This reverts commit fc06262c1c365777e71207b6a5de281cba927c96.
The option `-misched-detail-resource-booking` prints the following
information every time the method
`SchedBoundary::getNextResourceCycle` is invoked:
1. counters of the resources that have already been booked;
2. the values returned by `getNextResourceCycle`, which is the next
available cycle in which a resource can be booked.
The method is useful to debug low-level checks inside the machine
scheduler that make decisions based on the values returned by
`getNextResourceCycle`.
Reviewed By: andreadb
Differential Revision: https://reviews.llvm.org/D153116
When building the compiler with -DLLVM_ENABLE_EXPENSIVE_CHECKS=ON,
sometimes resources that are dumped in scheduled traces gets reordered
even if they are booked in the same cycle. Using `stable_sort`
guarantees that such occasional reordering does not happen.
This change should fix failures like the one seen in
https://lab.llvm.org/buildbot/#/builders/16/builds/49592.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D152800
This commit re-work the methods that dump traces with resource usage to take into account the StartAtCycle value added by https://reviews.llvm.org/D150310.
For each i, the values of the lists StartAtCycle and ReservedCycles is are printed with the interval [StartAtCycle[i], ReservedCycles[i])
```
... | StartAtCycle[i] | ... | ReservedCycles[i] - 1 | ReservedCycles[i] | ...
| xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx | |
```
Reviewed By: andreadb
Differential Revision: https://reviews.llvm.org/D150311
Re-landing the code that was reverted because of the buildbot failure
in https://lab.llvm.org/buildbot#builders/9/builds/27319.
Original commit message
======================
The class `ResourceSegments` is used to keep track of the intervals
that represent resource usage of a list of instructions that are
being scheduled by the machine scheduler.
The collection is made of intervals that are closed on the left and
open on the right (represented by the standard notation `[a, b)`).
These collections of intervals can be extended by `add`ing new
intervals accordingly while scheduling a basic block.
Unit tests are added to verify the possible configurations of
intervals, and the relative possibility of scheduling a new
instruction in these configurations. Specifically, the methods
`getFirstAvailableAtFromBottom` and `getFirstAvailableAtFromTop` are
tested to make sure that both bottom-up and top-down scheduling work
when tracking resource usage across the basic block with
`ResourceSegments`.
Note that the scheduler tracks resource usage with two methods:
1. counters (via `std::vector<unsigned> ReservedCycles;`);
2. intervals (via `std::map<unsigned, ResourceSegments> ReservedResourceSegments;`).
This patch can be considered a NFC test for existing scheduling models
because the tracking system that uses intervals is turned off by
default (field `bit EnableIntervals = false;` in the tablegen class
`SchedMachineModel`).
Reviewed By: andreadb
Differential Revision: https://reviews.llvm.org/D150312
Reverted because it produces the following builbot failure at https://lab.llvm.org/buildbot#builders/9/builds/27319:
/b/ml-opt-rel-x86-64-b1/llvm-project/llvm/unittests/CodeGen/SchedBoundary.cpp: In member function ‘virtual void ResourceSegments_getFirstAvailableAtFromBottom_empty_Test::TestBody()’:
/b/ml-opt-rel-x86-64-b1/llvm-project/llvm/unittests/CodeGen/SchedBoundary.cpp:395:31: error: call of overloaded ‘ResourceSegments(<brace-enclosed initializer list>)’ is ambiguous
395 | auto X = ResourceSegments({});
| ^
This reverts commit dc312f0331309692e8d6e06e93b3492b6a40989f.
The class `ResourceSegments` is used to keep track of the intervals
that represent resource usage of a list of instructions that are
being scheduled by the machine scheduler.
The collection is made of intervals that are closed on the left and
open on the right (represented by the standard notation `[a, b)`).
These collections of intervals can be extended by `add`ing new
intervals accordingly while scheduling a basic block.
Unit tests are added to verify the possible configurations of
intervals, and the relative possibility of scheduling a new
instruction in these configurations. Specifically, the methods
`getFirstAvailableAtFromBottom` and `getFirstAvailableAtFromTop` are
tested to make sure that both bottom-up and top-down scheduling work
when tracking resource usage across the basic block with
`ResourceSegments`.
Note that the scheduler tracks resource usage with two methods:
1. counters (via `std::vector<unsigned> ReservedCycles;`);
2. intervals (via `std::map<unsigned, ResourceSegments> ReservedResourceSegments;`).
This patch can be considered a NFC test for existing scheduling models
because the tracking system that uses intervals is turned off by
default (field `bit EnableIntervals = false;` in the tablegen class
`SchedMachineModel`).
Reviewed By: andreadb
Differential Revision: https://reviews.llvm.org/D150312
This is rework of;
- rG13e77db2df94 (r328395; MVT)
Since `LowLevelType.h` has been restored to `CodeGen`, `MachinveValueType.h`
can be restored as well.
Depends on D148767
Differential Revision: https://reviews.llvm.org/D149024
The traces are printed only for bottom-up and top-down scheduling
because the values of TopReadyCycle and BottomReadyCycle are
inconsistent when obtained via bidirectional scheduling (see
`BIDIRECTIONAL` checks in the test).
Differential Revision: https://reviews.llvm.org/D142529
Summary:
As supporting information, I have added an example that describes how
the indexes of the vector of resources SchedBoundary::ReservedCycles
are tracked by the field SchedBoundary::ReservedCyclesIndex.
This has a minor rework of
b39a9a94f4
which was reverted in
df6ae1779f
becasue the llc invocation of the test was missing the argument
`-mtriple`.
See for example the failure at
https://lab.llvm.org/buildbot#builders/231/builds/7245 that reported
the following when targeting a non-aarch64 native build:
'cortex-a55' is not a recognized processor for this target (ignoring processor)
Reviewers: jroelofs
Subscribers:
Differential Revision: https://reviews.llvm.org/D141367
Reverting because of https://lab.llvm.org/buildbot#builders/16/builds/41860
When building on x86, I need to specify also -mtriple in the
invocation of llc otherwise the folllowing error shows up:
'cortex-a55' is not a recognized processor for this target (ignoring processor)
This reverts commit b39a9a94f420a25a239ae03097c255900cbd660e.
As supporting information, I have added an example that describes how
the indexes of the vector of resources SchedBoundary::ReservedCycles
are tracked by the field SchedBoundary::ReservedCyclesIndex.
Reviewed By: jroelofs
Differential Revision: https://reviews.llvm.org/D141367
This patch fixes the error llvm/lib/CodeGen/MachineScheduler.cpp(755): error C2065: 'MISchedCutoff': undeclared identifier in case of NDEBUG and LLVM_ENABLE_ABI_BREAKING_CHECKS.
Note MISchedCutoff is declared under #ifndef NDEBUG.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D130425
The SI machine scheduler inherits from ScheduleDAGMI.
This patch adds support for a few features that are implemented
in ScheduleDAGMI (or its base classes) that were missing so far
because their support is implemented in overridden functions.
* Support cl::opt -view-misched-dags
This option allows to open a graphical window of the scheduling DAG.
* Support cl::opt -misched-print-dags
This option allows to print the scheduling DAG in text form.
* After constructing the scheduling DAG, call postprocessDAG()
to apply any registered DAG mutations.
Note that currently there are no mutations defined in AMDGPUTargetMachine.cpp
in case SIScheduler is used.
Still add this to avoid surprises in the future in case mutations are added.
Differential Revision: https://reviews.llvm.org/D128808
Non-static class members declared under #ifndef NDEBUG should be declared
under #if LLVM_ENABLE_ABI_BREAKING_CHECKS to make headers library-friendly and
allow cross-linking, as discussed in D120714.
Differential Revision: https://reviews.llvm.org/D121549
This reverts commit 7f230feeeac8a67b335f52bd2e900a05c6098f20.
Breaks CodeGenCUDA/link-device-bitcode.cu in check-clang,
and many LLVM tests, see comments on https://reviews.llvm.org/D121169
Debug position data is cleared after ScheduleDAGMILive::schedule() due to it also calling placeDebugValues(). Make it so the data is not cleared after initial call to placeDebugValues since we will call it again after reverting a schedule.
Secondly, since we skip debug instructions when reverting the schedule on AMDGPU, all debug instructions are now moved to the end of the scheduling region. RegionEnd points to the beginning of this chunk of debug instructions since it was not incremented when a debug instruction was skipped. RegionBegin may also point to the same debug instruction if Unsched.front() is a debug instruction thus shrinking the region to 1. Fix RegionBegin and RegionEnd so that they point to the current beginning and ending before calling placeDebugValues() since both vars will be used as reference points to move debug instructions back.
Reviewed By: rampitec
Differential Revision: https://reviews.llvm.org/D119022
The Pre-RA VLIWMachineScheduler used by Hexagon is a relatively generic
implementation that would make sense to use on other VLIW targets.
This commit lifts those classes into their own header/source file with the
root VLIWMachineScheduler. I chose this path rather than adding the
strategy et al. into MachineScheduler to avoid bloating the file with other
implementations.
Target-specific behaviors have been captured and replicated through
function overloads.
- Added an overloadable DFAPacketizer creation member function. This is
mainly done for our downstream, which has the capability to override
the DFAPacketizer with custom implementations. This is an upstreamable
TODO on our end. Currently, it always returns the result of
TargetInstrInfo::CreateTargetScheduleState
- Added an extra helper which returns the number of instructions in the
current packet. This is used in our downstream, and may be useful
elsewhere.
- Placed the priority heuristic values into the ConvergingVLIWscheduler
class instead of defining them as local statics in the implementation
- Added a overridable helper in ConvergingVLIWScheduler so that targets
can create their own VLIWResourceModel
Differential Revision: https://reviews.llvm.org/D113150