PostRA scheduling supports different directions now, but we can
only specify it via command line options.
This patch adds a new hook `overridePostRASchedPolicy` for targets
to override PostRA scheduling policy.
Note that some options like tracking register pressure won't take
effect in PostRA scheduling.
Previously we set the dump direction according to command line
options, but we may override the scheduling direction in `initPolicy`
and this results in mismatch between dump and actual policy.
Here we simply set the dump direction after initializing the policy.
This produces far too much terminal output, particularly for the
instruction reduction. Since it doesn't consider the liveness of of
the instructions it's deleting, it produces quite a lot of verifier
errors.
This patch is part of a set of patches that add an `-fextend-lifetimes`
flag to clang, which extends the lifetimes of local variables and
parameters for improved debuggability. In addition to that flag, the
patch series adds a pragma to selectively disable `-fextend-lifetimes`,
and an `-fextend-this-ptr` flag which functions as `-fextend-lifetimes`
for this pointers only. All changes and tests in these patches were
written by Wolfgang Pieb (@wolfy1961), while Stephen Tozer (@SLTozer)
has handled review and merging. The extend lifetimes flag is intended to
eventually be set on by `-Og`, as discussed in the RFC
here:
https://discourse.llvm.org/t/rfc-redefine-og-o1-and-add-a-new-level-of-og/72850
This patch implements a new intrinsic instruction in LLVM,
`llvm.fake.use` in IR and `FAKE_USE` in MIR, that takes a single operand
and has no effect other than "using" its operand, to ensure that its
operand remains live until after the fake use. This patch does not emit
fake uses anywhere; the next patch in this sequence causes them to be
emitted from the clang frontend, such that for each variable (or this) a
fake.use operand is inserted at the end of that variable's scope, using
that variable's value. This patch covers everything post-frontend, which
is largely just the basic plumbing for a new intrinsic/instruction,
along with a few steps to preserve the fake uses through optimizations
(such as moving them ahead of a tail call or translating them through
SROA).
Co-authored-by: Stephen Tozer <stephen.tozer@sony.com>
- Add `LiveIntervalsAnalysis`.
- Add `LiveIntervalsPrinterPass`.
- Use `LiveIntervalsWrapperPass` in legacy pass manager.
- Use `std::unique_ptr` instead of raw pointer for `LICalc`, so
destructor and default move constructor can handle it correctly.
This would be the last analysis required by `PHIElimination`.
Since `raw_string_ostream` doesn't own the string buffer, it is
desirable (in terms of memory safety) for users to directly reference
the string buffer rather than use `raw_string_ostream::str()`.
Work towards TODO comment to remove `raw_string_ostream::str()`.
Prepare for new pass manager version of `MachineDominatorTreeAnalysis`.
We may need a machine dominator tree version of `DomTreeUpdater` to
handle `SplitCriticalEdge` in some CodeGen passes.
I had some trouble understanding why `removeReady` removed nodes from
the Pending queue, since my intuition told me that the Pending queue did
not represent a node that was ready. I took a deeper look and found that
pickOnlyNode and pickNodeFromQueue only picked nodes from the Available
queue too.
I found that need to nodes from the Available and Pending queues that
correspond to the opposite direction that we ended up choosing from
(IsTopNode vs !IsTopNode).
It took me a little longer than I would have liked to understand this
fact, so I figured that I would add a comment in the code that makes it
clear for future readers.
Machine scheduler will suppress register pressure when the scheduling
window is too small, but now it doesn't consider i64 register type,
and this MR extends it into i64 register type, so architecture like
RISCV64 that only supports i64 interger register will have the same
behavior like RISCV32.
This PR is stacked on #76186.
This PR keeps the default strategy as top-down since that is what
existing targets expect. It can be enabled using
`-misched-postra-direction=bidirectional`.
It is up to targets to decide whether they would like to enable this
option for themselves.
This is another part of #70452 which makes getMemOperandsWithOffsetWidth
use a LocationSize for Width, as opposed to the unsigned it currently
uses. The advantages on it's own are not super high if
getMemOperandsWithOffsetWidth usually uses known sizes, but if the
values can come from an MMO it can help be more accurate in case they
are Unknown (and in the future, scalable).
There is the possibility that the bottom-up direction will lead to
performance improvements on certain targets, as this is certainly the case for
the pre-regalloc GenericScheduler. This patch will give people the
opportunity to experiment for their sub-targets. However, this patch
keeps the top-down approach as the default for the PostGenericScheduler
since that is what subtargets expect today.
This is a precommit to supporting post reg-alloc bottom up scheduling.
We'd like to have post-ra scheduling direction that can be different from
pre-ra direction. The current dumpSchedule function is changed in this
patch to support the fact that the post-ra and pre-ra directions will
depend on different command line options.
b1ae461a5358932851de42b66ffde8748da51a83 renamed Cycle ->
ReleaseAtCycle.
7e09239e24b339f45f63a670e2e831150826bf70 was committed without rebasing
but used the old Cycle syntax.
This caused a build failure when
7e09239e24b339f45f63a670e2e831150826bf70 was squash-and-merged. This
patch fixes this problem.
TargetSchedule.td explicitly allows the usage of a ProcResource for zero
cycles, in order to represent that the ProcResource must be available
but is not consumed by the instruction. On the other hand,
ResourceSegments explicitly does not allow for a zero sized interval. In
order to remedy this, this patch handles the special case of when there
is an empty interval usage of a resource by not adding an empty
interval.
We ran into this issue downstream, but it makes sense to have
this upstream since it is explicitly allowed by TargetSchedule.td.
Reordering based on the sort order of the MemOpInfo array was disabled
in <https://reviews.llvm.org/D72706>. However, it's not clear this is
desirable for al targets. It also makes it more difficult to compare the
incremental benefit of enabling load clustering in the selectiondag
scheduler as well was the machinescheduler, as the sdag scheduler does
seem to allow this reordering.
This patch adds a parameter that can control the behaviour on a
per-target basis.
Split out from #73789.
The member functions of ScheduleDAGMI are called back from
PostMachineScheduler::runOnMachineFunction, instead of
MachineScheduler::runOnMachineFunction.
These are picked up from getMemOperandsWithOffsetWidth but weren't then
being passed through to shouldClusterMemOps, which forces backends to
collect the information again if they want to use the kind of heuristics
typically used for the similar shouldScheduleLoadsNear function (e.g.
checking the offset is within 1 cache line).
This patch just adds the parameters, but doesn't attempt to use them.
There is potential to use them in the current PPC and AArch64
shouldClusterMemOps implementation, and I intend to use the offset in
the heuristic for RISC-V. I've left these for future patches in the
interest of being as incremental as possible.
As noted in the review and in an inline FIXME, an ElementCount-style abstraction may later be used to condense these two parameters to one argument. ElementCount isn't quite suitable as it doesn't support negative offsets.
D150312 added a TODO:
TODO: consider renaming the field `StartAtCycle` and `Cycles` to
`AcquireAtCycle` and `ReleaseAtCycle` respectively, to stress the
fact that resource allocation is now represented as an interval,
relatively to the issue cycle of the instruction.
This patch implements that TODO. This naming clarifies how to use these
fields in the scheduler. In addition it was confusing that `StartAtCycle` was
singular but `Cycles` was plural. This renaming fixes this inconsistency.
This commit as previously reverted since it missed renaming that came
down after rebasing. This version of the commit fixes those problems.
Differential Revision: https://reviews.llvm.org/D158568
D150312 added a TODO:
TODO: consider renaming the field `StartAtCycle` and `Cycles` to
`AcquireAtCycle` and `ReleaseAtCycle` respectively, to stress the
fact that resource allocation is now represented as an interval,
relatively to the issue cycle of the instruction.
This patch implements that TODO. This naming clarifies how to use these
fields in the scheduler. In addition it was confusing that `StartAtCycle` was
singular but `Cycles` was plural. This renaming fixes this inconsistency.
This commit as previously reverted since it missed renaming that came
down after rebasing. This version of the commit fixes those problems.
Differential Revision: https://reviews.llvm.org/D158568
D150312 added a TODO:
TODO: consider renaming the field `StartAtCycle` and `Cycles` to
`AcquireAtCycle` and `ReleaseAtCycle` respectively, to stress the
fact that resource allocation is now represented as an interval,
relatively to the issue cycle of the instruction.
This patch implements that TODO. This naming clarifies how to use these
fields in the scheduler. In addition it was confusing that `StartAtCycle` was
singular but `Cycles` was plural. This renaming fixes this inconsistency.
Differential Revision: https://reviews.llvm.org/D158568
When dealing with the subunits of a resource group, we should reset
the subunits availability at the first avaiable cycle of the resource
that contains the subunits. Previously, the reset operation was
returning cycle 0, effectively erasing the booking history of the
subunits.
Without this change, when using intervals for models have make use of
subunits, the erasing of resource booking for subunits can raise the
assertion "A resource is being overwritten" in
`ResourceSegments::add`. The test added in the patch is one of such
cases.
Reviewed By: andreadb
Differential Revision: https://reviews.llvm.org/D156530
BUG 1 - choosing the right cycle when booking a resource.
---------------------------------------------------------
Bottom up scheduling should take in account the current cycle at
the scheduling boundary when determing at what cycle a resource can be
issued. Supposed the schedule boundary is at cycle `C`, and that we
want to check at what cycle a 3 cycles resource can be instantiated.
We have two cases: A, in which the last seen resource cycle LSRC in
which the resource is known to be used is more than oe euqual to 3
cycles away from current cycle `C`, (`C - LSRC >=3`) and B in which
the LSRC is less than 3 cycles away from C (`C - LSRC < 3`). Note
that, in bottom-up scheduling LRS is always smaller or eaual to the
current cycle `C`.
The two cases can be schematized as follow:
```
... | C + 1 | C | C - 1 | C - 2 | C - 3 | C - 4 | ...
| | | | | | LSRC | -> Case A
| | | | LSRC | | | -> Case B
// Before allocating the resource
LSRC(A) = C - 4
LSRC(B) = C - 2
```
In case A, the scheduler sees cycles `C`, `C-1` and `C-2` being
available for booking the 3-cycles resource. Therefore the LSRC can be
updated to be `C`, and the resource can be scheduled from cycle `C`
(the `X` in the table):
```
... | C + 1 | C | C - 1 | C - 2 | C - 3 | C - 4 | ...
| | X | X | X | | | -> Case A
// After allocating the resource
LSRC(A) = C
```
In case B, the 3-cycle resource usage would clash with the LSRC if
allocated starting from cycle C:
```
... | C + 1 | C | C - 1 | C - 2 | C - 3 | C - 4 | ...
| | X | X | X | | | -> clash at cycle C - 2
| | | | LSRC | | | -> Case B
```
Therefore, the cycle in which the resource can be scheduled needs to
be greater than `C`. For the example, the resource is booked
in cycle `C + 1`.
```
... | C + 1 | C | C - 1 | C - 2 | C - 3 | C - 4 | ...
| X | X | X | | | |
// After allocating the resource
LSRC(B) = C + 1
```
The behavior we need to correctly support cases A and B is obtained by
computing the next value of the LSRC as the maximum between:
1. the current cycle `C`;
2. and the previous LSRC plus the number of cycle CYCLES the resource will need.
In formula:
```
LSRC(next) = max(C, LSRC(previous) + CYCLES)
```
BUG 2 - booking the resource for the correct number of cycles.
--------------------------------------------------------------
When storing the next LSRC, the funcion `getNextResourceCycle` was
being invoked setting to 0 the number of cycles a resource was using.
The invocation of `getNextResourceCycle` is now using the values of
`Cycles` instead of 0.
Effects on code generation
--------------------------
This fix have effects only on AArch64, for the Cortex-A55
scheduling model (`-mcpu=cortex-a55`).
The changes in the MIR tests caused by this patch show that the value
now reported by `getNextResourceCycle` is correct.
Other cortex-a55 tests have been touched by this change, where some
instructions have been swapped. The final generated code is equivalent
in term of the total number of cycles. The test
`llvm/test/CodeGen/AArch64/misched-detail-resource-booking-02.mir`
shows in details the correctness of the bottom up scheduling, and the
effect on the codegen change that are visible in the test
`llvm/test/CodeGen/AArch64/aarch64-smull.ll`.
Reviewed By: andreadb, dmgreen
Differential Revision: https://reviews.llvm.org/D153117
The option `-misched-detail-resource-booking` prints the following
information every time the method
`SchedBoundary::getNextResourceCycle` is invoked:
1. counters of the resources that have already been booked;
2. the values returned by `getNextResourceCycle`, which is the next
available cycle in which a resource can be booked.
The method is useful to debug low-level checks inside the machine
scheduler that make decisions based on the values returned by
`getNextResourceCycle`.
Reviewed By: andreadb
Differential Revision: https://reviews.llvm.org/D153116
Reverting because of https://lab.llvm.org/buildbot#builders/75/builds/32485:
llvm-project/llvm/lib/CodeGen/MachineScheduler.cpp:2374:7: error: use of undeclared identifier 'MischedDetailResourceBooking'
if (MischedDetailResourceBooking)
This reverts commit fc06262c1c365777e71207b6a5de281cba927c96.
The option `-misched-detail-resource-booking` prints the following
information every time the method
`SchedBoundary::getNextResourceCycle` is invoked:
1. counters of the resources that have already been booked;
2. the values returned by `getNextResourceCycle`, which is the next
available cycle in which a resource can be booked.
The method is useful to debug low-level checks inside the machine
scheduler that make decisions based on the values returned by
`getNextResourceCycle`.
Reviewed By: andreadb
Differential Revision: https://reviews.llvm.org/D153116
When building the compiler with -DLLVM_ENABLE_EXPENSIVE_CHECKS=ON,
sometimes resources that are dumped in scheduled traces gets reordered
even if they are booked in the same cycle. Using `stable_sort`
guarantees that such occasional reordering does not happen.
This change should fix failures like the one seen in
https://lab.llvm.org/buildbot/#/builders/16/builds/49592.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D152800
This commit re-work the methods that dump traces with resource usage to take into account the StartAtCycle value added by https://reviews.llvm.org/D150310.
For each i, the values of the lists StartAtCycle and ReservedCycles is are printed with the interval [StartAtCycle[i], ReservedCycles[i])
```
... | StartAtCycle[i] | ... | ReservedCycles[i] - 1 | ReservedCycles[i] | ...
| xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx | |
```
Reviewed By: andreadb
Differential Revision: https://reviews.llvm.org/D150311
Re-landing the code that was reverted because of the buildbot failure
in https://lab.llvm.org/buildbot#builders/9/builds/27319.
Original commit message
======================
The class `ResourceSegments` is used to keep track of the intervals
that represent resource usage of a list of instructions that are
being scheduled by the machine scheduler.
The collection is made of intervals that are closed on the left and
open on the right (represented by the standard notation `[a, b)`).
These collections of intervals can be extended by `add`ing new
intervals accordingly while scheduling a basic block.
Unit tests are added to verify the possible configurations of
intervals, and the relative possibility of scheduling a new
instruction in these configurations. Specifically, the methods
`getFirstAvailableAtFromBottom` and `getFirstAvailableAtFromTop` are
tested to make sure that both bottom-up and top-down scheduling work
when tracking resource usage across the basic block with
`ResourceSegments`.
Note that the scheduler tracks resource usage with two methods:
1. counters (via `std::vector<unsigned> ReservedCycles;`);
2. intervals (via `std::map<unsigned, ResourceSegments> ReservedResourceSegments;`).
This patch can be considered a NFC test for existing scheduling models
because the tracking system that uses intervals is turned off by
default (field `bit EnableIntervals = false;` in the tablegen class
`SchedMachineModel`).
Reviewed By: andreadb
Differential Revision: https://reviews.llvm.org/D150312
Reverted because it produces the following builbot failure at https://lab.llvm.org/buildbot#builders/9/builds/27319:
/b/ml-opt-rel-x86-64-b1/llvm-project/llvm/unittests/CodeGen/SchedBoundary.cpp: In member function ‘virtual void ResourceSegments_getFirstAvailableAtFromBottom_empty_Test::TestBody()’:
/b/ml-opt-rel-x86-64-b1/llvm-project/llvm/unittests/CodeGen/SchedBoundary.cpp:395:31: error: call of overloaded ‘ResourceSegments(<brace-enclosed initializer list>)’ is ambiguous
395 | auto X = ResourceSegments({});
| ^
This reverts commit dc312f0331309692e8d6e06e93b3492b6a40989f.
The class `ResourceSegments` is used to keep track of the intervals
that represent resource usage of a list of instructions that are
being scheduled by the machine scheduler.
The collection is made of intervals that are closed on the left and
open on the right (represented by the standard notation `[a, b)`).
These collections of intervals can be extended by `add`ing new
intervals accordingly while scheduling a basic block.
Unit tests are added to verify the possible configurations of
intervals, and the relative possibility of scheduling a new
instruction in these configurations. Specifically, the methods
`getFirstAvailableAtFromBottom` and `getFirstAvailableAtFromTop` are
tested to make sure that both bottom-up and top-down scheduling work
when tracking resource usage across the basic block with
`ResourceSegments`.
Note that the scheduler tracks resource usage with two methods:
1. counters (via `std::vector<unsigned> ReservedCycles;`);
2. intervals (via `std::map<unsigned, ResourceSegments> ReservedResourceSegments;`).
This patch can be considered a NFC test for existing scheduling models
because the tracking system that uses intervals is turned off by
default (field `bit EnableIntervals = false;` in the tablegen class
`SchedMachineModel`).
Reviewed By: andreadb
Differential Revision: https://reviews.llvm.org/D150312
This is rework of;
- rG13e77db2df94 (r328395; MVT)
Since `LowLevelType.h` has been restored to `CodeGen`, `MachinveValueType.h`
can be restored as well.
Depends on D148767
Differential Revision: https://reviews.llvm.org/D149024
The traces are printed only for bottom-up and top-down scheduling
because the values of TopReadyCycle and BottomReadyCycle are
inconsistent when obtained via bidirectional scheduling (see
`BIDIRECTIONAL` checks in the test).
Differential Revision: https://reviews.llvm.org/D142529