Fixes#82659
There are some functions, such as `findRegisterDefOperandIdx` and `findRegisterDefOperand`, that have too many default parameters. As a result, we have encountered some issues due to the lack of TRI parameters, as shown in issue #82411.
Following @RKSimon 's suggestion, this patch refactors 9 functions, including `{reads, kills, defines, modifies}Register`, `registerDefIsDead`, and `findRegister{UseOperandIdx, UseOperand, DefOperandIdx, DefOperand}`, adjusting the order of the TRI parameter and making it required. In addition, all the places that call these functions have also been updated correctly to ensure no additional impact.
After this, the caller of these functions should explicitly know whether to pass the `TargetRegisterInfo` or just a `nullptr`.
When a whole register is added a basic block's liveins, use
LaneBitmask::getAll for the live lanes instead of trying to calculate an
accurate mask of the lanes that comprise the register.
This simplifies the code and matches other places where a whole register
is marked as livein.
This also avoids problems when regunits that are synthesized by TableGen
to represent ad hoc aliasing have a lane mask of 0.
Fixes#78942
When doing sink-and-fold, the MachineSink clears the "killed" flags of
the operands of the sunk (and deleted) instruction. However, this is not
always sufficient. In some cases we can create the new load/store
instruction with operands other than the ones present in the deleted
instruction. One such example is folding a zero word extend into a
memory load on AArch64. The zero-extend is represented by a pair of
instructions - `MOV` (i.e. `ORRwrs`) followed by a `SUBREG_TO_REG`. The
`SUBREG_TO_REG` is deleted (it is the sunk instruction), but the new
load instruction mentions operands "killed" in the `MOV`, which is no
longer correct.
To fix this, clear the "killed" flags of the registers participating in
the addressing mode.
Somewhat similar to ef9bcace834e63f25bbbc5e8e2b615f89d85fb2f
([MachineSink][AArch64] Preserve debug location when rematerialising
an instruction to replace a COPY (#72685))
reuse the debug location of the COPY, iff the rematerialised instruction
did not have a location.
Fixes a regression in `DebugInfo/AArch64/constant-dbgloc.ll` after
enabling sink-and-fold.
After performing sink-and-fold over a COPY, the original instruction is
replaced with one that produces its output in the destination of the
copy. Its value is still available (in a hard register), so if there are
debug instructions which refer to the (now deleted) virtual register
they could be updated to refer to the hard register, in principle.
However, it's not clear how to do that, moreover in some cases the debug
instructions may need to be replicated proportionally to the number of
the COPY instructions replaced and in some extreme cases we can end up
with quadratic increase in the number of debug instructions, e.g:
int f(int);
void g(int x) {
int y = x + 1;
int t0 = y;
f(t0);
int t1 = y;
f(t1);
}
After the SinkAndFold optimization was enabled, we saw some crashes with
GISel due to SinkAndFold erasing an MI while a reference was being held in a
cache.
Temporal divergence that was present in input or introduced in IR
transforms, like code-sinking or LICM, is handled in SIFixSGPRCopies
by changing sgpr source instr to vgpr instr.
After 5b657f5, that moved LICM after AMDGPUCodeGenPrepare,
machine-sinking can introduce temporal divergence by sinking
instructions outside of the cycle.
Add isSafeToSink callback in TargetInstrInfo.
There were a couple of issues with maintaining register def/uses held
in `MachineRegisterInfo`:
* when an operand is changed from one register to another, the
corresponding instruction must already be inserted into the function,
or MRI won't be updated
* when traversing the set of all uses of a register, that set must not
change
This patch adds a new code transformation to the `MachineSink` pass,
that tries to sink copies of an instruction, when the copies can be folded
into the addressing modes of load/store instructions, or
replace another instruction (currently, copies into a hard register).
The criteria for performing the transformation is that:
* the register pressure at the sink destination block must not
exceed the register pressure limits
* the latency and throughput of the load/store or the copy must not deteriorate
* the original instruction must be deleted
Reviewed By: dmgreen
Differential Revision: https://reviews.llvm.org/D152828
This fixes sinking a VGPR def out of a loop past the reconvergence
point at the SI_END_CF. There was a prior fix which introduced
blockPrologueInterferes (D121277) to fix the same basic problem for
the post RA sink. This also had the special case isIgnorableUse case
which was incorrect, because in some contexts the exec use is not
ignorable.
I'm thinking about a new way to represent this which will avoid
needing hasIgnorableUse and isBasicBlockPrologue, which would function
more like the exception handling.
Fixes: SWDEV-407790
https://reviews.llvm.org/D155343
An instruction should be sunk (if otherwise legal and profitable) regardless
of if it has a dead def of a physreg or not. Physreg defs are checked in other
places and sinking is only done with dead defs of regs that are not live into
the target MBB.
Differential Revision: https://reviews.llvm.org/D150447
Reviewed By: sebastian-ne, arsenm
This change initializes the members TSI, LI, DT, PSI, and ORE pointer feilds of the SelectOptimize class to nullptr.
Reviewed By: LuoYuanke
Differential Revision: https://reviews.llvm.org/D148303
This is a helper function to very slightly simplify many calls to
MachineInstruction::getOperandNo.
Differential Revision: https://reviews.llvm.org/D143250
This patch makes two notable changes to the MIR debug info representation,
which result in different MIR output but identical final DWARF output (NFC
w.r.t. the full compilation). The two changes are:
* The introduction of a new MachineOperand type, MO_DbgInstrRef, which
consists of two unsigned numbers that are used to index an instruction
and an output operand within that instruction, having a meaning
identical to first two operands of the current DBG_INSTR_REF
instruction. This operand is only used in DBG_INSTR_REF (see below).
* A change in syntax for the DBG_INSTR_REF instruction, shuffling the
operands to make it resemble DBG_VALUE_LIST instead of DBG_VALUE,
and replacing the first two operands with a single MO_DbgInstrRef-type
operand.
This patch is the first of a set that will allow DBG_INSTR_REF
instructions to refer to multiple machine locations in the same manner
as DBG_VALUE_LIST.
Reviewed By: jmorse
Differential Revision: https://reviews.llvm.org/D129372
If an instruction is sunk into a loop then any kill flags on
operands declared outside the loop must be cleared as these
will be live for all loop iterations.
Fixes#46827
Reviewed By: MatzeB
Differential Revision: https://reviews.llvm.org/D126754
The use operand may be undefined. In that case we can just continue to
check the next operand since it won't increase register pressure.
Differential Revision: https://reviews.llvm.org/D127848
reapply 62a9b36fcf728b104ea87e6eb84c0be69b779df7 and fix module build
failue:
1: remove MachineCycleInfoWrapperPass in MachinePassRegistry.def
MachineCycleInfoWrapperPass is a anylysis pass, should not be there.
2: move the definition for MachineCycleInfoPrinterPass to cpp file.
Otherwise, there are module conflicit for MachineCycleInfoWrapperPass
in MachinePassRegistry.def and MachineCycleAnalysis.h after
62a9b36fcf728b104ea87e6eb84c0be69b779df7.
MachineCycle can handle irreducible loop. Natural loop
analysis (MachineLoop) can not return correct loop depth if
the loop is irreducible loop. And MachineSink is sensitive
to the loop depth, see MachineSinking::isProfitableToSinkTo().
This patch tries to use MachineCycle so that we can handle
irreducible loop better.
Reviewed By: sameerds, MatzeB
Differential Revision: https://reviews.llvm.org/D123995
MachineCycle can handle irreducible loop. Natural loop
analysis (MachineLoop) can not return correct loop depth if
the loop is irreducible loop. And MachineSink is sensitive
to the loop depth, see MachineSinking::isProfitableToSinkTo().
This patch tries to use MachineCycle so that we can handle
irreducible loop better.
Reviewed By: sameerds, MatzeB
Differential Revision: https://reviews.llvm.org/D123995
Sinking must check for interference between the block prologue
and the instruction being sunk.
Specifically check for clobbering of uses by the prologue, and
overwrites to prologue defined registers by the sunk instruction.
Reviewed By: rampitec, ruiling
Differential Revision: https://reviews.llvm.org/D121277
This reverts commit 7f230feeeac8a67b335f52bd2e900a05c6098f20.
Breaks CodeGenCUDA/link-device-bitcode.cu in check-clang,
and many LLVM tests, see comments on https://reviews.llvm.org/D121169
This is an alternative to D120330, which disables MachineSink for
functions with irreducible cycles entirely. This avoids both the
correctness problem, and ensures we don't perform non-profitable
sinks into cycles. At the same time, it may also disable
profitable sinks in the same function. This can be made more
precise by using MachineCycleInfo in the future.
Fixes https://github.com/llvm/llvm-project/issues/53990.
Differential Revision: https://reviews.llvm.org/D120800
For AMDGPU the insertion point for a block may not be the first
non-PHI instruction. This happens when a block contains EXEC
mask manipulation related to control flow (converging lanes).
Use SkipPHIsAndLabels to determine the block insertion point
so that the target can skip any block prologue instructions.
Reviewed By: rampitec, ruiling
Differential Revision: https://reviews.llvm.org/D119399
For AMDGPU, any use of the physical register EXEC prevents sinking even if it is not a real physical register read. Add check to see if a physical
register use can be ignored for sinking.
Also perform same constant and ignorable physical register check when considering sinking in loops.
https://reviews.llvm.org/D116053
Register uses that are MRI->isConstantPhysReg() should not inhibit
sinking transformation.
Reviewed By: StephenTozer
Differential Revision: https://reviews.llvm.org/D111531
We did a experiment and observed dramatic decrease on compilation time which spent on clearing kill flags.
Before:
Number of BasicBlocks:33357
Number of Instructions:162067
Number of Cleared Kill Flags:32869
Time of handling kill flags(ms):1.607509e+05
After:
Number of BasicBlocks:33357
Number of Instructions:162067
Number of Cleared Kill Flags:32869
Time of handling kill flags:3.987371e+03
Reviewed By: MatzeB
Differential Revision: https://reviews.llvm.org/D111688
This is a cleanup patch -- we're now able to support all flavours of
variable location in instruction referencing mode. This patch updates
various tests for debug instructions to be broader: numerous code paths
try to ignore debug isntructions, and they now have to ignore the
additional DBG_PHI and DBG_INSTR_REFs that we can generate.
A small amount of rework happens for LiveDebugVariables: as we don't need
to track live intervals through regalloc any more, we can get away with
unlinking debug instructions before regalloc, then re-inserting them after.
Note that this isn't (yet) true of DBG_VALUE_LISTs, they still have to go
through live interval tracking.
In SelectionDAG, add a helper lambda that emits half-formed DBG_INSTR_REFs
for arguments in instr-ref mode, DBG_VALUE otherwise. This is one of the
final locations where DBG_VALUEs are emitted for vreg arguments.
X86InstrInfo now un-sets the debug instr number on SUB instructions that
get mutated into CMP instructions. As the instruction no longer computes a
subtraction, we can't use it for variable locations.
Differential Revision: https://reviews.llvm.org/D88898