2017-09-13 21:15:20 +00:00
|
|
|
//===- RegAllocBase.h - basic regalloc interface and driver -----*- C++ -*-===//
|
2010-10-22 23:09:15 +00:00
|
|
|
//
|
2019-01-19 08:50:56 +00:00
|
|
|
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
|
|
|
|
// See https://llvm.org/LICENSE.txt for license information.
|
|
|
|
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
|
2010-10-22 23:09:15 +00:00
|
|
|
//
|
|
|
|
//===----------------------------------------------------------------------===//
|
|
|
|
//
|
|
|
|
// This file defines the RegAllocBase class, which is the skeleton of a basic
|
|
|
|
// register allocation algorithm and interface for extending it. It provides the
|
|
|
|
// building blocks on which to construct other experimental allocators and test
|
|
|
|
// the validity of two principles:
|
2010-11-30 23:18:47 +00:00
|
|
|
//
|
2010-10-22 23:09:15 +00:00
|
|
|
// - If virtual and physical register liveness is modeled using intervals, then
|
|
|
|
// on-the-fly interference checking is cheap. Furthermore, interferences can be
|
|
|
|
// lazily cached and reused.
|
2010-11-30 23:18:47 +00:00
|
|
|
//
|
2010-10-22 23:09:15 +00:00
|
|
|
// - Register allocation complexity, and generated code performance is
|
|
|
|
// determined by the effectiveness of live range splitting rather than optimal
|
|
|
|
// coloring.
|
|
|
|
//
|
|
|
|
// Following the first principle, interfering checking revolves around the
|
|
|
|
// LiveIntervalUnion data structure.
|
|
|
|
//
|
|
|
|
// To fulfill the second principle, the basic allocator provides a driver for
|
|
|
|
// incremental splitting. It essentially punts on the problem of register
|
|
|
|
// coloring, instead driving the assignment of virtual to physical registers by
|
|
|
|
// the cost of splitting. The basic allocator allows for heuristic reassignment
|
|
|
|
// of registers, if a more sophisticated allocator chooses to do that.
|
|
|
|
//
|
|
|
|
// This framework provides a way to engineer the compile time vs. code
|
2010-12-29 04:42:39 +00:00
|
|
|
// quality trade-off without relying on a particular theoretical solver.
|
2010-10-22 23:09:15 +00:00
|
|
|
//
|
|
|
|
//===----------------------------------------------------------------------===//
|
|
|
|
|
2014-08-13 16:26:38 +00:00
|
|
|
#ifndef LLVM_LIB_CODEGEN_REGALLOCBASE_H
|
|
|
|
#define LLVM_LIB_CODEGEN_REGALLOCBASE_H
|
2010-10-22 23:09:15 +00:00
|
|
|
|
2017-09-13 21:15:20 +00:00
|
|
|
#include "llvm/ADT/SmallPtrSet.h"
|
2024-06-21 13:18:35 +02:00
|
|
|
#include "llvm/CodeGen/MachineRegisterInfo.h"
|
RegAlloc: Allow targets to split register allocation
AMDGPU normally spills SGPRs to VGPRs. Previously, since all register
classes are handled at the same time, this was problematic. We don't
know ahead of time how many registers will be needed to be reserved to
handle the spilling. If no VGPRs were left for spilling, we would have
to try to spill to memory. If the spilled SGPRs were required for exec
mask manipulation, it is highly problematic because the lanes active
at the point of spill are not necessarily the same as at the restore
point.
Avoid this problem by fully allocating SGPRs in a separate regalloc
run from VGPRs. This way we know the exact number of VGPRs needed, and
can reserve them for a second run. This fixes the most serious
issues, but it is still possible using inline asm to make all VGPRs
unavailable. Start erroring in the case where we ever would require
memory for an SGPR spill.
This is implemented by giving each regalloc pass a callback which
reports if a register class should be handled or not. A few passes
need some small changes to deal with leftover virtual registers.
In the AMDGPU implementation, a new pass is introduced to take the
place of PrologEpilogInserter for SGPR spills emitted during the first
run.
One disadvantage of this is currently StackSlotColoring is no longer
used for SGPR spills. It would need to be run again, which will
require more work.
Error if the standard -regalloc option is used. Introduce new separate
-sgpr-regalloc and -vgpr-regalloc flags, so the two runs can be
controlled individually. PBQB is not currently supported, so this also
prevents using the unhandled allocator.
2018-09-27 09:36:28 +10:00
|
|
|
#include "llvm/CodeGen/RegAllocCommon.h"
|
2012-12-04 07:12:27 +00:00
|
|
|
#include "llvm/CodeGen/RegisterClassInfo.h"
|
2010-10-22 23:09:15 +00:00
|
|
|
|
|
|
|
namespace llvm {
|
|
|
|
|
2017-09-13 21:15:20 +00:00
|
|
|
class LiveInterval;
|
2010-10-26 18:34:01 +00:00
|
|
|
class LiveIntervals;
|
2012-06-20 22:52:24 +00:00
|
|
|
class LiveRegMatrix;
|
2017-09-13 21:15:20 +00:00
|
|
|
class MachineInstr;
|
|
|
|
class MachineRegisterInfo;
|
|
|
|
template<typename T> class SmallVectorImpl;
|
2010-11-10 19:18:47 +00:00
|
|
|
class Spiller;
|
2017-09-13 21:15:20 +00:00
|
|
|
class TargetRegisterInfo;
|
|
|
|
class VirtRegMap;
|
2010-10-26 18:34:01 +00:00
|
|
|
|
2010-10-22 23:09:15 +00:00
|
|
|
/// RegAllocBase provides the register allocation driver and interface that can
|
|
|
|
/// be extended to add interesting heuristics.
|
|
|
|
///
|
2010-11-30 23:18:47 +00:00
|
|
|
/// Register allocators must override the selectOrSplit() method to implement
|
2011-02-22 23:01:52 +00:00
|
|
|
/// live range splitting. They must also override enqueue/dequeue to provide an
|
|
|
|
/// assignment order.
|
2013-09-11 18:05:11 +00:00
|
|
|
class RegAllocBase {
|
2013-11-19 00:57:56 +00:00
|
|
|
virtual void anchor();
|
2017-09-13 21:15:20 +00:00
|
|
|
|
2012-01-11 23:19:08 +00:00
|
|
|
protected:
|
2017-09-13 21:15:20 +00:00
|
|
|
const TargetRegisterInfo *TRI = nullptr;
|
|
|
|
MachineRegisterInfo *MRI = nullptr;
|
|
|
|
VirtRegMap *VRM = nullptr;
|
|
|
|
LiveIntervals *LIS = nullptr;
|
|
|
|
LiveRegMatrix *Matrix = nullptr;
|
2012-01-11 23:19:08 +00:00
|
|
|
RegisterClassInfo RegClassInfo;
|
2024-06-21 13:18:35 +02:00
|
|
|
|
|
|
|
private:
|
|
|
|
/// Private, callees should go through shouldAllocateRegister
|
2024-07-22 16:49:39 +05:30
|
|
|
const RegAllocFilterFunc shouldAllocateRegisterImpl;
|
2012-01-11 23:19:08 +00:00
|
|
|
|
2024-06-21 13:18:35 +02:00
|
|
|
protected:
|
2016-04-13 03:08:27 +00:00
|
|
|
/// Inst which is a def of an original reg and whose defs are already all
|
|
|
|
/// dead after remat is saved in DeadRemats. The deletion of such inst is
|
|
|
|
/// postponed till all the allocations are done, so its remat expr is
|
|
|
|
/// always available for the remat of all the siblings of the original reg.
|
|
|
|
SmallPtrSet<MachineInstr *, 32> DeadRemats;
|
|
|
|
|
2024-07-22 16:49:39 +05:30
|
|
|
RegAllocBase(const RegAllocFilterFunc F = nullptr)
|
|
|
|
: shouldAllocateRegisterImpl(F) {}
|
RegAlloc: Allow targets to split register allocation
AMDGPU normally spills SGPRs to VGPRs. Previously, since all register
classes are handled at the same time, this was problematic. We don't
know ahead of time how many registers will be needed to be reserved to
handle the spilling. If no VGPRs were left for spilling, we would have
to try to spill to memory. If the spilled SGPRs were required for exec
mask manipulation, it is highly problematic because the lanes active
at the point of spill are not necessarily the same as at the restore
point.
Avoid this problem by fully allocating SGPRs in a separate regalloc
run from VGPRs. This way we know the exact number of VGPRs needed, and
can reserve them for a second run. This fixes the most serious
issues, but it is still possible using inline asm to make all VGPRs
unavailable. Start erroring in the case where we ever would require
memory for an SGPR spill.
This is implemented by giving each regalloc pass a callback which
reports if a register class should be handled or not. A few passes
need some small changes to deal with leftover virtual registers.
In the AMDGPU implementation, a new pass is introduced to take the
place of PrologEpilogInserter for SGPR spills emitted during the first
run.
One disadvantage of this is currently StackSlotColoring is no longer
used for SGPR spills. It would need to be run again, which will
require more work.
Error if the standard -regalloc option is used. Introduce new separate
-sgpr-regalloc and -vgpr-regalloc flags, so the two runs can be
controlled individually. PBQB is not currently supported, so this also
prevents using the unhandled allocator.
2018-09-27 09:36:28 +10:00
|
|
|
|
2017-09-13 21:15:20 +00:00
|
|
|
virtual ~RegAllocBase() = default;
|
2010-10-22 23:33:19 +00:00
|
|
|
|
2010-10-22 23:09:15 +00:00
|
|
|
// A RegAlloc pass should call this before allocatePhysRegs.
|
2012-06-20 22:52:29 +00:00
|
|
|
void init(VirtRegMap &vrm, LiveIntervals &lis, LiveRegMatrix &mat);
|
2011-05-10 17:37:41 +00:00
|
|
|
|
2024-06-21 13:18:35 +02:00
|
|
|
/// Get whether a given register should be allocated
|
|
|
|
bool shouldAllocateRegister(Register Reg) {
|
2024-07-22 16:49:39 +05:30
|
|
|
if (!shouldAllocateRegisterImpl)
|
2024-06-21 13:18:35 +02:00
|
|
|
return true;
|
2024-07-22 16:49:39 +05:30
|
|
|
return shouldAllocateRegisterImpl(*TRI, *MRI, Reg);
|
2024-06-21 13:18:35 +02:00
|
|
|
}
|
|
|
|
|
2010-10-26 18:34:01 +00:00
|
|
|
// The top-level driver. The output is a VirtRegMap that us updated with
|
|
|
|
// physical register assignments.
|
|
|
|
void allocatePhysRegs();
|
2010-10-22 23:09:15 +00:00
|
|
|
|
2016-04-13 03:08:27 +00:00
|
|
|
// Include spiller post optimization and removing dead defs left because of
|
|
|
|
// rematerialization.
|
|
|
|
virtual void postOptimization();
|
|
|
|
|
2010-11-10 19:18:47 +00:00
|
|
|
// Get a temporary reference to a Spiller instance.
|
|
|
|
virtual Spiller &spiller() = 0;
|
2010-11-30 23:18:47 +00:00
|
|
|
|
2011-02-22 23:01:52 +00:00
|
|
|
/// enqueue - Add VirtReg to the priority queue of unassigned registers.
|
2022-02-03 09:07:42 -08:00
|
|
|
virtual void enqueueImpl(const LiveInterval *LI) = 0;
|
RegAlloc: Allow targets to split register allocation
AMDGPU normally spills SGPRs to VGPRs. Previously, since all register
classes are handled at the same time, this was problematic. We don't
know ahead of time how many registers will be needed to be reserved to
handle the spilling. If no VGPRs were left for spilling, we would have
to try to spill to memory. If the spilled SGPRs were required for exec
mask manipulation, it is highly problematic because the lanes active
at the point of spill are not necessarily the same as at the restore
point.
Avoid this problem by fully allocating SGPRs in a separate regalloc
run from VGPRs. This way we know the exact number of VGPRs needed, and
can reserve them for a second run. This fixes the most serious
issues, but it is still possible using inline asm to make all VGPRs
unavailable. Start erroring in the case where we ever would require
memory for an SGPR spill.
This is implemented by giving each regalloc pass a callback which
reports if a register class should be handled or not. A few passes
need some small changes to deal with leftover virtual registers.
In the AMDGPU implementation, a new pass is introduced to take the
place of PrologEpilogInserter for SGPR spills emitted during the first
run.
One disadvantage of this is currently StackSlotColoring is no longer
used for SGPR spills. It would need to be run again, which will
require more work.
Error if the standard -regalloc option is used. Introduce new separate
-sgpr-regalloc and -vgpr-regalloc flags, so the two runs can be
controlled individually. PBQB is not currently supported, so this also
prevents using the unhandled allocator.
2018-09-27 09:36:28 +10:00
|
|
|
|
|
|
|
/// enqueue - Add VirtReg to the priority queue of unassigned registers.
|
2022-02-03 09:07:42 -08:00
|
|
|
void enqueue(const LiveInterval *LI);
|
2011-02-22 23:01:52 +00:00
|
|
|
|
|
|
|
/// dequeue - Return the next unassigned register, or NULL.
|
2022-02-03 09:07:42 -08:00
|
|
|
virtual const LiveInterval *dequeue() = 0;
|
2010-12-08 22:22:41 +00:00
|
|
|
|
2010-10-22 23:09:15 +00:00
|
|
|
// A RegAlloc pass should override this to provide the allocation heuristics.
|
2010-10-26 18:34:01 +00:00
|
|
|
// Each call must guarantee forward progess by returning an available PhysReg
|
|
|
|
// or new set of split live virtual registers. It is up to the splitter to
|
2010-10-22 23:09:15 +00:00
|
|
|
// converge quickly toward fully spilled live ranges.
|
2022-02-03 09:07:42 -08:00
|
|
|
virtual MCRegister selectOrSplit(const LiveInterval &VirtReg,
|
2020-10-09 10:04:29 -07:00
|
|
|
SmallVectorImpl<Register> &splitLVRs) = 0;
|
2010-10-22 23:09:15 +00:00
|
|
|
|
2010-12-11 00:19:56 +00:00
|
|
|
// Use this group name for NamedRegionTimer.
|
2013-07-17 03:11:32 +00:00
|
|
|
static const char TimerGroupName[];
|
2016-11-18 19:43:18 +00:00
|
|
|
static const char TimerGroupDescription[];
|
2010-12-11 00:19:56 +00:00
|
|
|
|
[RegAllocGreedy] Introduce a late pass to repair broken hints.
A broken hint is a copy where both ends are assigned different colors. When a
variable gets evicted in the neighborhood of such copies, it is likely we can
reconcile some of them.
** Context **
Copies are inserted during the register allocation via splitting. These split
points are required to relax the constraints on the allocation problem. When
such a point is inserted, both ends of the copy would not share the same color
with respect to the current allocation problem. When variables get evicted,
the allocation problem becomes different and some split point may not be
required anymore. However, the related variables may already have been colored.
This usually shows up in the assembly with pattern like this:
def A
...
save A to B
def A
use A
restore A from B
...
use B
Whereas we could simply have done:
def B
...
def A
use A
...
use B
** Proposed Solution **
A variable having a broken hint is marked for late recoloring if and only if
selecting a register for it evict another variable. Indeed, if no eviction
happens this is pointless to look for recoloring opportunities as it means the
situation was the same as the initial allocation problem where we had to break
the hint.
Finally, when everything has been allocated, we look for recoloring
opportunities for all the identified candidates.
The recoloring is performed very late to rely on accurate copy cost (all
involved variables are allocated).
The recoloring is simple unlike the last change recoloring. It propagates the
color of the broken hint to all its copy-related variables. If the color is
available for them, the recoloring uses it, otherwise it gives up on that hint
even if a more complex coloring would have worked.
The recoloring happens only if it is profitable. The profitability is evaluated
using the expected frequency of the copies of the currently recolored variable
with a) its current color and b) with the target color. If a) is greater or
equal than b), then it is profitable and the recoloring happen.
** Example **
Consider the following example:
BB1:
a =
b =
BB2:
...
= b
= a
Let us assume b gets split:
BB1:
a =
b =
BB2:
c = b
...
d = c
= d
= a
Because of how the allocation work, b, c, and d may be assigned different
colors. Now, if a gets evicted to make room for c, assuming b and d were
assigned to something different than a.
We end up with:
BB1:
a =
st a, SpillSlot
b =
BB2:
c = b
...
d = c
= d
e = ld SpillSlot
= e
This is likely that we can assign the same register for b, c, and d,
getting rid of 2 copies.
** Performances **
Both ARM64 and x86_64 show performance improvements of up to 3% for the
llvm-testsuite + externals with Os and O3. There are a few regressions too that
comes from the (in)accuracy of the block frequency estimate.
<rdar://problem/18312047>
llvm-svn: 225422
2015-01-08 01:16:39 +00:00
|
|
|
/// Method called when the allocator is about to remove a LiveInterval.
|
2022-02-03 09:07:42 -08:00
|
|
|
virtual void aboutToRemoveInterval(const LiveInterval &LI) {}
|
[RegAllocGreedy] Introduce a late pass to repair broken hints.
A broken hint is a copy where both ends are assigned different colors. When a
variable gets evicted in the neighborhood of such copies, it is likely we can
reconcile some of them.
** Context **
Copies are inserted during the register allocation via splitting. These split
points are required to relax the constraints on the allocation problem. When
such a point is inserted, both ends of the copy would not share the same color
with respect to the current allocation problem. When variables get evicted,
the allocation problem becomes different and some split point may not be
required anymore. However, the related variables may already have been colored.
This usually shows up in the assembly with pattern like this:
def A
...
save A to B
def A
use A
restore A from B
...
use B
Whereas we could simply have done:
def B
...
def A
use A
...
use B
** Proposed Solution **
A variable having a broken hint is marked for late recoloring if and only if
selecting a register for it evict another variable. Indeed, if no eviction
happens this is pointless to look for recoloring opportunities as it means the
situation was the same as the initial allocation problem where we had to break
the hint.
Finally, when everything has been allocated, we look for recoloring
opportunities for all the identified candidates.
The recoloring is performed very late to rely on accurate copy cost (all
involved variables are allocated).
The recoloring is simple unlike the last change recoloring. It propagates the
color of the broken hint to all its copy-related variables. If the color is
available for them, the recoloring uses it, otherwise it gives up on that hint
even if a more complex coloring would have worked.
The recoloring happens only if it is profitable. The profitability is evaluated
using the expected frequency of the copies of the currently recolored variable
with a) its current color and b) with the target color. If a) is greater or
equal than b), then it is profitable and the recoloring happen.
** Example **
Consider the following example:
BB1:
a =
b =
BB2:
...
= b
= a
Let us assume b gets split:
BB1:
a =
b =
BB2:
c = b
...
d = c
= d
= a
Because of how the allocation work, b, c, and d may be assigned different
colors. Now, if a gets evicted to make room for c, assuming b and d were
assigned to something different than a.
We end up with:
BB1:
a =
st a, SpillSlot
b =
BB2:
c = b
...
d = c
= d
e = ld SpillSlot
= e
This is likely that we can assign the same register for b, c, and d,
getting rid of 2 copies.
** Performances **
Both ARM64 and x86_64 show performance improvements of up to 3% for the
llvm-testsuite + externals with Os and O3. There are a few regressions too that
comes from the (in)accuracy of the block frequency estimate.
<rdar://problem/18312047>
llvm-svn: 225422
2015-01-08 01:16:39 +00:00
|
|
|
|
2010-12-17 23:16:35 +00:00
|
|
|
public:
|
|
|
|
/// VerifyEnabled - True when -verify-regalloc is given.
|
|
|
|
static bool VerifyEnabled;
|
|
|
|
|
2010-10-22 23:09:15 +00:00
|
|
|
private:
|
2011-02-22 23:01:52 +00:00
|
|
|
void seedLiveRegs();
|
2010-10-22 23:09:15 +00:00
|
|
|
};
|
|
|
|
|
|
|
|
} // end namespace llvm
|
|
|
|
|
2017-09-13 21:15:20 +00:00
|
|
|
#endif // LLVM_LIB_CODEGEN_REGALLOCBASE_H
|