478 Commits

Author SHA1 Message Date
Kazu Hirata
735ab61ac8
[CodeGen] Remove unused includes (NFC) (#115996)
Identified with misc-include-cleaner.
2024-11-12 23:15:06 -08:00
Akshat Oke
4e32d7236b
[NewPM][CodeGen] Port LiveRegMatrix to NPM (#109938) 2024-10-22 15:28:04 +05:30
Akshat Oke
93802815ab
[NewPM][CodeGen] Port VirtRegMap to NPM (#109936) 2024-10-22 15:15:56 +05:30
Bevin Hansson
1a65d95d00
[CodeGen][RAGreedy] Inform LiveDebugVariables about snippets spilled by InlineSpiller. (#109962)
RAGreedy invokes InlineSpiller to spill a particular virtreg inline.
When the spiller does this, it also identifies small, adjacent liveranges called
snippets. These are also spilled or rematerialized in the process.

However, the spiller does not inform RA that it has spilled these regs.
This means that debug variable locations referencing these regs/ranges
are lost.

Mark any spilled regs which do not have a stack slot assigned to them as
allocated to the slot being spilled to to tell LDV that those regs are
located in that slot, even though the regs might no longer exist in the
program after regalloc is finished. Also, inform RA about all of the
regs which were replaced (spilled or rematted), not just the one that was
requested so that it can properly manage the ranges of the debug vars.
2024-10-02 10:29:56 +02:00
Matt Arsenault
71ca9fcb8d
llvm-reduce: Don't print verifier failed machine functions (#109673)
This produces far too much terminal output, particularly for the
instruction reduction. Since it doesn't consider the liveness of of
the instructions it's deleting, it produces quite a lot of verifier
errors.
2024-09-24 22:32:53 +04:00
Christudasan Devadasan
15b41d207e
[CodeGen] change prototype of regalloc filter function (#93525)
[CodeGen] Change the prototype of regalloc filter function

Change the prototype of the filter function so that we can
filter not just by RegClass. We need to implement more
complicated filter based upon some other info associated
with each register.

Patch provided by: Gang Chen (gangc@amd.com)
2024-07-22 16:49:39 +05:30
Kazu Hirata
66cd2e0f9a
[CodeGen] Use range-based for loops (NFC) (#98706) 2024-07-13 13:29:47 -07:00
paperchalice
099899961c
[CodeGen][NewPM] Port machine-block-freq to new pass manager (#98317)
- Add `MachineBlockFrequencyAnalysis`.
- Add `MachineBlockFrequencyPrinterPass`.
- Use `MachineBlockFrequencyInfoWrapperPass` in legacy pass manager.
- `LazyMachineBlockFrequencyInfo::print` is empty, drop it due to new
pass manager migration.
2024-07-12 15:45:01 +08:00
paperchalice
abde52aa66
[CodeGen][NewPM] Port LiveIntervals to new pass manager (#98118)
- Add `LiveIntervalsAnalysis`.
- Add `LiveIntervalsPrinterPass`.
- Use `LiveIntervalsWrapperPass` in legacy pass manager.
- Use `std::unique_ptr` instead of raw pointer for `LICalc`, so
destructor and default move constructor can handle it correctly.

This would be the last analysis required by `PHIElimination`.
2024-07-10 19:34:48 +08:00
paperchalice
4010f894a1
[CodeGen][NewPM] Port SlotIndexes to new pass manager (#97941)
- Add `SlotIndexesAnalysis`.
- Add `SlotIndexesPrinterPass`.
- Use `SlotIndexesWrapperPass` in legacy pass.
2024-07-09 12:09:11 +08:00
paperchalice
79d0de2ac3
[CodeGen][NewPM] Port machine-loops to new pass manager (#97793)
- Add `MachineLoopAnalysis`.
- Add `MachineLoopPrinterPass`.
- Convert to `MachineLoopInfoWrapperPass` in legacy pass manager.
2024-07-09 09:11:18 +08:00
Alexis Engelke
739a960567
[RegAlloc] Don't call always-true ShouldAllocClass (#96296)
Previously, there was at least one virtual function call for every
allocated register. The only users of this feature are AMDGPU and RISC-V
(RVV), other targets don't use this. To easily identify these cases,
change the default functor to nullptr and don't call it for every
allocated register.
2024-06-21 13:18:35 +02:00
paperchalice
837dc542b1
[CodeGen][NewPM] Split MachineDominatorTree into a concrete analysis result (#94571)
Prepare for new pass manager version of `MachineDominatorTreeAnalysis`.
We may need a machine dominator tree version of `DomTreeUpdater` to
handle `SplitCriticalEdge` in some CodeGen passes.
2024-06-11 21:27:14 +08:00
Piyou Chen
32cb3c5508
[NFC][LLVM][CodeGen] Move LiveDebugVariables.h into llvm/include/llvm/CodeGen (#88374)
This patch make `LiveDebugVariables` can be used by passes outside of
`lib/CodeGen`.

If we run a pass that occurs between the split register allocation pass
without preserving this pass, it will be freed and recomputed until it
encounters the next pass that needs LiveDebugVariables.

However, `LiveDebugVariables` will raise an assertion due to the pass
being freed without emitting a debug value.

This is reason we need `LiveDebugVariables` to be available for passes
outside of lib/Codegen.
2024-04-15 21:58:57 +08:00
David Green
303a7835ff
[GreedyRA] Improve RA for nested loop induction variables (#72093)
Imagine a loop of the form:
```
  preheader:
    %r = def
  header:
    bcc latch, inner
  inner1:
    ..
  inner2:
    b latch
  latch:
    %r = subs %r
    bcc header
```

It can be possible for code to spend a decent amount of time in the
header<->latch loop, not going into the inner part of the loop as much.
The greedy register allocator can prefer to spill _around_ %r though,
adding spills around the subs in the loop, which can be very detrimental
for performance. (The case I am looking at is actually a very deeply
nested set of loops that repeat the header<->latch pattern at multiple
different levels).

The greedy RA will apply a preference to spill to the IV, as it is live
through the header block. This patch attempts to add a heuristic to
prevent that in this case for variables that look like IVs, in a similar
regard to the extra spill weight that gets added to variables that look
like IVs, that are expensive to spill. That will mean spills are more
likely to be pushed into the inner blocks, where they are less likely to
be executed and not as expensive as spills around the IV.

This gives a 8% speedup in the exchange benchmark from spec2017 when
compiled with flang-new, whilst importantly stabilising the scores to be
less chaotic to other changes. Running ctmark showed no difference in
the compile time. I've tried to run a range of benchmarking for
performance, most of which were relatively flat not showing many large
differences. One matrix multiply case improved 21.3% due to removing a
cascading chains of spills, and some other knock-on effects happen which
usually cause small differences in the scores.
2023-11-18 09:55:19 +00:00
Kazu Hirata
bafd35ca04 [llvm] Stop including llvm/ADT/SmallPtrSet.h (NFC)
Identified with clangd.
2023-11-11 00:35:14 -08:00
weiguozhi
b6043f9867
[RA] Disable split around hint register if optimize for size (#68619)
Split a virtual register with hint may generate COPY instructions in
multiple cold basic blocks, and increase code size. So disable this
split when the function is optimized for size.
2023-10-11 14:57:15 -07:00
Matthias Braun
2e26d09106
BlockFrequencyInfo: Add PrintBlockFreq helper (#67512)
- Refactor the (Machine)BlockFrequencyInfo::printBlockFreq functions
into a `PrintBlockFreq()` function returning a `Printable` object. This
simplifies usage as it can be directly piped to a `raw_ostream` like
`dbgs() << PrintBlockFreq(MBFI, Freq) << '\n';`.
- Previously there was an interesting behavior where
`BlockFrequencyInfoImpl` stores frequencies both as a `Scaled64` number
and as an `uint64_t`. Most algorithms use the `BlockFrequency`
abstraction with the integers, the print function for basic blocks
printed the `Scaled64` number potentially showing higher accuracy than
was used by the algorithm. This changes things to only print
`BlockFrequency` values.
- Replace some instances of `dbgs() << Freq.getFrequency()` with the new
function.
2023-10-05 18:26:50 -07:00
Matthias Braun
5181156b37
Use BlockFrequency type in more places (NFC) (#68266)
The `BlockFrequency` class abstracts `uint64_t` frequency values. Use it
more consistently in various APIs and disable implicit conversion to
make usage more consistent and explicit.

- Use `BlockFrequency Freq` parameter for `setBlockFreq`,
`getProfileCountFromFreq` and `setBlockFreqAndScale` functions.
- Return `BlockFrequency` in `getEntryFreq()` functions.
- While on it change some `const BlockFrequency& Freq` parameters to
plain `BlockFreqency Freq`.
- Mark `BlockFrequency(uint64_t)` constructor as explicit.
- Add missing `BlockFrequency::operator!=`.
- Remove `uint64_t BlockFreqency::getMaxFrequency()`.
- Add `BlockFrequency BlockFrequency::max()` function.
2023-10-05 11:40:17 -07:00
Matt Arsenault
7252787dd9 RegAllocGreedy: Fix detection of lanes read by a bundle
SplitKit creates questionably formed bundles of copies
when it needs to copy a subset of live lanes and can't do
it with a single subregister index. These are merely marked
as part of a bundle, and don't start with a BUNDLE instruction.
Queries for the slot index would give the first copy in the
bundle, and we need to inspect the operands of all the other
bundled copies.

Also fix and simplify detection of read lane subsets. This causes
some RISCV test regressions, but these look like accidentally beneficial
splits. I don't see a subrange based reason to perform these splits.

Avoids some really ugly regressions in a future patch.

https://reviews.llvm.org/D146859
2023-10-01 11:37:48 +03:00
weiguozhi
31f81e96a4
[RA] Don't split a register generated from another split (#67351)
Split a register generated from another split usually doesn't bring us
too much benefit. It may also cause dead loop as pr67188 shows if the
heuristic cost always satisfy the split condition. So prevent such
splitting.

It fixed pr67188.
2023-09-26 08:38:18 -07:00
Jay Foad
e0919b189b [CodeGen] Renumber slot indexes before register allocation (#66334)
RegAllocGreedy uses SlotIndexes::getApproxInstrDistance to approximate
the length of a live range for its heuristics. Renumbering all slot
indexes with the default instruction distance ensures that this estimate
will be as accurate as possible, and will not depend on the history of
how instructions have been added to and removed from SlotIndexes's maps.

This also means that enabling -early-live-intervals, which runs the
SlotIndexes analysis earlier, will not cause large amounts of churn due
to different register allocator decisions.
2023-09-19 11:18:12 +01:00
Guozhi Wei
cbdccb30c2 [RA] Split a virtual register in cold blocks if it is not assigned preferred physical register
If a virtual register is not assigned preferred physical register, it means some
COPY instructions will be changed to real register move instructions. In this
case we can try to split the virtual register in colder blocks, if success, the
original COPY instructions can be deleted, and the new COPY instructions in
colder blocks will be generated as register move instructions. It results in
fewer dynamic register move instructions executed.

The new test case split-reg-with-hint.ll gives an example, the hot path contains
24 instructions without this patch, now it is only 4 instructions with this
patch.

Differential Revision: https://reviews.llvm.org/D156491
2023-09-15 19:52:50 +00:00
Matt Arsenault
4d42e8b5d1 Reapply "[CodeGen]Allow targets to use target specific COPY instructions for live range splitting"
This reverts commit a496c8be6e638ae58bb45f13113dbe3a4b7b23fd.

The workaround in c26dfc81e254c78dc23579cf3d1336f77249e1f6 should work
around the underlying problem with SUBREG_TO_REG.
2023-07-31 20:15:45 -04:00
Vitaly Buka
a496c8be6e Revert "[CodeGen]Allow targets to use target specific COPY instructions for live range splitting"
And dependent commits.

Details in D150388.

This reverts commit 825b7f0ca5f2211ec3c93139f98d1e24048c225c.
This reverts commit 7a98f084c4d121244ef7286bc6503b6a181d446e.
This reverts commit b4a62b1fa546312d882fa12dfdcd015177d66826.
This reverts commit b7836d856206ec39509d42529f958c920368166b.

No conflicts in the code, few tests had conflicts in autogenerated CHECKs:
llvm/test/CodeGen/Thumb2/mve-float32regloops.ll
llvm/test/CodeGen/AMDGPU/fix-frame-reg-in-custom-csr-spills.ll

Reviewed By: alexfh

Differential Revision: https://reviews.llvm.org/D156381
2023-07-26 22:13:32 -07:00
Yashwant Singh
b7836d8562 [CodeGen]Allow targets to use target specific COPY instructions for live range splitting
Replacing D143754. Right now the LiveRangeSplitting during register allocation uses
TargetOpcode::COPY instruction for splitting. For AMDGPU target that creates a
problem as we have both vector and scalar copies. Vector copies perform a copy over
a vector register but only on the lanes(threads) that are active. This is mostly sufficient
however we do run into cases when we have to copy the entire vector register and
not just active lane data. One major place where we need that is live range splitting.

Allowing targets to use their own copy instructions(if defined) will provide a lot of
flexibility and ease to lower these pseudo instructions to correct MIR.

- Introduce getTargetCopyOpcode() virtual function and use if to generate copy in Live range
 splitting.
- Replace necessary MI.isCopy() checks with TII.isCopyInstr() in register allocator pipeline.

Reviewed By: arsenm, cdevadas, kparzysz

Differential Revision: https://reviews.llvm.org/D150388
2023-07-07 22:29:50 +05:30
Matt Arsenault
c3b27c236d RegAllocGreedy: Fix assert with remarks on unassigned subregisters
This tried to query the physical subregister on virtual registers
if they were left unassigned.
2023-06-25 19:26:25 -04:00
Sergei Barannikov
b70b96c0f5 [RegAlloc] Simplify RegAllocEvictionAdvisor::canReassign (NFC)
Use range-based for loops.
The return type has been changed to bool because the method is only
used in boolean contexts.

Reviewed By: mtrofin

Differential Revision: https://reviews.llvm.org/D152665
2023-06-16 05:39:56 +03:00
Sergei Barannikov
aa2d0fbc30 [MC] Add MCRegisterInfo::regunits for iteration over register units
Reviewed By: foad

Differential Revision: https://reviews.llvm.org/D152098
2023-06-16 05:39:50 +03:00
Matt Arsenault
ce6c36bab5 RegAllocGreedy: Don't use Register reference 2023-03-17 15:22:13 -04:00
Craig Topper
e72ca520bb [CodeGen] Remove uses of Register::isPhysicalRegister/isVirtualRegister. NFC
Use isPhysical/isVirtual methods.

Reviewed By: foad

Differential Revision: https://reviews.llvm.org/D141715
2023-01-13 14:38:08 -08:00
serge-sans-paille
38818b60c5
Move from llvm::makeArrayRef to ArrayRef deduction guides - llvm/ part
Use deduction guides instead of helper functions.

The only non-automatic changes have been:

1. ArrayRef(some_uint8_pointer, 0) needs to be changed into ArrayRef(some_uint8_pointer, (size_t)0) to avoid an ambiguous call with ArrayRef((uint8_t*), (uint8_t*))
2. CVSymbol sym(makeArrayRef(symStorage)); needed to be rewritten as CVSymbol sym{ArrayRef(symStorage)}; otherwise the compiler is confused and thinks we have a (bad) function prototype. There was a few similar situation across the codebase.
3. ADL doesn't seem to work the same for deduction-guides and functions, so at some point the llvm namespace must be explicitly stated.
4. The "reference mode" of makeArrayRef(ArrayRef<T> &) that acts as no-op is not supported (a constructor cannot achieve that).

Per reviewers' comment, some useless makeArrayRef have been removed in the process.

This is a follow-up to https://reviews.llvm.org/D140896 that introduced
the deduction guides.

Differential Revision: https://reviews.llvm.org/D140955
2023-01-05 14:11:08 +01:00
Fangrui Song
67819a72c6 [CodeGen] llvm::Optional => std::optional 2022-12-13 09:06:36 +00:00
Kazu Hirata
998960ee1f [CodeGen] Use std::nullopt instead of None (NFC)
This patch mechanically replaces None with std::nullopt where the
compiler would warn if None were deprecated.  The intent is to reduce
the amount of manual work required in migrating from Optional to
std::optional.

This is part of an effort to migrate from llvm::Optional to
std::optional:

https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716
2022-12-02 20:36:08 -08:00
Kazu Hirata
20d764aff0 [llvm] Don't including SetVector.h (NFC)
llvm/lib/ProfileData/RawMemProfReader.cpp uses SetVector without
including SetVector.h, so this patch adds an appropriate #include
there.
2022-09-17 12:36:43 -07:00
Matt Arsenault
63d1d37d35 RegAllocGreedy: Avoid overflowing priority bitfields
The class priority is expected to be at most 5 bits before it starts
clobbering bits used for other fields. Also clamp the instruction
distance in case we have millions of instructions.

AMDGPU was accidentally overflowing into the global priority bit in
some cases. I think in principal we would have wanted this, but in the
cases I've looked at, it had the counter intuitive effect and
de-prioritized the large register tuple.

Avoid using weird bit hack PPC uses for global priority. The
AllocationPriority field is really 5 bits, and PPC was relying on
overflowing this to 6-bits to forcibly set the global priority
bit. Split this out as a separate flag to avoid having magic behavior
for values above 31.
2022-09-15 10:38:40 -04:00
Aiden Grossman
eec183c171 [nfc] Refactor SlotIndex::getInstrDistance to better reflect actual functionality
This patch refactors SlotIndex::getInstrDistance to
SlotIndex::getApproxInstrDistance to better describe the actual
functionality of this function. This patch also adds in some additional
comments better documenting the assumptions that this function makes to
increase clarity.

Based on discussion on the LLVM Discourse:
https://discourse.llvm.org/t/odd-behavior-in-slotindex-getinstrdistance/64934/5

Reviewed By: mtrofin, foad

Differential Revision: https://reviews.llvm.org/D133386
2022-09-12 23:33:35 +00:00
Matt Arsenault
e30271169f RegAllocGreedy: Try local instruction splitting with subranges
This was only trying this to relax register class constraints, but
this can also help if there are subranges involved.

This solves a compilation failure for AMDGPU when there is high
pressure created by large register tuples. If one virtual register is
using most of the available budget, we need to be able to evict
subranges.

This solves the immediate failure, but this solution leaves a lot to
be desired. In the relevant testcases, we have 32-element tuples but
most of the uses are operations on 1 element subranges of it. What
we're now getting is a spill and restore of the full 1024 bits and an
extract of the used 32-bits. It would be far better if we introduced a
copy to a new virtual register with a smaller register class and used
narrower spills.

Furthermore, we could probably do a better job if the allocator were
to introduce new subranges where none previously existed in the
highest pressure scenarios. The block and region splits should also
try to split specific subranges out.

The mve-vst3.ll test changes looks like noise to me, but instruction
count increased by one. mve-vst4.ll looks like a solid improvement
with several 16-byte spills eliminated. splitkit-copy-live-lanes.mir
also shows a solid reduction in total spill count.

This could use more tests but it's pretty tiring to come up with cases
that fail on this.
2022-09-12 09:03:55 -04:00
Eric Wang
d8a2d3f7d4 [NFC][Regalloc] Introduce the RegAllocPriorityAdvisorAnalysis
This patch introduces the priority analysis and the priority advisor,
the default implementation, and the scaffolding for introducing the
other implementations of the advisor.

Reviewed By: mtrofin

Differential Revision: https://reviews.llvm.org/D132835
2022-09-08 07:50:03 -07:00
Eric Wang
ad8eb85545 [NFC][MLGO] ML Regalloc Priority Advisor
This patch introduces the priority analysis and the priority advisor,
the default implementation, and the scaffolding for introducing the
other implementations of the advisor.

Reviewed By: mtrofin

Differential Revision: https://reviews.llvm.org/D131220
2022-08-18 15:33:48 -07:00
Matthias Braun
19ce5e515f RAGreedyStats: Ignore identity COPYs; count COPYs from/to physregs
Improve copy statistics:

- Count copies from or to physical registers: They are used to model function parameters and calling conventions and the register allocator optimizes for them.
- Check physical registers assigned to virtual registers and stop counting "identity" `COPY`s where source and destination is the same physical registers; they will be removed in the `virtregmap` pass anyway.

Differential Revision: https://reviews.llvm.org/D131932
2022-08-17 12:53:29 -07:00
Matt Arsenault
62531518f9 RegAllocGreedy: Add a command line flag for reverseLocalAssignment
Introduce a flag like for some of the other target heuristic controls
to help with experimentation.
2022-07-25 15:47:15 -04:00
Matt Arsenault
8d0383eb69 CodeGen: Remove AliasAnalysis from regalloc
This was stored in LiveIntervals, but not actually used for anything
related to LiveIntervals. It was only used in one check for if a load
instruction is rematerializable. I also don't think this was entirely
correct, since it was implicitly assuming constant loads are also
dereferenceable.

Remove this and rely only on the invariant+dereferenceable flags in
the memory operand. Set the flag based on the AA query upfront. This
should have the same net benefit, but has the possible disadvantage of
making this AA query nonlazy.

Preserve the behavior of assuming pointsToConstantMemory implying
dereferenceable for now, but maybe this should be changed.
2022-07-18 17:23:41 -04:00
Kazu Hirata
c0fe37de04 [CodeGen] Remove redundant declaration createGreedyRegisterAllocator (NFC)
The function is declared in llvm/include/llvm/CodeGen/Passes.h.

Identified with readability-redundant-declaration.
2022-07-16 15:43:34 -07:00
Kazu Hirata
4d9d07c5fb [CodeGen] Use RegClassFilterFunc where appropriate (NFC) 2022-07-16 15:43:33 -07:00
Luo, Yuanke
fa8656d28d [greedyalloc] Return early when there is no register to allocate.
In X86 we split greddy register allocation into 2 passes. The 1st pass
is to allocate tile register, and the 2nd pass is to allocate the rest
of virtual register. In most cases there is no tile register, so the 1st
pass is unnecessary. To improve the compiling time, we check if there is
any register need to be allocated by invoking callback
`ShouldAllocateClass`. If there is no register to be allocated, just
return false in the pass. This would improve the 1st greed RA pass for
normal cases.

Differential Revision: https://reviews.llvm.org/D128804
2022-06-30 11:12:05 +08:00
Serguei Katkov
095bf6be28 [Greedy RegAlloc] Fix the handling of split register in last chance re-coloring.
This is a fix for https://github.com/llvm/llvm-project/issues/55827.

When register we are trying to re-color is split the original register (we tried to recover)
has no uses after the split. However in rollback actions we assign back physical register to it.
Later it causes different assertions. One of them is in attached test.

This CL fixes this by avoiding assigning physical register back to register which has no usage
or its live interval now is empty.

Reviewed By: arsenm, qcolombet
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D127281
2022-06-14 12:04:17 +07:00
Kazu Hirata
23d9ca10ae [CodeGen] Remove EvictionTrack (NFC)
The last of getEvictor use was removed on Jun 5, 2022 in commit
5c06f7168fd1bd589b831cacd5f1cb8a928446fb, which was itself a patch to
remove unused code.

Once we remove getEvictor, EvictionTrack becomes a write-only data
structure.  The data in it won't affect compilation, so the entire
class is essentially dead.
2022-06-13 07:21:29 -07:00
Kazu Hirata
5c06f7168f [CodeGen] Remove splitCanCauseEvictionChain and its helpers (NFC)
The last use was removed on Mar 7, 2022 in commit
294eca35a00f89dff474044ebd478a7f83ccc310.
2022-06-05 20:22:47 -07:00
Qunyan Mangus
12bae5f3e2 Remove duplicate fields in RAGreedy
RAGreedy has two fields of RegisterClassInfo, one called RCI and another RegClassInfo from its base class.
RCI is initialized without freezeReservedRegs first, while RegClassInfo does. Therefore, if reserved registers
information is changed between last time freezeReservedRegs is called and RAGreedy, it's not picked up by RCI.
Instead of having both fields in RAGreedy, remove RCI and use RegClassInfo instead. Also removed is the TRI field
which is present in its base class.

Reviewed By: MatzeB

Differential Revision: https://reviews.llvm.org/D125926
2022-05-23 13:08:25 -07:00