It is good to have a combined `divrem` instruction when the
`div` and `rem` are computed from identical input operands.
Some targets can lower them through a single expansion that
computes both division and remainder. It effectively reduces
the number of instructions than individually expanding them.
Reviewed By: arsenm, paquette
Differential Revision: https://reviews.llvm.org/D96013
If every element is extracted from a G_BUILD_VECTOR, pass through the source
registers. This is different to the extract(build_vector) combine because this
one tolerates multiple users as long as they're exhaustive.
Differential Revision: https://reviews.llvm.org/D97890
Recommit 4112299ee761a9b6a309c8ff4a7e75f8c8d8851b. Depends on
4c8fb7ddd6fa49258e0e9427e7345fb56ba522d4 which was reverted.
Combine zext(trunc x) to x when truncated bits are known to be zero.
Differential Revision: https://reviews.llvm.org/D96031
This combine tries to do inter-block hoisting of extends of G_PHIs, into the
originating blocks of the phi's incoming value. The idea is to expose further
optimization opportunities that are normally obscured by the PHI.
Some basic heuristics, and a target hook for AArch64 is added, to allow tuning.
E.g. if the extend is used by a G_PTR_ADD, it doesn't perform this combine
since it may be folded into the addressing mode during selection.
There are very minor code size improvements on AArch64 -Os, but the real benefit
is that it unlocks optimizations like AArch64 conditional compares on some
benchmarks.
Differential Revision: https://reviews.llvm.org/D95703
The builder was using the extend user as the insertion point, which meant that
we were incorrectly "moving" the load from its original position, and therefore
could violate memory operation ordering.
If the G_BR + G_BRCOND in this combine use the same MBB, then it will infinite
loop. Don't allow that to happen.
Differential Revision: https://reviews.llvm.org/D95895
This is a restricted version of the combine in `DAGCombiner::MatchLoadCombine`.
(See D27861)
This tries to recognize patterns like below (assuming a little-endian target):
```
s8* x = ...
s32 val = a[0] | (a[1] << 8) | (a[2] << 16) | (a[3] << 24)
->
s32 val = *((i32)a)
s8* x = ...
s32 val = a[3] | (a[2] << 8) | (a[1] << 16) | (a[0] << 24)
->
s32 val = BSWAP(*((s32)a))
```
(This patch also handles the big-endian target case as well, in which the first
example above has a BSWAP, and the second example above does not.)
To recognize the pattern, this searches from the last G_OR in the expression
tree.
E.g.
```
Reg Reg
\ /
OR_1 Reg
\ /
OR_2
\ Reg
.. /
Root
```
Each non-OR register in the tree is put in a list. Each register in the list is
then checked to see if it's an appropriate load + shift logic.
If every register is a load + potentially a shift, the combine checks if those
loads + shifts, when OR'd together, are equivalent to a wide load (possibly with
a BSWAP.)
To simplify things, this patch
(1) Only handles G_ZEXTLOADs (which appear to be the common case)
(2) Only works in a single MachineBasicBlock
(3) Only handles G_SHL as the bit twiddling to stick the small load into a
specific location
An IR example of this is here: https://godbolt.org/z/4sP9Pj (lifted from
test/CodeGen/AArch64/load-combine.ll)
At -Os on AArch64, this is a 0.5% code size improvement for CTMark/sqlite3,
and a 0.4% improvement for CTMark/7zip-benchmark.
Also fix a bug in `isPredecessor` which caused it to fail whenever `DefMI` was
the first instruction in the block.
Differential Revision: https://reviews.llvm.org/D94350
Returning int64_t was arbitrarily limiting for wide integer types, and
the functions should handle the full generality of the IR.
Also changes the full form which returns the originally defined
vreg. Add another wrapper for the common case of just immediately
converting to int64_t (arguably this would be useful for the full
return value case as well).
One possible issue with this change is some of the existing uses did
break without conversion to getConstantVRegSExtVal, and it's possible
some without adequate test coverage are now broken.
It's fairly common to need matchers for a specific constant value, or for
common idioms like finding a negated register.
Add
- `m_SpecificICst`, which returns true when matching a specific value..
- `m_ZeroInt`, which returns true when an integer 0 is matched.
- `m_Neg`, which returns when a register is negated.
Also update a few places which use idioms related to the new matchers.
Differential Revision: https://reviews.llvm.org/D91397
We can use KnownBitsAnalysis to cover cases when mask is not trivial. It can
also help with cases when mask is not constant but can still be folded into
one. Since 'and' is comutative we should treat both operands as possible
replacements.
Differential Revision: https://reviews.llvm.org/D90674
This sequence of instructions can be simplified if they are single use and
some operands are constants. Additional combines may be applied afterwards.
Differential Revision: https://reviews.llvm.org/D90223
Sequence of same shift instructions with constant operands can be combined into
a single shift instruction.
Differential Revision: https://reviews.llvm.org/D90217
https://reviews.llvm.org/D88060
This adds the following combines
1) build_vector formation from insert_vec_elts
2) insert_vec_elts (build_vector) -> build_vector
https://reviews.llvm.org/D88865
This adds a single combine for GlobalISel to fold:
ptradd (inttoptr C1) C2
Into:
C1 + C2
Additionally, a small test for AArch64 is added.
Patch by pnappa.
When the first operand is a null pointer we can avoid making a G_PTR_ADD and
make a G_INTTOPTR with the offset operand.
This helps us avoid making add with 0 later on for targets such as AMDGPU.
Differential Revision: https://reviews.llvm.org/D87140
When we see this:
```
%and = G_AND %x, %y
%xor = G_XOR %and, %y
```
Produce this:
```
%not = G_XOR %x, -1
%new_and = G_AND %not, %y
```
as long as we are guaranteed to eliminate the original G_AND.
Also matches all commuted forms. E.g.
```
%and = G_AND %y, %x
%xor = G_XOR %y, %and
```
will be matched as well.
Differential Revision: https://reviews.llvm.org/D88104
The shift amount type does not necessarily match the result type. This
was inserting a trunc from s32 to s32, which asserted. Just preserve
the original shift amount type which can be legalized later.
https://reviews.llvm.org/D86393
Patch adds five new `GICombinerRules`, one for each of the following unary
FP instrs: `G_FNEG`, `G_FABS`, `G_FPTRUNC`, `G_FSQRT`, and `G_FLOG2`. The
combine rules perform the FP operation on the constant operand and replace
the original instr with the result. Patch additionally adds new combiner
tests for the AArch64 target to test these new combiner rules.
test/CodeGen/AArch64/GlobalISel/combine-trunc.mir was failing
due to the different order for evaluating function arguments.
This patch updates the related code to fix the issue.
https://reviews.llvm.org/D87668
Patch adds two new GICombinerRules, one for G_MUL(X, 1) and another for G_MUL(X, -1).
G_MUL(X, 1) is an identity combine, and G_MUL(X, -1) gets replaced with G_SUB(0, X).
Patch additionally adds new combiner tests for the AArch64 target to test these
new combiner rules, as well as updates AMDGPU GISel tests.
Patch by mkitzan
Add a combiner helper that replaces G_UNMERGE where all the destination lanes
are dead except the first one with a G_TRUNC.
Differential Revision: https://reviews.llvm.org/D87174
Add a combiner helper that replaces G_UNMERGE of big constants into direct
use of smaller constants.
Differential Revision: https://reviews.llvm.org/D87166
https://reviews.llvm.org/D87554
Patch adds one new GICombinerRule for G_FABS. The combine rule folds G_FABS(G_FABS(X)) to G_FABS(X).
Patch additionally adds new combiner tests for the AArch64 target to test this new combiner rule.
Patch by mkitzan.
Add the matching and applying function to the combiner helper for
G_UNMERGE_VALUES(G_MERGE_VALUES).
This combine also supports any merge-like input nodes, like G_BUILD_VECTORS
and is robust against bitcasts in between int unmerge and merge nodes.
When the input type of the merge node and the output type of the unmerge
node are not the same, but the sizes are, the combine still applies but
creates bitcasts between the sources and the destinations instead of
reusing the destinations directly.
Long term, the artifact combiner should probably reuse that helper, but
as of today, it doesn't use any outside helper, so I kept it this way.
Differential Revision: https://reviews.llvm.org/D87117
This combine previously tried to take sequences like:
%cond = G_ICMP pred, a, b
G_BRCOND %cond, %truebb
G_BR %falsebb
%truebb:
...
%falsebb:
...
and by inverting the compare predicate and swapping branch targets, delete the
G_BR and instead have a single conditional branch to the falsebb. Since in an
earlier patch we have a combine to fold not(icmp) into just an inverted icmp,
we don't need this combine to do as much. This patch instead generalizes the
combine by just looking for:
G_BRCOND %cond, %truebb
G_BR %falsebb
%truebb:
...
%falsebb:
...
and then inverting the condition using a not (xor). The xor can be folded away
in a separate combine. This change also lets us avoid some optimization code
in the IRTranslator.
I also think that deleting G_BRs in the combiner is unnecessary. That's
something that targets can decide to do at selection time and could simplify
generic code in future.
Differential Revision: https://reviews.llvm.org/D86664
The post-index matcher, before it queries the target legality, walks uses
of some instructions which in pathological cases can be massive. Since
no targets actually support indexed loads yet, disable this to stop wasting
compile time on something which is going to fail anyway.
This is needed for an upcoming change to how we translate conditional branches
which might generate these.
Differential Revision: https://reviews.llvm.org/D86383
https://reviews.llvm.org/D83833
Patch adds two new GICombinerRules for G_SELECT. The rules include:
combining selects with undef comparisons into their first selectee value,
and to combine away selects with constant comparisons. Patch additionally
adds a new combiner test for the AArch64 target to test these new G_SELECT
combiner rules and the existing select_same_val combiner rule.
Patch by mkitzan
AArch64, X86 and Mips currently directly consumes these and custom
lowering to produce a libcall, but really these should follow the
normal legalization process through the libcall/lower action.
This produces less work for addressing mode matching. I think this is
safe since I don't think machine IR is supposed to give the same
aliasing properties as getelementptr in the IR.
shl ([sza]ext x, y) => zext (shl x, y).
Turns expensive 64 bit shifts into 32 bit if it does not overflow the
source type:
This is a port of an AMDGPU DAG combine added in
5fa289f0d8ff85b9e14d2f814a90761378ab54ae. InstCombine does this
already, but we need to do it again here to apply it to shifts
introduced for lowered getelementptrs. This will help matching
addressing modes that use 32-bit offsets in a future patch.
TableGen annoyingly assumes only a single match data operand, so
introduce a reusable struct. However, this still requires defining a
separate GIMatchData for every combine which is still annoying.
Adds a morally equivalent function to the existing
getShiftAmountTy. Without this, we would have to do try to repeatedly
query the legalizer info and guess at what type to use for the shift.
If we have a mask, and a value x, where (x & mask) == x, we can drop the AND
and just use x.
This is about a 0.4% geomean code size improvement on CTMark at -O3 for AArch64.
In AArch64, this is most useful post-legalization. Patterns like this often
show up when legalizing s1s, which must be extended to larger types.
e.g.
```
%cmp:_(s32) = G_ICMP ...
%and:_(s32) = G_AND %cmp, 1
```
Since G_ICMP only produces a single bit, there's no reason to mask it with the
G_AND.
Differential Revision: https://reviews.llvm.org/D85463