- Add tests for computeOverflowFor*Sub functions
- extend the computeOverflowForSignedSub/computeOverflowForUnsignedSub
implementations with ConstantRange (#37109)
This is part of the work to address the D155472 regressions, there's a number of issues with generalizing this fold which is why I'm just adding SRL support atm.
Differential Revision: https://reviews.llvm.org/D159533
Noticed on D159533 and I've finally dealt with the x86 regressions - MatchingStackOffset wasn't peeking through AssertZext nodes while trying to find CopyFromReg/Load sources, it was only removing them if they were part of a (trunc (assertzext x)) pattern.
Reapplied after being reverted at 4389252c58b783ce5b - which should be addressed by D159537 / 6d2679992e58b
Noticed on D159533 and I've finally deal with the x86 regressions - MatchingStackOffset wasn't peeking through AssertZext nodes while trying to find CopyFromReg/Load sources, it was only removing them if they were part of a (trunc (assertzext x)) pattern.
Do so by extending `matchUnaryPredicate` to also work for
`ConstantFPSDNode` types then encapsulate the constant checks in a
lambda and pass it to `matchUnaryPredicate`.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D154868
We'll need to generalize this fold to check for any zero upperbits to address some of the D155472 regressions, but this exposes a number of issues. For now, just use the general MaskedValueIsZero test instead of the assertzext.
This will make it easy for callers to see issues with and fix up calls
to createTargetMachine after a future change to the params of
TargetMachine.
This matches other nearby enums.
For downstream users, this should be a fairly straightforward
replacement,
e.g. s/CodeGenOpt::Aggressive/CodeGenOptLevel::Aggressive
or s/CGFT_/CodeGenFileType::
Followup to D59363 which failed to handle the icmp(X,undef) -> isTrueWhenEqual case - similar to llvm::ConstantFoldCompareInstruction
As discussed on the review, this is affecting some previously reduced test cases, but will also prevent reductions from relying on this inconsistent behaviour in the future.
Reapplied after reversion at e1e3c75c7dad72 with a tweak to the pseudo-probe-peep.ll test
Differential Revision: https://reviews.llvm.org/D158068
Followup to D59363 which failed to handle the icmp(X,undef) -> isTrueWhenEqual case - similar to llvm::ConstantFoldCompareInstruction
As discussed on the review, this is affecting some previously reduced test cases, but will also prevent reductions from relying on this inconsistent behaviour in the future.
Differential Revision: https://reviews.llvm.org/D158068
On PPC there are instructions to store element from vector(e.g.
stxsdx/stxsiwx), and these instructions can be leveraged to avoid tail
constant in memset and constant splat array initialization.
This patch tries to explore these opportunities.
Reviewed By: shchenz
Differential Revision: https://reviews.llvm.org/D138883
We currently have log, log2, log10, exp and exp2 intrinsics. Add exp10
to fix this asymmetry. AMDGPU already has most of the code for f32
exp10 expansion implemented alongside exp, so the current
implementation is duplicating nearly identical effort between the
compiler and library which is inconvenient.
https://reviews.llvm.org/D157871
The CodeView `S_ARMSWITCHTABLE` debug symbol is used to describe the layout of a jump table, it contains the following information:
* The address of the branch instruction that uses the jump table.
* The address of the jump table.
* The "base" address that the values in the jump table are relative to.
* The type of each entry (absolute pointer, a relative integer, a relative integer that is shifted).
Together this information can be used by debuggers and binary analysis tools to understand what an jump table indirect branch is doing and where it might jump to.
Documentation for the symbol can be found in the Microsoft PDB library dumper: 0fe89a942f/cvdump/dumpsym7.cpp (L5518)
This change adds support to LLVM to emit the `S_ARMSWITCHTABLE` debug symbol as well as to dump it out (for testing purposes).
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D149367
That matches how such a SPLAT_VECTOR would have been type legalized
so assume it is ok to use for creating constants after type legalization.
Still need some improvements to SPLAT_VECTOR lowering.
This overlaps with some of what D158742 was trying to fix.
Reviewed By: luke
Differential Revision: https://reviews.llvm.org/D158870
We can work out the known bits for a given lane by concatenating the known bits of each scalar operand.
In the description of ISD::SPLAT_VECTOR_PARTS in ISDOpcodes.h it says that the
total size of the scalar operands must cover the output element size, but I've
added a stricter assertion here that the total width of the scalar operands
must be exactly equal to the element size. It doesn't seem to trigger, and I'm
not sure if there any targets that use SPLAT_VECTOR_PARTS for anything other
than v4i32 -> v2i64 splats.
We also need to include it in isTargetCanonicalConstantNode, otherwise
returning the known bits introduces an infinite combine loop.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D158852
This reverts commit 8d0c3db388143f4e058b5f513a70fd5d089d51c3.
Causes crashes, see comments in https://reviews.llvm.org/D149367.
Some follow-up fixes are also reverted:
This reverts commit 636269f4fca44693bfd787b0a37bb0328ffcc085.
This reverts commit 5966079cf4d4de0285004eef051784d0d9f7a3a6.
This reverts commit e7294dbc85d24a08c716d9babbe7f68390cf219b.
The CodeView `S_ARMSWITCHTABLE` debug symbol is used to describe the layout of a jump table, it contains the following information:
* The address of the branch instruction that uses the jump table.
* The address of the jump table.
* The "base" address that the values in the jump table are relative to.
* The type of each entry (absolute pointer, a relative integer, a relative integer that is shifted).
Together this information can be used by debuggers and binary analysis tools to understand what an jump table indirect branch is doing and where it might jump to.
Documentation for the symbol can be found in the Microsoft PDB library dumper: 0fe89a942f/cvdump/dumpsym7.cpp (L5518)
This change adds support to LLVM to emit the `S_ARMSWITCHTABLE` debug symbol as well as to dump it out (for testing purposes).
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D149367
Simple function which scalarizes Ops then ExtOrTruncs them according to function parameters
Differential Revision: https://reviews.llvm.org/D157733
Change-Id: Ie5215069228f7bf530cd2dbb4bd17cbf409e046a
1) Handle casts a bit more cleanly just with a loop rather than with
recursion.
2) Add additional cases for smin/smax
3 ) For shifts we can also deduce non-zero if the maximum shift amount
on the known 1s is non-zero.
Differential Revision: https://reviews.llvm.org/D156777
This helps simplify constant splats a little. Without this the code in
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp#L14072 always returns the
existing node.
Differential Revision: https://reviews.llvm.org/D157259
If the pre-truncated value was the same width as the extension, and the assertzext guarantees that the extended bits are already zero, then skip the zext/trunc 'zero_extend_inreg' pattern.
Addresses several regressions noticed in D155472
Use the maximum 64 for BitWidth of getVScaleRange to avoid returning an empty range.
the previous changes bring in a Buildbot failure because MinSVEVectorSize = MinSVEVectorSize.
error: explicitly assigning value of variable of type 'unsigned int' to itself [-Werror,-Wself-assign]
Reviewed By: sdesmalen, nikic, dmgreen
Differential Revision: https://reviews.llvm.org/D155708
Use the maximum 64 for BitWidth of getVScaleRange to
avoid returning an empty range.
Reviewed By: sdesmalen, nikic, dmgreen
Differential Revision: https://reviews.llvm.org/D155708
This mostly copies cases that already exist in ValueTracking, although
it skips the more complex ones. Those can be filled in as needed.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D149199
Add an intrinsic which returns the two pieces as multiple return
values. Alternatively could introduce a pair of intrinsics to
separately return the fractional and exponent parts.
AMDGPU has native instructions to return the two halves, but could use
some generic legalization and optimization handling. For example, we
should be able to handle legalization of f16 on older targets, and for
bf16. Additionally antique targets need a hardware workaround which
would be better handled in the backend rather than in library code
where it is now.
In NVPTX `ReplaceVectorLoad()`, i1 and i8 types are promoted to i16,
followed by a truncate operation. Thus, v2i8 (or v2i1) and v2i16 will
have the same VTList, which causes a collision in CSEMap.
To differentiate the original VTList, let's add the size in generating
an ID. Otherwise the compiler crashes in refineAlignment:
`MMO->getSize() == getSize() && "Size mismatch!"`
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D153712
I have fixed an existing DAGCombiner bug that caused the previous assertion failure.
See 7163539466d7e8930416e55dd9fd29891f8239f2.
Original message
We don't have VP_ANY_EXTEND or VP_SIGN_EXTEND_INREG yet so I've
deviated a little from the non-VP lowering.
My goal was to fix the crashes that occurs on these test cases without this patch.
Reviewed By: fakepaper56
Differential Revision: https://reviews.llvm.org/D152854
We don't have VP_ANY_EXTEND or VP_SIGN_EXTEND_INREG yet so I've
deviated a little from the non-VP lowering.
My goal was to fix the crashes that occurs on these test cases without this patch.
Reviewed By: fakepaper56
Differential Revision: https://reviews.llvm.org/D152854
Type legalization may need to promote the result to the same type
as the input. Instead of forming a vp_truncate with the same
source and dest type, don't create any vp_truncate.
Handling in getNode like is done for ISD::TRUNCATE.
This patch introduces the reduction intrinsic for floating point minimum
and maximum which has the same semantics (for NaN and signed zero) as
llvm.minimum and llvm.maximum.
Reviewed-By: nikic
Differential Revision: https://reviews.llvm.org/D152370