2591 Commits

Author SHA1 Message Date
Paul Walker
4197386dbd
[LLVM][SelectionDAG] Remove scalable vector restriction from poison analysis. (#102504)
The following functions have an early exit for scalable vectors:
  SelectionDAG::canCreateUndefOrPoison
  SelectionDAG:isGuaranteedNotToBeUndefOrPoison
    
The implementations of these don't look to be sensitive to the
vector type other than some uses of demanded elts analysis that
doesn't fully support scalable types.  That said the initial
calculation demands all elements and so I've followed the same
scheme as used by TargetLowering::SimplifyDemandedBits.
2024-08-13 12:53:20 +01:00
Simon Pilgrim
13d04fa560 [DAG] Add legalization handling for ABDS/ABDU (#92576) (REAPPLIED)
Always match ABD patterns pre-legalization, and use TargetLowering::expandABD to expand again during legalization.

abdu(lhs, rhs) -> sub(xor(sub(lhs, rhs), usub_overflow(lhs, rhs)), usub_overflow(lhs, rhs))
Alive2: https://alive2.llvm.org/ce/z/dVdMyv

REAPPLIED: Fix regression issue with "abs(ext(x) - ext(y)) -> zext(abd(x, y))" fold failing after type legalization
2024-08-08 11:39:05 +01:00
Simon Pilgrim
e4e96b3e26 Revert b1234ddbe2652aa7948242a57107ca7ab12fd2f8. "[DAG] Add legalization handling for ABDS/ABDU (#92576)"
Reverting #92576 while we identify a reported regression
2024-08-07 17:11:25 +01:00
Simon Pilgrim
b1234ddbe2
[DAG] Add legalization handling for ABDS/ABDU (#92576)
Always match ABD patterns pre-legalization, and use TargetLowering::expandABD to expand again during legalization.

abdu(lhs, rhs) -> sub(xor(sub(lhs, rhs), usub_overflow(lhs, rhs)), usub_overflow(lhs, rhs))
Alive2: https://alive2.llvm.org/ce/z/dVdMyv
2024-08-06 10:18:06 +01:00
Simon Pilgrim
1b85935453 Fix MSVC "not all control paths return a value" warning. NFC. 2024-08-06 09:38:09 +01:00
Luke Lau
33fc322696
[SelectionDAG] Simplify vselect true, T, F -> T (#100992)
This addresses a TODO where we can fold a vselect to it's true operand
if the boolean is known to be all trues, by factoring out the logic from
extractBooleanFlip which checks TLI.getBooleanContents.
2024-08-06 10:49:20 +08:00
Kazu Hirata
8d1b17b662
[CodeGen] Construct SmallVector with ArrayRef (NFC) (#101841) 2024-08-04 00:41:29 -07:00
Matt Arsenault
9843843c88
SelectionDAG: Do not propagate divergence through copy glue (#101210)
This fixes DAG divergence mishandling inline asm.

This was considering the glue nodes for divergence, when the
divergence should only come from the individual CopyFromRegs
that are glued. As a result, having any VGPR CopyFromRegs would
taint any uniform SGPR copies as divergent, resulting in SGPR
copies to VGPR virtual registers later.
2024-07-31 00:04:58 +04:00
Manish Kausik H
f6d1d6fe7b
[SelectionDAG] Use unaligned store to legalize EXTRACT_VECTOR_ELT type when Stack is non-realignable (#98176)
This patch ports the commit a6614ec5b7c1dbfc4b847884c5de780cf75e8e9c to
SelectionDAG TypeLegalization.

Fixes #98044

Co-authored-by: Manish Kausik H <hmamishkausik@gmail.com>
2024-07-29 15:23:36 +08:00
Vitaly Buka
455990d18f
Reland "SelectionDAG: Avoid using MachineFunction::getMMI" (#99779)
Reverts llvm/llvm-project#99777

Co-authored-by: Matt Arsenault <Matthew.Arsenault@amd.com>
2024-07-24 10:38:53 +04:00
Simon Pilgrim
5bd38a98d7
[DAG] ComputeNumSignBits - subo_carry(x,x,c) -> bitwidth 'allsignbits' (#99935)
Handle cases where the subo_carry is subtracting the same operand (=zero) - so only the subtraction of the 0/1 carry bit is affecting the result, giving a 0/-1 allsignbits value.

Noticed while improving ABDS/ABDU expansion.
2024-07-23 11:49:12 +01:00
Vitaly Buka
98c0e55d9d
Revert "SelectionDAG: Avoid using MachineFunction::getMMI" (#99777)
Reverts llvm/llvm-project#99696

https://lab.llvm.org/buildbot/#/builders/164/builds/1262
2024-07-20 12:20:50 -07:00
Joseph Huber
615b7eeaa9 Reapply "[LLVM][LTO] Factor out RTLib calls and allow them to be dropped (#98512)"
This reverts commit 740161a9b98c9920dedf1852b5f1c94d0a683af5.

I moved the `ISD` dependencies into the CodeGen portion of the handling,
it's a little awkward but it's the easiest solution I can think of for
now.
2024-07-20 09:29:31 -05:00
Matt Arsenault
c2019a37bd
SelectionDAG: Avoid using MachineFunction::getMMI (#99696) 2024-07-20 10:53:41 +04:00
NAKAMURA Takumi
740161a9b9 Revert "[LLVM][LTO] Factor out RTLib calls and allow them to be dropped (#98512)"
This reverts commit c05126bdfc3b02daa37d11056fa43db1a6cdef69.
(llvmorg-19-init-17714-gc05126bdfc3b)
See #99610
2024-07-20 12:36:57 +09:00
Jie Fu
544c390aac [CodeGen] Fix -Wunused-variable in SelectionDAG.cpp (NFC)
/llvm-project/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp:7560:9:
error: unused variable 'VecVT' [-Werror,-Wunused-variable]
    EVT VecVT = N1.getValueType();
        ^
1 error generated.
2024-07-17 21:20:55 +08:00
Lawrence Benson
177ce1900f
[LLVM] Add llvm.experimental.vector.compress intrinsic (#92289)
This PR adds a new vector intrinsic `@llvm.experimental.vector.compress`
to "compress" data within a vector based on a selection mask, i.e., it
moves all selected values (i.e., where `mask[i] == 1`) to consecutive
lanes in the result vector. A `passthru` vector can be provided, from
which remaining lanes are filled.

The main reason for this is that the existing
`@llvm.masked.compressstore` has very strong constraints in that it can
only write values that were selected, resulting in guard branches for
all targets except AVX-512 (and even there the AMD implementation is
_very_ slow). More instruction sets support "compress" logic, but only
within registers. So to store the values, an additional store is needed.
But this combination is likely significantly faster on many target as it
avoids branches.

In follow up PRs, my plan is to add target-specific lowerings for x86,
SVE, and possibly RISCV. I also want to combine this with a store
instruction, as this is probably a common case and we can avoid some
memory writes in that case.

See [discussion in
forum](https://discourse.llvm.org/t/new-intrinsic-for-masked-vector-compress-without-store/78663)
for initial discussion on the design.
2024-07-17 14:24:24 +02:00
Amara Emerson
f270a4dd66
[AArch64] Don't tail call memset if it would convert to a bzero. (#98969)
Well, not quite that simple. We can tc memset since it returns the first
argument but bzero doesn't do that and therefore we can end up
miscompiling.

This patch also refactors the logic out of isInTailCallPosition() into the callers.
As a result memcpy and memmove are also modified to do the same thing
for consistency.

rdar://131419786
2024-07-17 01:31:52 -07:00
Joseph Huber
c05126bdfc
[LLVM][LTO] Factor out RTLib calls and allow them to be dropped (#98512)
Summary:
The LTO pass and LLD linker have logic in them that forces extraction
and prevent internalization of needed runtime calls. However, these
currently take all RTLibcalls into account, even if the target does not
support them. The target opts-out of a libcall if it sets its name to
nullptr. This patch pulls this logic out into a class in the header so
that LTO / lld can use it to determine if a symbol actually needs to be
kept.

This is important for targets like AMDGPU that want to be able to use
`lld` to perform the final link step, but does not want the overhead of
uncalled functions. (This adds like a second to the link time trivially)
2024-07-16 06:22:09 -05:00
Volodymyr Vasylkun
afb584a562
[SelectionDAG] Ensure that we don't create UCMP/SCMP nodes with operands being scalars and result being a 1-element vector during scalarization (#98687)
This patch fixes a problem that existed before where in some situations
a `UCMP`/`SCMP` node which operated on 1-element vectors had a legal
result type (i.e. `v1i64` on AArch64), but illegal operands (i.e.
`v1i65`). This meant that operand scalarization was performed on the
node and the operands were changed to a legal scalar type, but the
result wasn't. This then led to `UCMP`/`SCMP` nodes with different
vector-ness of operands and result appearing in the SDAG. This patch
addresses this issue by fully scalarizing the `UCMP`/`SCMP` node and
then turning its result back into a 1-element vector using a
`SCALAR_TO_VECTOR` node.

It also adds several assertions to `SelectionDAG::getNode()` to avoid
this or a similar issue arising in the future. I wasn't sure if these
two changes are unrelated enough to warrant two small separate PRs, but
I'm happy to split this PR into two if that's deemed more appropriate.
2024-07-12 21:41:49 +01:00
Farzon Lotfi
0b58f34c98
[X86][CodeGen] Add base trig intrinsic lowerings (#96222)
This change is an implementation of
https://github.com/llvm/llvm-project/issues/87367's investigation on
supporting IEEE math operations as intrinsics.
Which was discussed in this RFC:
https://discourse.llvm.org/t/rfc-all-the-math-intrinsics/78294

This change adds constraint intrinsics and some lowering cases for
`acos`, `asin`, `atan`, `cosh`, `sinh`, and `tanh`.
The only x86 specific change was for f80.

https://github.com/llvm/llvm-project/issues/70079
https://github.com/llvm/llvm-project/issues/70080
https://github.com/llvm/llvm-project/issues/70081
https://github.com/llvm/llvm-project/issues/70083
https://github.com/llvm/llvm-project/issues/70084
https://github.com/llvm/llvm-project/issues/95966
    
The x86 lowering is going to be done in three pr changes with this being
the first.
A second PR will be put up for Loop Vectorizing and then SLPVectorizer.

The constraint intrinsics is also going to be in multiple parts, but
just 2.
This part covers just the llvm specific changes, part2 will cover clang
specifc changes and legalization for backends than have special
legalization
 requirements like aarch64 and wasm.
2024-07-11 15:58:43 -04:00
Luke Lau
baf22a527c [SelectionDAG] Handle vscale range wrapping in isKnownNeverZero
As pointed out by @preames, ConstantRange can wrap so it's possible
for zero to be in a range without zero being the minimum. This fixes
this by checking contains instead.
2024-07-09 23:05:22 +08:00
Bjorn Pettersson
c2fbc701aa [SelectionDAG] Let ComputeKnownSignBits handle (shl (ext X), C) (#97695)
Add simple support for looking through ZEXT/ANYEXT/SEXT when doing
ComputeKnownSignBits for SHL. This is valid for the case when all
extended bits are shifted out, because then the number of sign bits
can be found by analysing the EXT operand.

A future improvement could be to pass along the "shifted left by"
information in the recursive calls to ComputeKnownSignBits. Allowing
us to handle this more generically.
2024-07-05 22:37:26 +02:00
Luke Lau
e4b28420f6
[SelectionDAG] Handle VSCALE in isKnownNeverZero (#97789)
VSCALE is by definition greater than zero, but this checks it via
getVScaleRange anyway.

The motivation for this is to be able to check if the EVL for a VP
strided load is non-zero in #97394.

I added the tests to the RISC-V backend since the existing X86
known-never-zero.ll test crashed when trying to lower vscale for the
+sse2 RUN line.
2024-07-05 16:11:06 +08:00
Craig Topper
8419da8bd4
[SelectionDAG] Remove LegalTypes argument from getShiftAmountConstant. (#97653)
#97645 proposed to remove LegalTypes from getShiftAmountTy. This patches
removes it from getShiftAmountConstant which is one of the callers of
getShiftAmountTy.
2024-07-04 18:33:25 -07:00
Craig Topper
3141c11fe8
[SelectionDAG] Remove LegalTypes argument from getShiftAmountTy. NFC (#97757)
This argument is no longer used inside the function. Remove it from the
interface.
2024-07-04 15:24:54 -07:00
Yingwei Zheng
d5c9ffd545
[SDAG] Intersect poison-generating flags after CSE (#97434)
This patch fixes a miscompilation when `N` gets CSEed to `Existing`:
```
Existing: t5: i32 = sub nuw Constant:i32<0>, t3
N: t30: i32 = sub Constant:i32<0>, t3
```

Fixes https://github.com/llvm/llvm-project/issues/96366.
2024-07-03 20:32:46 +08:00
Youngsuk Kim
a95c85fba5
[llvm][CodeGen] Avoid 'raw_string_ostream::str' (NFC) (#97318)
Since `raw_string_ostream` doesn't own the string buffer, it is
desirable (in terms of memory safety) for users to directly reference
the string buffer rather than use `raw_string_ostream::str()`.

Work towards TODO comment to remove `raw_string_ostream::str()`.
2024-07-01 21:52:37 -04:00
Shilei Tian
9a4f57ec1e
[SelectionDAG] Use EVT::getIntegerVT in getBitcastedAnyExtOrTrunc (#96658)
`SelectionDAG::getBitcastedAnyExtOrTrunc` assumes that there is always a
valid
integer type corresponding to another type, which is not always true
when it
comes to vector type. For example, `<3 x i8>` doesn't have a
corresponding
integer type.

Fix SWDEV-464698.
2024-07-01 15:10:57 -04:00
Nikita Popov
f2f18459d4 Revert "Intrinsic: introduce minimumnum and maximumnum (#93841)"
As far as I can tell, this pull request was not approved, and
did not go through an RFC on discourse.

This reverts commit 89881480030f48f83af668175b70a9798edca2fb.
This reverts commit 225d8fc8eb24fb797154c1ef6dcbe5ba033142da.
2024-06-21 08:34:04 +02:00
YunQiang Su
8988148003
Intrinsic: introduce minimumnum and maximumnum (#93841)
Currently, on different platform, the behaivor of llvm.minnum is
different if one operand is sNaN:

When we compare sNaN vs NUM:

ARM/AArch64/PowerPC: follow the IEEE754-2008's minNUM: return qNaN.
RISC-V/Hexagon follow the IEEE754-2019's minimumNumber: return NUM. X86:
Returns NUM but not same with IEEE754-2019's minimumNumber as
     +0.0 is not always greater than -0.0.
MIPS/LoongArch/Generic: return NUM.
LIBCALL: returns qNaN.

So, let's introduce llvm.minmumnum/llvm.maximumnum, which always follow
IEEE754-2019's minimumNumber/maximumNumber.

Half-fix: #93033
2024-06-21 11:53:08 +08:00
Simon Pilgrim
1216e7045f [DAG] getNode - add value type assertions for AVG nodes. 2024-06-12 17:00:33 +01:00
Simon Pilgrim
346f16d504 [DAG] Move isNullConstantOrUndef helper to SelectionDAGNodes.h to allow other future uses. NFC. 2024-06-12 16:32:36 +01:00
Simon Pilgrim
53fecef1ec
[DAG] FoldConstantArithmetic - allow binop folding to work with differing bitcasted constants (#94863)
We currently only constant fold binop(bitcast(c1),bitcast(c2)) if c1 and c2 are both bitcasted and from the same type.

This patch relaxes this assumption to allow the constant build vector to originate from different types (and allow cases where only one operand was bitcasted).

We still ensure we bitcast back to one of the original types if both operand were bitcasted (we assume that if we have a non-bitcasted constant then its legal to keep using that type).
2024-06-09 11:30:05 +01:00
c8ef
b25b1db819
[KnownBits] Remove hasConflict() assertions (#94568)
Allow KnownBits to represent "always poison" values via conflict.

close: #94436
2024-06-07 17:01:22 +02:00
Matt Arsenault
84b026690d
DAG: Pass flags to FoldConstantArithmetic (#93663)
There is simply way too much going on inside getNode. The complicated
constant folding of vector handling works by looking for build_vector
operands, and then tries to getNode the scalar element and then checks
if
constants were the result. As a side effect, this produces unused scalar
operation nodes (previously, without flags). If the vector operation
were later scalarized, it would find the flagless constant folding
temporary and lose the flag. I don't think this is a reasonable way for
constant folding to operate, but for now fix this by ensuring flags
on the original operation are preserved in the temporary.
    
This yields a clear code improvement for AMDGPU when f16 isn't legal.
The Wasm cases switch from using a libcall to compare and select. We are
evidently
missing the fcmp+select to fminimum/fmaximum handling, but this would be
further
improved when that's handled. AArch64 also avoids the libcall, but looks
worse and
has a different call for some reason.
2024-06-06 16:44:07 +02:00
Farzon Lotfi
1d87433593
[x86] Add tan intrinsic part 4 (#90503)
This change is an implementation of #87367's investigation on supporting
IEEE math operations as intrinsics.
Which was discussed in this RFC:
https://discourse.llvm.org/t/rfc-all-the-math-intrinsics/78294


Much of this change was following how G_FSIN and G_FCOS were used.

Changes:
- `llvm/docs/GlobalISel/GenericOpcode.rst` - Document the `G_FTAN`
opcode
-  `llvm/docs/LangRef.rst` - Document the tan intrinsic
- `llvm/include/llvm/Analysis/VecFuncs.def` - Associate the tan
intrinsic as a vector function similar to the tanf libcall.
- `llvm/include/llvm/CodeGen/BasicTTIImpl.h` - Map the tan intrinsic to
`ISD::FTAN`
- `llvm/include/llvm/CodeGen/ISDOpcodes.h` - Define ISD opcodes for
`FTAN` and `STRICT_FTAN`
-  `llvm/include/llvm/IR/Intrinsics.td` - Create the tan intrinsic
- `llvm/include/llvm/IR/RuntimeLibcalls.def` - Define tan libcall
mappings
- `llvm/include/llvm/Target/GenericOpcodes.td` - Define the `G_FTAN`
Opcode
- `llvm/include/llvm/Support/TargetOpcodes.def` - Create a `G_FTAN`
Opcode handler
- `llvm/include/llvm/Target/GlobalISel/SelectionDAGCompat.td` - Map
`G_FTAN` to `ftan`
- `llvm/include/llvm/Target/TargetSelectionDAG.td` - Define `ftan`,
`strict_ftan`, and `any_ftan` and map them to the ISD opcodes for `FTAN`
and `STRICT_FTAN`
- `llvm/lib/Analysis/VectorUtils.cpp` - Associate the tan intrinsic as a
vector intrinsic
- `llvm/lib/CodeGen/GlobalISel/IRTranslator.cpp` Map the tan intrinsic
to `G_FTAN` Opcode
- `llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp` - Add `G_FTAN` to
the list of floating point math operations also associate `G_FTAN` with
the `TAN_F` runtime lib.
- `llvm/lib/CodeGen/GlobalISel/Utils.cpp` - More floating point math
operation common behaviors.
- llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp - List the function
expansion operations for `FTAN` and `STRICT_FTAN`. Also define both
opcodes in `PromoteNode`.
- `llvm/lib/CodeGen/SelectionDAG/LegalizeFloatTypes.cpp` - More `FTAN`
and `STRICT_FTAN` handling in the legalizer
- `llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h` - Define
`SoftenFloatRes_FTAN` and `ExpandFloatRes_FTAN`.
- `llvm/lib/CodeGen/SelectionDAG/LegalizeVectorOps.cpp` - Define `FTAN`
as a legal vector operation.
- `llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp` - Define
`FTAN` as a legal vector operation.
- `llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp` - define tan as an
intrinsic that doesn't return NaN.
- `llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp` Map
`LibFunc_tan`, `LibFunc_tanf`, and `LibFunc_tanl` to `ISD::FTAN`. Map
`Intrinsic::tan` to `ISD::FTAN` and add selection dag handling for
`Intrinsic::tan`.
- `llvm/lib/CodeGen/SelectionDAG/SelectionDAGDumper.cpp` - Define `ftan`
and `strict_ftan` names for the equivalent ISD opcodes.
- `llvm/lib/CodeGen/TargetLoweringBase.cpp` -Define a Tan128 libcall and
ISD::FTAN as a target lowering action.
- `llvm/lib/Target/X86/X86ISelLowering.cpp` - Add x86_64 lowering for
tan intrinsic

resolves https://github.com/llvm/llvm-project/issues/70082
2024-06-05 15:01:33 -04:00
Simon Pilgrim
54b20cbb95
[DAG] computeKnownBits - abds(x, y) will be zero in the upper bits if x and y are sign-extended (#94448)
As reported on #94442 - if x and y have more than one signbit, then the upper bits of its absolute value are guaranteed to be zero

Sibling PR to #94382

Alive2: https://alive2.llvm.org/ce/z/7_z2Vc

Fixes #94442
2024-06-05 11:57:55 +01:00
Simon Pilgrim
e635520be8
[DAG] computeKnownBits - abs(x) will be zero in the upper bits if x is sign-extended (#94382)
As reported on https://github.com/llvm/llvm-project/issues/94344 - if x has more than one signbit, then the upper bits of its absolute value are guaranteed to be zero

Alive2: https://alive2.llvm.org/ce/z/a87fHU

Fixes #94344
2024-06-05 10:58:54 +01:00
Simon Pilgrim
c9a86fa9a6 [DAG] canCreateUndefOrPoison - fix missing argument typo
We were missing the PoisonOnly argument (so Depth + 1 was being used instead and the default Depth = 0 argument then being silently used)

Fixes #94145 and serves as the test case for 9e22c7a0ea87228dffcdfd7ab62724f72e0b3e30
2024-06-02 10:34:48 +01:00
Simon Pilgrim
9e22c7a0ea [DAG] canCreateUndefOrPoison - only compute shift amount knownbits when not poison
Since #93182 we can now call computeKnownBits inside getValidMaximumShiftAmount to determine the bounds of the shift amount ensuring that it wasn't poison, meaning if we did freeze the ahift amount, isGuaranteedNotToBeUndefOrPoison would then fail as we can't call computeKnownBits through FREEZE for potentially poison values.

I'm still reducing a decent test case but wanted to get the buildbot fix ASAP.
2024-06-01 19:05:27 +01:00
Simon Pilgrim
2b1dfd2b35
[DAG] Replace getValid*ShiftAmountConstant helpers with getValid*ShiftAmount helpers to support KnownBits analysis (#93182)
The getValidShiftAmountConstant/getValidMinimumShiftAmountConstant/getValidMaximumShiftAmountConstant helpers only worked with constant shift amounts, which could be problematic after type legalization (e.g. v2i64 might be partially scalarized or split into v4i32 on some targets such as 32-bit x86, Thumb2 MVE).

This patch proposes we generalize these helpers to work with ConstantRange+KnownBits if a scalar/buildvector constant isn't available.

Most restrictions are the same - the helper fails if any shift amount is out of bounds, getValidShiftConstant must be a specific constant uniform etc.

However, getValidMinimumShiftAmount/getValidMaximumShiftAmount now can return bounds values that aren't values in the actual data, as they are based off the common KnownBits of every vector element.

This addresses feedback on #92096
2024-06-01 16:48:26 +01:00
Simon Pilgrim
f78febf7a8
[DAG] ComputeNumSignBits - add AVGCEILS/AVGFLOORS handling (#93021)
Pulled from #92096
2024-05-22 14:29:49 +01:00
Craig Topper
110f6a740b
[SelectionDAG] Add getVPZeroExtendInReg. NFC (#92792)
Use it for 2 places in LegalizeIntegerTypes that created a VP_AND.
2024-05-20 15:29:35 -07:00
Nhat Nguyen
eab92cb7f3
[llvm] Add KnownBits implementations for avgFloor and avgCeil (#86445)
This PR is to address the issue #84640
2024-05-19 10:57:11 -04:00
Simon Pilgrim
e0217ee782 [DAG] canCreateUndefOrPoison - only compute extract/index vector elt index knownbits when not poison
We were calling computeKnownBits to determine the bounds of the element index without ensuring that it wasn't poison, meaning if we did freeze the index, isGuaranteedNotToBeUndefOrPoison would then fail as we can't call computeKnownBits through FREEZE for potentially poison values.

Fixes #92569
2024-05-19 11:06:09 +01:00
Simon Pilgrim
689bba1eec [DAG] canCreateUndefOrPoison - merge INSERT_VECTOR_ELT/EXTRACT_VECTOR_ELT cases. NFC.
The only difference is the operand index for the element index variable.
2024-05-19 10:25:01 +01:00
Graham Hunter
fbb37e9606
[AArch64] Add an all-in-one histogram intrinsic
Based on discussion from
https://discourse.llvm.org/t/rfc-vectorization-support-for-histogram-count-operations/74788

Current interface is:

llvm.experimental.histogram(<vecty> ptrs, <intty> inc_amount, <vecty> mask)

The integer type used by 'inc_amount' needs to match the type of the buckets in memory.

The intrinsic covers the following operations:
  * Gather load
  * histogram on the elements of 'ptrs'
  * multiply the histogram results by 'inc_amount'
  * add the result of the multiply to the values loaded by the gather
  * scatter store the results of the add

Supports lowering to histcnt instructions for AArch64 targets, and scalarization for all others at present.
2024-05-13 11:35:28 +01:00
Min-Yih Hsu
f8063ffe73
[VP][RISCV] Add vp.reduce.fmaximum/fminimum and its RISC-V codegen (#91782)
`vp.reduce.fmaximum/fminimum` are the VP version of
`vector.reduce.fmaximum/fminimum`.
2024-05-10 16:01:47 -07:00
David Green
8fc9e3d577
[DAG] Lower frem of power-2 using div/trunc/mul+sub (#91148)
If we are lowering a frem and the divisor is known to be an integer power-2, we
can use the formula 'frem = x - trunc(x / d) * d'. This avoids the more
expensive call to fmod. The results are identical as fmod so long as d is a
power-2 (so the mul does not round incorrectly), and the sign of the return is
either always positive or not important for zeroes (nsz).

Unfortunately Alive2 does not handle this well at the moment. I was using
exhaustive checking to test this:
(https://gist.github.com/davemgreen/6078015f30d3bacd1e9572f8db5d4b64).

I found this in cpythons implementation of float_pow. I currently added it as a
DAG combine for frem with power-2 fp constants.
2024-05-10 14:58:48 +01:00