422 Commits

Author SHA1 Message Date
Alex Richardson
4c72266830 Fix call to deprecated API in bd87a2449da0c82e63cebdf9c131c54a5472e3a7 2023-02-09 10:26:33 +00:00
Alex Richardson
bd87a2449d [CGP] Add generic TargetLowering::shouldAlignPointerArgs() implementation
This function was added for ARM targets, but aligning global/stack pointer
arguments passed to memcpy/memmove/memset can improve code size and
performance for all targets that don't have fast unaligned accesses.
This adds a generic implementation that adjusts the alignment to pointer
size if unaligned accesses are slow.
Review D134168 suggests that this significantly improves performance on
synthetic benchmarks such as Dhrystone on RV32 as it avoids memcpy() calls.

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D134282
2023-02-09 10:11:40 +00:00
Archibald Elliott
62c7f035b4 [NFC][TargetParser] Remove llvm/ADT/Triple.h
I also ran `git clang-format` to get the headers in the right order for
the new location, which has changed the order of other headers in two
files.
2023-02-07 12:39:46 +00:00
Kazu Hirata
526966d07d Use llvm::bit_ceil (NFC)
Note that:

  std::has_single_bit(X) ? X : llvm::NextPowerOf2(X);

is equivalent to:

  std::bit_ceil(X)

even for input 0.
2023-01-28 16:13:09 -08:00
Matt Arsenault
e70ae0f46b DAG/GlobalISel: Fix broken/redundant setting of MODereferenceable
This was incorrectly setting dereferenceable on unaligned
operands. getLoadMemOperandFlags does the alignment dereferenceabilty
check without alignment, and then both paths went on to check
isDereferenceableAndAlignedPointer. Make getLoadMemOperandFlags check
isDereferenceableAndAlignedPointer, and remove the second call.
2023-01-13 20:30:30 -05:00
Guillaume Chatelet
48f5d77eee [NFC] Use TypeSize::getKnownMinValue() instead of TypeSize::getKnownMinSize()
This change is one of a series to implement the discussion from
https://reviews.llvm.org/D141134.
2023-01-11 16:36:39 +00:00
Roman Lebedev
16facf1ca6
[DAGCombiner][TLI] Do not fuse bitcast to <1 x ?> into a load/store of a vector
Single-element vectors are legalized by splitting,
so the the memory operations would also get scalarized.
While we do have some support to reconstruct scalarized loads,
we clearly don't catch everything.

The comment for the affected AArch64 store suggests that
having two stores was the desired outcome in the first place.

This was showing as a source of *many* regressions
with more aggressive ZERO_EXTEND_VECTOR_INREG recognition.
2022-12-31 03:49:43 +03:00
Roman Lebedev
603e849072
[NFC][TLI] Move isLoadBitCastBeneficial() implementation into source file
... so any change to it does not cause 700 source files to be recompiled.
2022-12-31 02:07:50 +03:00
Freddy Ye
89f36dd8f3 [X86] Add ExpandLargeFpConvert Pass and enable for X86
As stated in
https://discourse.llvm.org/t/rfc-llc-add-expandlargeintfpconvert-pass-for-fp-int-conversion-of-large-bitint/65528,
this implementation is very similar to ExpandLargeDivRem, which expands
‘fptoui .. to’, ‘fptosi .. to’, ‘uitofp .. to’, ‘sitofp .. to’ instructions
with a bitwidth above a threshold into auto-generated functions. This is
useful for targets like x86_64 that cannot lower fp convertions with more
than 128 bits. The expanded nodes are referring from the IR generated by
`compiler-rt/lib/builtins/floattidf.c`, `compiler-rt/lib/builtins/fixdfti.c`,
and etc.

Corner cases:
1. For fp16: as there is no related builtins added in compliler-rt. So I
mainly utilized the fp32 <-> fp16 lib calls to implement.
2. For fp80: as this pass is soft fp emulation and no fp80 instructions can
help in this problem. I recommend users to deprecate this usage. For now, the
implementation uses fp128 as the temporary conversion type and inserts
fptrunc/ext at top/end of the function.
3. For bf16: as clang FE currently doesn't support bf16 algorithm operations
(convert to int, float, +, -, *, ...), this patch doesn't consider bf16 for
now.
4. For unsigned FPToI: since both default hardware behaviors and libgcc are
ignoring "returns 0 for negative input" spec. This pass follows this old way
to ignore unsigned FPToI. See this example:
https://gcc.godbolt.org/z/bnv3jqW1M

The end-to-end tests are uploaded at https://reviews.llvm.org/D138261

Reviewed By: LuoYuanke, mgehre-amd

Differential Revision: https://reviews.llvm.org/D137241
2022-12-01 13:47:43 +08:00
Phoebe Wang
b39b76f2ef [X86] Allow no X87 on 32-bit
This patch is an alternative of D100091. It solved the problems in `f80` type lowering.

Reviewed By: LuoYuanke

Differential Revision: https://reviews.llvm.org/D137946
2022-11-22 10:47:47 +08:00
Stanislav Mekhanoshin
bcaf31ec3f [AMDGPU] Allow finer grain control of an unaligned access speed
A target can return if a misaligned access is 'fast' as defined
by the target or not. In reality there can be different levels
of 'fast' and 'slow'. This patch changes the boolean 'Fast'
argument of the allowsMisalignedMemoryAccesses family of functions
to an unsigned representing its speed.

A target can still define it as it wants and the direct translation
of the current code uses 0 and 1 for current false and true. This
makes the change an NFC.

Subsequent patch will start using an actual value of speed in
the load/store vectorizer to compare if a vectorized access going
to be not just fast, but not slower than before.

Differential Revision: https://reviews.llvm.org/D124217
2022-11-17 09:23:53 -08:00
Craig Topper
1121eca685 [VP][VE] Default VP_SREM/UREM to Expand and add generic expansion using VP_SDIV/UDIV+VP_MUL+VP_SUB.
I want to default all VP operations to Expand. These 2 were blocking
because VE doesn't support them and the tests were expecting them
to fail a specific way. Using Expand caused them to fail differently.

Seemed better to emulate them using operations that are supported.

@simoll mentioned on Discord that VE has some expansion downstream. Not
sure if its done like this or in the VE target.

Reviewed By: frasercrmck, efocht

Differential Revision: https://reviews.llvm.org/D133514
2022-09-16 13:19:02 -07:00
Matthias Gehre
c1502425ba Move TargetTransformInfo::maxLegalDivRemBitWidth -> TargetLowering::maxSupportedDivRemBitWidth
Also remove new-pass-manager version of ExpandLargeDivRem because there is no way
yet to access TargetLowering in the new pass manager.

Differential Revision: https://reviews.llvm.org/D133691
2022-09-12 17:06:16 +01:00
Joe Loser
5e96cea1db [llvm] Use std::size instead of llvm::array_lengthof
LLVM contains a helpful function for getting the size of a C-style
array: `llvm::array_lengthof`. This is useful prior to C++17, but not as
helpful for C++17 or later: `std::size` already has support for C-style
arrays.

Change call sites to use `std::size` instead.

Differential Revision: https://reviews.llvm.org/D133429
2022-09-08 09:01:53 -06:00
Benjamin Kramer
c349d7f4ff [SelectionDAG] Rewrite bfloat16 softening to use the "half promotion" path
The main difference is that this preserves intermediate rounding steps,
which the other route doesn't. This aligns bfloat16 more with half
floats, which use this path on most targets.

I didn't understand what the difference was between these softening
approaches when I first added bfloat lowerings, would be nice if we only
had one of them.

Based on @pengfei 's D131502

Differential Revision: https://reviews.llvm.org/D133207
2022-09-06 11:54:34 +02:00
Daniil Fukalov
7ed3d81333 [NFCI] Move cost estimation from TargetLowering to TargetTransformInfo.
TragetLowering had two last InstructionCost related `getTypeLegalizationCost()`
and `getScalingFactorCost()` members, but all other costs are processed in TTI.

E.g. it is not comfortable to use other TTI members in these two functions
overrided in a target.

Minor refactoring: `getTypeLegalizationCost()` now doesn't need DataLayout
parameter - it was always passed from TTI.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D117723
2022-08-18 00:38:55 +03:00
Fangrui Song
de9d80c1c5 [llvm] LLVM_FALLTHROUGH => [[fallthrough]]. NFC
With C++17 there is no Clang pedantic warning or MSVC C5051.
2022-08-08 11:24:15 -07:00
Amara Emerson
65246d3eb4 Use hasNItemsOrLess() in MRI::hasAtMostUserInstrs(). 2022-07-27 11:42:14 -07:00
Amara Emerson
19cdd1908b [AArch64][GlobalISel] Add heuristics for localizing G_CONSTANT.
This adds similar heuristics to G_GLOBAL_VALUE, querying the cost of
materializing a specific constant in code size. Doing so prevents us from
sinking constants which require multiple instructions to generate into
use blocks.

Code size savings on CTMark -Os:
Program                                       size.__text
                                              before         after           diff
ClamAV/clamscan                               381940.00      382052.00       0.0%
lencod/lencod                                 428408.00      428428.00       0.0%
SPASS/SPASS                                   411868.00      411876.00       0.0%
kimwitu++/kc                                  449944.00      449944.00       0.0%
Bullet/bullet                                 463588.00      463556.00      -0.0%
sqlite3/sqlite3                               284696.00      284668.00      -0.0%
consumer-typeset/consumer-typeset             414492.00      414424.00      -0.0%
7zip/7zip-benchmark                           595244.00      594972.00      -0.0%
mafft/pairlocalalign                          247512.00      247368.00      -0.1%
tramp3d-v4/tramp3d-v4                         372884.00      372044.00      -0.2%
                           Geomean difference                               -0.0%

Differential Revision: https://reviews.llvm.org/D130554
2022-07-27 10:51:16 -07:00
Kazu Hirata
9e6d1f4b5d [CodeGen] Qualify auto variables in for loops (NFC) 2022-07-17 01:33:28 -07:00
Paul Robinson
ac2ad3b7bb [PS5] Support sin+cos->sincos optimization 2022-06-15 09:36:05 -07:00
Benjamin Kramer
8bc0bb9564 Add a conversion from double to bf16
This introduces a new compiler-rt function `__truncdfbf2`.
2022-06-15 12:56:31 +02:00
Benjamin Kramer
fb34d531af Promote bf16 to f32 when the target doesn't support it
This is modeled after the half-precision fp support. Two new nodes are
introduced for casting from and to bf16. Since casting from bf16 is a
simple operation I opted to always directly lower it to integer
arithmetic. The other way round is more complicated if you want to
preserve IEEE semantics, so it's handled by a new __truncsfbf2
compiler-rt builtin.

This is of course very bare bones, but sufficient to get a semi-softened
fadd on x86.

Possible future improvements:
 - Targets with bf16 conversion instructions can now make fp_to_bf16 legal
 - The software conversion to bf16 can be replaced by a trivial
   implementation under fast math.

Differential Revision: https://reviews.llvm.org/D126953
2022-06-15 12:56:31 +02:00
Hendrik Greving
a92ed167f2 [ValueTypes] Define MVTs for v128i2/v64i4 as well as i2 and i4.
Adds MVT::v128i2, MVT::v64i4, and implied MVT::i2, MVT::i4.

Keeps MVT::i2, MVT::i4 lowering actions as expand, which should be
removed once targets set this explicitly.

Adjusts 11 lit tests to reflect slightly different behavior during
DAG combine.

Differential Revision: https://reviews.llvm.org/D125247
2022-06-02 00:49:11 +00:00
Hendrik Greving
e9d05cc7d8 Revert "[ValueTypes] Define MVTs for v128i2/v64i4 as well as i2 and i4."
This reverts commit 430ac5c3029c52e391e584c6d4447e6e361fae99.

Due to failures in Clang tests.

Differential Revision: https://reviews.llvm.org/D125247
2022-06-01 13:27:49 -07:00
Hendrik Greving
430ac5c302 [ValueTypes] Define MVTs for v128i2/v64i4 as well as i2 and i4.
Adds MVT::v128i2, MVT::v64i4, and implied MVT::i2, MVT::i4.

Keeps MVT::i2, MVT::i4 lowering actions as `expand`, which should be
removed once targets set this explicitly.

Adjusts 11 lit tests to reflect slightly different behavior during
DAG combine.

Differential Revision: https://reviews.llvm.org/D125247
2022-06-01 12:48:01 -07:00
Matthias Braun
cd19af74c0 Avoid 8 and 16bit switch conditions on x86
This adds a `TargetLoweringBase::getSwitchConditionType` callback to
give targets a chance to control the type used in
`CodeGenPrepare::optimizeSwitchInst`.

Implement callback for X86 to avoid i8 and i16 types where possible as
they often incur extra zero-extensions.

This is NFC for non-X86 targets.

Differential Revision: https://reviews.llvm.org/D124894
2022-05-10 10:00:10 -07:00
Serge Pavlov
170a903144 Intrinsic for checking floating point class
This change introduces a new intrinsic, `llvm.is.fpclass`, which checks
if the provided floating-point number belongs to any of the the specified
value classes. The intrinsic implements the checks made by C standard
library functions `isnan`, `isinf`, `isfinite`, `isnormal`, `issubnormal`,
`issignaling` and corresponding IEEE-754 operations.

The primary motivation for this intrinsic is the support of strict FP
mode. In this mode using compare instructions or other FP operations is
not possible, because if the value is a signaling NaN, floating-point
exception `Invalid` is raised, but the aforementioned functions must
never raise exceptions.

Currently there are two solutions for this problem, both are
implemented partially. One of them is using integer operations to
implement the check. It was implemented in https://reviews.llvm.org/D95948
for `isnan`. It solves the problem of exceptions, but offers one
solution for all targets, although some can do the check in more
efficient way.

The other, implemented in https://reviews.llvm.org/D96568, introduced a
hook 'clang::TargetCodeGenInfo::testFPKind', which injects a target
specific code into IR to implement `isnan` and some other functions. It is
convenient for targets that have dedicated instruction to determine FP data
class. However using target-specific intrinsic complicates analysis and can
prevent some optimizations.

A special intrinsic for value class checks allows representing data class
tests with enough flexibility. During IR transformations it represents the
check in target-independent way and saves it from undesired transformations.
In the instruction selector it allows efficient lowering depending on the
used target and mode.

This implementation is an extended variant of `llvm.isnan` introduced
in https://reviews.llvm.org/D104854. It is limited to minimal intrinsic
support. Target-specific treatment will be implemented in separate
patches.

Differential Revision: https://reviews.llvm.org/D112025
2022-04-26 13:09:16 +07:00
Shao-Ce SUN
e90110e696 [NFC][CodeGen] Use ArrayRef in TargetLowering functions
This patch is similar to D122557, adding an `ArrayRef` version for `setOperationAction`, `setLoadExtAction`, `setCondCodeAction`, `setLibcallName`.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D123467
2022-04-13 00:46:05 +08:00
serge-sans-paille
989f1c72e0 Cleanup codegen includes
This is a (fixed) recommit of https://reviews.llvm.org/D121169

after:  1061034926
before: 1063332844

Discourse thread: https://discourse.llvm.org/t/include-what-you-use-include-cleanup
Differential Revision: https://reviews.llvm.org/D121681
2022-03-16 08:43:00 +01:00
Nico Weber
a278250b0f Revert "Cleanup codegen includes"
This reverts commit 7f230feeeac8a67b335f52bd2e900a05c6098f20.
Breaks CodeGenCUDA/link-device-bitcode.cu in check-clang,
and many LLVM tests, see comments on https://reviews.llvm.org/D121169
2022-03-10 07:59:22 -05:00
serge-sans-paille
7f230feeea Cleanup codegen includes
after:  1061034926
before: 1063332844

Differential Revision: https://reviews.llvm.org/D121169
2022-03-10 10:00:30 +01:00
Paul Robinson
7b85f0f32f [PS4] isPS4 and isPS4CPU are not meaningfully different 2022-03-03 11:36:59 -05:00
Craig Topper
1df8efae56 [SelectionDAG][X86] Support f16 in getReciprocalOpName.
If the "reciprocal-estimates" attribute is present and it doesn't
contain "all", "none", or "default", we previously crashed on f16
operations.

This patch addes an 'h' suffix' to prevent the crash.

I've added simple tests that just enable the estimate for all
vec-sqrt and one test case that explicitly tests the new 'h' suffix
to override the default steps.

There may be some frontend change needed to, but I haven't checked
that yet.

Reviewed By: pengfei

Differential Revision: https://reviews.llvm.org/D120158
2022-02-18 21:55:49 -08:00
Craig Topper
0d59a54cea Revert "[SelectionDAG][X86] Support f16 in getReciprocalOpName."
This reverts commit 86b5e256628ae49193ad9962626a73bafeda2883.

This wasn't supposed to be commited yet
2022-02-18 15:39:50 -08:00
Craig Topper
86b5e25662 [SelectionDAG][X86] Support f16 in getReciprocalOpName.
If the "reciprocal-estimates" attribute is present and it doesn't
contain "all", "none", or "default", we previously crashed on f16
operations.

This patch addes an 'h' suffix' to prevent the crash.

I've added simple tests that just enable the estimate for all
vec-sqrt and one test case that explicitly tests the new 'h' suffix
to override the default steps.

There may be some frontend change needed to, but I haven't checked
that yet.

Differential Revision: https://reviews.llvm.org/D120158
2022-02-18 15:36:35 -08:00
David Green
4072e362c0 [ISel] Port AArch64 HADD and RHADD to ISel
This ports the aarch64 combines for HADD and RHADD over to DAG combine,
so that they can be used in more architectures (notably MVE in a
followup patch). They are renamed to AVGFLOOR and AVGCEIL in the
process, to avoid confusion with instructions such as X86 hadd. The code
was also rewritten slightly to remove the AArch64 idiosyncrasies.

The general pattern for a AVGFLOORS is
  %xe = sext i8 %x to i32
  %ye = sext i8 %y to i32
  %a = add i32 %xe, %ye
  %r = lshr i32 %a, 1
  %t = trunc i32 %r to i8

An AVGFLOORU is equivalent with zext. Because of the truncate
lshr==ashr, as the top bits are not demanded. An AVGCEIL also includes
an extra rounding, so includes an extra add of 1.

Differential Revision: https://reviews.llvm.org/D106237
2022-02-11 18:28:56 +00:00
Benjamin Kramer
f15014ff54 Revert "Rename llvm::array_lengthof into llvm::size to match std::size from C++17"
This reverts commit ef8206320769ad31422a803a0d6de6077fd231d2.

- It conflicts with the existing llvm::size in STLExtras, which will now
  never be called.
- Calling it without llvm:: breaks C++17 compat
2022-01-26 16:55:53 +01:00
serge-sans-paille
ef82063207 Rename llvm::array_lengthof into llvm::size to match std::size from C++17
As a conquence move llvm::array_lengthof from STLExtras.h to
STLForwardCompat.h (which is included by STLExtras.h so no build
breakage expected).
2022-01-26 16:17:45 +01:00
David Green
fffd663c87 [CodeGen] Initialize MaxBytesForAlignment in TargetLoweringBase::TargetLoweringBase.
This appears to be missing from D114590, causing sanitizer errors.
2022-01-05 19:34:27 +00:00
Nicholas Guy
73d92faa2f [CodeGen] Emit alignment "Max Skip" operand
The current AsmPrinter has support to emit the "Max Skip" operand
(the 3rd of .p2align), however has no support for it to actually be specified.
Adding MaxBytesForAlignment to MachineBasicBlock provides this capability on a
per-block basis. Leaving the value as default (0) causes no observable differences
in behaviour.

Differential Revision: https://reviews.llvm.org/D114590
2022-01-05 12:54:30 +00:00
Kazu Hirata
26bd534a79 [llvm] Use none_of instead of \!any_of (NFC) 2021-12-17 13:48:57 -08:00
David Sherwood
8d77555b12 [Analysis] Ensure getTypeLegalizationCost returns a simple VT for TypeScalarizeScalableVector
When getTypeConversion returns TypeScalarizeScalableVector we were
sometimes returning a non-simple type from getTypeLegalizationCost.
However, many callers depend upon this being a simple type and will
crash if not. This patch changes getTypeLegalizationCost to ensure
that we always a return sensible simple VT. If the vector type
contains unusual integer types, e.g. <vscale x 2 x i3>, then we just
set the type to MVT::i64 as a reasonable default.

A test has been added here that demonstrates the vectoriser can
correctly calculate the cost of vectorising a "zext i3 to i64"
instruction with a VF=vscale x 1:

  Transforms/LoopVectorize/AArch64/sve-inductions-unusual-types.ll

Differential Revision: https://reviews.llvm.org/D113777
2021-11-17 13:11:58 +00:00
Alfredo Dal'Ava Junior
1cb9f37a17 [FreeBSD] Do not mark __stack_chk_guard as dso_local
This symbol is defined in libc.so so it is definitely not DSO-Local.
Marking it as such causes problems on some platforms (such as PowerPC).

Differential revision: https://reviews.llvm.org/D109090
2021-11-05 07:29:50 -05:00
Craig Topper
d51e3a2139 [LegalizeTypes][TargetLowering] Merge getShiftAmountTyForConstant into TargetLowering::getShiftAmountTy.
getShiftAmountTyForConstant is a special helper that changes the
shift amount to i32 if the type chosen by
TargetLowering::getShiftAmountTy can't represent all possible values.
This is needed to satisfy an assert in SelectionDAG::getNode.

It requires additional consideration to know when this helper should be used.
I'm not sure that we are always using it when we should.

This patch merges the getShiftAmountTyForConstant handling into
TargetLowering::getShiftAmountTy so we don't need to think about it
anymore.

Technically this may slightly increase compile times since the majority
of callers of getShiftAmountTy won't need this. Hopefully, this isn't
an issue in practice.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D112469
2021-10-25 14:06:53 -07:00
Dávid Bolvanský
3649fb14d1 Fixed some errors detected by PVS Studio 2021-10-09 17:20:04 +02:00
Arthur Eubanks
a0a4935182 Make more places that use alignment use uint64_t
Followup to D110451.
2021-10-08 16:35:19 -07:00
Freddy Ye
13207a21a6 [NFC] Remove redundant setOperationAction.
[FROUND,FROUNDEVEN][f32, f64, f128] are set Expand twice.

Differential Revision: https://reviews.llvm.org/D110302
2021-09-23 10:28:21 +08:00
Roman Lebedev
3f1f08f0ed
Revert @llvm.isnan intrinsic patchset.
Please refer to
https://lists.llvm.org/pipermail/llvm-dev/2021-September/152440.html
(and that whole thread.)

TLDR: the original patch had no prior RFC, yet it had some changes that
really need a proper RFC discussion. It won't be productive to discuss
such an RFC, once it's actually posted, while said patch is already
committed, because that introduces bias towards already-committed stuff,
and the tree is potentially in broken state meanwhile.

While the end result of discussion may lead back to the current design,
it may also not lead to the current design.

Therefore i take it upon myself
to revert the tree back to last known good state.

This reverts commit 4c4093e6e39fe6601f9c95a95a6bc242ef648cd5.
This reverts commit 0a2b1ba33ae6dcaedb81417f7c4cc714f72a5968.
This reverts commit d9873711cb03ac7aedcaadcba42f82c66e962e6e.
This reverts commit 791006fb8c6fff4f33c33cb513a96b1d3f94c767.
This reverts commit c22b64ef66f7518abb6f022fcdfd86d16c764caf.
This reverts commit 72ebcd3198327da12804305bda13d9b7088772a8.
This reverts commit 5fa6039a5fc1b6392a3c9a3326a76604e0cb1001.
This reverts commit 9efda541bfbd145de90f7db38d935db6246dc45a.
This reverts commit 94d3ff09cfa8d7aecf480e54da9a5334e262e76b.
2021-09-02 13:53:56 +03:00
Wang, Pengfei
2379949aad [X86] AVX512FP16 instructions enabling 3/6
Enable FP16 conversion instructions.

Ref.: https://software.intel.com/content/www/us/en/develop/download/intel-avx512-fp16-architecture-specification.html

Reviewed By: LuoYuanke

Differential Revision: https://reviews.llvm.org/D105265
2021-08-18 09:03:41 +08:00