736 Commits

Author SHA1 Message Date
Momchil Velikov
99e57f06c4 [CodeGenPrepare] Increase the limit on the number of instructions to scan
... when finding all memory uses for an address and make it a
parameter.

Now that we have avoided potentially exponential run time of
`FindAllMemoryUses` in D143893. it'd be beneficial to increase the
limit up from 20.

Reviewed By: mkazantsev

Differential Revision: https://reviews.llvm.org/D143894

Change-Id: I3abdf40332ef65e9b2f819ac32ac60e4200ec51d
2023-03-30 14:38:22 +01:00
Momchil Velikov
2453da0a4e [CodeGenPrepare] Fix counting uses when folding addresses into memory instructions
The counter of the number of instructions seen in `FindAllMemoryUses`
is reset after returning from a recursive invocation of
`FindAllMemoryUses` to the value it had before the call. In effect,
depending on the shape of the uses graph, the function may scan up to
`2^N-1` instructions where `N` is the scan limit
(`MaxMemoryUsesToScan`).  This does not look intuitive or intended.

This patch changes the counting to just count the scanned
instructions, independent of the shape of the references.

Reviewed By: mkazantsev

Differential Revision: https://reviews.llvm.org/D143893

Change-Id: I99f5de55e84843cf2fbea287d6ae4312fa196240
2023-03-30 14:18:14 +01:00
Peter Rong
670c92a415 [CodeGen] Remove redundent instructions generated by combineAddrModes.
CodeGenPare may optimize memory access modes.
During such optimization, it might create a new instruction representing combined value.
Later, If the optimization failed, the generated value is not removed and remains a dead instruction.

Normally this won't be a problem as dead code will be eliminated later.
However, in this case (Issue 58538), the generated instruction may trigger an infinite loop.
The infinite loop involves `sinkCmpExpression`, where it tries to optimize the placeholder generated by us.
(See the test case detailed in the issue)

To fix this, we remove the unnecessary placeholder immediately when we abort the optimization.
`AddressingModeCombiner` will keep track of the placeholder, and remove it if it is an inserted placeholder and has no uses.
This patch fixes https://github.com/llvm/llvm-project/issues/58538, a test is also included.

Reviewed By: skatkov

Differential Revision: https://reviews.llvm.org/D147041
2023-03-30 00:35:56 -07:00
Craig Topper
697a28b380 [CodeGenPrepare][RISCV] Correct the MathUsed flag for shouldFormOverflowOp
For add, if we match the constant edge case the add isn't used by
the compare so we shouldn't check for 2 users.

For sub, the compare is not a user of the sub so the math is
used if the sub has any users.

This regresses RISC-V which I will work on other patches for.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D146786
2023-03-27 09:58:50 -07:00
Momchil Velikov
6a2a5f08de [CodeGenPrepare] Don't give up if unable to sink first arg to a cold call
Reviewed By: mkazantsev

Differential Revision: https://reviews.llvm.org/D143892
2023-03-23 17:31:09 +00:00
Kazu Hirata
398af9b43b [llvm] Use *{Map,Set}::contains (NFC) 2023-03-15 18:06:32 -07:00
Kazu Hirata
a585fa2637 [CodeGen] Use *{Set,Map}::contains (NFC) 2023-03-14 08:07:42 -07:00
Paul Walker
adbdf273ef [CodeGenPrepare] Stop llvm.vscale() -> getelementptr(null, 1) transformation.
I've pulled this change from D145404 to land in isolation because
I'm concerned the code might be more important than the test
coverage might suggest (NOTE: the code has no test coverage).
2023-03-08 15:47:03 +00:00
Xiang1 Zhang
eed31bbb37 [NFC] Remove dead code in ExtAddrMode::print checked by coverty tool 2023-03-08 15:01:28 +08:00
Sander de Smalen
170e7a0ec2 [AArch64][SME2] Add CodeGen support for target("aarch64.svcount").
This patch adds AArch64 CodeGen support such that the type can be passed
and returned to/from functions, and also adds support to use this type in
load/store operations and PHI nodes.

Reviewed By: paulwalker-arm

Differential Revision: https://reviews.llvm.org/D136862
2023-03-02 12:07:41 +00:00
Kazu Hirata
a28b252d85 Use APInt::getSignificantBits instead of APInt::getMinSignedBits (NFC)
Note that getMinSignedBits has been soft-deprecated in favor of
getSignificantBits.
2023-02-19 23:56:52 -08:00
Jake Egan
08533f8b86 Revert "[CGP] Add generic TargetLowering::shouldAlignPointerArgs() implementation"
These commits are causing a test-suite build failure on AIX. Revert for now for time to investigate.
https://lab.llvm.org/buildbot/#/builders/214/builds/5779/steps/9/logs/stdio

This reverts commit bd87a2449da0c82e63cebdf9c131c54a5472e3a7 and 4c72266830ffa332ebb7cf1d3bbd6c56d001fa0f.
2023-02-14 15:20:06 -05:00
Alex Richardson
bd87a2449d [CGP] Add generic TargetLowering::shouldAlignPointerArgs() implementation
This function was added for ARM targets, but aligning global/stack pointer
arguments passed to memcpy/memmove/memset can improve code size and
performance for all targets that don't have fast unaligned accesses.
This adds a generic implementation that adjusts the alignment to pointer
size if unaligned accesses are slow.
Review D134168 suggests that this significantly improves performance on
synthetic benchmarks such as Dhrystone on RV32 as it avoids memcpy() calls.

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D134282
2023-02-09 10:11:40 +00:00
Kazu Hirata
caa99a01f5 Use llvm::popcount instead of llvm::countPopulation(NFC) 2023-01-22 12:48:51 -08:00
Piotr Fusik
898b5c9f5e [NFC] Fix "form/from" typos
Reviewed By: #libc, ldionne

Differential Revision: https://reviews.llvm.org/D142007
2023-01-22 20:05:51 +01:00
ShihPo Hung
5fb3a57ea7 [Cost] Add CostKind to getVectorInstrCost and its related users
LoopUnroll estimates the loop size via getInstructionCost(),
but getInstructionCost() cannot pass CostKind to getVectorInstrCost().
And so does getShuffleCost() to getBroadcastShuffleOverhead(),
getPermuteShuffleOverhead(), getExtractSubvectorOverhead(),
and getInsertSubvectorOverhead().

To address this, this patch adds an argument CostKind to these
functions.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D142116
2023-01-21 05:29:24 -08:00
Guillaume Chatelet
8fd5558b29 [NFC] Use TypeSize::geFixedValue() instead of TypeSize::getFixedSize()
This change is one of a series to implement the discussion from
https://reviews.llvm.org/D141134.
2023-01-11 16:49:38 +00:00
Guillaume Chatelet
48f5d77eee [NFC] Use TypeSize::getKnownMinValue() instead of TypeSize::getKnownMinSize()
This change is one of a series to implement the discussion from
https://reviews.llvm.org/D141134.
2023-01-11 16:36:39 +00:00
OCHyams
042107494d [DebugInfo][NFC] Rename is/setUndef to is/setKilllocation
These names better reflect the semantics and also the implementation, since
it's not just "undef" operands that are sentinels used to signal that the debug
intrinsic terminates dominating locations definitions.

Related to https://discourse.llvm.org/t/auto-undef-debug-uses-of-a-deleted-value

Reviewed By: StephenTozer

Differential Revision: https://reviews.llvm.org/D140903
2023-01-06 09:15:02 +00:00
serge-sans-paille
38818b60c5
Move from llvm::makeArrayRef to ArrayRef deduction guides - llvm/ part
Use deduction guides instead of helper functions.

The only non-automatic changes have been:

1. ArrayRef(some_uint8_pointer, 0) needs to be changed into ArrayRef(some_uint8_pointer, (size_t)0) to avoid an ambiguous call with ArrayRef((uint8_t*), (uint8_t*))
2. CVSymbol sym(makeArrayRef(symStorage)); needed to be rewritten as CVSymbol sym{ArrayRef(symStorage)}; otherwise the compiler is confused and thinks we have a (bad) function prototype. There was a few similar situation across the codebase.
3. ADL doesn't seem to work the same for deduction-guides and functions, so at some point the llvm namespace must be explicitly stated.
4. The "reference mode" of makeArrayRef(ArrayRef<T> &) that acts as no-op is not supported (a constructor cannot achieve that).

Per reviewers' comment, some useless makeArrayRef have been removed in the process.

This is a follow-up to https://reviews.llvm.org/D140896 that introduced
the deduction guides.

Differential Revision: https://reviews.llvm.org/D140955
2023-01-05 14:11:08 +01:00
Sprite
a9f9f3dff4 Correct typos (NFC)
Just found some typos while reading the llvm/circt project.

compliment -> complement
emitsd -> emits
2022-12-16 10:51:26 -08:00
Kazu Hirata
3c09ed006a [llvm] Use std::nullopt instead of None in comments (NFC)
This is part of an effort to migrate from llvm::Optional to
std::optional:

https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716
2022-12-04 17:12:44 -08:00
Kazu Hirata
998960ee1f [CodeGen] Use std::nullopt instead of None (NFC)
This patch mechanically replaces None with std::nullopt where the
compiler would warn if None were deprecated.  The intent is to reduce
the amount of manual work required in migrating from Optional to
std::optional.

This is part of an effort to migrate from llvm::Optional to
std::optional:

https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716
2022-12-02 20:36:08 -08:00
Kazu Hirata
000749d753 [CodeGen] Use std::optional in CodeGenPrepare.cpp (NFC)
This is part of an effort to migrate from llvm::Optional to
std::optional:

https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716
2022-11-26 14:27:19 -08:00
OCHyams
3115e6828c [Assignment Tracking][25/*] Replace sunk address uses in dbg.assign intrinsics
The Assignment Tracking debug-info feature is outlined in this RFC:

https://discourse.llvm.org/t/
rfc-assignment-tracking-a-better-way-of-specifying-variable-locations-in-ir

Reviewed By: StephenTozer

Differential Revision: https://reviews.llvm.org/D136255
2022-11-21 15:50:47 +00:00
Alex Richardson
754d25844a [CGP] Update MemIntrinsic alignment if possible
Previously it was only being done if shouldAlignPointerArgs() returned
true, which right now is only true for ARM targets.
Updating the argument alignment attributes of memcpy/memset intrinsics
if the underlying object has larger alignment can be beneficial even
when CGP didn't increase alignment (as can be seen from the test changes),
so invert the loop and if condition.

Differential Revision: https://reviews.llvm.org/D134281
2022-11-17 11:59:35 +00:00
Haohai Wen
e419620fc2 [CodeGenPrep] Change ValueToSExts from DeseMap to MapVector
mergeSExts iterates throught ValueToSExts. Using DenseMap result in
unstable optimization path so that output IR may vary even if the input
IR is same.

Reviewed By: wxiao3

Differential Revision: https://reviews.llvm.org/D137234
2022-11-04 11:15:18 +08:00
David Green
16e4e4ab87 [CodeGenPrep] Handle constants in ConvertPhiType
This is a simple addition to the convertPhiTypes in CodeGenPrepare to
consider and convert constants as it converts the phi type. Someone
fixed the bug in the motivating example, so the undef is now a constant
0. This does mean converting between integer and floating point
constants, which may have different materialization.

Differential Revision: https://reviews.llvm.org/D135561
2022-10-13 16:41:44 +01:00
Florian Hahn
6b86b481e3
[AArch64] Use tbl for truncating vector FPtoUI conversions.
On AArch64, doing the vector truncate separately after the fptoui
conversion can be lowered more efficiently using tbl.4, building on
D133495.

https://alive2.llvm.org/ce/z/T538CC

Depends on D133495

Reviewed By: t.p.northover

Differential Revision: https://reviews.llvm.org/D133496
2022-09-16 14:57:43 +01:00
Florian Hahn
8491d01cc3
[AArch64] Lower vector trunc using tbl.
Similar to using tbl to lower vector ZExts, tbl4 can be used to lower
vector truncates.

The initial version support i32->i8 conversions.

Depends on D120571

Reviewed By: t.p.northover

Differential Revision: https://reviews.llvm.org/D133495
2022-09-16 12:42:49 +01:00
Florian Hahn
5871f18827
[AArch64] Lower extending uitofp using tbl.
On AArch64, doing the zero-extend separately first can be lowered more
efficiently using tbl, building on D120571.

https://alive2.llvm.org/ce/z/8Je595

Depends on D120571

Reviewed By: t.p.northover

Differential Revision: https://reviews.llvm.org/D133494
2022-09-16 10:20:25 +01:00
Florian Hahn
81a11da762
[CGP,AArch64] Replace zexts with shuffle that can be lowered using tbl.
This patch extends CodeGenPrepare to lower zext v16i8 -> v16i32 in loops
using a wide shuffle  creating a v64i8 vector, selecting groups of 3
zero elements and an element from the input.

This is profitable on AArch64 where such shuffles can be lowered to tbl
instructions, but only in loops, because it requires materializing 4
masks, which can be done in the loop preheader.

This is the only reason the transform is part of CGP. If there's a
better alternative I missed, please let me know. The same goes for the
shouldReplaceZExtWithShuffle hook which guards this. I am not sure if
this transform will be beneficial on other targets, but it seems like
there is no way other convenient way.

This improves the generated code for loops like the one below in
combination with D96522.

    int foo(uint8_t *p, int N) {
      unsigned long long sum = 0;
      for (int i = 0; i < N ; i++, p++) {
	unsigned int v = *p;
	sum += (v < 127) ? v : 256 - v;
      }
      return sum;
    }

https://clang.godbolt.org/z/Wco866MjY

Reviewed By: t.p.northover

Differential Revision: https://reviews.llvm.org/D120571
2022-09-15 19:18:13 +01:00
Xiang1 Zhang
16743c9534 [CodeGen] Limit building time in CodeGenPrepare for huge function
Details:

Currently CodeGenPrepare is very time consuming in handling big functions.

Old Algorithm :
It iterate each BB in function, and go on handle very instructions in BB.
Due to some instruction optimizations may affect the BBs' dominate tree.
The old logic will re-iterate and try optimize for each BB.

Suppose we have a big function with 20000 BBs, If we handled the last BB
with fine tuning the dominate tree. We need totally re-iterate and try optimize
the 20000 BBs from the beginning.

The Complex is near N!

And we really encounter somes big tests (> 20000 BBs) that cost more than 30
mins in this pass. (Debug version compiler will cost 2 hours here)

What this patch do for huge function ?
It mainly changes the iteration way for optimization.

1 We do optimizeBlock for each BB (that is same with old way).
And, in the meaning time, If BB is changed/updated in the optimization, it will
be put into FreshBBs (try do optimizeBlock again).
The new created BB at previous iteration will also put into FreshBBs.

2 For the BBs which not updated at previous iteration, we directly skip it.
Strictly speaking, here may miss some opportunity, but the probability is very
small.

3 For Instructions in single BB, we do optimizeInst for each instruction.
If optimizeInst change the instruction dominator in this BB, rather than break
and go back to optimize the first BB (the old way), we directly iterate
instructions (to do optimizeInst) in this updated BB again (the new way).

What this patch do for small/normal (not huge) function ?
It is same with the Old Algorithm. (NFC)

Reviewed By: LuoYuanke

Differential Revision: https://reviews.llvm.org/D129352
2022-09-07 10:05:40 +08:00
Simon Pilgrim
e2d140e9c3 [TTI] Add isExpensiveToSpeculativelyExecute wrapper
CGP uses a raw `getInstructionCost(I, TargetTransformInfo::TCK_SizeAndLatency) >= TCC_Expensive` check to see if its better to move an expensive instruction used in a select behind a branch instead.

This is causing issues with upcoming improvements to TCK_SizeAndLatency costs on X86 as we need to use TCK_SizeAndLatency as an uop count (so its compatible with various target-specific buffer sizes - see D132288), but we can have instructions that have a low TCK_SizeAndLatency value but should still be treated as 'expensive' (FDIV for example) - by adding a isExpensiveToSpeculativelyExecute wrapper we can keep the current behaviour but still add an x86 override in a future patch when the cost tables are updated to compensate.
2022-09-03 13:12:22 +01:00
Xiang1 Zhang
a808ac2e42 [NFC] Clang-format for CodeGenPrepare.cpp 2022-08-30 13:42:36 +08:00
Simon Pilgrim
f9de13232f [X86] Promote i8/i16 CTTZ (BSF) instructions and remove speculation branch
This patch adds a Type operand to the TLI isCheapToSpeculateCttz/isCheapToSpeculateCtlz callbacks, allowing targets to decide whether branches should occur on a type-by-type/legality basis.

For X86, this patch proposes to allow CTTZ speculation for i8/i16 types that will lower to promoted i32 BSF instructions by masking the operand above the msb (we already do something similar for i8/i16 TZCNT). This required a minor tweak to CTTZ lowering - if the src operand is known never zero (i.e. due to the promotion masking) we can remove the CMOV zero src handling.

Although BSF isn't very fast, most CPUs from the last 20 years don't do that bad a job with it, although there are some annoying passthrough EFLAGS dependencies. Additionally, now that we emit 'REP BSF' in most cases, we are tending towards assuming this will most likely be executed as a TZCNT instruction on any semi-modern CPU.

Differential Revision: https://reviews.llvm.org/D132520
2022-08-24 17:28:18 +01:00
Philip Reames
274f86e7a6 [TTI] Remove OperandValueKind/Properties from getArithmeticInstrCost interface [nfc]
This completes the client side transition to the OperandValueInfo version of this routine.  Backend TTI implementations still use the prior versions for now.
2022-08-22 11:06:32 -07:00
Simon Pilgrim
fdec50182d [CostModel] Replace getUserCost with getInstructionCost
* Replace getUserCost with getInstructionCost, covering all cost kinds.
* Remove getInstructionLatency, it's not implemented by any backends, and we should fold the functionality into getUserCost (now getInstructionCost) to make it easier for targets to handle the cost kinds with their existing cost callbacks.

Original Patch by @samparker (Sam Parker)

Differential Revision: https://reviews.llvm.org/D79483
2022-08-18 11:55:23 +01:00
Kazu Hirata
f5a68feab3 Use llvm::none_of (NFC) 2022-08-14 16:25:39 -07:00
Fangrui Song
7d6017fd31 [TTI] Change new getVectorInstrCost overload to use const reference after D131114
A const reference is preferred over a non-null const pointer.
`Type *` is kept as is to match the other overload.

Reviewed By: davidxl

Differential Revision: https://reviews.llvm.org/D131197
2022-08-04 15:16:51 -07:00
Mingming Liu
bc8f2f3649 [AArch64][TTI][NFC] Overload method 'getVectorInstrCost' to provide vector instruction itself, as a context information for cost estimation.
1) Overloaded (instruction-based) method is a wrapper around the current (opcode-based) method.
2) This patch also changes a few callsites (VectorCombine.cpp,
   SLPVectorizer.cpp, CodeGenPrepare.cpp) to call the overloaded method.
3) This is a split of D128302.

Differential Revision: https://reviews.llvm.org/D131114
2022-08-04 12:58:25 -07:00
Paul Kirth
d434e40f39 [llvm][NFC] Refactor code to use ProfDataUtils
In this patch we replace common code patterns with the use of utility
functions for dealing with profiling metadata. There should be no change
in functionality, as the existing checks should be preserved in all
cases.

Reviewed By: bogner, davidxl

Differential Revision: https://reviews.llvm.org/D128860
2022-08-03 00:09:45 +00:00
Sotiris Apostolakis
995b61cdac [SelectOpti] Auto-disable other cmov optis when the new select-opti pass is enabled
Reviewed By: davidxl

Differential Revision: https://reviews.llvm.org/D129817
2022-08-02 00:19:59 +00:00
Paul Kirth
6e9bab71b6 Revert "[llvm][NFC] Refactor code to use ProfDataUtils"
This reverts commit 300c9a78819b4608b96bb26f9320bea6b8a0c4d0.

We will reland once these issues are ironed out.
2022-07-27 21:38:11 +00:00
Paul Kirth
300c9a7881 [llvm][NFC] Refactor code to use ProfDataUtils
In this patch we replace common code patterns with the use of utility
functions for dealing with profiling metadata. There should be no change
in functionality, as the existing checks should be preserved in all
cases.

Reviewed By: bogner, davidxl

Differential Revision: https://reviews.llvm.org/D128860
2022-07-27 21:13:54 +00:00
Dmitry Vassiliev
e3e63f30a5 [CodeGen] Fixed ambiguous symbol ExtAddrMode in case of NDEBUG and LLVM_ENABLE_DUMP
This patch fixes the following error with MSVC 16.9.2 in case of NDEBUG and LLVM_ENABLE_DUMP:
llvm/lib/CodeGen/CodeGenPrepare.cpp(2581): error C2872: 'ExtAddrMode': ambiguous symbol
llvm/include/llvm/CodeGen/TargetInstrInfo.h(86): note: could be 'llvm::ExtAddrMode'
llvm/lib/CodeGen/CodeGenPrepare.cpp(2447): note: or '`anonymous-namespace'::ExtAddrMode'
llvm/lib/CodeGen/CodeGenPrepare.cpp(2581): error C2039: 'print': is not a member of 'llvm::ExtAddrMode'

Reviewed By: aaron.ballman

Differential Revision: https://reviews.llvm.org/D130426
2022-07-27 00:21:57 +02:00
Kazu Hirata
76e18cc4f6 [llvm] Use llvm::any_of and llvm::none_of (NFC) 2022-07-20 00:36:19 -07:00
Kazu Hirata
9e6d1f4b5d [CodeGen] Qualify auto variables in for loops (NFC) 2022-07-17 01:33:28 -07:00
Tim Besard
a323dfc015 Don't sink ptrtoint/inttoptr sequences into non-noop addrspacecasts.
In https://reviews.llvm.org/D30114, support for mismatching address
spaces was introduced to CodeGenPrepare's optimizeMemoryInst, using
addrspacecast as it was argued that only no-op addrspacecasts would be
considered when constructing the address mode. However, by doing
inttoptr/ptrtoint, it's possible to get CGP to emit an addrspace
that's not actually no-op, introducing a miscompilation:

define void @kernel(i8* %julia_ptr) {
  %intptr = ptrtoint i8* %julia_ptr to i64
  %ptr = inttoptr i64 %intptr to i32 addrspace(3)*

  br label %end
end:

  store atomic i32 1, i32 addrspace(3)* %ptr unordered, align 4
  ret void
}

Gets compiled to:

define void @kernel(i8* %julia_ptr) {
end:
  %0 = addrspacecast i8* %julia_ptr to i32 addrspace(3)*
  store atomic i32 1, i32 addrspace(3)* %0 unordered, align 4
  ret void
}

In the case of NVPTX, this introduces a cvta.to.shared, whereas
leaving out the %end block and branch doesn't trigger this
optimization. This results in illegal memory accesses as seen in
https://github.com/JuliaGPU/CUDA.jl/issues/558

In this change, I introduced a check before doing the pointer cast
that verifies address spaces are the same. If not, it emits a
ptrtoint/inttoptr combination to get a no-op cast between address
spaces. I decided against disallowing ptrtoint/inttoptr with
non-default AS in matchOperationAddr, because now its still possible
to look through multiple sequences of them that ultimately do not
result in a address space mismatch (i.e. the second lit test).
2022-07-16 10:56:42 -04:00
Nuno Lopes
373571dbb4 [NFC] Switch a few uses of undef to poison as placeholders for unreachble code 2022-06-30 23:01:43 +01:00