llvm-project

mirror of https://github.com/llvm/llvm-project.git synced 2025-05-09 06:56:07 +00:00

Author	SHA1	Message	Date
Kazu Hirata	3c09ed006a	[llvm] Use std::nullopt instead of None in comments (NFC) This is part of an effort to migrate from llvm::Optional to std::optional: https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716	2022-12-04 17:12:44 -08:00
Kazu Hirata	998960ee1f	[CodeGen] Use std::nullopt instead of None (NFC) This patch mechanically replaces None with std::nullopt where the compiler would warn if None were deprecated. The intent is to reduce the amount of manual work required in migrating from Optional to std::optional. This is part of an effort to migrate from llvm::Optional to std::optional: https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716	2022-12-02 20:36:08 -08:00
Kazu Hirata	000749d753	[CodeGen] Use std::optional in CodeGenPrepare.cpp (NFC) This is part of an effort to migrate from llvm::Optional to std::optional: https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716	2022-11-26 14:27:19 -08:00
OCHyams	3115e6828c	[Assignment Tracking][25/*] Replace sunk address uses in dbg.assign intrinsics The Assignment Tracking debug-info feature is outlined in this RFC: https://discourse.llvm.org/t/ rfc-assignment-tracking-a-better-way-of-specifying-variable-locations-in-ir Reviewed By: StephenTozer Differential Revision: https://reviews.llvm.org/D136255	2022-11-21 15:50:47 +00:00
Alex Richardson	754d25844a	[CGP] Update MemIntrinsic alignment if possible Previously it was only being done if shouldAlignPointerArgs() returned true, which right now is only true for ARM targets. Updating the argument alignment attributes of memcpy/memset intrinsics if the underlying object has larger alignment can be beneficial even when CGP didn't increase alignment (as can be seen from the test changes), so invert the loop and if condition. Differential Revision: https://reviews.llvm.org/D134281	2022-11-17 11:59:35 +00:00
Haohai Wen	e419620fc2	[CodeGenPrep] Change ValueToSExts from DeseMap to MapVector mergeSExts iterates throught ValueToSExts. Using DenseMap result in unstable optimization path so that output IR may vary even if the input IR is same. Reviewed By: wxiao3 Differential Revision: https://reviews.llvm.org/D137234	2022-11-04 11:15:18 +08:00
David Green	16e4e4ab87	[CodeGenPrep] Handle constants in ConvertPhiType This is a simple addition to the convertPhiTypes in CodeGenPrepare to consider and convert constants as it converts the phi type. Someone fixed the bug in the motivating example, so the undef is now a constant 0. This does mean converting between integer and floating point constants, which may have different materialization. Differential Revision: https://reviews.llvm.org/D135561	2022-10-13 16:41:44 +01:00
Florian Hahn	6b86b481e3	[AArch64] Use tbl for truncating vector FPtoUI conversions. On AArch64, doing the vector truncate separately after the fptoui conversion can be lowered more efficiently using tbl.4, building on D133495. https://alive2.llvm.org/ce/z/T538CC Depends on D133495 Reviewed By: t.p.northover Differential Revision: https://reviews.llvm.org/D133496	2022-09-16 14:57:43 +01:00
Florian Hahn	8491d01cc3	[AArch64] Lower vector trunc using tbl. Similar to using tbl to lower vector ZExts, tbl4 can be used to lower vector truncates. The initial version support i32->i8 conversions. Depends on D120571 Reviewed By: t.p.northover Differential Revision: https://reviews.llvm.org/D133495	2022-09-16 12:42:49 +01:00
Florian Hahn	5871f18827	[AArch64] Lower extending uitofp using tbl. On AArch64, doing the zero-extend separately first can be lowered more efficiently using tbl, building on D120571. https://alive2.llvm.org/ce/z/8Je595 Depends on D120571 Reviewed By: t.p.northover Differential Revision: https://reviews.llvm.org/D133494	2022-09-16 10:20:25 +01:00
Florian Hahn	81a11da762	[CGP,AArch64] Replace zexts with shuffle that can be lowered using tbl. This patch extends CodeGenPrepare to lower zext v16i8 -> v16i32 in loops using a wide shuffle creating a v64i8 vector, selecting groups of 3 zero elements and an element from the input. This is profitable on AArch64 where such shuffles can be lowered to tbl instructions, but only in loops, because it requires materializing 4 masks, which can be done in the loop preheader. This is the only reason the transform is part of CGP. If there's a better alternative I missed, please let me know. The same goes for the shouldReplaceZExtWithShuffle hook which guards this. I am not sure if this transform will be beneficial on other targets, but it seems like there is no way other convenient way. This improves the generated code for loops like the one below in combination with D96522. int foo(uint8_t p, int N) { unsigned long long sum = 0; for (int i = 0; i < N ; i++, p++) { unsigned int v = p; sum += (v < 127) ? v : 256 - v; } return sum; } https://clang.godbolt.org/z/Wco866MjY Reviewed By: t.p.northover Differential Revision: https://reviews.llvm.org/D120571	2022-09-15 19:18:13 +01:00
Xiang1 Zhang	16743c9534	[CodeGen] Limit building time in CodeGenPrepare for huge function Details: Currently CodeGenPrepare is very time consuming in handling big functions. Old Algorithm : It iterate each BB in function, and go on handle very instructions in BB. Due to some instruction optimizations may affect the BBs' dominate tree. The old logic will re-iterate and try optimize for each BB. Suppose we have a big function with 20000 BBs, If we handled the last BB with fine tuning the dominate tree. We need totally re-iterate and try optimize the 20000 BBs from the beginning. The Complex is near N! And we really encounter somes big tests (> 20000 BBs) that cost more than 30 mins in this pass. (Debug version compiler will cost 2 hours here) What this patch do for huge function ? It mainly changes the iteration way for optimization. 1 We do optimizeBlock for each BB (that is same with old way). And, in the meaning time, If BB is changed/updated in the optimization, it will be put into FreshBBs (try do optimizeBlock again). The new created BB at previous iteration will also put into FreshBBs. 2 For the BBs which not updated at previous iteration, we directly skip it. Strictly speaking, here may miss some opportunity, but the probability is very small. 3 For Instructions in single BB, we do optimizeInst for each instruction. If optimizeInst change the instruction dominator in this BB, rather than break and go back to optimize the first BB (the old way), we directly iterate instructions (to do optimizeInst) in this updated BB again (the new way). What this patch do for small/normal (not huge) function ? It is same with the Old Algorithm. (NFC) Reviewed By: LuoYuanke Differential Revision: https://reviews.llvm.org/D129352	2022-09-07 10:05:40 +08:00
Simon Pilgrim	e2d140e9c3	[TTI] Add isExpensiveToSpeculativelyExecute wrapper CGP uses a raw `getInstructionCost(I, TargetTransformInfo::TCK_SizeAndLatency) >= TCC_Expensive` check to see if its better to move an expensive instruction used in a select behind a branch instead. This is causing issues with upcoming improvements to TCK_SizeAndLatency costs on X86 as we need to use TCK_SizeAndLatency as an uop count (so its compatible with various target-specific buffer sizes - see D132288), but we can have instructions that have a low TCK_SizeAndLatency value but should still be treated as 'expensive' (FDIV for example) - by adding a isExpensiveToSpeculativelyExecute wrapper we can keep the current behaviour but still add an x86 override in a future patch when the cost tables are updated to compensate.	2022-09-03 13:12:22 +01:00
Xiang1 Zhang	a808ac2e42	[NFC] Clang-format for CodeGenPrepare.cpp	2022-08-30 13:42:36 +08:00
Simon Pilgrim	f9de13232f	[X86] Promote i8/i16 CTTZ (BSF) instructions and remove speculation branch This patch adds a Type operand to the TLI isCheapToSpeculateCttz/isCheapToSpeculateCtlz callbacks, allowing targets to decide whether branches should occur on a type-by-type/legality basis. For X86, this patch proposes to allow CTTZ speculation for i8/i16 types that will lower to promoted i32 BSF instructions by masking the operand above the msb (we already do something similar for i8/i16 TZCNT). This required a minor tweak to CTTZ lowering - if the src operand is known never zero (i.e. due to the promotion masking) we can remove the CMOV zero src handling. Although BSF isn't very fast, most CPUs from the last 20 years don't do that bad a job with it, although there are some annoying passthrough EFLAGS dependencies. Additionally, now that we emit 'REP BSF' in most cases, we are tending towards assuming this will most likely be executed as a TZCNT instruction on any semi-modern CPU. Differential Revision: https://reviews.llvm.org/D132520	2022-08-24 17:28:18 +01:00
Philip Reames	274f86e7a6	[TTI] Remove OperandValueKind/Properties from getArithmeticInstrCost interface [nfc] This completes the client side transition to the OperandValueInfo version of this routine. Backend TTI implementations still use the prior versions for now.	2022-08-22 11:06:32 -07:00
Simon Pilgrim	fdec50182d	[CostModel] Replace getUserCost with getInstructionCost * Replace getUserCost with getInstructionCost, covering all cost kinds. * Remove getInstructionLatency, it's not implemented by any backends, and we should fold the functionality into getUserCost (now getInstructionCost) to make it easier for targets to handle the cost kinds with their existing cost callbacks. Original Patch by @samparker (Sam Parker) Differential Revision: https://reviews.llvm.org/D79483	2022-08-18 11:55:23 +01:00
Kazu Hirata	f5a68feab3	Use llvm::none_of (NFC)	2022-08-14 16:25:39 -07:00
Fangrui Song	7d6017fd31	[TTI] Change new getVectorInstrCost overload to use const reference after D131114 A const reference is preferred over a non-null const pointer. `Type *` is kept as is to match the other overload. Reviewed By: davidxl Differential Revision: https://reviews.llvm.org/D131197	2022-08-04 15:16:51 -07:00
Mingming Liu	bc8f2f3649	[AArch64][TTI][NFC] Overload method 'getVectorInstrCost' to provide vector instruction itself, as a context information for cost estimation. 1) Overloaded (instruction-based) method is a wrapper around the current (opcode-based) method. 2) This patch also changes a few callsites (VectorCombine.cpp, SLPVectorizer.cpp, CodeGenPrepare.cpp) to call the overloaded method. 3) This is a split of D128302. Differential Revision: https://reviews.llvm.org/D131114	2022-08-04 12:58:25 -07:00
Paul Kirth	d434e40f39	[llvm][NFC] Refactor code to use ProfDataUtils In this patch we replace common code patterns with the use of utility functions for dealing with profiling metadata. There should be no change in functionality, as the existing checks should be preserved in all cases. Reviewed By: bogner, davidxl Differential Revision: https://reviews.llvm.org/D128860	2022-08-03 00:09:45 +00:00
Sotiris Apostolakis	995b61cdac	[SelectOpti] Auto-disable other cmov optis when the new select-opti pass is enabled Reviewed By: davidxl Differential Revision: https://reviews.llvm.org/D129817	2022-08-02 00:19:59 +00:00
Paul Kirth	6e9bab71b6	Revert "[llvm][NFC] Refactor code to use ProfDataUtils" This reverts commit 300c9a78819b4608b96bb26f9320bea6b8a0c4d0. We will reland once these issues are ironed out.	2022-07-27 21:38:11 +00:00
Paul Kirth	300c9a7881	[llvm][NFC] Refactor code to use ProfDataUtils In this patch we replace common code patterns with the use of utility functions for dealing with profiling metadata. There should be no change in functionality, as the existing checks should be preserved in all cases. Reviewed By: bogner, davidxl Differential Revision: https://reviews.llvm.org/D128860	2022-07-27 21:13:54 +00:00
Dmitry Vassiliev	e3e63f30a5	[CodeGen] Fixed ambiguous symbol ExtAddrMode in case of NDEBUG and LLVM_ENABLE_DUMP This patch fixes the following error with MSVC 16.9.2 in case of NDEBUG and LLVM_ENABLE_DUMP: llvm/lib/CodeGen/CodeGenPrepare.cpp(2581): error C2872: 'ExtAddrMode': ambiguous symbol llvm/include/llvm/CodeGen/TargetInstrInfo.h(86): note: could be 'llvm::ExtAddrMode' llvm/lib/CodeGen/CodeGenPrepare.cpp(2447): note: or '`anonymous-namespace'::ExtAddrMode' llvm/lib/CodeGen/CodeGenPrepare.cpp(2581): error C2039: 'print': is not a member of 'llvm::ExtAddrMode' Reviewed By: aaron.ballman Differential Revision: https://reviews.llvm.org/D130426	2022-07-27 00:21:57 +02:00
Kazu Hirata	76e18cc4f6	[llvm] Use llvm::any_of and llvm::none_of (NFC)	2022-07-20 00:36:19 -07:00
Kazu Hirata	9e6d1f4b5d	[CodeGen] Qualify auto variables in for loops (NFC)	2022-07-17 01:33:28 -07:00
Tim Besard	a323dfc015	Don't sink ptrtoint/inttoptr sequences into non-noop addrspacecasts. In https://reviews.llvm.org/D30114, support for mismatching address spaces was introduced to CodeGenPrepare's optimizeMemoryInst, using addrspacecast as it was argued that only no-op addrspacecasts would be considered when constructing the address mode. However, by doing inttoptr/ptrtoint, it's possible to get CGP to emit an addrspace that's not actually no-op, introducing a miscompilation: define void @kernel(i8* %julia_ptr) { %intptr = ptrtoint i8* %julia_ptr to i64 %ptr = inttoptr i64 %intptr to i32 addrspace(3)* br label %end end: store atomic i32 1, i32 addrspace(3)* %ptr unordered, align 4 ret void } Gets compiled to: define void @kernel(i8* %julia_ptr) { end: %0 = addrspacecast i8* %julia_ptr to i32 addrspace(3)* store atomic i32 1, i32 addrspace(3)* %0 unordered, align 4 ret void } In the case of NVPTX, this introduces a cvta.to.shared, whereas leaving out the %end block and branch doesn't trigger this optimization. This results in illegal memory accesses as seen in https://github.com/JuliaGPU/CUDA.jl/issues/558 In this change, I introduced a check before doing the pointer cast that verifies address spaces are the same. If not, it emits a ptrtoint/inttoptr combination to get a no-op cast between address spaces. I decided against disallowing ptrtoint/inttoptr with non-default AS in matchOperationAddr, because now its still possible to look through multiple sequences of them that ultimately do not result in a address space mismatch (i.e. the second lit test).	2022-07-16 10:56:42 -04:00
Nuno Lopes	373571dbb4	[NFC] Switch a few uses of undef to poison as placeholders for unreachble code	2022-06-30 23:01:43 +01:00
Craig Topper	44b456e5f0	[CodeGenPrepare] Avoid double map lookup. NFCI	2022-06-26 10:47:14 -07:00
Kazu Hirata	4271a1ff33	[llvm] Call *set::insert without checking membership first (NFC)	2022-06-18 10:17:22 -07:00
Guillaume Chatelet	6725d80640	[NFC][Alignment] Use Align in shouldAlignPointerArgs	2022-06-14 10:56:36 +00:00
Nikita Popov	c10921fa1a	[CGP] Also freeze ctlz/cttz operand when despeculating D125887 changed the ctlz/cttz despeculation transform to insert a freeze for the introduced branch on zero. While this does fix the "branch on poison" issue, we may still get in trouble if we pick a different value for the branch and for the ctz argument (i.e. non-zero for the branch, but zero for the ctz). To avoid this, we should use the same frozen value in both positions. This does cause a regression in RISCV codegen by introducing an additional sext. The DAG looks like this: t0: ch = EntryToken t2: i64,ch = CopyFromReg t0, Register:i64 %3 t4: i64 = AssertSext t2, ValueType:ch:i32 t23: i64 = freeze t4 t9: ch = CopyToReg t0, Register:i64 %0, t23 t16: ch = CopyToReg t0, Register:i64 %4, Constant:i64<32> t18: ch = TokenFactor t9, t16 t25: i64 = sign_extend_inreg t23, ValueType:ch:i32 t24: i64 = setcc t25, Constant:i64<0>, seteq:ch t28: i64 = and t24, Constant:i64<1> t19: ch = brcond t18, t28, BasicBlock:ch<cond.end 0x8311f68> t21: ch = br t19, BasicBlock:ch<cond.false 0x8311e80> I don't see a really obvious way to improve this, as we can't push the freeze past the AssertSext (which may produce poison). Differential Revision: https://reviews.llvm.org/D126638	2022-06-10 09:46:10 +02:00
Simon Moll	b8c2781ff6	[NFC] format InstructionSimplify & lowerCaseFunctionNames Clang-format InstructionSimplify and convert all "FunctionName"s to "functionName". This patch does touch a lot of files but gets done with the cleanup of InstructionSimplify in one commit. This is the alternative to the less invasive clang-format only patch: D126783 Reviewed By: spatel, rengolin Differential Revision: https://reviews.llvm.org/D126889	2022-06-09 16:10:08 +02:00
Chuanqi Xu	0e10f12844	[NFC] Remove commented cerr debugging loggings There are some unused cerr debugging loggings in the codes. It is weird to remain such commented debug helpers in the product.	2022-06-08 15:58:06 +08:00
Fangrui Song	d86a206f06	Remove unneeded cl::ZeroOrMore for cl::opt/cl::list options	2022-06-05 00:31:44 -07:00
Fangrui Song	557efc9a8b	[llvm] Remove unneeded cl::ZeroOrMore for cl::opt options. NFC Some cl::ZeroOrMore were added to avoid the `may only occur zero or one times!` error. More were added due to cargo cult. Since the error has been removed, cl::ZeroOrMore is unneeded. Also remove cl::init(false) while touching the lines.	2022-06-03 21:59:05 -07:00
Rahman Lavaee	08cc058518	Reland "[Propeller] Promote functions with propeller profiles to .text.hot." This relands commit 4d8d2580c53e130c3c3dd3877384301e3c495554. The major change here is using 'addUsedIfAvailable<BasicBlockSectionsProfileReader>()` to make sure we don't change the pipeline tests. Differential Revision: https://reviews.llvm.org/D126518	2022-05-26 19:53:14 -07:00
Rahman Lavaee	3aa249329f	Revert "[Propeller] Promote functions with propeller profiles to .text.hot." This reverts commit 4d8d2580c53e130c3c3dd3877384301e3c495554.	2022-05-26 18:45:40 -07:00
Rahman Lavaee	4d8d2580c5	[Propeller] Promote functions with propeller profiles to .text.hot. Today, text section prefixes (none, .unlikely, .hot, and .unkown) are determined based on PGO profile. However, Propeller may deem a function hot when PGO doesn't. Besides, when `-Wl,-keep-text-section-prefix=true` Propeller cannot enforce a global section ordering as the linker can only reorder sections within each output section (.text, .text.hot, .text.unlikely). This patch promotes all functions with Propeller profiles (functions listed in the basic-block-sections profile) to .text.hot. The feature is hidden behind the flag `--bbsections-guided-section-prefix` which defaults to `true`. The new implementation refactors the parsing of basic block sections profile into a new `BasicBlockSectionsProfileReader` analysis pass. This allows us to use the information earlier in `CodeGenPrepare` in order to set the functions text prefix. `BasicBlockSectionsProfileReader` will be used both by `BasicBlockSections` pass and `CodeGenPrepare`. Differential Revision: https://reviews.llvm.org/D122930	2022-05-26 16:23:21 -07:00
Nikita Popov	5126c38012	[CGP] Freeze condition when despeculating ctlz/cttz Freeze the condition of the newly introduced conditional branch, to avoid immediate undefined behavior if the input to ctlz/cttz was originally poison. Differential Revision: https://reviews.llvm.org/D125887	2022-05-23 11:01:18 +02:00
Matthias Braun	8d03c49f49	Extend switch condition in optimizeSwitchPhiConst when free In a case like: switch((i32)x) { case 42: phi((i64)42, ...); } replace `(i64)42` with `zext(x)` when we can do so for free. This fixes a part of https://github.com/llvm/llvm-project/issues/55153 Differential Revision: https://reviews.llvm.org/D124897	2022-05-18 16:23:53 -07:00
Nikita Popov	ed1cb01baf	[IRBuilder] Add IsInBounds parameter to CreateGEP() We commonly want to create either an inbounds or non-inbounds GEP based on a boolean value, e.g. when preserving inbounds from existing GEPs. Directly accept such a boolean in the API, rather than requiring a ternary between CreateGEP and CreateInBoundsGEP. This change is not entirely NFC, because we now preserve an inbounds flag in a constant expression edge-case in InstCombine.	2022-05-13 14:30:55 +02:00
Craig Topper	edbf390d10	[CodeGenPrepare] Use const reference to avoid unnecessary APInt copy. NFC Spotted while looking at Matthias' patches. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D124985	2022-05-11 12:06:45 -07:00
Matthias Braun	de9ad98d2d	Fix endless loop in optimizePhiConst with integer constant switch condition Avoid endless loop in degenerate case with an integer constant as switch condition as reported in https://reviews.llvm.org/D124552	2022-05-11 08:49:01 -07:00
Matthias Braun	f0ea9c9cec	CodeGenPrepare: Replace constant PHI arguments with switch condition value We often see code like the following after running SCCP: switch (x) { case 42: phi(42, ...); } This tends to produce bad code as we currently materialize the constant phi-argument in the switch-block. This increases register pressure and if the pattern repeats for `n` case statements, we end up generating `n` constant values. This changes CodeGenPrepare to catch this pattern and revert it back to: switch (x) { case 42: phi(x, ...); } Differential Revision: https://reviews.llvm.org/D124552	2022-05-10 10:00:10 -07:00
Matthias Braun	cd19af74c0	Avoid 8 and 16bit switch conditions on x86 This adds a `TargetLoweringBase::getSwitchConditionType` callback to give targets a chance to control the type used in `CodeGenPrepare::optimizeSwitchInst`. Implement callback for X86 to avoid i8 and i16 types where possible as they often incur extra zero-extensions. This is NFC for non-X86 targets. Differential Revision: https://reviews.llvm.org/D124894	2022-05-10 10:00:10 -07:00
Jonas Paulsson	46f83caebc	[InlineAsm] Add support for address operands ("p"). This patch adds support for inline assembly address operands using the "p" constraint on X86 and SystemZ. This was in fact broken on X86 (see example at https://reviews.llvm.org/D110267, Nov 23). These operands should probably be treated the same as memory operands by CodeGenPrepare, which have been commented with "TODO" there. Review: Xiang Zhang and Ulrich Weigand Differential Revision: https://reviews.llvm.org/D122220	2022-04-13 12:50:21 +02:00
serge-sans-paille	989f1c72e0	Cleanup codegen includes This is a (fixed) recommit of https://reviews.llvm.org/D121169 after: 1061034926 before: 1063332844 Discourse thread: https://discourse.llvm.org/t/include-what-you-use-include-cleanup Differential Revision: https://reviews.llvm.org/D121681	2022-03-16 08:43:00 +01:00
Nico Weber	a278250b0f	Revert "Cleanup codegen includes" This reverts commit 7f230feeeac8a67b335f52bd2e900a05c6098f20. Breaks CodeGenCUDA/link-device-bitcode.cu in check-clang, and many LLVM tests, see comments on https://reviews.llvm.org/D121169	2022-03-10 07:59:22 -05:00

1 2 3 4 5 ...

665 Commits