llvm-project

mirror of https://github.com/llvm/llvm-project.git synced 2025-04-27 21:16:05 +00:00

Author	SHA1	Message	Date
Matthias Braun	7ac63f004c	Bump coalescing limit This bumps the "large-interval-freq-threshold" limit in the register coalescer to 256. The limit was introduced in https://reviews.llvm.org/D59143 without much justify for the particular value "100", so I hope bumping it is ok. This change is motivated by bad codegen for the popular crc32c algorithm; the code is often based/copied from this implementation: https://github.com/htot/crc32c/blob/master/crc32c/crc32intelc.cc which uses a duffs-device pattern with 128 switch-cases. There are examples in RocksDB (https://github.com/facebook/rocksdb/blob/main/util/crc32c.cc) and Folly (https://github.com/facebook/folly/blob/main/folly/hash/detail/Crc32cDetail.cpp) which are important use cases for us. Differential Revision: https://reviews.llvm.org/D150994	2023-05-24 09:15:05 -07:00
Jay Foad	0ea5eb143c	[RegisterCoalescer] Fix updating LiveIntervals in joinReservedPhysReg Live intervals for physical registers are calculated lazily on demand. In a case like this: 16B %0:gpr32 = IMPLICIT_DEF 32B $wzr = COPY %0 if the live interval for $wzr did not already exist then the update code in joinReservedPhysReg would create it with a definition at 32B, which would remain even after the COPY was deleted. Differential Revision: https://reviews.llvm.org/D151314	2023-05-24 15:19:05 +01:00
Jay Foad	2dad1249d2	[MachineVerifier] Verify liveins for live-through segments Differential Revision: https://reviews.llvm.org/D149947	2023-05-24 15:17:02 +01:00
Sergei Barannikov	d41f6cff03	[CodeGen] Skip null physical register in AntiDepBreaker (NFCI) D151036 adds an assertions that prohibits iterating over sub- and super-registers of a null register. This is already the case when iterating over register units of a null register, and worked by accident for sub- and super-registers. The only place where the assertion is currently triggering is in CriticalAntiDepBreaker::ScanInstruction. Other places are changed in case new assertions are added and should be harmless otherwise. Differential Revision: https://reviews.llvm.org/D151288	2023-05-24 12:58:29 +03:00
Bing1 Yu	c8466ab7cb	[LegalizeType][X86] Support WidenVecRes_AssertZext and SplitVecRes_AssertZext for ISD::AssertZext during LegalizeType procedure Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D150941	2023-05-24 10:17:08 +08:00
Rahman Lavaee	9c3c6f6aca	[Propeller] Add HasIndirectBranch to BBEntry::Metadata. This information helps to avoid considering cloning for blocks with indirect branches. Reviewed By: jhenderson Differential Revision: https://reviews.llvm.org/D150611	2023-05-24 02:15:38 +00:00
Shubham Sandeep Rastogi	775258d758	Add support for salvaging debug info from icmp instrcuctions. salvageDebugInfo is a function that allows us to reatin debug info for instructions that have been optimized out. Currently, it doesn't support salvaging the debug information from icmp instrcutions, but DWARF expressions can emulate an icmp by using the DWARF conditional expressions. This patch adds support for salvaging debug information from icmp instructions. Differential Revision: https://reviews.llvm.org/D150216	2023-05-23 15:31:31 -07:00
Kyle Huey	3be667ae5a	[X86] Use the CFA when appropriate for better variable locations around calls. Without frame pointers, the locations of variables on the stack are emitted relative to the stack pointer (via the stack pointer being the value of DW_AT_frame_base on the subprogram). If a call modifies the stack pointer this results in the locations being wrong and the debugger displaying the wrong values for variables. By using DW_OP_call_frame_cfa in these situations the emitted location for the variable will automatically handle changes in the stack pointer (provided LLVM is emitting the correct CFI directives elsewhere, of course). The CFA needs to be adjusted for the size of the stack frame (including the return address) to allow the variable locations themselves to remain unchanged by this patch. Certain LLDB features cannot cope with DW_OP_call_frame_cfa, so this change is heuristically limited to the cases where it's necessary for correctness to minimize the fallout there. Reviewed By: #debug-info, scott.linder, jryans, jmorse Differential Revision: https://reviews.llvm.org/D143463	2023-05-23 20:24:55 +00:00
Joshua Cranmer	3ac1cef866	[CodeGen] Fix crash in CodeGenPrepare::optimizeGatherScatterInst. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D151141	2023-05-23 15:02:03 -04:00
Fangrui Song	e018cbf720	[IR] Make stack protector symbol dso_local according to -f[no-]direct-access-external-data There are two motivations. `-fno-pic -fstack-protector -mstack-protector-guard=global` created `__stack_chk_guard` is referenced directly on all ELF OSes except FreeBSD. This patch allows referencing the symbol indirectly with -fno-direct-access-external-data. Some Linux kernel folks want `-fno-pic -fstack-protector -mstack-protector-guard-reg=gs -mstack-protector-guard-symbol=__stack_chk_guard` created `__stack_chk_guard` to be referenced directly, avoiding R_X86_64_REX_GOTPCRELX (even if the relocation may be optimized out by the linker). https://github.com/llvm/llvm-project/issues/60116 Why they need this isn't so clear to me. --- Add module flag "direct-access-external-data" and set the dso_local property of the stack protector symbol. The module flag can benefit other LLVMCodeGen synthesized symbols that are not represented in LLVM IR. Nowadays, with `-fno-pic` being uncommon, ideally we should set "direct-access-external-data" when it is true. However, doing so would require ~90 clang/test tests to be updated, which are too much. As a compromise, we set "direct-access-external-data" only when it's different from the implied default value. Reviewed By: nickdesaulniers Differential Revision: https://reviews.llvm.org/D150841	2023-05-23 09:49:57 -07:00
Craig Topper	139392c0a5	[LegalizeTypes][ARM][AArch6][RISCV][VE][WebAssembly] Add special case for smin(X, -1) and smax(X, 0) to ExpandIntRes_MINMAX. We can compute a simpler expression for Lo for these cases. This is an alternative for the test cases in D151180 that works for more targets. This is similar to some of the special cases we have for expanding setcc operands. Differential Revision: https://reviews.llvm.org/D151182	2023-05-23 09:19:55 -07:00
Elliot Goodrich	ac73c48e09	[llvm] Reduce ComplexDeinterleavingPass.h includes Remove the unnecessary `"llvm/IR/PatternMatch.h"` include directive from `ComplexDeinterleavingPass.h` and move it to the corresponding source file. Add missing includes that were transitively included by this header to 3 other source files. This reduces the total number of preprocessing tokens across the LLVM source files in `lib` from (roughly) 1,964,876,961 to 1,935,091,611 - a reduction of ~1.52%. This should result in a small improvement in compilation time.	2023-05-20 17:49:18 +01:00
Fangrui Song	46f366494f	-fsanitize=function: use type hashes instead of RTTI objects Currently we use RTTI objects to check type compatibility. To support non-unique RTTI objects, commit 5745eccef54ddd3caca278d1d292a88b2281528b added a `checkTypeInfoEquality` string matching to the runtime. The scheme is inefficient. ``` _Z1fv: .long 846595819 # jmp .long .L__llvm_rtti_proxy-_Z3funv ... main: ... # Load the second word (pointer to the RTTI object) and dereference it. movslq 4(%rsi), %rax movq (%rax,%rsi), %rdx # Is it the desired typeinfo object? leaq _ZTIFvvE(%rip), %rax # If not, call __ubsan_handle_function_type_mismatch_v1, which may recover if checkTypeInfoEquality allows cmpq %rax, %rdx jne .LBB1_2 ... .section .data.rel.ro,"aw",@progbits .p2align 3, 0x0 .L__llvm_rtti_proxy: .quad _ZTIFvvE ``` Let's replace the indirect `_ZTI` pointer with a type hash similar to `-fsanitize=kcfi`. ``` _Z1fv: .long 3238382334 .long 2772461324 # type hash main: ... # Load the second word (callee type hash) and check whether it is expected cmpl $-1522505972, -4(%rax) # If not, fail: call __ubsan_handle_function_type_mismatch jne .LBB2_2 ``` The RTTI object derives its name from `clang::MangleContext::mangleCXXRTTI`, which uses `mangleType`. `mangleTypeName` uses `mangleType` as well. So the type compatibility change is high-fidelity. Since we no longer need RTTI pointers in `__ubsan::__ubsan_handle_function_type_mismatch_v1`, let's switch it back to version 0, the original signature before e215996a2932ed7c472f4e94dc4345b30fd0c373 (2019). `__ubsan::__ubsan_handle_function_type_mismatch_abort` is not recoverable, so we can revert some changes from e215996a2932ed7c472f4e94dc4345b30fd0c373. Reviewed By: samitolvanen Differential Revision: https://reviews.llvm.org/D148785	2023-05-20 08:24:20 -07:00
Elliot Goodrich	b7fb2a3fec	Revert "[llvm] Reduce ComplexDeinterleavingPass.h includes" This reverts commit 058ca5c07106d38ad66e3ec4972a613a64e88151.	2023-05-20 14:21:07 +01:00
Elliot Goodrich	058ca5c071	[llvm] Reduce ComplexDeinterleavingPass.h includes Remove the unnecessary `"llvm/IR/PatternMatch.h"` include directive from `ComplexDeinterleavingPass.h` and move it to the corresponding source file. Add missing includes that were transitively included by this header to 2 other source files. This reduces the total number of preprocessing tokens across the LLVM source files in `lib` from (roughly) 1,964,876,961 to 1,935,091,611 - a reduction of ~1.52%. This should result in a small improvement in compilation time. Differential Revision: https://reviews.llvm.org/D150514	2023-05-20 13:36:50 +01:00
Matt Arsenault	a5e03972f7	GlobalISel: Move fconstant matching into tablegen I don't really understand what the point of wip_match_opcode is. It doesn't seem to have any purpose other than to list opcodes to have all the logic in pure C++. You can't seem to use it to select multiple opcodes in the same way you use match. Something is wrong with it, since the match emitter prints "errors" if an opcode is covered by wip_match_opcode and then appears in another pattern. For exmaple with this patch, you see this several times in the build: error: Leaf constant_fold_fabs is unreachable note: Leaf idempotent_prop will have already matched The combines are actually produced and the tests for them do pass, so this seems to just be a broken warning.	2023-05-19 22:44:12 +01:00
Jay Foad	8fcb4fa847	[RegScavenger] Change scavengeRegister to pick registers in allocation order This matches what scavengeRegisterBackwards does. This is in preparation for converting most uses of scavengeRegister to scavengeRegisterBackwards, to reduce test case churn when that lands and to help with bisection if anything goes wrong. Differential Revision: https://reviews.llvm.org/D150792	2023-05-19 21:39:19 +01:00
Craig Topper	3fb1041165	[SelectionDAGBuilder] Use getPtrExtOrTrunc in place of getZExtOrTrunc. NFC This getZExtOrTrunc seems to have been added when getPtrExtOrTrunc was introduced. getPtrExtOrTrunc is currently equivalent to getZExtOrTrunc, but could be changed for some target in the future. Reviewed By: t.p.northover Differential Revision: https://reviews.llvm.org/D149680	2023-05-19 13:08:39 -07:00
eopXD	c8eb535aed	[1/11][IR] Permit load/store/alloca for struct of the same scalable vector type This patch-set aims to simplify the existing RVV segment load/store intrinsics to use a type that represents a tuple of vectors instead. To achieve this, first we need to relax the current limitation for an aggregate type to be a target of load/store/alloca when the aggregate type contains homogeneous scalable vector types. Then to adjust the prolog of an LLVM function during lowering to clang. Finally we re-define the RVV segment load/store intrinsics to use the tuple types. The pull request under the RVV intrinsic specification is riscv-non-isa/rvv-intrinsic-doc#198 --- This is the 1st patch of the patch-set. This patch is originated from D98169. This patch allows aggregate type (StructType) that contains homogeneous scalable vector types to be a target of load/store/alloca. The RFC of this patch was posted in LLVM Discourse. https://discourse.llvm.org/t/rfc-ir-permit-load-store-alloca-for-struct-of-the-same-scalable-vector-type/69527 The main changes in this patch are: Extend `StructLayout::StructSize` from `uint64_t` to `TypeSize` to accommodate an expression of scalable size. Allow `StructType:isSized` to also return true for homogeneous scalable vector types. Let `Type::isScalableTy` return true when `Type` is `StructType` and contains scalable vectors Extra description is added in the LLVM Language Reference Manual on the relaxation of this patch. Authored-by: Hsiangkai Wang <kai.wang@sifive.com> Co-Authored-by: eop Chen <eop.chen@sifive.com> Reviewed By: craig.topper, nikic Differential Revision: https://reviews.llvm.org/D146872	2023-05-19 09:39:36 -07:00
Fangrui Song	ad31a2dcad	Change -fsanitize=function to place two words before the function entry The current implementation of -fsanitize=function places two words (the prolog signature and the RTTI proxy) at the function entry, which makes the feature incompatible with Intel Indirect Branch Tracking (IBT) that needs an ENDBR instruction at the function entry. To allow the combination, move the two words before the function entry, similar to -fsanitize=kcfi. Armv8.5 Branch Target Identification (BTI) has a similar requirement. Note: for IBT and BTI, whether a function gets a marker instruction at the entry generally cannot be assumed (it can be disabled by a function attribute or stronger LTO optimizations). It is extremely unlikely for two words preceding a function entry to be inaccessible. One way to achieve this is by ensuring that a function is aligned at a page boundary and making the preceding page unmapped or unreadable. This is not reasonable for application or library code. (Think: the first text section has crt* code not instrumented by -fsanitize=function.) We use 0xc105cafe for all targets. .long 0xc105cafe disassembles to invalid instructions on all architectures I have tested, except Power where it is `lfs 8, -13570(5)` (Load Floating-Point with a weird offset, unlikely to be used in real code). --- For the removed function in AsmPrinter.cpp, remove an assert: `mdconst::extract` already asserts non-nullness. For compiler-rt/test/ubsan/TestCases/TypeCheck/Function/function.cpp, when the function doesn't have prolog/epilog (-O1 and above), after moving the two words, the address of the function equals the address of ret instruction, so symbolizing the function will additionally get a non-zero column number. Adjust the test to allow an optional column number. ``` .long 3238382334 .long .L__llvm_rtti_proxy-_Z1fv _Z1fv: // symbolizing here retrieves the line table entry from the second .loc .file 0 ... .loc 0 1 0 .cfi_startproc .loc 0 2 1 prologue_end retq ``` Reviewed By: peter.smith Differential Revision: https://reviews.llvm.org/D148665	2023-05-19 07:50:29 -07:00
Stephen Tozer	0670470a8d	[DebugInfo][InstrRef] Handle non-directly reachable DBG_PHIs in LiveDebugValues Fixes: https://github.com/llvm/llvm-project/issues/62725 This patch fixes an error in which a DBG_INSTR_REF referring to a DBG_PHI in a block that is not directly reachable from the entry block results in a crash during LiveDebugValues. Note that this fix prevents a crash from occurring, but will give undef locations to users of these PHIs even if a valid location exists. Reviewed By: jmorse Differential Revision: https://reviews.llvm.org/D150707	2023-05-19 11:29:30 +01:00
Thomas Symalla	b819fd7e2c	[NFC] Fix typo in CodeGenPrepare.cpp	2023-05-19 11:27:28 +02:00
Heejin Ahn	3eccb40fa9	[RegisterCoalescer] Remove DbgMergedVRegNums (NFC) Not sure what this was originally intended for, but this seems to be unused. It didn't seem to be used when it was first added in D64630 either. Reviewed By: jmorse Differential Revision: https://reviews.llvm.org/D150606	2023-05-18 16:02:51 -07:00
Heejin Ahn	2dd349428b	[DebugInfo][InstrRef] Prettyprint metadata Some metadata prettyprinting, including variable prettyprinting and debug line info comments, is currently only supported for `DBG_VALUE`. This allows `DBG_INSTR_REF` can be printed in the same way. Reviewed By: jmorse Differential Revision: https://reviews.llvm.org/D150620	2023-05-18 16:01:42 -07:00
Amaury Séchet	87bf2bff05	[NFC][DAG] Simplify a giant expression in visitMul.	2023-05-18 18:58:07 +00:00
Ramkumar Ramachandra	b4038fb72f	MachineTraceMetrics: modernize loops (NFC) Differential Revision: https://reviews.llvm.org/D150854	2023-05-18 12:11:43 +01:00
OCHyams	6c088972d2	[DebugInfo][SelectionDAG] Do not drop dbg intrinsics with empty metadata locs Without this patch SelectionDAG silently drops dbg.values using `!{}` operands. Related to https://discourse.llvm.org/t/auto-undef-debug-uses-of-a-deleted-value This causes assignment-tracking to behaviour to match non-assignment-tracking behaviour after a recent change (see D140990). Reviewed By: jmorse Differential Revision: https://reviews.llvm.org/D150767	2023-05-18 10:08:37 +01:00
Matt Arsenault	7f54b38e28	GlobalISel: Refactor unary FP op constant folding	2023-05-18 08:33:43 +01:00
Vitaly Buka	e06dac07a1	[LiveDebugValues] Initialized variable to avoid msan reports Reproducible with =-1 and assert: https://reviews.llvm.org/P8309 Reviewed By: jmorse Differential Revision: https://reviews.llvm.org/D150420	2023-05-18 00:26:40 -07:00
Sergei Barannikov	da42b2846c	[CodeGen] Support allocating of arguments by decreasing offsets Previously, `CCState::AllocateStack` always allocated stack space by increasing offsets. For targets with stack growing up (away from zero) it is more convenient to allocate arguments by decreasing offsets, so that the first argument is at the top of the stack. This is important when calling a function with variable number of arguments: the callee does not know the size of the stack, but must be able to access "fixed" arguments. For that to work, the "fixed" arguments should have fixed offsets relative to the stack top, i.e. the variadic arguments area should be at the stack bottom (at lowest addresses). The in-tree target with stack growing up is AMDGPU, but it allocates arguments by increasing addresses. It does not support variadic arguments. A drive-by change is to promote stack size/offset to 64-bit integer. This is what MachineFrameInfo expects. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D149575	2023-05-17 21:51:52 +03:00
Sergei Barannikov	01a7967447	[CodeGen] Replace CCState's getNextStackOffset with getStackSize (NFC) The term "next stack offset" is misleading because the next argument is not necessarily allocated at this offset due to alignment constrains. It also does not make much sense when allocating arguments at negative offsets (introduced in a follow-up patch), because the returned offset would be past the end of the next argument. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D149566	2023-05-17 21:51:45 +03:00
Philip Reames	0dc0c27989	[TLI] Add IsZero parameter to storeOfVectorConstantIsCheap [nfc] Make the decision to consider zero constant stores cheap target specific. Will be used in an upcoming change for RISCV.	2023-05-17 09:19:01 -07:00
Dhruv Chawla	b66551370f	[SelectionDAG] Handle NSW for ADD/SUB in computeKnownBits() This patch is a continuation of D150110. It separates the cases for ADD and SUB into their own cases so that computeForAddSub can be directly called and the NSW flag passed. This allows better optimization when the NSW flag is enabled, and allows fixing up the TODO that was there previously in SimplifyDemandedBits. Differential Revision: https://reviews.llvm.org/D150769	2023-05-17 15:15:05 +02:00
Hongtao Yu	d4d6b9a142	[FS-AFDO] Clean up non-zero discriminator for pseudo probes at the first FS discriminator pass. The dwarf discriminator field for pseudo probes is not supposed to be used until the first FS discriminator pass. Unfortunately there are always corner cases that accidientally set this field. For example, the inliner could set this field for an inlined instruction if the instruction does not come with any debug information. While fixing all such spots is possible, but for future-proff I'd like to enforce a general cleanup before assigning probes any FS discriminator. Reviewed By: wenlei Differential Revision: https://reviews.llvm.org/D150741	2023-05-16 21:35:17 -07:00
Hongtao Yu	456eb4b5bf	[PseudoProbe] Only emit discriminstor in FS-AFDO mode. Despite previous effort {D148569} to avoid screwing up existing disrminator field, I'm still seeing some call probes getting a non-zero discriminator eventually in non-FS mode. It could be related to callsite merge. While they are investigated I'm disabling discriminator emission for non-FS mode. This avoids breaking the compatiblity with older tools like llvm-profgen and bolt. Reviewed By: wenlei Differential Revision: https://reviews.llvm.org/D150625	2023-05-16 21:35:17 -07:00
Sameer Sahasrabuddhe	0a170eb786	[Uniformity] Propagate divergence only along divergent outputs. When an instruction is determined to be divergent, not all its outputs are divergent. The users of only divergent outputs should now be examined for divergence. Also, replaced a repeating pattern of "if new divergent instruction, then add to worklist" by combining it into a single function. This does not cause any change in functionality. Reviewed By: foad, arsenm Differential Revision: https://reviews.llvm.org/D150636	2023-05-17 07:47:43 +05:30
Noah Goldstein	d294e3cb76	[SelectionDAG] Improve `computeKnownBits` implementations of `sdiv` and `udiv` Add `exact` flag handling for `udiv` and add entire `sdiv` case. Differential Revision: https://reviews.llvm.org/D150098	2023-05-16 18:58:13 -05:00
Austin Chang	d069ac035a	[DAGCombiner] Add bswap(logic_op(bswap(x), y)) optimization This is the implementation of D149782 The patch implements a helper function that matches and fold the following cases in the DAGCombiner: 1. `bswap(logic_op(x, bswap(y))) -> logic_op(bswap(x), y)` 2. `bswap(logic_op(bswap(x), y)) -> logic_op(x, bswap(y))` 3. `bswap(logic_op(bswap(x), bswap(y))) -> logic_op(x, y)` in multiuse case, which still reduces the number of instructions. The helper function accepts SDValue with BSWAP and BITREVERSE opcode. This patch folds the BSWAP cases and remain the BITREVERSE optimization in the future Reviewed By: RKSimon, goldstein.w.n Differential Revision: https://reviews.llvm.org/D149783	2023-05-16 18:58:07 -05:00
Daniel Paoliello	f8499d5709	Emit the correct flags for the PROC CodeView Debug Symbol The S_LPROC32_ID and S_GPROC32_ID CodeView Debug Symbols have a flags field which LLVM has had the values for (in the ProcSymFlags enum) but has never actually set. These flags are used by Microsoft-internal tooling that leverages debug information to do binary analysis. Modified LLVM to set the correct flags: - ProcSymFlags::HasOptimizedDebugInfo - always set, as this indicates that debug info is present for optimized builds (if debug info is not emitted for optimized builds, then LLVM won't emit a debug symbol at all). - ProcSymFlags::IsNoReturn and ProcSymFlags::IsNoInline - set if the function has the NoReturn or NoInline attributes respectively. - ProcSymFlags::HasFP - set if the function requires a frame pointer (per TargetFrameLowering::hasFP). Per discussion in review, XFAIL'ing lldb test until someone working on lldb has a chance to look at it. Differential Revision: https://reviews.llvm.org/D148761	2023-05-16 10:58:10 -07:00
Jay Foad	d8229e2f14	[KnownBits] Define and use intersectWith and unionWith Define intersectWith and unionWith as two complementary ways of combining KnownBits. The names are chosen for consistency with ConstantRange. Deprecate commonBits as a synonym for intersectWith. Differential Revision: https://reviews.llvm.org/D150443	2023-05-16 09:23:51 +01:00
Jay Foad	71ac47f391	[KnownBits] Make use of KnownBits.isUnknown. NFC.	2023-05-16 09:19:55 +01:00
Jonas Paulsson	64599ac97e	[MachineSink] Don't reject sinking because of dead def in isProfitableToSinkTo(). An instruction should be sunk (if otherwise legal and profitable) regardless of if it has a dead def of a physreg or not. Physreg defs are checked in other places and sinking is only done with dead defs of regs that are not live into the target MBB. Differential Revision: https://reviews.llvm.org/D150447 Reviewed By: sebastian-ne, arsenm	2023-05-16 10:00:44 +02:00
Gaëtan Bossu	c4a872badb	FastRegAlloc: Fix implicit operands not rewritten This patch fixes a potential crash due to RegAllocFast not rewriting virtual registers. This essentially happens because of a call to MachineInstr::addRegisterKilled() in the process of allocating a "killed" vreg. The former can eventually delete implicit operands without RegAllocFast noticing, leading to some operands being "skipped" and not rewritten to use physical registers. Note that I noticed this crash when working on a solution for tying a register with one/multiple of its sub-registers within an instruction. (See problem description here: https://discourse.llvm.org/t/pass-to-tie-an-output-operand-to-a-subregister-of-an-input-operand/67184). Aside from this fix, I believe there could be further improvements to the RegAllocFast when it comes to instructions with multiple uses of a same virtual register. You can see it in the added test where the implicit uses have been re-written in a somewhat surprising way because of phase ordering. Ultimately, when allocating vregs for an instruction, I believe we should iterate on the vregs it uses (and then process all the operands that use this vregs), instead of directly iterating on operands and somewhat assuming each operand uses a different vreg. This would in the end be quite close to what greedy+virtregrewriter does. If that makes sense, I would probably spin off another patch (after I get more familiar with RegAllocFast). Differential Revision: https://reviews.llvm.org/D145169	2023-05-16 09:49:20 +02:00
esmeyi	4054c68644	[XCOFF][DWARF] XCOFF64 should be able to select the dwarf format in intergrated-as mode. Summary: DWARF32 is not supported for XCOFF64 under non-integrated-as mode on AIX, because system assembler will fill the debug section lengths according to DWARF64 format. While in intergrated-as mode, XCOFF64 should be able to select the DWARF format. Reviewed By: shchenz Differential Revision: https://reviews.llvm.org/D150181	2023-05-16 03:02:00 -04:00
Sameer Sahasrabuddhe	fbe1c0616f	[LLVM][Uniformity] Improve detection of uniform registers The MachineUA now queries the target to determine if a given register holds a uniform value. This is determined using the corresponding register bank if available, or by a combination of the register class and value type. This assumes that the target is optimizing for performance by choosing registers, and the target is responsible for any mismatch with the inferred uniformity. For example, on AMDGPU, an SGPR is now treated as uniform, except if the register bank is VCC (i.e., the register holds a wave-wide vector of 1-bit values) or equivalently if it has a value type of s1. - This does not always work with inline asm, where the register bank or the value type might not be present. We assume that the SGPR is uniform, because it is not expected to be s1 in the vast majority of cases. - The pseudo branch instruction SI_LOOP is now hard-coded to be always divergent, although its condition is an SGPR. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D150438	2023-05-16 09:37:04 +05:30
Jessica Paquette	407b4648b8	[MachineOutliner] NFC: Add debug output to MachineOutliner::outline Add some debug output to `outline` to assist in debugging + understanding the code. This will say - How many things we found worth turning into outlined functions - Whether or not candidates were pruned via the outlining algorithm - The function created (if it was created) - Where the calls were inserted - What instruction was used to create the call Sample output below: ``` NUMBER OF POTENTIAL FUNCTIONS: 5 WALKING FUNCTION LIST PRUNED: 0/2 candidates OUTLINE: Expected benefit (12 B) > threshold (1 B) NEW FUNCTION: OUTLINED_FUNCTION_0 CREATE OUTLINED CALLS CALL: OUTLINED_FUNCTION_0 in bar:<unknown> .. BL @OUTLINED_FUNCTION_0, implicit-def $lr, implicit $sp CALL: OUTLINED_FUNCTION_0 in bar:<unknown> .. BL @OUTLINED_FUNCTION_0, implicit-def $lr, implicit $sp PRUNED: 2/2 candidates SKIP: Expected benefit (0 B) < threshold (1 B) PRUNED: 0/2 candidates OUTLINE: Expected benefit (8 B) > threshold (1 B) NEW FUNCTION: OUTLINED_FUNCTION_1 CREATE OUTLINED CALLS CALL: OUTLINED_FUNCTION_1 in bar:<unknown> .. BL @OUTLINED_FUNCTION_1, implicit-def $lr, implicit $sp CALL: OUTLINED_FUNCTION_1 in bar:<unknown> .. BL @OUTLINED_FUNCTION_1, implicit-def $lr, implicit $sp PRUNED: 2/2 candidates SKIP: Expected benefit (0 B) < threshold (1 B) PRUNED: 2/2 candidates SKIP: Expected benefit (0 B) < threshold (1 B) ```	2023-05-15 15:29:26 -07:00
Muhammad Omair Javaid	6b22608a1d	Revert "Emit the correct flags for the PROC CodeView Debug Symbol" This reverts commit e48826e016e2f427f3b7b1274166aa9aa0ea7f4f. https://lab.llvm.org/buildbot/#/builders/219/builds/2520 ldb-shell :: SymbolFile/PDB/function-nested-block.test Differential Revision: https://reviews.llvm.org/D148761	2023-05-15 23:38:07 +04:00
J. Ryan Stinnett	d6e4c4f8c1	Revert "[X86] Use the CFA as the DWARF frame base for better variable locations around calls." This reverts commit d421f5226048e4a5d88aab157d0f4d434c43f208. LLDB tests are failing as shown in https://green.lab.llvm.org/green/view/LLDB/job/lldb-cmake/55133/testReport/	2023-05-15 16:53:52 +01:00
Sameer Sahasrabuddhe	b0f0dd2554	[LLVM][Uniformity] Propagate temporal divergence explicitly At a cycle C with divergent exits, UA was using a naive traversal of the exiting edges to locate blocks that may use values defined inside C. But this traversal fails when it encounters a cycle. This is now replaced with a much simpler propagation that iterates over every instruction in C and checks any uses that are outside C. But such an iteration can be expensive when C is very large; the original strategy may need to be reconsidered if there is a regression in compilation times. Also fixed lit tests that should have originally caught the missed propagation of temporal divergence. Reviewed By: foad Differential Revision: https://reviews.llvm.org/D149646	2023-05-15 20:17:43 +05:30
Kyle Huey	d421f52260	[X86] Use the CFA as the DWARF frame base for better variable locations around calls. Prior to this patch, for the DWARF frame base LLVM uses the frame pointer register if available, otherwise the stack pointer register. If the stack pointer register is being used and a call or other code modifies the stack pointer during the body of the function this results in the locations being wrong and the debugger displaying the wrong values for variables. By using DW_OP_call_frame_cfa in these situations the emitted location for the variable will automatically handle changes in the stack pointer. The CFA needs to be adjusted for the offset between the frame pointer/stack pointer to allow the variable locations themselves to remain unchanged by this patch. Reviewed By: #debug-info, scott.linder, jryans Differential Revision: https://reviews.llvm.org/D143463	2023-05-15 15:10:02 +01:00

1 2 3 4 5 ...

34118 Commits