llvm-project

mirror of https://github.com/llvm/llvm-project.git synced 2025-04-29 15:46:08 +00:00

Author	SHA1	Message	Date
Orlando Cazalet-Hyams	f34418c73b	[HWASAN] Remove DW_OP_LLVM_tag_offset from DIExpression::isImplicit (#79816 ) According to its doc-comment `isImplicit` is meant to return true if the expression is an implicit location description (describes an object or part of an object which has no location by computing the value from available program state). There's a brief entry for `DW_OP_LLVM_tag_offset` in the LangRef and there's some info in the original commit fb9ce100d19be130d004d03088ccd4af295f3435. From what I can tell it doesn't look like `DW_OP_LLVM_tag_offset` affects whether or not the location is implicit; the opcode doesn't get included in the final location description but instead is added as an attribute to the variable. This was tripping an assertion in the latest application of the fix to #76545, #78606, where an expression containing a `DW_OP_LLVM_tag_offset` is split into a fragment (i.e., describe a part of the whole variable).	2024-02-01 10:29:08 +00:00
Kazu Hirata	39fa304866	[llvm] Use StringRef::starts_with (NFC)	2024-01-31 23:54:07 -08:00
Kazu Hirata	e8512786fe	[IR] Use range-based for loops (NFC)	2024-01-31 23:54:05 -08:00
Shengchen Kan	1395e582f3	[X86][CodeGen] Set mayLoad = 1 for LZCNT/POPCNT/TZCNTrm_(EVEX\|NF) Promoted and NF LZCNT/POPCNT/TZCNT were supported in #79954. B/c null_frag is used in the patterns for these variants, tablgen can not infer mayLoad = 1 for them. This can be tested by MCA tests, which will be added after -mcpu=<cpu_with_apx> is supported.	2024-02-01 14:13:29 +08:00
Shengchen Kan	c82a645ef2	[X86][NFC] Simplify the code for memory fold	2024-02-01 13:43:25 +08:00
Wanyi	5a8f290ded	[llvm-gsymutil] Print one-time DWO file missing warning under --quiet flag (#79882 ) FileCheck test added ``` ./bin/llvm-lit -sv llvm/test/tools/llvm-gsymutil/X86/elf-dwo.yaml ``` Manual test steps: - Create binary with split-dwarf: ``` clang++ -g -gdwarf-4 -gsplit-dwarf main.cpp -o main_split ``` - Remove or remane the dwo file to a different name so llvm-gsymutil can't find it ``` mv main_split-main.dwo main_split-main__.dwo ``` - Now run llvm-gsymutil conversion, it should print out warning with and without the `--quiet` flag ``` $ ./bin/llvm-gsymutil --convert=./main_split Input file: ./main_split Output file (x86_64): ./main_split.gsym warning: Unable to retrieve DWO .debug_info section for main_split-main.dwo Loaded 0 functions from DWARF. Loaded 12 functions from symbol table. Pruned 0 functions, ended with 12 total ``` ``` $ ./bin/llvm-gsymutil --convert=./main_split --quiet Input file: ./main_split Output file (x86_64): ./main_split.gsym warning: Unable to retrieve DWO .debug_info section for some object files. (Remove the --quiet flag for full output) Pruned 0 functions, ended with 12 total ```	2024-02-01 00:34:03 -05:00
wangpc	995d21bc6f	[SelectOpt] Print instruction instead of pointer Pull Request: https://github.com/llvm/llvm-project/pull/80125	2024-02-01 13:10:52 +08:00
Craig Topper	cf401f72e1	[RISCV] Use Zacas for AtomicRMWInst::Nand i32 and XLen. (#80119 ) We don't have an AMO instruction for Nand, so with the A extension we use an LR/SC loop. If we have Zacas we can use a CAS loop instead. According to the Zacas spec, a CAS loop scales to highly parallel systems better than LR/SC.	2024-01-31 15:37:41 -08:00
Congcong Cai	5561beae29	[WebAssembly] avoid to enable explicit disabled feature (#80094 )	2024-02-01 07:26:58 +08:00
Philip Reames	f264da4322	[lsr][term-fold] Restrict transform to low cost expansions (#74747 ) This is a follow up to an item I noted in my submission comment for e947f95. I don't have a real world example where this is triggering unprofitably, but avoiding the transform when we estimate the loop to be short running from profiling seems quite reasonable. It's also now come up as a possibility in a regression twice in two days, so I'd like to get this in to close out the possibility if nothing else. The original review dropped the threshold for short trip count loops. I will return to that in a separate review if this lands.	2024-01-31 14:48:20 -08:00
Philip Reames	ff53d50742	[RISCV] Improve legalization of e8 m8 VL>256 shuffles (#79330 ) If we can't produce a large enough index vector in i8, we may need to legalize the shuffle (via scalarization - which in turn gets lowered into stack usage). This change makes two related changes: * Deferring legalization until we actually need to generate the vrgather instruction. With the new recursive structure, this only happens when doing the fallback for one of the arms. * Check the actual mask values for something outside of the representable range. Both are covered by recently added tests.	2024-01-31 14:41:15 -08:00
Alex MacLean	5e3ae4c4af	[NVPTX] improve Boolean ISel (#80166 ) Add TableGen patterns to convert more instructions to boolean expressions: - mul -> and/or: i1 multiply instructions currently cannot be selected causing the compiler to crash. See https://github.com/llvm/llvm-project/issues/57404 - select -> and/or: Converting selects to and/or can enable more optimizations. `InstCombine` cannot do this as aggressively due to poison semantics.	2024-01-31 14:37:27 -08:00
Konstantin Zhuravlyov	4eee04585f	AMDGPU/NFC: Add predicate for supporting buffer/flat/global f64 atomics (#80209 )	2024-01-31 17:35:32 -05:00
Usman Nadeem	1d1432356e	[AArch64][SVE2] Generate urshr rounding shift rights (#78374 ) Add a new node `AArch64ISD::URSHR_I_PRED`. `srl(add(X, 1 << (ShiftValue - 1)), ShiftValue)` is transformed to `urshr`, or to `rshrnb` (as before) if the result it truncated. `uzp1(rshrnb(uunpklo(X),C), rshrnb(uunpkhi(X), C))` is converted to `urshr(X, C)` (tested by the wide_trunc tests). Pattern matching code in `canLowerSRLToRoundingShiftForVT` is taken from prior code in rshrnb. It returns true if the add has NUW or if the number of bits used in the return value allow us to not care about the overflow (tested by rshrnb test cases).	2024-01-31 14:03:58 -08:00
Zaara Syeda	a03a6e9964	[AIX] [XCOFF] Add support for common and local common symbols in the TOC (#79530 ) This patch adds support for common and local symbols in the TOC for AIX. Note that we need to update isVirtualSection so as a common symbol in TOC will have the symbol type XTY_CM and will be initialized when placed in the TOC so sections with this type are no longer virtual. --------- Co-authored-by: Zaara Syeda <syzaara@ca.ibm.com>	2024-01-31 16:34:21 -05:00
lhames	ebe8733a11	[ORC] Merge MaterializationResponsibility notifyEmitted and addDependencies Removes the MaterializationResponsibility::addDependencies and addDependenciesForAll methods, and transfers dependency registration to the notifyEmitted operation. The new dependency registration allows dependencies to be specified for arbitrary subsets of the MaterializationResponsibility's symbols (rather than just single symbols or all symbols) via an array of SymbolDependenceGroups (pairs of symbol sets and corresponding dependencies for that set). This patch aims to both improve emission performance and simplify dependence tracking. By eliminating some states (e.g. symbols having registered dependencies but not yet being resolved or emitted) we make some errors impossible by construction, and reduce the number of error cases that we need to check. NonOwningSymbolStringPtrs are used for dependence tracking under the session lock, which should reduce ref-counting operations, and intra-emit dependencies are resolved outside the session lock, which should provide better performance when JITing concurrently (since some dependence tracking can happen in parallel). The Orc C API is updated to account for this change, with the LLVMOrcMaterializationResponsibilityNotifyEmitted API being modified and the LLVMOrcMaterializationResponsibilityAddDependencies and LLVMOrcMaterializationResponsibilityAddDependenciesForAll operations being removed.	2024-01-31 13:06:09 -08:00
Shilei Tian	09fc333ec0	[NFC] Fold an `if` statement into `return` of bool expression	2024-01-31 13:55:22 -05:00
David Green	d04ae1b15f	[AArch64] Use DAG->isAddLike in add_and_or_is_add (#79563 ) This allows it to work with disjoint or's as well as computing the known bits.	2024-01-31 16:49:23 +00:00
Rin Dobrescu	2907c63311	Revert "[AArch64] Convert concat(uhadd(a,b), uhadd(c,d)) to uhadd(concat(a,c), concat(b,d))" (#80157 ) Reverts llvm/llvm-project#79464 while figuring out why the tests are failing.	2024-01-31 16:45:25 +00:00
Jay Foad	baf1b19763	[CodeGen] Use regunits instead of MCRegUnitIterator in RegisterClassInfo. NFC.	2024-01-31 16:27:54 +00:00
Jay Foad	e34fd2e193	[CodeGen] Simplify RegisterClassInfo BitVector comparisons. NFC.	2024-01-31 16:25:19 +00:00
Nikita Popov	f2df4bfe54	[AsmParser] Support non-consecutive global value numbers (#80013 ) https://github.com/llvm/llvm-project/pull/78171 added support for non-consecutive local value numbers. This extends the support for global value numbers (for globals and functions). This means that it is now possible to delete an unnamed global definition/declaration without breaking the IR. This is a lot less common than unnamed local values, but it seems like something we should support for consistency. (Unnamed globals are used a lot in Rust though.)	2024-01-31 17:04:30 +01:00
David Green	5d7d89de31	[AArch64] Use add_and_or_is_add for CSINC (#79552 ) Adds or add-like-or's of 1 can both be turned into csinc, which can help fold more instructions into a csinc.	2024-01-31 15:48:31 +00:00
Sjoerd Meijer	8841846050	[AArch64] MI Scheduler LDP combine follow up (#79003 ) This is a follow up of 75d820dcdd86, adding more opcodes to the combine target hook enabling more LDP creation. Patch co-authored by Cameron McInally.	2024-01-31 15:41:32 +00:00
Shimin Cui	1bab570e9b	Move the PowerPC/PPCMergeStringPool work to initializer (#77352 ) Currently, the `PPCMergeStringPool` merges the global variable after the `AsmPrinter` initializer adds the global variables to its symbol list. This is to move the merging work of `PPCMergeStringPool` to its initializer, just like what GlobalMerge does, to avoid adding merged global variables to the `AsmPrinter` symbol lis.	2024-01-31 10:27:07 -05:00
Quentin Dian	b7738e275d	[MIRPrinter] Don't print space when there is no successor (#80143 ) Extra space causes the checks generated by update_mir_test_checks to be unavailable. ``` # NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py UTC_ARGS: --version 4 # RUN: llc -mtriple=x86_64-- -o - %s -run-pass=none -verify-machineinstrs -simplify-mir \| FileCheck %s --- name: foo body: \| ; CHECK-LABEL: name: foo ; CHECK: bb.0: ; CHECK-NEXT: successors: ; CHECK-NEXT: {{ $}} ; CHECK-NEXT: {{ $}} ; CHECK-NEXT: bb.1: ; CHECK-NEXT: RET 0, $eax bb.0: successors: bb.1: RET 0, $eax ... ``` The failure log is as follows: ``` llvm/test/CodeGen/MIR/X86/unreachable-block-print.mir:9:16: error: CHECK-NEXT: is on the same line as previous match ; CHECK-NEXT: {{ $}} ^ <stdin>:21:13: note: 'next' match was here successors: ^ <stdin>:21:13: note: previous match ended here successors: ```	2024-01-31 22:35:41 +08:00
Nikita Popov	4f32f5d572	[AA][JumpThreading] Don't use DomTree for AA in JumpThreading (#79294 ) JumpThreading may perform AA queries while the dominator tree is not up to date, which may result in miscompilations. Fix this by adding a new AAQI option to disable the use of the dominator tree in BasicAA. Fixes https://github.com/llvm/llvm-project/issues/79175.	2024-01-31 15:23:53 +01:00
Alfie Richards	de75e5079a	[ARM][NEON] Add constraint to vld2 Odd/Even Pseudo instructions. (#79287 ) This ensures the odd/even pseudo instructions are allocated to the same register range. This fixes #71763	2024-01-31 14:08:02 +00:00
Florian Hahn	cec24f0d7e	[VPlan] Update stale test after 9536a6286, fix formatting.	2024-01-31 13:45:38 +00:00
Florian Hahn	9536a6286e	[VPlan] Preserve original induction order when creating scalar steps. Update createScalarIVSteps to take an insert point as parameter. This ensures that the inserted scalar steps are in the same order as the recipes they replace (vs in reverse order as currently). This helps to reduce the diff for follow-up changes.	2024-01-31 13:31:28 +00:00
Yingwei Zheng	817d0cb485	[InstCombine] Simplify commutative compares of symmetric pairs (#80134 ) Fixes #78038.	2024-01-31 21:21:27 +08:00
XinWang10	d9e875dcc1	[X86][MC] Support encoding/decoding for APX variant LZCNT/TZCNT/POPCNT instructions (#79954 ) Two variants: promoted legacy, NF (no flags update). The syntax of NF instructions is aligned with GNU binutils. https://sourceware.org/pipermail/binutils/2023-September/129545.html	2024-01-31 21:10:02 +08:00
Shengchen Kan	e3c9327bc4	[X86][CodeGen] Set isReMaterializable = 1 for AVX broadcast load Broadcast of a single float should not be any slower than loading 32B using vmovaps. So remat it can help reduce register spill when there is big register pressure.	2024-01-31 20:55:56 +08:00
Rin Dobrescu	cf828aee24	[AArch64] Convert concat(uhadd(a,b), uhadd(c,d)) to uhadd(concat(a,c), concat(b,d)) (#79464 ) We can convert concat(v4i16 uhadd(a,b), v4i16 uhadd(c,d)) to v8i16 uhadd(concat(a,c), concat(b,d)), which can lead to further simplifications.	2024-01-31 12:52:12 +00:00
Simon Pilgrim	912cdd2179	[DAG] AddNodeIDCustom - call ShuffleVectorSDNode::getMask once instead of repeated getMaskElt calls. Use a simpler for-range loop to append all shuffle mask elements	2024-01-31 12:01:01 +00:00
Nashe Mncube	d309261d05	[llvm][InstCombine] bitcast bfloat half castpair bug (#79832 ) Miscompilation arises due to instruction combining of cast pairs of the type `bitcast bfloat to half` + `<FPOp> bfloat to half` or `bitcast half to bfloat` + `<FPOp half to bfloat`. For example `bitcast bfloat to half`+`fpext half to double` or `bitcast bfloat to half`+`fpext bfloat to double` respectively reduce to `fpext bfloat to double` and `fpext half to double`. This is an incorrect conversion as it assumes the representation of `bfloat` and `half` are equivalent due to having the same width. As a consequence miscompilation arises. Fixes #61984	2024-01-31 11:42:10 +00:00
Sander de Smalen	3abf55a68c	[AArch64][SME] Fix inlining bug introduced in #78703 (#79994 ) Calling a `__arm_locally_streaming` function from a function that is not a streaming-SVE function would lead to incorrect inlining. The issue didn't surface because the tests were not testing what they were supposed to test.	2024-01-31 11:38:29 +00:00
Nikita Popov	5cc87b424b	[AsmParser] Add missing globals declarations in incomplete IR mode (#79855 ) If `-allow-incomplete-ir` is enabled, automatically insert declarations for missing globals. If a global is only used in calls with the same function type, insert a function declaration with that type. Otherwise, insert a dummy i8 global. The fallback case could be extended with various heuristics (e.g. we could look at load/store types), but I've chosen to keep it simple for now, because I'm unsure to what degree this would really useful without more experience. I expect that in most cases the declaration type doesn't really matter (note that the type of an external global specifies a minimum size only, not a precise size). This is a followup to https://github.com/llvm/llvm-project/pull/78421.	2024-01-31 12:24:35 +01:00
Nikita Popov	cb6240d247	[BDCE] Also drop poison-generating metadata The comment was incorrect: !range also applies to calls, and we do need to drop it in some cases.	2024-01-31 12:22:58 +01:00
Dominik Adamski	b4370140b4	[OpenMPIRBuilder] Do not call host runtime for GPU teams codegen (#79984 ) Patch ensures that host runtime functions are not called for handling OpenMP teams clause on the device. GPU code for pragma `omp target teams distribute parallel do` will require only one call to OpenMP loop-worksharing GPU runtime. Support for it will be added later. This patch does not include changes required for handling `omp target teams` for the host side.	2024-01-31 12:16:35 +01:00
Vyacheslav Levytskyy	5a07774fe1	[SPIR-V] Improve how lowering of formal arguments in SPIR-V Backend interprets a value of 'kernel_arg_type' (#78730 ) The goal of this PR is to tolerate differences between description of formal arguments by function metadata (represented by "kernel_arg_type") and LLVM actual parameter types. A compiler may use "kernel_arg_type" of function metadata fields to encode detailed type information, whereas LLVM IR may utilize for an actual parameter a more general type, in particular, opaque pointer type. This PR proposes to resolve this by a fallback to LLVM actual parameter types during the lowering of formal function arguments in cases when the type can't be created by string content of "kernel_arg_type", i.e., when "kernel_arg_type" contains a type unknown for the SPIR-V Backend. An example of the issue manifestation is https://github.com/KhronosGroup/SPIRV-LLVM-Translator/blob/main/test/transcoding/KernelArgTypeInOpString.ll, where a compiler generates for the following kernel function detailed `kernel_arg_type` info in a form of `!{!"image_kernel_data", !"myInt", !"struct struct_name"}`, and in LLVM IR same arguments are referred to as `@foo(ptr addrspace(1) %in, i32 %out, ptr addrspace(1) %outData)`. Both definitions are correct, and the resulting LLVM IR is correct, but lowering stage of SPIR-V Backend fails to generate SPIR-V type. ``` typedef int myInt; typedef struct { int width; int height; } image_kernel_data; struct struct_name { int i; int y; }; void kernel foo(__global image_kernel_data* in, __global struct struct_name outData, myInt out) {} ``` ``` define spir_kernel void @foo(ptr addrspace(1) %in, i32 %out, ptr addrspace(1) %outData) ... !kernel_arg_type !7 ... { entry: ret void } ... !7 = !{!"image_kernel_data", !"myInt", !"struct struct_name"} ``` The PR changes a contract of `SPIRVType getArgSPIRVType(...)` in a way that it may return `nullptr` to signal that the metadata string content is not recognized, so corresponding comments are added and a couple of checks for `nullptr` are inserted where appropriate.	2024-01-31 02:58:50 -08:00
Jay Foad	c2c650f62e	[AMDGPU] Stop combining arbitrary offsets into PAL relocs (#80034 ) PAL uses ELF REL (not RELA) relocations which can only store a 32-bit addend in the instruction, even for reloc types like R_AMDGPU_ABS32_HI which require the upper 32 bits of a 64-bit address calculation to be correct. This means that it is not safe to fold an arbitrary offset into a GlobalAddressSDNode, so stop doing that. In practice this is mostly a problem for small negative offsets which do not work as expected because PAL treats the 32-bit addend as unsigned.	2024-01-31 10:28:23 +00:00
Yingwei Zheng	50e80e06d1	[ValueTracking] Merge `cannotBeOrderedLessThanZeroImpl` into `computeKnownFPClass` (#76360 ) This patch merges the logic of `cannotBeOrderedLessThanZeroImpl` into `computeKnownFPClass` to improve the signbit inference. --------- Co-authored-by: Matt Arsenault <arsenm2@gmail.com>	2024-01-31 18:26:50 +08:00
Jay Foad	942cc9a222	Revert "[CodeGen] Don't include aliases in RegisterClassInfo::IgnoreCSRForAllocOrder (#80015 )" This reverts commit f8525030004f907cd108e7c18df255a6d3b23124. It was supposed to speed things up but llvm-compile-time-tracker.com showed a slight slow down.	2024-01-31 10:25:51 +00:00
Nikita Popov	b210cbbd0e	[BDCE] Fix clearing of poison-generating flags If the demanded bits of an instruction are full, we don't have to recurse to its users, but we may still have to clear flags on the instruction itself. Fixes https://github.com/llvm/llvm-project/issues/80113.	2024-01-31 11:24:13 +01:00
Timm Baeder	24a804101b	[llvm][Support] Support bright colors in raw_ostream (#80017 )	2024-01-31 11:14:02 +01:00
Yingwei Zheng	89f87c3876	[RISCV][MC] Add MC layer support for the experimental zabha extension (#80005 ) This patch implements the zabha (Byte and Halfword Atomic Memory Operations) v1.0-rc1 extension. See also https://github.com/riscv/riscv-zabha/blob/v1.0-rc1/zabha.adoc.	2024-01-31 17:06:43 +08:00
Sander de Smalen	dd73666182	[SME] Stop RA from coalescing COPY instructions that transcend beyond smstart/smstop. (#78294 ) This patch introduces a 'COALESCER_BARRIER' which is a pseudo node that expands to a 'nop', but which stops the register allocator from coalescing a COPY node when its use/def crosses a SMSTART or SMSTOP instruction. For example: %0:fpr64 = COPY killed $d0 undef %2.dsub:zpr = COPY %0 // <- Do not coalesce this COPY ADJCALLSTACKDOWN 0, 0 MSRpstatesvcrImm1 1, 0, csr_aarch64_smstartstop, implicit-def dead $d0 $d0 = COPY killed %0 BL @use_f64, csr_aarch64_aapcs If the COPY would be coalesced, that would lead to: $d0 = COPY killed %0 being replaced by: $d0 = COPY killed %2.dsub which means the whole ZPR reg would be live upto the call, causing the MSRpstatesvcrImm1 (smstop) to spill/reload the ZPR register: str q0, [sp] // 16-byte Folded Spill smstop sm ldr z0, [sp] // 16-byte Folded Reload bl use_f64 which would be incorrect for two reasons: 1. The program may load more data than it has allocated. 2. If there are other SVE objects on the stack, the compiler might use the 'mul vl' addressing modes to access the spill location. By disabling the coalescing, we get the desired results: str d0, [sp, #8] // 8-byte Folded Spill smstop sm ldr d0, [sp, #8] // 8-byte Folded Reload bl use_f64	2024-01-31 09:04:13 +00:00
Jay Foad	f852503000	[CodeGen] Don't include aliases in RegisterClassInfo::IgnoreCSRForAllocOrder (#80015 ) Previously we called ignoreCSRForAllocationOrder on every alias of every CSR which was expensive on targets like AMDGPU which define a very large number of overlapping register tuples. On such targets it is simpler and faster to call ignoreCSRForAllocationOrder once for every physical register. Differential Revision: https://reviews.llvm.org/D146735	2024-01-31 08:16:06 +00:00
Chia	dc5dca1d01	[RISCV][Isel] Remove redundant vmerge for the scalable vwadd(u).wv (#80079 ) Similar to #78403, but for scalable `vwadd(u).wv`, given that #76785 is recommited. ### Code ``` define <vscale x 8 x i64> @vwadd_wv_mask_v8i32(<vscale x 8 x i32> %x, <vscale x 8 x i64> %y) { %mask = icmp slt <vscale x 8 x i32> %x, shufflevector (<vscale x 8 x i32> insertelement (<vscale x 8 x i32> poison, i32 42, i64 0), <vscale x 8 x i32> poison, <vscale x 8 x i32> zeroinitializer) %a = select <vscale x 8 x i1> %mask, <vscale x 8 x i32> %x, <vscale x 8 x i32> zeroinitializer %sa = sext <vscale x 8 x i32> %a to <vscale x 8 x i64> %ret = add <vscale x 8 x i64> %sa, %y ret <vscale x 8 x i64> %ret } ``` ### Before this patch [Compiler Explorer](https://godbolt.org/z/xsoa5xPrd) ``` vwadd_wv_mask_v8i32: li a0, 42 vsetvli a1, zero, e32, m4, ta, ma vmslt.vx v0, v8, a0 vmv.v.i v12, 0 vmerge.vvm v24, v12, v8, v0 vwadd.wv v8, v16, v24 ret ``` ### After this patch ``` vwadd_wv_mask_v8i32: li a0, 42 vsetvli a1, zero, e32, m4, ta, ma vmslt.vx v0, v8, a0 vsetvli zero, zero, e32, m4, tu, mu vwadd.wv v16, v16, v8, v0.t vmv8r.v v8, v16 ret ```	2024-01-31 17:11:07 +09:00

1 2 3 4 5 ...

178160 Commits