llvm-project

mirror of https://github.com/llvm/llvm-project.git synced 2025-04-28 05:36:06 +00:00

Author	SHA1	Message	Date
Kazu Hirata	fd358997b3	[IR] Use range-based for loops (NFC)	2024-01-14 00:53:28 -08:00
Kazu Hirata	c0cb80338f	[IR] Use StringRef::consume_front (NFC)	2024-01-14 00:53:26 -08:00
Nicholas Mosier	49138d97c0	[X86] Fix SLH crash on llvm.eh.sjlh.longjmp (#77959 ) Fix #60081.	2024-01-14 12:03:18 +08:00
Kazu Hirata	e4a6be0fc0	[CodeGen] Use getConstantOperandVal (NFC)	2024-01-13 18:18:51 -08:00
Kazu Hirata	b5d6ea4d8b	[Support] Use StringRef::consume_front (NFC)	2024-01-13 18:18:49 -08:00
Kazu Hirata	96f14ea618	[llvm] Use range-based for loops with llvm::drop_begin (NFC)	2024-01-13 18:18:47 -08:00
Heejin Ahn	d871f40deb	[WebAssembly] Use DebugValueManager only when subprogram exists (#77978 ) We previously scanned the whole BB for `DBG_VALUE` instruction even when the program doesn't have debug info, i.e., the function doesn't have a subprogram associated with it, which can make compilation unnecessarily slow. This disables `DebugValueManager` when a `DISubprogram` doesn't exist for a function. This only reduces unnecessary work in non-debug mode and does not change output, so it's hard to add a test to test this behavior. Test changes were necessary because their `DISubprogram`s were not correctly linked with the functions, so with this PR the compiler incorrectly assumed the functions didn't have a subprogram and the tests started to fail. Fixes https://github.com/emscripten-core/emscripten/issues/21048.	2024-01-13 14:55:54 -08:00
Vitaly Buka	253d2f931e	Revert "[InstCombine] Fold `icmp pred (inttoptr X), (inttoptr Y) -> icmp pred X, Y`" (#78023 ) Reverts llvm/llvm-project#77832 To fix https://lab.llvm.org/buildbot/#/builders/236/builds/8673 Also truncation to shorter type looks incorrect. Issue for tracking #78024 .	2024-01-13 11:15:30 -08:00
Durgadoss R	8d817f6479	[LLVM][NVPTX]: Add aligned versions of cluster barriers (#77940 )	2024-01-13 10:41:19 +01:00
paperchalice	8566cd6124	[CodeGen] Let `PassBuilder` support machine passes (#76320 ) `PassBuilder` would be a better place to parse MIR pipeline. We can reuse the code to support parsing pass with parameters and targets can reuse `registerPassBuilderCallbacks` to register the target specific passes. `PassBuilder` also has ability to check whether a Pass is a machine pass.	2024-01-13 11:28:07 +08:00
Kazu Hirata	1df4fb9881	[Support] Use StringRef::ltrim (NFC)	2024-01-12 18:39:53 -08:00
Kazu Hirata	15179aa433	[llvm] Use llvm::is_contained (NFC)	2024-01-12 18:39:48 -08:00
Koakuma	5fa4b1d83c	[SPARC] Consume `tune-cpu` directive in the backend (#77195 ) This lets the backend read the `tune-cpu` directive that is emitted by the frontend. No changes are needed for clang as it is already emits it.	2024-01-12 21:21:32 -05:00
Min-Yih Hsu	3edf82d556	[XRay] Reserve memory space ahead-of-time when reading native format log (#76853 ) XRay used to struggle reading large log files. It turned out the bottleneck was primarily caused by the reallocation happens when appending log entries into a std::vector. This patch reserves the memory space ahead-of-time since the number of entries is known for most cases. Making llvm-xray runs 1.8 times faster and uses 1.4 times less physical memory when reading large (~2.6GB) log files.	2024-01-12 16:34:39 -08:00
Reid Kleckner	21a77e8a92	[IR] Reorder Value fields to put the SubclassID first (#53520 ) Placing the class id at offset 0 should make `isa` and `dyn_cast` faster by eliminating the field offset (previously 0x10) from the memory operand, saving encoding space on x86, and, in theory, an add micro-op. You can see the load encodes one byte smaller here: https://godbolt.org/z/Whvz4can9 The compile time tracker shows some modestly positive results in the on the `cycle` metric and in the final clang binary size metric: https://llvm-compile-time-tracker.com/compare.php?from=33b54f01fe32030ff60d661a7a951e33360f82ee&to=2530347a57401744293c54f92f9781fbdae3d8c2&stat=cycles Clicking through to the per-library size breakdown shows that instcombine size reduces by 0.68%, which is meaningful, and I believe instcombine is known to be a hotspot. It is, however, potentially noise. I still think we should do this, because notionally, the class id really acts as the vptr of the Value, and conventionally the vptr is always at offset 0.	2024-01-12 23:13:01 +00:00
Greg Clayton	4618ef8cf5	Allow the dumping of .dwo files contents to show up when dumping an executable with split DWARF. (#66726 ) Allow the dumping of .dwo files contents to show up when dumping an executable with split DWARF. Currently if you run llvm-dwarfdump on a binary that has skeleton compile units, you only see the skeleton compile units. Since the main binary has the linked addresses it would be nice to be able to dump DWARF from the .dwo files and how the resolved addresses instead of showing the address index and "<unresolved>" in the output. This patch adds an option that can be specified to dump the non skeleton DIEs named --dwo. Added the ability to use the following options with split dwarf as well: --name <name> --lookup <addr> --debug-info <die-offset>	2024-01-12 13:31:55 -08:00
Craig Topper	8cd956197f	[RISCV] Update descriptions for Zvk* shorthands. (#77961 ) This makes them more consistent with other extensions so they appear move similar in the -print-supported-extensions output.	2024-01-12 11:56:36 -08:00
Usman Nadeem	792fa23c1b	[AArch64][SVE2] Lower OR to SLI/SRI (#77555 ) Code builds on NEON code and the tests are adapted from NEON tests minus the tests for illegal types.	2024-01-12 11:23:56 -08:00
Igor Kudrin	9f8c818141	[CommandLine][NFCI] Do not add 'All' to 'RegisteredSubCommands' (#77722 ) After #75679, it is no longer necessary to add the `All` pseudo subcommand to the list of registered subcommands. The change causes the list to contain only real subcommands, i.e. an unnamed top-level subcommand and named ones. This simplifies the code a bit by removing some checks for this special case. This is a fixed version of #77041, where options of the 'All' subcommand were not added to subcommands defined after them.	2024-01-13 02:19:42 +07:00
Krzysztof Drewniak	88871784fd	[AMDGPU] Allow buffer intrinsics to be marked volatile at the IR level (#77847 ) In order to ensure the correctness of ptr addrspace(7) lowering, we need a backwards-compatible way to flag buffer intrinsics as volatile that can't be dropped (unlike metadata). To acheive this in a backwards-compatible way, we use bit 31 of the auxilliary immediates of buffer intrinsics as the volatile flag. When this bit is set, the MachineMemOperand for said intrinsic is marked volatile. Existing code will ensure that this results in the appropriate use of flags like glc and dlc. This commit also harmorizes the handling of the auxilliary immediate for atomic intrinsics, which new go through extract_cpol like loads and stores, which masks off the volatile bit.	2024-01-12 11:20:01 -06:00
Jay Foad	9d8e53818d	[AMDGPU] Refactor getNonSoftWaitcntOpcode and its callers (#77933 ) This avoids listing all soft waitcnt opcodes in two places (getNonSoftWaitcntOpcode and isSoftWaitcnt) and avoids the need for helpers isWaitcnt and isWaitcntVsCnt.	2024-01-12 17:12:09 +00:00
Jay Foad	dec74a8347	[AMDGPU] Fix VS_CNT overflow assertion (#77935 ) Always set the upper bound for VS_CNT higher than the lower bound. Before #77439 this code was only executed on function entry where the lower bound was 0 so it was not a problem. Fixes #77931	2024-01-12 17:11:19 +00:00
Paschalis Mpeis	a300b24037	Revert "[TLI] Fix replace-with-veclib crash with invalid arguments (#77112 )" This reverts commit 9fdc568824b0992d48704dfa530a12073cc02f5e, as it linker crashes on some platforms.	2024-01-12 15:49:15 +00:00
Philip Reames	e4d01bb227	[SCEV] Special case sext in isKnownNonZero (#77834 ) The existing logic in isKnownNonZero relies on unsigned ranges, which can be problematic when our range calculation is imprecise. Consider the following: %offset.nonzero = or i32 %offset, 1 --> %offset.nonzero U: [1,0) S: [1,0) %offset.i64 = sext i32 %offset.nonzero to i64 --> (sext i32 %offset.nonzero to i64) U: [-2147483648,2147483648) S: [-2147483648,2147483648) Note that the unsigned range for the sext does contain zero in this case despite the fact that it can never actually be zero. Instead, we can push the query down one level - relying on the fact that the sext is an invertible operation and that the result can only be zero if the input is. We could likely generalize this reasoning for other invertible operations, but special casing sext seems worthwhile.	2024-01-12 07:45:28 -08:00
Paschalis Mpeis	9fdc568824	[TLI] Fix replace-with-veclib crash with invalid arguments (#77112 ) Fix a crash of `replace-with-veclib` pass, when the arguments of the TLI mapping do not match the original call. Now, it simply ignores such cases. Test require assertions as it accesses programmatically the debug log.	2024-01-12 15:19:52 +00:00
Alexey Bataev	6fdc2ce8c5	[SLP]Fix PR77916: transform the whole mask, not only the elements for the second vector. Need to transform all elements in the long mask, if we decided to produce shorter version, some elements may still have incorrect inifices after transformation for the first vector in the permutation.	2024-01-12 07:07:43 -08:00
Natalie Chouinard	4f47372f8c	[SPIR-V] Add Float16 support when targeting Vulkan (#77115 ) Add Float16 to Vulkan's available capabilities, and guard Float16Buffer (Kernel-only capability) against being added outside OpenCL environments. Add tests to verify half and half vector types, and validate with spirv-val. Fixes #66398	2024-01-12 10:03:48 -05:00
Yingwei Zheng	2aae304cbc	[InstCombine] Fold `icmp pred (inttoptr X), (inttoptr Y) -> icmp pred X, Y` (#77832 ) NOTE: Alive2 proofs are unavailable because `inttoptr` is unsupported.	2024-01-12 23:03:07 +08:00
Alexander Yermolovich	d199ab4699	[LLVM][DWARF] Fix accelerator table switching between CU and TU (#77511 ) Bug 1 is triggered when a TU is already created, and we process the same DICompositeType at a top level. We would switch to TU accelerator table, but would not switch back on early exit. As the result we would add CU entries to the TU accelerator table. When we try to write out TUs and normalize entries, the offsets for DIEs that are part of a CU would not have been computed, and it would assert on getOffset(). Bug 2 is triggered when processing nested TUs. When we exit from addDwarfTypeUnitType we switched back to CU accelerator table. If we were processing nested TUs, the rest of the entries from TUs would be added to CU accelerator table. When we write out TUs, all the DIE pointers will become invalid. Eventually it will assert during normalization step after CU is processed.	2024-01-12 07:01:17 -08:00
Qiongsi Wu	39bb790b90	[SimplifyCFG] `switch`: Do Not Transform the Default Case if the Condition is Too Wide (#77831 ) https://github.com/llvm/llvm-project/pull/76669 taught SimplifyCFG to handle switches when `default` has only one case. When the `switch`'s condition is wider than 64 bit, the current implementation can calculate the wrong default value. This PR skips cases where the condition is too wide.	2024-01-12 08:54:35 -05:00
Nikita Popov	6c2fbc3a68	[IRBuilder] Add CreatePtrAdd() method (NFC) (#77582 ) This abstracts over the common pattern of creating a gep with i8 element type.	2024-01-12 14:21:21 +01:00
Florian Hahn	59d6f033a2	[VPlan] Support narrowing widened loads in truncateToMinimimalBitwidths. MinBWs may also contain widened load instructions, handle them by only narrowing their result. Fixes https://github.com/llvm/llvm-project/issues/77468	2024-01-12 13:14:13 +00:00
Alexey Bataev	39b2104b4a	[SLP]Fix a crash for reduced values with minbitwidth, which are reused. If the reduced values are additionally affected by minbitwidth analysis, need to cast them to a proper type before doing any math, if they are reused.	2024-01-12 04:49:48 -08:00
Alexey Lapshin	35708b0754	[DWARFLinker][NFC] Rename libraries to match with directories name. (#77592 ) It was noted that new DWARFLinker libraries do not follow naming agreement - https://github.com/llvm/llvm-project/pull/75925#issuecomment-1883301659 This patch rename libraries to match with the agreement. Rename LLVMDWARFLinkerBase library into the LLVMDWARFLinker. Rename LLVMDWARFLinker library into the LLVMDWARFLinkerClassic. Correct include path according to the new directory structure.	2024-01-12 15:36:44 +03:00
Mirko Brkušanin	2adbf254a1	[AMDGPU][NFC] Rename DotIUVOP3PMods to VOP3PModsNeg (#77785 ) This is used to select the source modifier (neg) from the immediate operand. After a follow up commit this will no longer be DOTIU specific. Co-authored-by: Changpeng Fang <changpeng.fang@amd.com>	2024-01-12 10:57:24 +01:00
ostannard	9c9bffe213	[AArch64] Disable FP loads/stores when fp-armv8 not enabled (#77817 ) Most of the floating-point instructions are already gated on the fp-armv8 subtarget feature (or some other feature), but most of the load and store instructions, and one move instruction, were not. I found this list of instructions with a script which consumes the output of llvm-tblgen --dump-json, looking for instructions which have an FPR operand but no predicate. That script now finds zero instructions. This only affects assembly, not codegen, because the floating-point types and registers are already not marked as legal when the FPU is disabled, so it is impossible for any of these to be selected.	2024-01-12 09:15:11 +00:00
Craig Topper	2e78c220fc	[RISCV] Simplify the description for ssaia and smaia. (#77870 ) It feels more important to expand out Advanced Interrupt Architecture for users than to have a description that explains how one extension is different from the other.	2024-01-11 23:53:00 -08:00
Kazu Hirata	a5dc3f68a8	[llvm] Use SmallString::operator std::string() (NFC)	2024-01-11 23:32:44 -08:00
Mariusz Sikora	2b83ceee3d	[AMDGPU][GFX12] Default component broadcast store (#76212 ) For image and buffer stores the default behaviour on GFX12 is to set all unset components to the value of the first component. So if we pass only X component, it will be the same as XXXX, or XY same as XYXX. This patch simplifies the passed vector of components in InstCombine by removing components from the end that are equal to the first component. For image stores it also trims DMask if necessary. --------- Co-authored-by: Mateja Marjanovic <mmarjano@amd.com>	2024-01-12 08:26:08 +01:00
Kazu Hirata	7b9bc4729b	[IPO] Use a range-based for loop (NFC)	2024-01-11 22:48:22 -08:00
Kazu Hirata	5e9da33b87	[llvm] Use StringRef::consume_front_insensitive (NFC)	2024-01-11 22:48:20 -08:00
Amara Emerson	a946934a12	[GlobalISel][NFC] Use GPhi wrapper in more places instead of iterating over operands.	2024-01-11 22:25:53 -08:00
Stanislav Mekhanoshin	369981181f	[AMDGPU] Handle bf16 operands the same way as f16. NFC. (#77826 ) This is infrastructure change which shall allow use of bf16 operands with instruction definitions.	2024-01-11 21:08:19 -08:00
Shengchen Kan	4f71068b72	[X86] Correct the asm comment for compression NF_ND -> NF	2024-01-12 12:55:11 +08:00
Carl Ritson	6752f1517d	[TwoAddressInstruction] Recompute live intervals for partial defs (#74431 ) Force live interval recomputation for a register if its definition is narrowed to become partial. The live interval repair process cannot otherwise detect these changes.	2024-01-12 13:26:01 +09:00
Emil J	3baedb4111	[GISel] Fix #77762 : extend correct source registers in combiner helper rule extend_through_phis (#77765 ) Since we already know which register we want to extend, we don't have to ask its defining MI about it --------- Co-authored-by: Emil Tywoniak <Emil.Tywoniak@hightec-rt.com>	2024-01-12 12:09:58 +08:00
darkbuck	54c19546ba	[GlobalISel] Revise 'assignCustomValue' interface (#77824 ) - Previously, 'assignCustomValue' requests the number of assigned VAs minus 1 is returned and treats 0 as the assignment failure. However, under that arrangment, we cannot tell a successful single VA custom assignment from the failure case. - This change requests that 'assignCustomValue' just return the number of all VAs assigned, including the first WA so that it won't be ambigous to tell the failure case from the single VA custom assignment.	2024-01-12 10:41:55 +07:00
Wang Pengcheng	a2af374284	[SelectionDAG] Add space-optimized forms of OPC_CheckPredicate (#77763 ) We record the usage of each `Predicate` and sort them by usage. For the top 8 `Predicate`s, we will emit a `PC_CheckPredicateN` to save one byte. Overall this reduces the llc binary size with all in-tree targets by about 61K. This is a recommit of 1a57927, which was reverted in bc98c31. The CI failures occurred when doing expensive checks (with option `LLVM_ENABLE_EXPENSIVE_CHECKS` being ON). The key point here is that we need stable sorting result in the test, but doing expensive checks uncovered the non-determinism of `llvm::sort`. So `llvm::sort` is changed to `llvm::stable_sort` in this revised patch. And we use `llvm::MapVector` to keep insertion order.	2024-01-12 11:38:05 +08:00
Craig Topper	9e40ba0c2d	[RISCV] Remove period from Zvbb extension description. No other instruction extension has a period. There are also periods in 'ssaia' and 'smaia', but those descriptions need a different update.	2024-01-11 19:28:05 -08:00
Matt Arsenault	5e5e98e36e	AMDGPU: Cleanup MAIFrag predicate code (#77734 ) Move the complex predicates into separate variables.	2024-01-12 10:09:56 +07:00

1 2 3 4 5 ...

177445 Commits