llvm-project

mirror of https://github.com/llvm/llvm-project.git synced 2025-04-24 15:16:06 +00:00

Author	SHA1	Message	Date
jeanPerier	94c024aded	[flang][lowering] delay stack save/restor emission in elemental calls (#109142 ) stack save/restore emitted for character elemental function result allocation inside hlfir.elemental in lowering created memory bugs because result memory is actually still used after the stack restore when lowering the elemental into a loop where the result element is copied into the array result storage. Instead of adding special handling for stack save/restore in lowering, just avoid emitting those since the stack reclaim pass is able to emit them in the generated loop. Not having those stack save/restore will also help optimizations that want to elide the temporary allocation for the element result when that is possible.	2024-09-19 13:52:58 +02:00
Ivan Butygin	96ac627238	[mlir][vector][nfc] Update vector load/store doc wrt unit strides. (#109267 ) Follow up to https://github.com/llvm/llvm-project/pull/108998. Non-contiguous strides are allowed now for 1-element vector load/stores.	2024-09-19 14:52:35 +03:00
Rahul Joshi	3e24dd42dd	[NFC] Rename variables to conform to LLVM coding standards (#109166 ) Rename `indent` to `Indent` and `o` to `OS`. Rename `Indentation` to `Indent`. Remove unused argument from `emitPredicateMatch`. Change `Indent` argument to `emitBinaryParser` to by value.	2024-09-19 04:49:12 -07:00
Jacek Caban	486f790d29	[LLD][COFF] Process all ARM64EC import symbols in MapFile's getSymbols (#109118 )	2024-09-19 13:47:22 +02:00
Simon Pilgrim	0013f94b24	[clang][powerpc][wasm][systemz][x86] Replace target vector popcount intrinsics with __builtin_elementwise_popcount (#109160 ) Now that we have the C/C++ `__builtin_elementwise_popcount` intrinsic (#108121) - remove custom target intrinsics that just immediately map to Intrinsic::ctpop and use the generic intrinsic directly.	2024-09-19 12:40:36 +01:00
Nico Weber	61ed5387c8	[gn] port c18be32185ca	2024-09-19 07:34:17 -04:00
Jacek Caban	912e821ab3	[LLD][COFF] Process all live import symbols in MapFile's getSymbols() (#109117 ) The current logic assumes that the import file is pulled by object files, and the loop for import files only needs to handle cases where the `__imp_` symbol is implicitly pulled by an import thunk. This is fragile, as the symbol may also be pulled through other means, such as the -export argument in tests. Additionally, this logic is insufficient for ARM64EC, which exposes multiple symbols through an import file, and referencing any one of them causes all of them to be defined. With this change, import symbols are added to `syms` more often, but we ensure that output symbols remain unique later in the process	2024-09-19 13:20:01 +02:00
Ulrich Weigand	baf9b7da81	[SystemZ] Fix codegen for _[u]128 intrinsics PR #74625 introduced a regression in the code generated for the following set of intrinsic: vec_add_u128, vec_addc_u128, vec_adde_u128, vec_addec_u128 vec_sub_u128, vec_subc_u128, vec_sube_u128, vec_subec_u128 vec_sum_u128, vec_msum_u128 vec_gfmsum_128, vec_gfmsum_accum_128 This is because the new code incorrectly assumed that a cast from "unsigned __int128" to "vector unsigned char" would simply be a bitcast re-interpretation; instead, this cast actually truncates the __int128 to char and splats the result. Fixed by adding an intermediate cast via a single-element 128-bit integer vector. Fixes: https://github.com/llvm/llvm-project/issues/109113	2024-09-19 13:19:03 +02:00
Hans Wennborg	04ccbe6e70	Fix typos in interception_win.cpp	2024-09-19 13:11:10 +02:00
Yonghong Song	becc02ce93	Revert "[Transforms][IPO] Add func suffix in ArgumentPromotion and DeadArgume… (#105742 )" This reverts commit 959448fbd6bc6f74fb3f9655b1387d0e8a272ab8. Reverting because multiple test failures e.g. https://lab.llvm.org/buildbot/#/builders/187/builds/1290 https://lab.llvm.org/buildbot/#/builders/153/builds/9389 and maybe a few others.	2024-09-19 03:54:13 -07:00
Nikita Popov	f1ff3a279f	[InstCombine] Rename TTI member for clarity (NFC) There is already a comment on the member and documentation in the InstCombine contributor guide, but also rename it to make add an additional speed bump.	2024-09-19 12:31:11 +02:00
Florian Hahn	256100489d	[VPlan] Rename isDefinedOutside[Vector]Regions -> [Loop] (NFC) Clarify name of helper, split off from https://github.com/llvm/llvm-project/pull/95842/files#r1765556732.	2024-09-19 11:20:31 +01:00
Ivan Butygin	f325085878	[mlir][vector] Relax strides check for 1-element vector load/stores (#108998 ) Single elememst vector load/stores are equivalent to scalar load/stores, so they don't need memref to be contigious.	2024-09-19 13:12:32 +03:00
Timm Baeder	d267daa9eb	[clang][bytecode] Diagnose loads from weak variables (#109256 )	2024-09-19 11:59:38 +02:00
Daniil Kovalev	3d5e8e4693	[PAC][CodeGen] Do not emit trivial 'mov xN, xN' on tail call (#109100 ) Under some conditions, a trivial `mov xN xN` instruction was emitted on tail calls. Consider the following code: ``` class Test { public: virtual void f() {} }; void call_f(Test *t) { t->f(); } ``` Correponding assembly: ``` _Z6call_fP4Test: ldr x16, [x0] mov x17, x0 movk x17, #6503, lsl #48 autda x16, x17 ldr x1, [x16] =====> mov x16, x16 movk x16, #54167, lsl #48 braa x1, x16 ``` This patch makes such movs being omitted. Co-authored-by: Anatoly Trosinenko <atrosinenko@accesssoftek.com>	2024-09-19 12:17:58 +03:00
kadir çetinkaya	bb5e66e31b	[include-cleaner] Suppress all clang warnings (#109099 ) This patch disables all clang warnings when running include-cleaner, as users aren't interested in other findings and in-development code might have them temporarily. This ensures tool can keep working even in presence of such issues.	2024-09-19 11:16:49 +02:00
David Sherwood	d4536bf5c9	Fix test issue introduced by e762d4dac762a3fc27c6e251086b6645d7543bb2 (#109254 )	2024-09-19 10:06:48 +01:00
Michael Buch	bca507387a	[lldb][FrameRecognizer] Display the first non-std frame on verbose_trap (#108825 ) This attempts to improve user-experience when LLDB stops on a verbose_trap. Currently if a `__builtin_verbose_trap` triggers, we display the first frame above the call to the verbose_trap. So in the newly added test case, we would've previously stopped here: ``` (lldb) run Process 28095 launched: '/Users/michaelbuch/a.out' (arm64) Process 28095 stopped * thread #1, queue = 'com.apple.main-thread', stop reason = Bounds error: out-of-bounds access frame #1: 0x0000000100003f5c a.out`std::__1::vector<int>::operator[](this=0x000000016fdfebef size=0, (null)=10) at verbose_trap.cpp:6:9 3 template <typename T> 4 struct vector { 5 void operator[](unsigned) { -> 6 __builtin_verbose_trap("Bounds error", "out-of-bounds access"); 7 } 8 }; ``` After this patch, we would stop in the first non-`std` frame: ``` (lldb) run Process 27843 launched: '/Users/michaelbuch/a.out' (arm64) Process 27843 stopped * thread #1, queue = 'com.apple.main-thread', stop reason = Bounds error: out-of-bounds access frame #2: 0x0000000100003f44 a.out`g() at verbose_trap.cpp:14:5 11 12 void g() { 13 std::vector<int> v; -> 14 v[10]; 15 } 16 ``` rdar://134490328	2024-09-19 10:06:28 +01:00
Benjamin Kramer	57777a5066	[LoopVectorize] Silence unused variable warning	2024-09-19 11:01:58 +02:00
David Sherwood	e762d4dac7	[LoopVectorize] Teach LoopVectorizationLegality about more early exits (#107004 ) This patch is split off from PR #88385 and concerns only the code related to the legality of vectorising early exit loops. It is the first step in adding support for vectorisation of a simple class of loops that typically involves searching for something, i.e. for (int i = 0; i < n; i++) { if (p[i] == val) return i; } return n; or for (int i = 0; i < n; i++) { if (p1[i] != p2[i]) return i; } return n; In this initial commit LoopVectorizationLegality will only consider early exit loops legal for vectorising if they follow these criteria: 1. There are no stores in the loop. 2. The loop must have only one early exit like those shown in the above example. I have referred to such exits as speculative early exits, to distinguish from existing support for early exits where the exit-not-taken count is known exactly at compile time. 3. The early exit block dominates the latch block. 4. The latch block must have an exact exit count. 5. There are no loads after the early exit block. 6. The loop must not contain reductions or recurrences. I don't see anything fundamental blocking vectorisation of such loops, but I just haven't done the work to support them yet. 7. We must be able to prove at compile-time that loops will not contain faulting loads. Tests have been added here: Transforms/LoopVectorize/AArch64/simple_early_exit.ll	2024-09-19 09:41:25 +01:00
Aditi Medhane	60a8b2b1d0	[AMDGPU] Add MachineVerifier check to detect illegal copies from vector register to SGPR (#105494 ) Addition of a check in the MachineVerifier to detect and report illegal vector registers to SGPR copies in the AMDGPU backend, ensuring correct code generation. We can enforce this check only after SIFixSGPRCopies pass. This is half-fix in the pipeline with the help of isSSA MachineFuction property, the check is happening for passes after phi-node-elimination.	2024-09-19 13:57:44 +05:30
yonghong-song	959448fbd6	[Transforms][IPO] Add func suffix in ArgumentPromotion and DeadArgume… (#105742 ) …ntElimination ArgumentPromotion and DeadArgumentElimination passes could change function signatures but the function name remains the same as before the transformation. This makes it hard for tracing with bpf programs where user tends to use function signature in the source. See discussion [1] for details. This patch added suffix to functions whose signatures are changed. The suffix lets users know that function signature has changed and they need to impact the IR or binary to find modified signature before tracing those functions. The suffix for ArgumentPromotion is ".argprom" and the suffixes for DeadArgumentElimination are ".argelim" and ".retelim". The suffix also gives user hints about what kind of transformation has been done. With this patch, I built a recent linux kernel with full LTO enabled. I got 4 functions with only argpromotion like ``` set_track_update.argelim.argprom pmd_trans_huge_lock.argprom ... ``` I got 1058 functions with only deadargelim like ``` process_bit0.argelim pci_io_ecs_init.argelim ... ``` I got 3 functions with both argpromotion and deadargelim ``` set_track_update.argelim.argprom zero_pud_populate.argelim.argprom zero_pmd_populate.argelim.argprom ``` [1] https://github.com/llvm/llvm-project/issues/104678	2024-09-19 10:21:58 +02:00
Nikita Popov	30cdf1e959	[SimplifyCFG] Pass context instruction to isSafeToSpeculativelyExecute() (#109132 ) Pass speculation target and assumption cache to isSafeToSpeculativelyExecute() calls. This allows speculating based on dereferenceable/align assumptions, but the primary motivation here is to avoid regressions from planned changes to fix https://github.com/llvm/llvm-project/issues/108854.	2024-09-19 10:19:15 +02:00
Kristóf Umann	752e10379c	[analyzer] Explicitly register NoStoreFuncVisitor from alpha.unix.cst… (#108373 ) …ring.UninitRead This is a drastic simplification of #106982. If you read that patch, this is the same thing with all BugReporterVisitors.cpp and SValBuilder.cpp changes removed! (since all replies came regarding changed to those files, I felt the new PR was justified) The patch was inspired by a pretty poor bug report on FFMpeg: ![image](https://github.com/user-attachments/assets/8f4e03d8-45a4-4ea2-a63d-3ab78d097be9) In this bug report, block is uninitialized, hence the bug report that it should not have been passed to memcpy. The confusing part is in line 93, where block was passed as a non-const pointer to seq_unpack_rle_block, which was obviously meant to initialize block. As developers, we know that clang likely didn't skip this function and found a path of execution on which this initialization failed, but NoStoreFuncVisitor failed to attach the usual "returning without writing to block" message. I fixed this by instead of tracking the entire array, I tracked the actual element which was found to be uninitialized (Remember, we heuristically only check if the first and last-to-access element is initialized, not the entire array). This is how the bug report looks now, with 'seq_unpack_rle_block' having notes describing the path of execution and lack of a value change: ![image](https://github.com/user-attachments/assets/8de5d101-052e-4ecb-9cd9-7c29724333d2) ![image](https://github.com/user-attachments/assets/8bf52a95-62de-44e7-aef8-03a46a3fa08e) Since NoStoreFuncVisitor was a TU-local class, I moved it back to BugReporterVisitors.h, and registered it manually in CStringChecker.cpp. This was done because we don't have a good trackRegionValue() function, only a trackExpressionValue() function. We have an expression for the array, but not for its first (or last-to-access) element, so I only had a MemRegion on hand.	2024-09-19 10:04:47 +02:00
Rainer Orth	0a3b6af768	[ASan][test] Skip Linux/odr_c_test.c on SPARC (#109111 ) When ASan testing is enabled on SPARC as per PR #107405, the ``` AddressSanitizer-sparc-linux :: TestCases/Linux/odr_c_test.c ``` test `FAIL`s on Linux/sparc64: ``` + projects/compiler-rt/test/asan/SPARCLinuxConfig/TestCases/Linux/Output/odr_c_test.c.tmp + count 0 Expected 0 lines, got 13. AddressSanitizer:DEADLYSIGNAL ================================================================= ==4165420==ERROR: AddressSanitizer: BUS on unknown address (pc 0x7012d5b4 bp 0xffa3b938 sp 0xffa3b8d0 T0) ==4165420==The signal is caused by a READ memory access. ==4165420==Hint: this fault was caused by a dereference of a high value address (see register values below). Disassemble the provided pc to learn which register was used. ``` The test relies on an unaligned access, which cannot work on a strict-alignment target like SPARC. Thus this patch skips the test. Tested on `sparc64-unknown-linux-gnu`.	2024-09-19 10:04:18 +02:00
Elvis Wang	edc71e22c0	[RISCV][TTI] Add instruction cost for vp.load/store. (#109245 ) This patch makes the instruction cost of vp.load/store same as their non-vp counterpart.	2024-09-19 16:00:21 +08:00
Nikita Popov	7183771834	[InitUndef] Also handle inline asm (#108951 ) InitUndef should also handle early-clobber / undef conflicts in inline asm operands. Do this by iterating over all_defs() instead of defs(). The newly added ARM test was generating an "unpredictable STXP instruction, status is also a source" error prior to this change. Fixes https://github.com/llvm/llvm-project/issues/106380.	2024-09-19 09:59:36 +02:00
David Green	4c50112ba1	[AArch64] Add patterns for 64bit vector addp This extends the existing patterns for addp to 64bit outputs with a single input. Whilst the general pattern is similar to the 128bit patterns (add(uzp1(extract_lo, extract_hi), uzp2(extract_lo, extract_hi))), at the late stage other optimzations have happened to turn the first uzp1 into trunc and the second into extract(uzp2) with undef. Fixes #109108	2024-09-19 08:50:43 +01:00
Nikita Popov	4ec4ac15ed	[SCEVExpander] Fix addrec cost model (#106704 ) The current isHighCostExpansion cost model for addrecs computes the cost for some kind of polynomial expansion that does not appear to have any relation to addrec expansion whatsoever. A literal expansion of an affine addrec is a phi and add (plus the expansion of start and step). For a non-affine addrec, we get another phi+add for each additional addrec nested in the step recurrence. This partially `fixes` https://github.com/llvm/llvm-project/issues/53205 (the runtime unroll test case in this PR).	2024-09-19 09:39:35 +02:00
Phoebe Wang	c18be32185	Reland "[X86][BF16] Add libcall for F80 -> BF16 (#109116 )" (#109143 ) This reverts commit ababfee78714313a0cad87591b819f0944b90d09. Add X86 FP80 check.	2024-09-19 15:39:07 +08:00
Nikita Popov	dc6876fc98	[ValueTracking] Use isSafeToSpeculativelyExecuteWithVariableReplaced() in more places (#109149 ) This replaces some uses of isSafeToSpeculativelyExecute() with isSafeToSpeculativelyExecuteWithVariableReplaced(), in cases where we are guarding against operand changes rather plain speculation. I believe that this is NFC with the current implementation of the function (as it only does something different from loads), but this makes us more defensive against future generalizations.	2024-09-19 09:38:20 +02:00
David Green	4e3781607c	[ARM][MVE] Add vector tests for ucmp/scmp. NFC	2024-09-19 08:32:23 +01:00
pvanhout	da1a222337	[AMDGPU] Regenerate load-constant-i1 test Fix failure caused by #106383	2024-09-19 09:23:59 +02:00
Timm Baeder	904f58e6b9	[clang][bytecode] Use field descriptor in IntPointer::atOffset (#109238 ) We're otherwise still pointing to the old type, but with the new offset.	2024-09-19 09:12:17 +02:00
Pierre van Houtryve	758444ca3e	[AMDGPU] Promote uniform ops to I32 in DAGISel (#106383 ) Promote uniform binops, selects and setcc between 2 and 16 bits to 32 bits in DAGISel Solves #64591	2024-09-19 09:00:21 +02:00
Him188	77af9d1023	[AArch64][GlobalISel] Implement selectVaStartAAPCS (#106979 ) This commit adds the missing support for varargs in the instruction selection pass for AAPCS. Previously we only implemented this for Darwin. The implementation was according to AAPCS and SelectionDAG's LowerAAPCS_VASTART. It resolves all VA_START fallbacks in RAJAperf, llvm-test-suite, and SPEC CPU2017. These benchmarks now compile and pass without fallbacks due to varargs. --------- Co-authored-by: Madhur Amilkanthwar <madhura@nvidia.com>	2024-09-19 11:48:14 +05:30
Fraser Cormack	90330e993d	[NVPTX] Set v2i16 SETCC to Expand (#108969 ) Note that this refers to the return type of SETCC. This operation is not legal in PTX but was assumed as such because v2i16 is declared a legal type. We were already expanding v4i8 SETCC. The DAGCombiner would in certain circumstances try to fold an extension of an illegal v2i1 SETCC (because v2i1 is illegal) into a "legal" v2i16 SETCC, which we wouldn't have patterns for.	2024-09-19 07:12:32 +01:00
Fangrui Song	e82f0838ae	[ELF] --icf: don't fold a section without relocation and a section with relocations for SHT_CREL Similar to commit 686cff17cc310884e48ae963bf7507f96950cc90 for SHT_REL (#57693). CREL hasn't been tested with ICF before. And avoid a pitfall that eqClass[0] might interfere with ICF.	2024-09-18 23:06:12 -07:00
Brendan Shanks	7281e0cb3b	[lldb] [debugserver] Use "full" x86_64 GPR state when available. (#108663 ) macOS 10.15 added a "full" x86_64 GPR thread state flavor, equivalent to the normal one but with DS, ES, SS, and GSbase added. This flavor can only be used with processes that install a custom LDT (functionality that was also added in 10.15 and is used by apps like Wine to execute 32-bit code). Along with allowing DS, ES, SS, and GSbase to be viewed/modified, using the full flavor is necessary when debugging a thread executing 32-bit code. If thread_set_state() is used with the regular thread state flavor, the kernel resets CS to the 64-bit code segment (see [set_thread_state64()](`94d3b45284/osfmk/i386/pcb.c (L723)`), which makes debugging impossible. There's no way to detect whether the full flavor is available, try to use it and fall back to the regular one if it's not available. A downside is that this patch exposes the DS, ES, SS, and GSbase registers for all x86_64 processes, even though they are not populated unless the full thread state is available. I'm not sure if there's a way to tell LLDB that a register is unavailable. The classic GDB `g` command [allows returning `x`](https://sourceware.org/gdb/current/onlinedocs/gdb.html/Packets.html#Packets) to denote unavailable registers, but it seems like the debug server uses newer commands like `jThreadsInfo` and I'm not sure if those have the same support. Fixes #57591 (also filed as Apple FB11464104) @jasonmolenda	2024-09-18 22:57:01 -07:00
Rahul Joshi	23123aa4ec	[LLVM][TableGen] Change InstrInfoEmitter to use const RecordKeeper (#109189 ) Change InstrInfoEmitter to use const RecordKeeper. This is a part of effort to have better const correctness in TableGen backends: https://discourse.llvm.org/t/psa-planned-changes-to-tablegen-getallderiveddefinitions-api-potential-downstream-breakages/81089	2024-09-18 22:27:26 -07:00
Rahul Joshi	7603e85429	[LLVM][TableGen] Change PseudoLoweringEmitter to use const RecordKeeper (#109194 ) Change PseudoLoweringEmitter to use const RecordKeeper. This is a part of effort to have better const correctness in TableGen backends: https://discourse.llvm.org/t/psa-planned-changes-to-tablegen-getallderiveddefinitions-api-potential-downstream-breakages/81089	2024-09-18 22:26:48 -07:00
Valentin Clement (バレンタインクレメン)	4194e8dea5	[flang][cuda][NFC] Fix grammar in CanCUDASymbolHasSave function name (#109234 )	2024-09-18 22:14:30 -07:00
Valentin Clement (バレンタインクレメン)	5e1a54b298	[flang][cuda][NFC] Add more descriptor inquiry tests for data transfer (#108094 ) Make sure there is no data transfer generated when a device variable is used in these intrinsic functions.	2024-09-18 21:45:32 -07:00
Rahul Joshi	0f06f707ec	[NFC] Cleanup RegisterInfoEmitter code (#109199 ) Change variable name `o` to `OS` to match definition, and `ClName` to `ClassName` for better clarity. Cache RegBank reference in the class and do no pass around class members to functions.	2024-09-18 21:42:52 -07:00
Craig Topper	80f6b42a26	[MachinePipeliner] Fix incorrect use of getPressureSets. (#109179 ) The code was passing a physical register directly to getPressureSets which expects a register unit. Fix this by looping over the register units and calling getPressureSets for each of them. Found while trying to add a RegisterUnit class to stop storing register units in `Register`. 0 is a valid register unit but not a valid Register.	2024-09-18 21:34:05 -07:00
Mircea Trofin	12d94850cd	[ctx_prof] Avoid `llvm::append_range` to fix some build bots Example: https://lab.llvm.org/buildbot/#/builders/169/builds/3381 The CI allowed the `llvm::append_range` instantiation, but on the other hand it's quite unnecessary here.	2024-09-18 21:19:28 -07:00
Mircea Trofin	ee5709b3b4	[nfc][ctx_prof] Don't try finding callsite annotation for un-instrumentable callsites (#109184 ) Reinforcing properties ensured at instrumentation time.	2024-09-18 21:13:48 -07:00
Mircea Trofin	ce9209f50e	[ctx_prof] Fix `ProfileAnnotator::allTakenPathsExit` (#109183 ) Added tests to the validator and fixed issues stemming from the previous skipping over BBs with single successors - which is incorrect. That would be now picked by added tests where the assertions are expected to be triggered.	2024-09-18 21:08:34 -07:00
Valentin Clement	156035ed4d	[flang][cuda] Convert module allocation/deallocation to runtime calls Convert `cuf.allocate` and `cuf.deallocate` to the runtime entry points added in #109213 Was reviewed in https://github.com/llvm/llvm-project/pull/109214 but the parent branch was closed for some reason.	2024-09-18 20:49:08 -07:00
Rahul Joshi	56015da593	[LLVM][TableGen] Change RegisterBankEmitter to use const RecordKeeper (#109195 ) Change RegisterBankEmitter to use const RecordKeeper. This is a part of effort to have better const correctness in TableGen backends: https://discourse.llvm.org/t/psa-planned-changes-to-tablegen-getallderiveddefinitions-api-potential-downstream-breakages/81089	2024-09-18 20:45:26 -07:00

1 2 3 4 5 ...

512336 Commits