llvm-project

mirror of https://github.com/llvm/llvm-project.git synced 2025-04-16 23:36:35 +00:00

Author	SHA1	Message	Date
Nikita Popov	2fbfbf499e	[cmake] Resolve symlink when finding install prefix (#124743 ) When determining the install prefix in LLVMConfig.cmake etc resolve symlinks in CMAKE_CURRENT_LIST_FILE first. The motivation for this is to support symlinks like `/usr/lib64/cmake/llvm` to `/usr/lib64/llvm19/lib/cmake/llvm`. This only works correctly if the paths are relative to the resolved symlink. It's worth noting that this mostly already works out of the box, because cmake automatically does the symlink resolution when the library is found via CMAKE_PREFIX_PATH. It just doesn't happen when it's found via the default prefix path.	2025-03-18 14:48:40 +01:00
Maryam Moghadas	22c6674f1d	[PowerPC] Add Dense Math binary integer outer-Product accumulate to DMR Instructions (#130791 ) This commit adds the following Dense Math Facility integer calculation instructions: dmxvi8gerx4, dmxvi8gerx4pp, dmxvi8gerx4spp, pmdmxvi8gerx4, pmdmxvi8gerx4pp, and pmdmxvi8gerx4spp, along with their corresponding intrinsics and tests.	2025-03-18 09:40:07 -04:00
Simon Pilgrim	a2d7451a13	[CostModel][X86] merge bitreverse costs tests using -cost-kind=all (#131791 )	2025-03-18 13:30:01 +00:00
Simon Pilgrim	4f5eed0a37	[CostModel][X86] merge bswap costs tests using -cost-kind=all (#131784 )	2025-03-18 13:29:25 +00:00
Akash Banerjee	cbc5c11fec	[MLIR][OpenMP] Add Lowering support for implicitly linking to default declare mappers (#131006 )	2025-03-18 13:17:10 +00:00
Kareem Ergawy	83658ddb1b	[flang][OpenMP] Enable delayed privatization by default for `omp.distribute` (#131574 ) Switches delayed privatization for `omp.distribute` to be on by default: controlled by the `-openmp-enable-delayed-privatization` instead of by `-openmp-enable-delayed-privatization-staging`. ### GFortran & Fujitsu test suite results: #### gfotran test-suite (this PR): ``` Testing Time: 34.51s Passed: 6569 ``` #### Fujitsu without changes (commit: 0813c5cf5f52): ``` Testing Time: 155.39s Passed : 88325 Failed : 156 Executable Missing: 408 ``` #### Fujitsu with changes (this PR): ``` Testing Time: 158.54s Passed : 88325 Failed : 156 Executable Missing: 408 ```	2025-03-18 14:07:41 +01:00
Vladislav Dzhidzhoev	84e44ae6b7	[llvm-objdump] Pass MCSubtargetInfo to findPltEntries (NFC) (#131773 ) It allows access to subtarget features, collected in llvm-objdump.cpp, from findPltEntries, which will be used in https://github.com/llvm/llvm-project/pull/130764.	2025-03-18 14:00:34 +01:00
Louis Dionne	428b320bf3	[libc++] Fix allocate_at_least test that assumes the size_type of the allocator (#131682 ) If the size_type of the allocator is not the same as std::size_t, this test would fail.	2025-03-18 08:55:43 -04:00
SivanShani-Arm	23743f5bf9	[readobj][ELFExtendedAttrParser] Add destructor with error handling (#131783 ) ELFExtendedAttrParser lacked a destructor that properly handled errors, causing `llvm-readobj --arch-specific` to crash when the AArch64 Build Attributes section was empty. This commit adds error handling in the destructor and introduces test files for `--arch-specific` to cover both an empty AArch64 Build Attributes section and a populated one. Fixes: `b1ebfac185`	2025-03-18 12:29:37 +00:00
Phoebe Wang	3d63191467	[X86] Ignore NSW when DstSVT is i32 (#131755 ) We don't have PACKSS for i64->i32. Fixes: https://godbolt.org/z/qb8nxnPbK, which was introduced by ddd2f57b	2025-03-18 20:04:23 +08:00
David Sherwood	2586e7fcd8	[LV][NFC] Tidy up partial reduction tests with filter-out-after option (#129047 ) A few test files seemed to have been edited after using the update_test_checks.py script, which can make life hard for developers when trying to update these tests in future patches. Also, the tests still had this comment at the top ; NOTE: Assertions have been autogenerated by ... which could potentially be confusing, since they've not strictly been auto-generated. I've attempted to keep the spirit of the original tests by excluding all CHECK lines after the scalar.ph IR block, however I've done this by using a new option called --filter-out-after to the update_test_checks.py script.	2025-03-18 11:39:55 +00:00
Simon Pilgrim	aea3ad8bd3	[X86] canCreateUndefOrPoisonForTargetNode - add handling for VPERMV3 intrinsic opcodes (#131768 ) We already handle the X86ISD::VPERMV3 node type, but if we can handle equivalent cases before intrinsic lowering we can simplify the code further - e.g. #109272 before constant BUILD_VECTOR nodes gets lowered to constant pool loads.	2025-03-18 11:35:58 +00:00
Fabian Ritter	332f060363	[SeparateConstOffsetFromGEP] Don't set unsound inbounds flag (#130616 ) The language reference says about inbounds geps that "if the getelementptr has any non-zero indices[...] [t]he base pointer has an in bounds address of the allocated object that it is based on [and] [d]uring the successive addition of offsets to the address, the resulting pointer must remain in bounds of the allocated object at each step." If (gep inbounds p, (a + 5)) is translated to (gep [inbounds] (gep p, a), 5) with p pointing to the beginning of an object and a=-4, as the example in the comments suggests, that's the case for neither of the resulting geps. Therefore, we need to clear the inbounds flag for both geps. We might want to use ValueTracking to check if a is known to be non-negative to preserve the inbounds flags. For the AMDGPU tests with scratch instructions, removing the unsound inbounds flag means that AMDGPUDAGToDAGISel::isFlatScratchBaseLegal sees no NUW flag at the pointer add, which prevents generation of scratch instructions with immediate offsets. For SWDEV-516125.	2025-03-18 12:30:20 +01:00
Aaron Ballman	9cf46fb230	[C2y] Add octal prefixes, deprecate unprefixed octals (#131626 ) WG14 N3353 added support for 0o and 0O as octal literal prefixes. It also deprecates use of octal literals without a prefix, except for the literal 0. This feature is being exposed as an extension in older C language modes as well as in all C++ language modes.	2025-03-18 07:28:59 -04:00
Simon Pilgrim	31e98c7037	[CostModel][X86] merge abs costs tests using -cost-kind=all (#131619 ) Now that we have #130490 - merge the cost test files to avoid bitrot Lots more set of files to do - but this is give an example	2025-03-18 11:19:05 +00:00
quic_hchandel	5d53a88416	[RISCV] Change RISCVMCExpr::VK_RISCV_None to RISCVMCExpr::VK_None (#131774 ) Fix RISCVMCExpr::VK_RISCV_None which were added in #130779	2025-03-18 16:43:42 +05:30
Diana Picus	0a21ef9536	[AMDGPU] Add SubtargetFeature for dynamic VGPR mode (#130030 ) This represents a hardware mode supported only for wave32 compute shaders. When enabled, we set the `.dynamic_vgpr_en` field of `.compute_registers` to true in the PAL metadata. This will be changed to use an attribute after downstream consumers have been migrated.	2025-03-18 11:48:01 +01:00
Balazs Benics	5865807421	Reapply "[analyzer] Delay the checker constructions after parsing" (#128369 ) Reapply "[analyzer] Delay the checker constructions after parsing" (#128350) This reverts commit db836edf47f36ed04cab919a7a2c4414f4d0d7e6, as-is. Depends on #128368	2025-03-18 11:40:39 +01:00
Matt Arsenault	c180fc80dc	AMDGPU: Replace unused permlane inputs with poison instead of undef (#131288 )	2025-03-18 17:37:44 +07:00
Matt Arsenault	052eca9ff7	AMDGPU: Replace unused update.dpp inputs with poison instead of undef (#131287 )	2025-03-18 17:33:58 +07:00
Matt Arsenault	8392573469	AMDGPU: Replace unused export inputs with poison instead of undef (#131286 )	2025-03-18 17:30:42 +07:00
Matt Arsenault	c5fe075eaf	AMDGPU: Use freeze poison instead of undef in alloca promotion (#131285 ) Previously the value created to represent the uninitialized memory of the alloca was undef. Use freeze poison instead. Enables some optimization improvements (which need defeating in the limit tests), but also a few regressions. Seems to leave behind dead code in some cases too.	2025-03-18 17:27:02 +07:00
cor3ntin	f7716047c6	[Clang][NFC] Cleanup UnaryExprOrTypeTraitExpr itanium mangling code (#131764 ) Just removing some code duplication. Extracted from #131515	2025-03-18 11:26:48 +01:00
Benjamin Maxwell	f406b28f8b	[AArch64][SVE] Fold integer lane extract and store to FPR store (#129756 ) This helps avoid pointless fmovs to GPRs, which may be slow, especially in streaming mode.	2025-03-18 10:10:23 +00:00
Valery Pykhtin	4ad0aa73b7	[SSAUpdaterBulk] Add expectedly failing loop tests. (#131761 ) These tests demonstrate the issue in SSAUpdaterBulk when it calculates incoming values from loop back edges. The failures are marked with `EXPECT_NONFATAL_FAILURE`, which is the way to designate an "expected fail" in the Google Test suite.	2025-03-18 10:55:56 +01:00
Kareem Ergawy	1094ffcafb	[flang][fir] Add MLIR op for `do concurrent` (#130893 ) Adds new MLIR ops to model `do concurrent`. In order to make `do concurrent` representation self-contained, a loop is modeled using 2 ops, one wrapper and one that contains the actual body of the loop. For example, a 2D `do concurrent` loop is modeled as follows: ```mlir fir.do_concurrent { %i = fir.alloca i32 %j = fir.alloca i32 fir.do_concurrent.loop (%i_iv, %j_iv) = (%i_lb, %j_lb) to (%i_ub, %j_ub) step (%i_st, %j_st) { %0 = fir.convert %i_iv : (index) -> i32 fir.store %0 to %i : !fir.ref<i32> %1 = fir.convert %j_iv : (index) -> i32 fir.store %1 to %j : !fir.ref<i32> } } ``` The `fir.do_concurrent` wrapper op encapsulates both the actual loop and the allocations required for the iteration variables. The `fir.do_concurrent.loop` op is a multi-dimensional op that contains the loop control and body. See the ops' docs for more info.	2025-03-18 10:53:44 +01:00
quic_hchandel	036c6cb37c	[RISCV] Add Qualcomm uC Xqcibi (Branch Immediate) extension (#130779 ) This extension adds twelve conditional branch instructions that use an immediate operand for the source. The current spec can be found at: https://github.com/quic/riscv-unified-db/releases/tag/Xqci-0.7.0 This patch adds assembler only support. Co-authored-by: Sudharsan Veeravalli <quic_svs@quicinc.com>	2025-03-18 15:18:43 +05:30
David Sherwood	194eceff43	update_test_checks: add new --filter-out-after option (#129739 ) Whilst trying to clean up some loop vectoriser IR tests (see test/Transforms/LoopVectorize/AArch64/partial-reduce-chained.ll for example) a reviewer on PR #129047 suggested it would be nice to have an option to stop generating CHECK lines after a certain point. Typically when performing a transformation with the loop vectoriser we don't usually care about any CHECK lines generated for the scalar tail of the loop, since the scalar loop is kept intact. Previously if you wanted to eliminate such unwanted CHECK lines you had to run the update script, then manually delete all the lines corresponding to the scalar loop. This can be very time consuming if the tests ever need changing. What I've tried to do here is add a new --filter-out-after option alongside the existing --filter* options that provides support for stopping the generation of any CHECK lines beyond the line that matches the filter. With the existing filter options we never generate CHECK-NEXT lines, but we still care about ordering with --filter-out-after so I've amended the code to ensure we treat this filter differently.	2025-03-18 09:46:43 +00:00
Srinivasa Ravi	c42952a782	[MLIR][NVVM] Add support for match.sync Op (#130718 ) This change adds the `match.sync` Op to the MLIR NVVM dialect to generate the `match.sync` PTX instruction. PTX Spec Reference: https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#parallel-synchronization-and-communication-instructions-match-sync	2025-03-18 14:54:24 +05:30
Kareem Ergawy	49b8d8472f	[OpenMP][MLIR] Support LLVM translation for `distribute` with delayed privatization (#131564 ) Adds support for tranlating delayed privatization (`private` and `firstprivate`) for `omp.distribute` ops.	2025-03-18 10:14:42 +01:00
Lucas Duarte Prates	44e4b27aec	[clang] Fix darwin-related tests' REQUIRES annotation (#130138 ) The tests updated by this commit were designed to check features in the clang's driver and index that require clang to be targgeting a darwin platform while running on a darwin host. For that, their execution is currently gated by the `REQUIRES: system-darwin` annotation. This approach becomes a problem when trying to run such tests on a cross-compiling build of clang on a darwin platform. When the default target is not darwin (e.g. via `LLVM_DEFAULT_TARGET_TRIPLE `), the tests will still run on a darwin host and fail spuriously because of the mismatch with the target detection. To fix this issue, this patch introduces an extra condition to the tests' REQUIRES annotation, `target={{.}}-{{darwin\|macos}}{{.}}`, ensuring they only run when the relevant target is present.	2025-03-18 09:11:43 +00:00
David Green	bd1be8a242	[CodeGen][GlobalISel] Add a getVectorIdxWidth and getVectorIdxLLT. (#131526 ) From #106446, this adds a variant of getVectorIdxTy that returns an LLT. Many uses only look at the width, so a getVectorIdxWidth was added as the common base.	2025-03-18 08:31:11 +00:00
Matthias Springer	e614e840bc	[mlir][memref] Add runtime verification for `memref.dim` (#130410 ) Add runtime verification for `memref.dim`: check that the index is in bounds. Also simplify the pass pipeline for all memref runtime verification checks.	2025-03-18 09:10:49 +01:00
Mel Chen	489d1e764e	[LV][NFC] Pre-commit test for supporting strided accesses. (#130563 ) Duplicate riscv-vector-reverse.ll as riscv-vector-reverse-output.ll to verify all generated IR, not just debug output. Pre-commit for #128718.	2025-03-18 16:08:42 +08:00
lorenzo chelini	57dc71352c	[MLIR][Bufferization] Retire `enforce-aliasing-invariants` (#130929 ) Why? This option can lead to incorrect IR if used in isolation, for example, consider the IR below: ```mlir func.func @loop_with_aliasing(%arg0: tensor<5xf32>, %arg1: index, %arg2: index) -> tensor<5xf32> { %c1 = arith.constant 1 : index %cst = arith.constant 1.000000e+00 : f32 %0 = tensor.empty() : tensor<5xf32> %1 = linalg.fill ins(%cst : f32) outs(%0 : tensor<5xf32>) -> tensor<5xf32> // The BufferizableOpInterface says that %2 alias with %arg0 or be a newly // allocated buffer %2 = scf.for %arg3 = %arg1 to %arg2 step %c1 iter_args(%arg4 = %arg0) -> (tensor<5xf32>) { scf.yield %1 : tensor<5xf32> } %cst_0 = arith.constant 1.000000e+00 : f32 %inserted = tensor.insert %cst_0 into %1[%c1] : tensor<5xf32> return %2 : tensor<5xf32> } ``` If we bufferize with: enforce-aliasing-invariants=false, we get: ``` func.func @loop_with_aliasing(%arg0: memref<5xf32, strided<[?], offset: ?>>, %arg1: index, %arg2: index) -> memref<5xf32, strided<[?], offset: ?>> { %c1 = arith.constant 1 : index %cst = arith.constant 1.000000e+00 : f32 %alloc = memref.alloc() {alignment = 64 : i64} : memref<5xf32> linalg.fill ins(%cst : f32) outs(%alloc : memref<5xf32>) %0 = scf.for %arg3 = %arg1 to %arg2 step %c1 iter_args(%arg4 = %arg0) -> (memref<5xf32, strided<[?], offset: ?>>) { %cast = memref.cast %alloc : memref<5xf32> to memref<5xf32, strided<[?], offset: ?>> scf.yield %cast : memref<5xf32, strided<[?], offset: ?>> } %cst_0 = arith.constant 1.000000e+00 : f32 memref.store %cst_0, %alloc[%c1] : memref<5xf32> return %0 : memref<5xf32, strided<[?], offset: ?>> } ``` Which is not correct IR since the loop yields the allocation. I am using this option. What do I need to do now? If you are using this option in isolation, you are possibly generating incorrect IR, so you need to revisit your bufferization strategy. If you are using it together with `copyBeforeWrite,` you simply need to retire the `enforceAliasingInvariants` option. Co-authored-by: Matthias Springer <mspringer@nvidia.com>	2025-03-18 08:42:43 +01:00
Kazu Hirata	58dd3eda4e	[Utils] Avoid repeated hash lookups (NFC) (#131723 )	2025-03-18 00:27:23 -07:00
Kazu Hirata	62204482c0	[CodeGen] Avoid repeated hash lookups (NFC) (#131722 )	2025-03-18 00:26:59 -07:00
Kazu Hirata	f6ad65a824	[ADT] Add SmallPtrSet::insert_range (#131716 ) This pach adds SmallPtrSet::insert_range for consistency with DenseSet::insert_range and std::set::insert_range from C++23.	2025-03-18 00:21:07 -07:00
Kazu Hirata	2df0254828	[ADT] Add SmallSet::insert_range (#131717 ) This patch adds SmallSet::insert_range for consistency with DenseSet::insert_range and std::set::insert_range from C++23.	2025-03-18 00:20:15 -07:00
Kazu Hirata	fc38982e93	[ADT] Add SetVector::insert_range (#131715 ) This patch adds SetVector::insert_range for consistency with DenseSet::insert_range and std::set::insert_range from C++23.	2025-03-18 00:19:48 -07:00
Petr Hosek	1fbfef9b8a	[libc] Templatize the scanf Reader interface (#131037 ) This allows specializing the implementation for different targets without including unnecessary logic and is similar to #111559 which did the same for printf Writer interface.	2025-03-17 23:51:24 -07:00
Ryosuke Niwa	4781941160	[alpha.webkit.UncountedCallArgsChecker] os_log functions should be treated as safe. (#131500 ) …os_log functions should be treated as safe in call arguments checkers. Also treat __builtin_* functions and __libcpp_verbose_abort functions as "trivial" for the purpose in call argument checkers.	2025-03-17 23:47:10 -07:00
Timm Baeder	2f808dd070	[clang][bytecode] Compile most recent function decl (#131730 ) We used to always do this because all calls went through the code path that calls getMostRecentDecl(). Do it now, too.	2025-03-18 07:29:38 +01:00
Akshat Oke	6be6400848	[LiveDebugValues][NFC] Remove TargetPassConfig from LDVImpl (#131562 ) TPC is only used to access the option `ShouldEmitDebugEntryValues`.	2025-03-18 11:04:54 +05:30
Sushant Gokhale	0f34eba48a	[NFC][AArch64] test for fixed-width vector signed division with pow2-divisor and SVE enabled (#130252 ) With SVE enabled, this should generate asrd instruction. Subsequent patch will address this.	2025-03-17 22:31:22 -07:00
Vikash Gupta	bdb63208b4	[AMDGPU][CodeGen] Using MBB's liveIn check in tandem with MCRegAliasIterator in SILowerSGPRSpills (#129848 ) This patch replaces use of MachineRegisterInfo's liveIn check with the machine basicBlock's liveIn. As the MRI's liveIn is inconsistent with the entry MBB liveIns, when it comes to the machine verifier checks. PS: Its an alternative solution with respect to #126926.	2025-03-18 10:51:07 +05:30
Craig Topper	0813c5cf5f	[RISCV] Accept '0(reg)' in addition to '(reg)' for vle1.v/vse1.v	2025-03-17 20:51:16 -07:00
Fangrui Song	b9d27ac252	[MC] Fix formatting of a comment	2025-03-17 20:24:08 -07:00
Kazu Hirata	c72f7958b0	[BOLT] Fix the build This is a follow-up for: commit 3c4b9317916ccd2e18c30b1540589518a4c7c88a Author: Fangrui Song <i@maskray.me> Date: Mon Mar 17 20:05:28 2025 -0700	2025-03-17 20:18:34 -07:00
Fangrui Song	e758237352	[docs] Mention --discard-locals/--discard-all change for llvm-strip PR #130704 updated llvm-strip as well. Suggested by @nga888 Pull Request: https://github.com/llvm/llvm-project/pull/131491	2025-03-17 20:09:52 -07:00

1 2 3 4 5 ...

531002 Commits