531002 Commits

Author SHA1 Message Date
Nikita Popov
2fbfbf499e
[cmake] Resolve symlink when finding install prefix (#124743)
When determining the install prefix in LLVMConfig.cmake etc resolve
symlinks in CMAKE_CURRENT_LIST_FILE first. The motivation for this is to
support symlinks like `/usr/lib64/cmake/llvm` to
`/usr/lib64/llvm19/lib/cmake/llvm`. This only works correctly if the
paths are relative to the resolved symlink.

It's worth noting that this *mostly* already works out of the box,
because cmake automatically does the symlink resolution when the library
is found via CMAKE_PREFIX_PATH. It just doesn't happen when it's found
via the default prefix path.
2025-03-18 14:48:40 +01:00
Maryam Moghadas
22c6674f1d
[PowerPC] Add Dense Math binary integer outer-Product accumulate to DMR Instructions (#130791)
This commit adds the following Dense Math Facility integer calculation
instructions: dmxvi8gerx4, dmxvi8gerx4pp, dmxvi8gerx4spp, pmdmxvi8gerx4,
pmdmxvi8gerx4pp, and pmdmxvi8gerx4spp, along with their corresponding
intrinsics and tests.
2025-03-18 09:40:07 -04:00
Simon Pilgrim
a2d7451a13
[CostModel][X86] merge bitreverse costs tests using -cost-kind=all (#131791) 2025-03-18 13:30:01 +00:00
Simon Pilgrim
4f5eed0a37
[CostModel][X86] merge bswap costs tests using -cost-kind=all (#131784) 2025-03-18 13:29:25 +00:00
Akash Banerjee
cbc5c11fec
[MLIR][OpenMP] Add Lowering support for implicitly linking to default declare mappers (#131006) 2025-03-18 13:17:10 +00:00
Kareem Ergawy
83658ddb1b
[flang][OpenMP] Enable delayed privatization by default for omp.distribute (#131574)
Switches delayed privatization for `omp.distribute` to be on by default:
controlled by the `-openmp-enable-delayed-privatization` instead of by
`-openmp-enable-delayed-privatization-staging`.

### GFortran & Fujitsu test suite results:

#### gfotran test-suite (this PR):
```
Testing Time: 34.51s
  Passed: 6569
```

#### Fujitsu without changes (commit: 0813c5cf5f52):
```
Testing Time: 155.39s
  Passed            : 88325
  Failed            :   156
  Executable Missing:   408
```

#### Fujitsu with changes (this PR):
```
Testing Time: 158.54s
  Passed            : 88325
  Failed            :   156
  Executable Missing:   408
```
2025-03-18 14:07:41 +01:00
Vladislav Dzhidzhoev
84e44ae6b7
[llvm-objdump] Pass MCSubtargetInfo to findPltEntries (NFC) (#131773)
It allows access to subtarget features, collected in llvm-objdump.cpp,
from findPltEntries, which will be used in
https://github.com/llvm/llvm-project/pull/130764.
2025-03-18 14:00:34 +01:00
Louis Dionne
428b320bf3
[libc++] Fix allocate_at_least test that assumes the size_type of the allocator (#131682)
If the size_type of the allocator is not the same as std::size_t, this
test would fail.
2025-03-18 08:55:43 -04:00
SivanShani-Arm
23743f5bf9
[readobj][ELFExtendedAttrParser] Add destructor with error handling (#131783)
ELFExtendedAttrParser lacked a destructor that properly handled errors,
causing `llvm-readobj --arch-specific` to crash when the AArch64 Build
Attributes section was empty.

This commit adds error handling in the destructor and introduces test
files for `--arch-specific` to cover both an empty AArch64 Build
Attributes section and a populated one.

Fixes:

b1ebfac185
2025-03-18 12:29:37 +00:00
Phoebe Wang
3d63191467
[X86] Ignore NSW when DstSVT is i32 (#131755)
We don't have PACKSS for i64->i32.

Fixes: https://godbolt.org/z/qb8nxnPbK, which was introduced by ddd2f57b
2025-03-18 20:04:23 +08:00
David Sherwood
2586e7fcd8
[LV][NFC] Tidy up partial reduction tests with filter-out-after option (#129047)
A few test files seemed to have been edited after using the
update_test_checks.py script, which can make life hard for
developers when trying to update these tests in future
patches. Also, the tests still had this comment at the top

; NOTE: Assertions have been autogenerated by ...

which could potentially be confusing, since they've not
strictly been auto-generated.

I've attempted to keep the spirit of the original tests by
excluding all CHECK lines after the scalar.ph IR block,
however I've done this by using a new option called
--filter-out-after to the update_test_checks.py script.
2025-03-18 11:39:55 +00:00
Simon Pilgrim
aea3ad8bd3
[X86] canCreateUndefOrPoisonForTargetNode - add handling for VPERMV3 intrinsic opcodes (#131768)
We already handle the X86ISD::VPERMV3 node type, but if we can handle equivalent cases before intrinsic lowering we can simplify the code further - e.g. #109272 before constant BUILD_VECTOR nodes gets lowered to constant pool loads.
2025-03-18 11:35:58 +00:00
Fabian Ritter
332f060363
[SeparateConstOffsetFromGEP] Don't set unsound inbounds flag (#130616)
The language reference says about inbounds geps that "if the
getelementptr has any non-zero indices[...] [t]he base pointer has an in
bounds address of the allocated object that it is based on [and]
[d]uring the successive addition of offsets to the address, the
resulting pointer must remain in bounds of the allocated object at each
step."

If (gep inbounds p, (a + 5)) is translated to (gep [inbounds] (gep p,
a), 5) with p pointing to the beginning of an object and a=-4, as the
example in the comments suggests, that's the case for neither of the
resulting geps. Therefore, we need to clear the inbounds flag for both
geps.

We might want to use ValueTracking to check if a is known to be
non-negative to preserve the inbounds flags.

For the AMDGPU tests with scratch instructions, removing the unsound
inbounds flag means that AMDGPUDAGToDAGISel::isFlatScratchBaseLegal sees
no NUW flag at the pointer add, which prevents generation of scratch
instructions with immediate offsets.

For SWDEV-516125.
2025-03-18 12:30:20 +01:00
Aaron Ballman
9cf46fb230
[C2y] Add octal prefixes, deprecate unprefixed octals (#131626)
WG14 N3353 added support for 0o and 0O as octal literal prefixes. It
also deprecates use of octal literals without a prefix, except for the
literal 0.

This feature is being exposed as an extension in older C language modes
as well as in all C++ language modes.
2025-03-18 07:28:59 -04:00
Simon Pilgrim
31e98c7037
[CostModel][X86] merge abs costs tests using -cost-kind=all (#131619)
Now that we have #130490 - merge the cost test files to avoid bitrot

Lots more set of files to do - but this is give an example
2025-03-18 11:19:05 +00:00
quic_hchandel
5d53a88416
[RISCV] Change RISCVMCExpr::VK_RISCV_None to RISCVMCExpr::VK_None (#131774)
Fix RISCVMCExpr::VK_RISCV_None which were added in #130779
2025-03-18 16:43:42 +05:30
Diana Picus
0a21ef9536
[AMDGPU] Add SubtargetFeature for dynamic VGPR mode (#130030)
This represents a hardware mode supported only for wave32 compute
shaders. When enabled, we set the `.dynamic_vgpr_en` field of
`.compute_registers` to true in the PAL metadata.

This will be changed to use an attribute after downstream consumers
have been migrated.
2025-03-18 11:48:01 +01:00
Balazs Benics
5865807421
Reapply "[analyzer] Delay the checker constructions after parsing" (#128369)
Reapply "[analyzer] Delay the checker constructions after parsing"
(#128350)
    
This reverts commit db836edf47f36ed04cab919a7a2c4414f4d0d7e6, as-is.

Depends on #128368
2025-03-18 11:40:39 +01:00
Matt Arsenault
c180fc80dc
AMDGPU: Replace unused permlane inputs with poison instead of undef (#131288) 2025-03-18 17:37:44 +07:00
Matt Arsenault
052eca9ff7
AMDGPU: Replace unused update.dpp inputs with poison instead of undef (#131287) 2025-03-18 17:33:58 +07:00
Matt Arsenault
8392573469
AMDGPU: Replace unused export inputs with poison instead of undef (#131286) 2025-03-18 17:30:42 +07:00
Matt Arsenault
c5fe075eaf
AMDGPU: Use freeze poison instead of undef in alloca promotion (#131285)
Previously the value created to represent the uninitialized memory
of the alloca was undef. Use freeze poison instead. Enables some
optimization improvements (which need defeating in the limit tests),
but also a few regressions. Seems to leave behind dead code in some
cases too.
2025-03-18 17:27:02 +07:00
cor3ntin
f7716047c6
[Clang][NFC] Cleanup UnaryExprOrTypeTraitExpr itanium mangling code (#131764)
Just removing some code duplication.

Extracted from #131515
2025-03-18 11:26:48 +01:00
Benjamin Maxwell
f406b28f8b
[AArch64][SVE] Fold integer lane extract and store to FPR store (#129756)
This helps avoid pointless fmovs to GPRs, which may be slow, especially
in streaming mode.
2025-03-18 10:10:23 +00:00
Valery Pykhtin
4ad0aa73b7
[SSAUpdaterBulk] Add expectedly failing loop tests. (#131761)
These tests demonstrate the issue in SSAUpdaterBulk when it calculates
incoming values from loop back edges.

The failures are marked with `EXPECT_NONFATAL_FAILURE`, which is the way
to designate an "expected fail" in the Google Test suite.
2025-03-18 10:55:56 +01:00
Kareem Ergawy
1094ffcafb
[flang][fir] Add MLIR op for do concurrent (#130893)
Adds new MLIR ops to model `do concurrent`. In order to make `do
concurrent` representation self-contained, a loop is modeled using 2
ops, one wrapper and one that contains the actual body of the loop. For
example, a 2D `do concurrent` loop is modeled as follows:

```mlir
  fir.do_concurrent {
    %i = fir.alloca i32
    %j = fir.alloca i32
    fir.do_concurrent.loop
      (%i_iv, %j_iv) = (%i_lb, %j_lb) to (%i_ub, %j_ub) step (%i_st, %j_st) {
      %0 = fir.convert %i_iv : (index) -> i32
      fir.store %0 to %i : !fir.ref<i32>

      %1 = fir.convert %j_iv : (index) -> i32
      fir.store %1 to %j : !fir.ref<i32>
    }
  }
```

The `fir.do_concurrent` wrapper op encapsulates both the actual loop and
the allocations required for the iteration variables. The
`fir.do_concurrent.loop` op is a multi-dimensional op that contains the
loop control and body. See the ops' docs for more info.
2025-03-18 10:53:44 +01:00
quic_hchandel
036c6cb37c
[RISCV] Add Qualcomm uC Xqcibi (Branch Immediate) extension (#130779)
This extension adds twelve conditional branch instructions that use an
immediate operand for the source.

The current spec can be found at:
https://github.com/quic/riscv-unified-db/releases/tag/Xqci-0.7.0

This patch adds assembler only support.

Co-authored-by: Sudharsan Veeravalli <quic_svs@quicinc.com>
2025-03-18 15:18:43 +05:30
David Sherwood
194eceff43
update_test_checks: add new --filter-out-after option (#129739)
Whilst trying to clean up some loop vectoriser IR tests (see
test/Transforms/LoopVectorize/AArch64/partial-reduce-chained.ll
for example) a reviewer on PR #129047 suggested it would be
nice to have an option to stop generating CHECK lines after a
certain point. Typically when performing a transformation with
the loop vectoriser we don't usually care about any CHECK lines
generated for the scalar tail of the loop, since the scalar
loop is kept intact. Previously if you wanted to eliminate such
unwanted CHECK lines you had to run the update script, then
manually delete all the lines corresponding to the scalar loop.
This can be very time consuming if the tests ever need changing.

What I've tried to do here is add a new --filter-out-after
option alongside the existing --filter* options that provides
support for stopping the generation of any CHECK lines beyond
the line that matches the filter. With the existing filter
options we never generate CHECK-NEXT lines, but we still care
about ordering with --filter-out-after so I've amended the
code to ensure we treat this filter differently.
2025-03-18 09:46:43 +00:00
Srinivasa Ravi
c42952a782
[MLIR][NVVM] Add support for match.sync Op (#130718)
This change adds the `match.sync` Op to the MLIR NVVM dialect to
generate the `match.sync` PTX instruction.

PTX Spec Reference:

https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#parallel-synchronization-and-communication-instructions-match-sync
2025-03-18 14:54:24 +05:30
Kareem Ergawy
49b8d8472f
[OpenMP][MLIR] Support LLVM translation for distribute with delayed privatization (#131564)
Adds support for tranlating delayed privatization (`private` and
`firstprivate`) for `omp.distribute` ops.
2025-03-18 10:14:42 +01:00
Lucas Duarte Prates
44e4b27aec
[clang] Fix darwin-related tests' REQUIRES annotation (#130138)
The tests updated by this commit were designed to check features in the
clang's driver and index that require clang to be targgeting a darwin
platform while running on a darwin host. For that, their execution is
currently gated by the `REQUIRES: system-darwin` annotation.

This approach becomes a problem when trying to run such tests on a
cross-compiling build of clang on a darwin platform. When the default
target is not darwin (e.g. via `LLVM_DEFAULT_TARGET_TRIPLE `), the
tests will still run on a darwin host and fail spuriously because of the
mismatch with the target detection.

To fix this issue, this patch introduces an extra condition to the
tests' REQUIRES annotation, `target={{.*}}-{{darwin|macos}}{{.*}}`,
ensuring they only run when the relevant target is present.
2025-03-18 09:11:43 +00:00
David Green
bd1be8a242
[CodeGen][GlobalISel] Add a getVectorIdxWidth and getVectorIdxLLT. (#131526)
From #106446, this adds a variant of getVectorIdxTy that returns an LLT.
Many uses only look at the width, so a getVectorIdxWidth was added as
the common base.
2025-03-18 08:31:11 +00:00
Matthias Springer
e614e840bc
[mlir][memref] Add runtime verification for memref.dim (#130410)
Add runtime verification for `memref.dim`: check that the index is in
bounds.

Also simplify the pass pipeline for all memref runtime verification
checks.
2025-03-18 09:10:49 +01:00
Mel Chen
489d1e764e
[LV][NFC] Pre-commit test for supporting strided accesses. (#130563)
Duplicate riscv-vector-reverse.ll as riscv-vector-reverse-output.ll to
verify all generated IR, not just debug output.
Pre-commit for #128718.
2025-03-18 16:08:42 +08:00
lorenzo chelini
57dc71352c
[MLIR][Bufferization] Retire enforce-aliasing-invariants (#130929)
Why? This option can lead to incorrect IR if used in isolation, for
example, consider the IR below:

```mlir
func.func @loop_with_aliasing(%arg0: tensor<5xf32>, %arg1: index, %arg2: index) -> tensor<5xf32> {
  %c1 = arith.constant 1 : index
  %cst = arith.constant 1.000000e+00 : f32
  %0 = tensor.empty() : tensor<5xf32>
  %1 = linalg.fill ins(%cst : f32) outs(%0 : tensor<5xf32>) -> tensor<5xf32>
  // The BufferizableOpInterface says that %2 alias with %arg0 or be a newly
  // allocated buffer
  %2 = scf.for %arg3 = %arg1 to %arg2 step %c1 iter_args(%arg4 = %arg0) -> (tensor<5xf32>) {
    scf.yield %1 : tensor<5xf32>
  }
  %cst_0 = arith.constant 1.000000e+00 : f32
  %inserted = tensor.insert %cst_0 into %1[%c1] : tensor<5xf32>
  return %2 : tensor<5xf32>
}
```

If we bufferize with: enforce-aliasing-invariants=false, we get:

```
func.func @loop_with_aliasing(%arg0: memref<5xf32, strided<[?], offset: ?>>, %arg1: index, %arg2: index) -> memref<5xf32, strided<[?], offset: ?>> {
  %c1 = arith.constant 1 : index
  %cst = arith.constant 1.000000e+00 : f32
  %alloc = memref.alloc() {alignment = 64 : i64} : memref<5xf32>
  linalg.fill ins(%cst : f32) outs(%alloc : memref<5xf32>)
  %0 = scf.for %arg3 = %arg1 to %arg2 step %c1 iter_args(%arg4 = %arg0) -> (memref<5xf32, strided<[?], offset: ?>>) {
    %cast = memref.cast %alloc : memref<5xf32> to memref<5xf32, strided<[?], offset: ?>>
    scf.yield %cast : memref<5xf32, strided<[?], offset: ?>>
  }
  %cst_0 = arith.constant 1.000000e+00 : f32
  memref.store %cst_0, %alloc[%c1] : memref<5xf32>
  return %0 : memref<5xf32, strided<[?], offset: ?>>
}
```
Which is not correct IR since the loop yields the allocation.

I am using this option. What do I need to do now?

If you are using this option in isolation, you are possibly generating
incorrect IR, so you need to revisit your bufferization strategy. If you
are using it together with `copyBeforeWrite,` you simply need to retire
the `enforceAliasingInvariants` option.

Co-authored-by: Matthias Springer <mspringer@nvidia.com>
2025-03-18 08:42:43 +01:00
Kazu Hirata
58dd3eda4e
[Utils] Avoid repeated hash lookups (NFC) (#131723) 2025-03-18 00:27:23 -07:00
Kazu Hirata
62204482c0
[CodeGen] Avoid repeated hash lookups (NFC) (#131722) 2025-03-18 00:26:59 -07:00
Kazu Hirata
f6ad65a824
[ADT] Add SmallPtrSet::insert_range (#131716)
This pach adds SmallPtrSet::insert_range for consistency with
DenseSet::insert_range and std::set::insert_range from C++23.
2025-03-18 00:21:07 -07:00
Kazu Hirata
2df0254828
[ADT] Add SmallSet::insert_range (#131717)
This patch adds SmallSet::insert_range for consistency with
DenseSet::insert_range and std::set::insert_range from C++23.
2025-03-18 00:20:15 -07:00
Kazu Hirata
fc38982e93
[ADT] Add SetVector::insert_range (#131715)
This patch adds SetVector::insert_range for consistency with
DenseSet::insert_range and std::set::insert_range from C++23.
2025-03-18 00:19:48 -07:00
Petr Hosek
1fbfef9b8a
[libc] Templatize the scanf Reader interface (#131037)
This allows specializing the implementation for different targets
without including unnecessary logic and is similar to #111559 which did
the same for printf Writer interface.
2025-03-17 23:51:24 -07:00
Ryosuke Niwa
4781941160
[alpha.webkit.UncountedCallArgsChecker] os_log functions should be treated as safe. (#131500)
…os_log functions should be treated as safe in call arguments checkers.

Also treat __builtin_* functions and __libcpp_verbose_abort functions as
"trivial" for the purpose in call argument checkers.
2025-03-17 23:47:10 -07:00
Timm Baeder
2f808dd070
[clang][bytecode] Compile most recent function decl (#131730)
We used to always do this because all calls went through the code path
that calls getMostRecentDecl(). Do it now, too.
2025-03-18 07:29:38 +01:00
Akshat Oke
6be6400848
[LiveDebugValues][NFC] Remove TargetPassConfig from LDVImpl (#131562)
TPC is only used to access the option `ShouldEmitDebugEntryValues`.
2025-03-18 11:04:54 +05:30
Sushant Gokhale
0f34eba48a
[NFC][AArch64] test for fixed-width vector signed division with pow2-divisor and SVE enabled (#130252)
With SVE enabled, this should generate asrd instruction. Subsequent
patch will address this.
2025-03-17 22:31:22 -07:00
Vikash Gupta
bdb63208b4
[AMDGPU][CodeGen] Using MBB's liveIn check in tandem with MCRegAliasIterator in SILowerSGPRSpills (#129848)
This patch replaces use of MachineRegisterInfo's liveIn check with the
machine basicBlock's liveIn. As the MRI's liveIn is inconsistent with
the entry MBB liveIns, when it comes to the machine verifier checks.

PS: Its an alternative solution with respect to #126926.
2025-03-18 10:51:07 +05:30
Craig Topper
0813c5cf5f [RISCV] Accept '0(reg)' in addition to '(reg)' for vle1.v/vse1.v 2025-03-17 20:51:16 -07:00
Fangrui Song
b9d27ac252 [MC] Fix formatting of a comment 2025-03-17 20:24:08 -07:00
Kazu Hirata
c72f7958b0 [BOLT] Fix the build
This is a follow-up for:

  commit 3c4b9317916ccd2e18c30b1540589518a4c7c88a
  Author: Fangrui Song <i@maskray.me>
  Date:   Mon Mar 17 20:05:28 2025 -0700
2025-03-17 20:18:34 -07:00
Fangrui Song
e758237352
[docs] Mention --discard-locals/--discard-all change for llvm-strip
PR #130704 updated llvm-strip as well.

Suggested by @nga888

Pull Request: https://github.com/llvm/llvm-project/pull/131491
2025-03-17 20:09:52 -07:00