531239 Commits

Author SHA1 Message Date
Aaron Ballman
9cf46fb230
[C2y] Add octal prefixes, deprecate unprefixed octals (#131626)
WG14 N3353 added support for 0o and 0O as octal literal prefixes. It
also deprecates use of octal literals without a prefix, except for the
literal 0.

This feature is being exposed as an extension in older C language modes
as well as in all C++ language modes.
2025-03-18 07:28:59 -04:00
Simon Pilgrim
31e98c7037
[CostModel][X86] merge abs costs tests using -cost-kind=all (#131619)
Now that we have #130490 - merge the cost test files to avoid bitrot

Lots more set of files to do - but this is give an example
2025-03-18 11:19:05 +00:00
quic_hchandel
5d53a88416
[RISCV] Change RISCVMCExpr::VK_RISCV_None to RISCVMCExpr::VK_None (#131774)
Fix RISCVMCExpr::VK_RISCV_None which were added in #130779
2025-03-18 16:43:42 +05:30
Diana Picus
0a21ef9536
[AMDGPU] Add SubtargetFeature for dynamic VGPR mode (#130030)
This represents a hardware mode supported only for wave32 compute
shaders. When enabled, we set the `.dynamic_vgpr_en` field of
`.compute_registers` to true in the PAL metadata.

This will be changed to use an attribute after downstream consumers
have been migrated.
2025-03-18 11:48:01 +01:00
Balazs Benics
5865807421
Reapply "[analyzer] Delay the checker constructions after parsing" (#128369)
Reapply "[analyzer] Delay the checker constructions after parsing"
(#128350)
    
This reverts commit db836edf47f36ed04cab919a7a2c4414f4d0d7e6, as-is.

Depends on #128368
2025-03-18 11:40:39 +01:00
Matt Arsenault
c180fc80dc
AMDGPU: Replace unused permlane inputs with poison instead of undef (#131288) 2025-03-18 17:37:44 +07:00
Matt Arsenault
052eca9ff7
AMDGPU: Replace unused update.dpp inputs with poison instead of undef (#131287) 2025-03-18 17:33:58 +07:00
Matt Arsenault
8392573469
AMDGPU: Replace unused export inputs with poison instead of undef (#131286) 2025-03-18 17:30:42 +07:00
Matt Arsenault
c5fe075eaf
AMDGPU: Use freeze poison instead of undef in alloca promotion (#131285)
Previously the value created to represent the uninitialized memory
of the alloca was undef. Use freeze poison instead. Enables some
optimization improvements (which need defeating in the limit tests),
but also a few regressions. Seems to leave behind dead code in some
cases too.
2025-03-18 17:27:02 +07:00
cor3ntin
f7716047c6
[Clang][NFC] Cleanup UnaryExprOrTypeTraitExpr itanium mangling code (#131764)
Just removing some code duplication.

Extracted from #131515
2025-03-18 11:26:48 +01:00
Benjamin Maxwell
f406b28f8b
[AArch64][SVE] Fold integer lane extract and store to FPR store (#129756)
This helps avoid pointless fmovs to GPRs, which may be slow, especially
in streaming mode.
2025-03-18 10:10:23 +00:00
Valery Pykhtin
4ad0aa73b7
[SSAUpdaterBulk] Add expectedly failing loop tests. (#131761)
These tests demonstrate the issue in SSAUpdaterBulk when it calculates
incoming values from loop back edges.

The failures are marked with `EXPECT_NONFATAL_FAILURE`, which is the way
to designate an "expected fail" in the Google Test suite.
2025-03-18 10:55:56 +01:00
Kareem Ergawy
1094ffcafb
[flang][fir] Add MLIR op for do concurrent (#130893)
Adds new MLIR ops to model `do concurrent`. In order to make `do
concurrent` representation self-contained, a loop is modeled using 2
ops, one wrapper and one that contains the actual body of the loop. For
example, a 2D `do concurrent` loop is modeled as follows:

```mlir
  fir.do_concurrent {
    %i = fir.alloca i32
    %j = fir.alloca i32
    fir.do_concurrent.loop
      (%i_iv, %j_iv) = (%i_lb, %j_lb) to (%i_ub, %j_ub) step (%i_st, %j_st) {
      %0 = fir.convert %i_iv : (index) -> i32
      fir.store %0 to %i : !fir.ref<i32>

      %1 = fir.convert %j_iv : (index) -> i32
      fir.store %1 to %j : !fir.ref<i32>
    }
  }
```

The `fir.do_concurrent` wrapper op encapsulates both the actual loop and
the allocations required for the iteration variables. The
`fir.do_concurrent.loop` op is a multi-dimensional op that contains the
loop control and body. See the ops' docs for more info.
2025-03-18 10:53:44 +01:00
quic_hchandel
036c6cb37c
[RISCV] Add Qualcomm uC Xqcibi (Branch Immediate) extension (#130779)
This extension adds twelve conditional branch instructions that use an
immediate operand for the source.

The current spec can be found at:
https://github.com/quic/riscv-unified-db/releases/tag/Xqci-0.7.0

This patch adds assembler only support.

Co-authored-by: Sudharsan Veeravalli <quic_svs@quicinc.com>
2025-03-18 15:18:43 +05:30
David Sherwood
194eceff43
update_test_checks: add new --filter-out-after option (#129739)
Whilst trying to clean up some loop vectoriser IR tests (see
test/Transforms/LoopVectorize/AArch64/partial-reduce-chained.ll
for example) a reviewer on PR #129047 suggested it would be
nice to have an option to stop generating CHECK lines after a
certain point. Typically when performing a transformation with
the loop vectoriser we don't usually care about any CHECK lines
generated for the scalar tail of the loop, since the scalar
loop is kept intact. Previously if you wanted to eliminate such
unwanted CHECK lines you had to run the update script, then
manually delete all the lines corresponding to the scalar loop.
This can be very time consuming if the tests ever need changing.

What I've tried to do here is add a new --filter-out-after
option alongside the existing --filter* options that provides
support for stopping the generation of any CHECK lines beyond
the line that matches the filter. With the existing filter
options we never generate CHECK-NEXT lines, but we still care
about ordering with --filter-out-after so I've amended the
code to ensure we treat this filter differently.
2025-03-18 09:46:43 +00:00
Srinivasa Ravi
c42952a782
[MLIR][NVVM] Add support for match.sync Op (#130718)
This change adds the `match.sync` Op to the MLIR NVVM dialect to
generate the `match.sync` PTX instruction.

PTX Spec Reference:

https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#parallel-synchronization-and-communication-instructions-match-sync
2025-03-18 14:54:24 +05:30
Kareem Ergawy
49b8d8472f
[OpenMP][MLIR] Support LLVM translation for distribute with delayed privatization (#131564)
Adds support for tranlating delayed privatization (`private` and
`firstprivate`) for `omp.distribute` ops.
2025-03-18 10:14:42 +01:00
Lucas Duarte Prates
44e4b27aec
[clang] Fix darwin-related tests' REQUIRES annotation (#130138)
The tests updated by this commit were designed to check features in the
clang's driver and index that require clang to be targgeting a darwin
platform while running on a darwin host. For that, their execution is
currently gated by the `REQUIRES: system-darwin` annotation.

This approach becomes a problem when trying to run such tests on a
cross-compiling build of clang on a darwin platform. When the default
target is not darwin (e.g. via `LLVM_DEFAULT_TARGET_TRIPLE `), the
tests will still run on a darwin host and fail spuriously because of the
mismatch with the target detection.

To fix this issue, this patch introduces an extra condition to the
tests' REQUIRES annotation, `target={{.*}}-{{darwin|macos}}{{.*}}`,
ensuring they only run when the relevant target is present.
2025-03-18 09:11:43 +00:00
David Green
bd1be8a242
[CodeGen][GlobalISel] Add a getVectorIdxWidth and getVectorIdxLLT. (#131526)
From #106446, this adds a variant of getVectorIdxTy that returns an LLT.
Many uses only look at the width, so a getVectorIdxWidth was added as
the common base.
2025-03-18 08:31:11 +00:00
Matthias Springer
e614e840bc
[mlir][memref] Add runtime verification for memref.dim (#130410)
Add runtime verification for `memref.dim`: check that the index is in
bounds.

Also simplify the pass pipeline for all memref runtime verification
checks.
2025-03-18 09:10:49 +01:00
Mel Chen
489d1e764e
[LV][NFC] Pre-commit test for supporting strided accesses. (#130563)
Duplicate riscv-vector-reverse.ll as riscv-vector-reverse-output.ll to
verify all generated IR, not just debug output.
Pre-commit for #128718.
2025-03-18 16:08:42 +08:00
lorenzo chelini
57dc71352c
[MLIR][Bufferization] Retire enforce-aliasing-invariants (#130929)
Why? This option can lead to incorrect IR if used in isolation, for
example, consider the IR below:

```mlir
func.func @loop_with_aliasing(%arg0: tensor<5xf32>, %arg1: index, %arg2: index) -> tensor<5xf32> {
  %c1 = arith.constant 1 : index
  %cst = arith.constant 1.000000e+00 : f32
  %0 = tensor.empty() : tensor<5xf32>
  %1 = linalg.fill ins(%cst : f32) outs(%0 : tensor<5xf32>) -> tensor<5xf32>
  // The BufferizableOpInterface says that %2 alias with %arg0 or be a newly
  // allocated buffer
  %2 = scf.for %arg3 = %arg1 to %arg2 step %c1 iter_args(%arg4 = %arg0) -> (tensor<5xf32>) {
    scf.yield %1 : tensor<5xf32>
  }
  %cst_0 = arith.constant 1.000000e+00 : f32
  %inserted = tensor.insert %cst_0 into %1[%c1] : tensor<5xf32>
  return %2 : tensor<5xf32>
}
```

If we bufferize with: enforce-aliasing-invariants=false, we get:

```
func.func @loop_with_aliasing(%arg0: memref<5xf32, strided<[?], offset: ?>>, %arg1: index, %arg2: index) -> memref<5xf32, strided<[?], offset: ?>> {
  %c1 = arith.constant 1 : index
  %cst = arith.constant 1.000000e+00 : f32
  %alloc = memref.alloc() {alignment = 64 : i64} : memref<5xf32>
  linalg.fill ins(%cst : f32) outs(%alloc : memref<5xf32>)
  %0 = scf.for %arg3 = %arg1 to %arg2 step %c1 iter_args(%arg4 = %arg0) -> (memref<5xf32, strided<[?], offset: ?>>) {
    %cast = memref.cast %alloc : memref<5xf32> to memref<5xf32, strided<[?], offset: ?>>
    scf.yield %cast : memref<5xf32, strided<[?], offset: ?>>
  }
  %cst_0 = arith.constant 1.000000e+00 : f32
  memref.store %cst_0, %alloc[%c1] : memref<5xf32>
  return %0 : memref<5xf32, strided<[?], offset: ?>>
}
```
Which is not correct IR since the loop yields the allocation.

I am using this option. What do I need to do now?

If you are using this option in isolation, you are possibly generating
incorrect IR, so you need to revisit your bufferization strategy. If you
are using it together with `copyBeforeWrite,` you simply need to retire
the `enforceAliasingInvariants` option.

Co-authored-by: Matthias Springer <mspringer@nvidia.com>
2025-03-18 08:42:43 +01:00
Kazu Hirata
58dd3eda4e
[Utils] Avoid repeated hash lookups (NFC) (#131723) 2025-03-18 00:27:23 -07:00
Kazu Hirata
62204482c0
[CodeGen] Avoid repeated hash lookups (NFC) (#131722) 2025-03-18 00:26:59 -07:00
Kazu Hirata
f6ad65a824
[ADT] Add SmallPtrSet::insert_range (#131716)
This pach adds SmallPtrSet::insert_range for consistency with
DenseSet::insert_range and std::set::insert_range from C++23.
2025-03-18 00:21:07 -07:00
Kazu Hirata
2df0254828
[ADT] Add SmallSet::insert_range (#131717)
This patch adds SmallSet::insert_range for consistency with
DenseSet::insert_range and std::set::insert_range from C++23.
2025-03-18 00:20:15 -07:00
Kazu Hirata
fc38982e93
[ADT] Add SetVector::insert_range (#131715)
This patch adds SetVector::insert_range for consistency with
DenseSet::insert_range and std::set::insert_range from C++23.
2025-03-18 00:19:48 -07:00
Petr Hosek
1fbfef9b8a
[libc] Templatize the scanf Reader interface (#131037)
This allows specializing the implementation for different targets
without including unnecessary logic and is similar to #111559 which did
the same for printf Writer interface.
2025-03-17 23:51:24 -07:00
Ryosuke Niwa
4781941160
[alpha.webkit.UncountedCallArgsChecker] os_log functions should be treated as safe. (#131500)
…os_log functions should be treated as safe in call arguments checkers.

Also treat __builtin_* functions and __libcpp_verbose_abort functions as
"trivial" for the purpose in call argument checkers.
2025-03-17 23:47:10 -07:00
Timm Baeder
2f808dd070
[clang][bytecode] Compile most recent function decl (#131730)
We used to always do this because all calls went through the code path
that calls getMostRecentDecl(). Do it now, too.
2025-03-18 07:29:38 +01:00
Akshat Oke
6be6400848
[LiveDebugValues][NFC] Remove TargetPassConfig from LDVImpl (#131562)
TPC is only used to access the option `ShouldEmitDebugEntryValues`.
2025-03-18 11:04:54 +05:30
Sushant Gokhale
0f34eba48a
[NFC][AArch64] test for fixed-width vector signed division with pow2-divisor and SVE enabled (#130252)
With SVE enabled, this should generate asrd instruction. Subsequent
patch will address this.
2025-03-17 22:31:22 -07:00
Vikash Gupta
bdb63208b4
[AMDGPU][CodeGen] Using MBB's liveIn check in tandem with MCRegAliasIterator in SILowerSGPRSpills (#129848)
This patch replaces use of MachineRegisterInfo's liveIn check with the
machine basicBlock's liveIn. As the MRI's liveIn is inconsistent with
the entry MBB liveIns, when it comes to the machine verifier checks.

PS: Its an alternative solution with respect to #126926.
2025-03-18 10:51:07 +05:30
Craig Topper
0813c5cf5f [RISCV] Accept '0(reg)' in addition to '(reg)' for vle1.v/vse1.v 2025-03-17 20:51:16 -07:00
Fangrui Song
b9d27ac252 [MC] Fix formatting of a comment 2025-03-17 20:24:08 -07:00
Kazu Hirata
c72f7958b0 [BOLT] Fix the build
This is a follow-up for:

  commit 3c4b9317916ccd2e18c30b1540589518a4c7c88a
  Author: Fangrui Song <i@maskray.me>
  Date:   Mon Mar 17 20:05:28 2025 -0700
2025-03-17 20:18:34 -07:00
Fangrui Song
e758237352
[docs] Mention --discard-locals/--discard-all change for llvm-strip
PR #130704 updated llvm-strip as well.

Suggested by @nga888

Pull Request: https://github.com/llvm/llvm-project/pull/131491
2025-03-17 20:09:52 -07:00
Fangrui Song
3c4b931791
Rename RISCVMCExpr::VK_RISCV_ to VK_. NFC
They implement relocation operators and are named VK_RISCV_ probably to
avoid confusion with `MCSymbolRefExpr::VariantKind`.
`MCSymbolRefExpr::VariantKind` is discouraged
(https://discourse.llvm.org/t/error-expected-relocatable-expression-with-mctargetexpr/84926/2)
and targets are migrating away from `MCSymbolRefExpr::VariantKind`.

Therefore, there is no need to make the name long in the presence of the
clear `RISCVMCExpr::` prefix.

Pull Request: https://github.com/llvm/llvm-project/pull/131489
2025-03-17 20:05:28 -07:00
Louis Dionne
297f6d9f6b
[libc++] Fix check for _LIBCPP_HAS_NO_WIDE_CHARACTERS in features.py (#131675)
The patch that added the new locale Lit features was created before we
switched to a 0-1 macro for _LIBCPP_HAS_WIDE_CHARACTERS, leading to that
patch referring to the obsolete _LIBCPP_HAS_NO_WIDE_CHARACTERS macro
that is never defined nowadays.
2025-03-17 22:13:51 -04:00
Elvis Wang
ed19620b8c
[VPlan] Make VPReductionRecipe a VPRecipeWithIRFlags. NFC (#130881)
This patch change the parent of the VPReductionRecipe from
VPSingleDefRecipe to VPRecipeWithIRFlags and also print/get/drop/control
flags by the VPRecipeWithIRFlags. This will remove the dependency of the
underlying instruction.

This patch also add a new function `setFastMathFlags()` to the
VPRecipeWithIRFlags because the entire reduction chain may contains
multiple instructions. And the underlying instruction may not contains
the corresponding flags for this reduction.

Split from #113903.
2025-03-18 10:08:23 +08:00
Jim Lin
00cad3ed22
[SDAG] Handle extract_subvector in isKnownNeverNaN (#131581)
Propagate nnan across extract_subvector.
2025-03-18 09:37:16 +08:00
Tim Gymnich
a5107be031
[NFC][AMDGPU][GlobalISel] Make LLTs constexpr (#131673)
- static const -> constexpr
2025-03-18 08:30:17 +07:00
Longsheng Mou
4cb1430c1c
[mlir][spirv] Fix a crash in spirv::ISubOp::fold (#131570)
This PR fixes a crash if `spirv.ISub` is not integer type. Fixes
#131283.
2025-03-18 09:18:49 +08:00
Nikolay Panchenko
745e16753f
[JSON][NFC] Move print method out of NDEBUG || DUMP (#131639) 2025-03-17 21:12:07 -04:00
William Moses
d9c65af626
[MLIR][GPUToNVVM] Support 32-bit isfinite (#131699)
Co-authored-by: Ivan Radanov Ivanov <ivanov.i.aa@m.titech.ac.jp>
2025-03-18 02:11:38 +01:00
Alexander Kornienko
d1156fcb56
Revert "[libc++] Optimize num_put integral functions" (#131613)
Reverts llvm/llvm-project#120859

This change breaks formatting of `0` with `std::showbase` + `std::hex`
or `std::oct`, as well as `+0` with `std::showpos`. I believe the new
behavior is violating the standard. See
https://github.com/llvm/llvm-project/pull/120859#issuecomment-2723970242
and later comments for details and explanation.
2025-03-18 02:05:51 +01:00
Craig Topper
50f8adb5c0 [RISCV] Accept '0(reg)' in addition to '(reg)' ifor vl1r.v/vl2r.v/vl4r.v/vl8r.v
This matches vl1re8.v, vl2re8.v, vl4re8.v, vl8re8.v.
2025-03-17 17:39:56 -07:00
Cyndy Ishida
cb1d640b03
[clang][DepScan] resolve dangling reference to lambda that goes out of
scope.

Fixes buildbots.
2025-03-17 17:33:58 -07:00
Farzon Lotfi
a2fbc9a8e3
[DirectX] Start the creation of a DXIL Instruction legalizer (#131221)
- Legalize i8 truncation back to original types
-  remove sext and truncs
- Legalize i64 indicies for insert\extract elements to i32 indicies
- fixes https://github.com/llvm/llvm-project/issues/126323
- fixes https://github.com/llvm/llvm-project/issues/129757
2025-03-17 20:33:02 -04:00
Matt Arsenault
092e25571c AMDGPU: Add REQUIRES: asserts to machine pass violation test
We should promote this to a proper error and not llvm_unreachable
2025-03-18 07:31:50 +07:00