529559 Commits

Author SHA1 Message Date
Nikita Popov
979c275097
[IR] Store Triple in Module (NFC) (#129868)
The module currently stores the target triple as a string. This means
that any code that wants to actually use the triple first has to
instantiate a Triple, which is somewhat expensive. The change in #121652
caused a moderate compile-time regression due to this. While it would be
easy enough to work around, I think that architecturally, it makes more
sense to store the parsed Triple in the module, so that it can always be
directly queried.

For this change, I've opted not to add any magic conversions between
std::string and Triple for backwards-compatibilty purses, and instead
write out needed Triple()s or str()s explicitly. This is because I think
a decent number of them should be changed to work on Triple as well, to
avoid unnecessary conversions back and forth.

The only interesting part in this patch is that the default triple is
Triple("") instead of Triple() to preserve existing behavior. The former
defaults to using the ELF object format instead of unknown object
format. We should fix that as well.
2025-03-06 10:27:47 +01:00
Ricardo Jesus
f01e760c08
[AArch64][SVE] Improve fixed-length addressing modes. (#129732)
When compiling VLS SVE, the compiler often replaces VL-based offsets
with immediate-based ones. This leads to a mismatch in the allowed
addressing modes due to SVE loads/stores generally expecting immediate
offsets relative to VL. For example, given:
```c

svfloat64_t foo(const double *x) {
  svbool_t pg = svptrue_b64();
  return svld1_f64(pg, x+svcntd());
}
```

When compiled with `-msve-vector-bits=128`, we currently generate:
```gas
foo:
        ptrue   p0.d
        mov     x8, #2
        ld1d    { z0.d }, p0/z, [x0, x8, lsl #3]
        ret
```

Instead, we could be generating:
```gas
foo:
        ldr     z0, [x0, #1, mul vl]
        ret
```

Likewise for other types, stores, and other VLS lengths.

This patch achieves the above by extending `SelectAddrModeIndexedSVE`
to let constants through when `vscale` is known.
2025-03-06 09:27:07 +00:00
Nikolas Klauser
0cceac6bbd
[libc++] Remove a few unused includes from <__concepts/*> (#129883) 2025-03-06 10:22:18 +01:00
Simon Pilgrim
2c7e7b5627
[X86] Extend shuf128(concat(x,y),concat(z,w)) -> shuf128(widen(y),widen(w)) folds to peek through bitcasts (#129896)
Peek through bitcasts when looking for freely accessible upper subvectors
2025-03-06 09:21:49 +00:00
Andrzej Warzyński
620c38371d
[mlir][nfc] De-duplicate tests from Type::isIntOrFloat (#129710)
This PR makes sure that we always use `Type::isIntOrFloat` rather than
re-implementing this condition inline. Also, it removes `isScalarType`
that effectively re-implemented this method.
2025-03-06 09:04:30 +00:00
lakshayk-nv
d61d219739
Adding support in llvm-exegesis for Aarch64 for handling FPR64/128, PPR16 and ZPR128 reg class. (#127564)
Current implementation (for Aarch64) in llvm-exegesis only supports
GRP32 and GPR64 bit register class, thus for opcodes variants which used
FPR64/128, PPR16 and ZPR128, llvm-exegesis throws warning "setReg is not
implemented". This code will handle the above register class and
initialize the registers using appropriate base instruction class.
2025-03-06 09:02:54 +00:00
Fraser Cormack
a2b0576172
[libclc] Stop installing CLC headers (#126908)
The libclc headers are an implementation detail and are not intended to
be used by others as OpenCL headers. The only artifacts of libclc we
want to publish are the LLVM bytecode libraries.

As the headers have been incidentally broken by recent changes, this
commit takes the step to stop installing the headers at all. Downstreams
can use clang's own OpenCL headers, and/or its -fdeclare-opencl-builtins
flag.

Fixes #119967.
2025-03-06 08:52:23 +00:00
Kunwar Grover
8e0a63ddad
[mlir][docs] Add docs on canonicalizers being folders or patterns (#129517)
If a transformation should be a canonicalization is an orthogonal
question to if a transformation should be implemented as a
`RewritePattern` or a `fold` method. The later is an implementation
detail.

This patch adds a suggestion to always implement a canonicalization as a
`fold` pattern if possible, as they are a restricted subset of a
`RewritePattern`.

This has been a common source of confusion, as to when to implement a
canonicalization as a fold method or a RewritePattern.
2025-03-06 08:41:47 +00:00
Narayan
6311e3fcc8
[ValueTracking] ComputeNumSignBitsImpl - add basic handling of BITCAST nodes (#127218)
When a wider scalar/vector type containing all sign bits is bitcast to a
narrower vector type, we can deduce that the resulting narrow elements
will also be all sign bits. This matches existing behavior in
SelectionDAG and helps optimize cases involving SSE intrinsics where
sign-extended values are bitcast between different vector types.

The current implementation fails to recognize that an arithmetic right
shift is redundant when applied to elements that are already known to be
all sign bits. This PR improves ComputeNumSignBitsImpl to track this
information through bitcasts, enabling the optimization of such cases.

```
%ext = sext <1 x i1> %cmp to <1 x i8>
%sub = bitcast <1 x i8> %ext to <4 x i2>
%sra = ashr <4 x i2> %sub, <i2 1, i2 1, i2 1, i2 1>
; Can be simplified to just:
%sub = bitcast <1 x i8> %ext to <4 x i2>
```

Closes #87624
2025-03-06 08:30:36 +00:00
Simon Pilgrim
1987d18012
[X86] isFreeToSplitVector - use a SDValue instead of SDNode argument. NFC. (#129906)
Every caller was having to call getNode() - so move that insider the helper.
2025-03-06 08:29:04 +00:00
Luke Lau
c6e2cbe5fd [LV] Regenerate select-cmp-predicated.ll with UTC. NFC
The main select-cmp.ll tests seem to be generated with UTC after
it should probably be converted to UTC beforehand.
2025-03-06 16:20:03 +08:00
Lu Weining
bae6644e12
[LoongArch] Relax the restrictions of inlineasm operand modifier 'u' and 'w' (#129864)
- Allow 'u' and 'w' on LASX, LSX or floating point register operands.
- Also add missing description in LangRef.

Fixes #129863.
2025-03-06 16:17:12 +08:00
Fangrui Song
687854aea8 [MC] Remove unneeded VK_None argument from MCSymbolRefExpr::create. NFC 2025-03-06 00:00:05 -08:00
Matthias Springer
a6151f4e23
[mlir][IR] Move match and rewrite functions into separate class (#129861)
The vast majority of rewrite / conversion patterns uses a combined
`matchAndRewrite` instead of separate `match` and `rewrite` functions.

This PR optimizes the code base for the most common case where users
implement a combined `matchAndRewrite`. There are no longer any `match`
and `rewrite` functions in `RewritePattern`, `ConversionPattern` and
their derived classes. Instead, there is a `SplitMatchAndRewriteImpl`
class that implements `matchAndRewrite` in terms of `match` and
`rewrite`.

Details:
* The `RewritePattern` and `ConversionPattern` classes are simpler
(fewer functions). Especially the `ConversionPattern` class, which now
has 5 fewer functions. (There were various `rewrite` overloads to
account for 1:1 / 1:N patterns.)
* There is a new class `SplitMatchAndRewriteImpl` that derives from
`RewritePattern` / `OpRewritePatern` / ..., along with a type alias
`RewritePattern::SplitMatchAndRewrite` for convenience.
* Fewer `llvm_unreachable` are needed throughout the code base. Instead,
we can use pure virtual functions. (In cases where users previously had
to implement `rewrite` or `matchAndRewrite`, etc.)
* This PR may also improve the number of [`-Woverload-virtual`
warnings](https://discourse.llvm.org/t/matchandrewrite-hiding-virtual-functions/84933)
that are produced by GCC. (To be confirmed...)

Note for LLVM integration: Patterns with separate `match` / `rewrite`
implementations, must derive from `X::SplitMatchAndRewrite` instead of
`X`.

---------

Co-authored-by: River Riddle <riddleriver@gmail.com>
2025-03-06 08:48:51 +01:00
Jim Lin
87976ca45f [M68k] Suppress compilation warning enumerated mismatch in conditional expression. (NFC) 2025-03-06 15:06:46 +08:00
Fangrui Song
fe56c4c019 [MC] Remove unneeded VK_None argument from MCSymbolRefExpr::create. NFC 2025-03-05 23:14:04 -08:00
Balazs Benics
7e5821bae8
Reapply "[analyzer] Handle [[assume(cond)]] as __builtin_assume(cond)" (#129234)
This is the second attempt to bring initial support for [[assume()]] in
the Clang Static Analyzer.
The first attempt (#116462) was reverted in
2b9abf0db2d106c7208b4372e662ef5df869e6f1 due to some weird failure in a
libcxx test involving `#pragma clang loop vectorize(enable)
interleave(enable)`.

The failure could be reduced into:
```c++
template <class ExecutionPolicy>
void transform(ExecutionPolicy) {
  #pragma clang loop vectorize(enable) interleave(enable)
  for (int i = 0; 0;) { // The DeclStmt of "i" would be added twice in the ThreadSafety analysis.
    // empty
  }
}
void entrypoint() {
  transform(1);
}
```

As it turns out, the problem with the initial patch was this:
```c++
for (const auto *Attr : AS->getAttrs()) {
  if (const auto *AssumeAttr = dyn_cast<CXXAssumeAttr>(Attr)) {
    Expr *AssumeExpr = AssumeAttr->getAssumption();
    if (!AssumeExpr->HasSideEffects(Ctx)) {
      childrenBuf.push_back(AssumeExpr);
    }
  }
  // Visit the actual children AST nodes.
  // For CXXAssumeAttrs, this is always a NullStmt.
  llvm::append_range(childrenBuf, AS->children()); // <--- This was not meant to be part of the "for" loop.
  children = childrenBuf;
}
return;
 ```

The solution was simple. Just hoist it from the loop.

I also had a closer look at `CFGBuilder::VisitAttributedStmt`, where I also spotted another bug.
We would have added the CFG blocks twice if the AttributedStmt would have both the `[[fallthrough]]` and the `[[assume()]]` attributes. With my fix, it will only once add the blocks. Added a regression test for this.

Co-authored-by: Vinay Deshmukh <vinay_deshmukh AT outlook DOT com>
2025-03-06 08:09:09 +01:00
Maksim Panchenko
a28daa7c1a
[BOLT][AArch64] Keep relocations for linker-relaxed instructions. NFCI (#129980)
We used to filter out relocations corresponding to NOP+ADR instruction
pairs that were a result of linker "relaxation" optimization. However,
these relocations will be useful for reversing the linker optimization.
Keep the relocations and ignore them while symbolizing ADR instruction
operands.
2025-03-05 23:06:01 -08:00
quic_hchandel
6e7e46cafe
[RISCV] Add Qualcomm uC Xqcibm (Bit Manipulation) extension (#129504)
This extension adds thirty eight  bit manipulation instructions.

The current spec can be found at:
https://github.com/quic/riscv-unified-db/releases/tag/Xqci-0.6

This patch adds assembler only support.

Co-authored-by: Sudharsan Veeravalli <quic_svs@quicinc.com>
2025-03-06 12:01:53 +05:30
Lang Hames
b18e5b6a36 Re-apply "[ORC] Remove the Triple argument from LLJITBuilder::..." with fixes.
This re-applies f905bf3e1ef860c4d6fe67fb64901b6bbe698a91, which was reverted in
c861c1a046eb8c1e546a8767e0010904a3c8c385 due to compiler errors, with a fix for
MLIR.
2025-03-06 17:17:05 +11:00
chrisPyr
038fff3f24
[NFC][BOLT] Make file-local cl::opt global variables static (#126472)
#125983
2025-03-05 22:11:05 -08:00
Fangrui Song
75f6fe2ee5 [MC] Remove unneeded MCSymbolRefExpr::create overload and add comments
The StringRef overload is often error-prone as users might forget to
register the MCSymbol.

Add comments to MCTargetExpr and MCSymbolRefExpr::VariantKind.
In the distant future the VariantKind parameter might be removed.
2025-03-05 22:10:08 -08:00
Owen Pan
a6ccda28f7 [clang-format][NFC] Use better names for a couple of data members 2025-03-05 21:45:01 -08:00
Lang Hames
c861c1a046 Revert "[ORC] Remove the Triple argument from LLJITBuilder::ObjectLinking..."
This reverts commit f905bf3e1ef860c4d6fe67fb64901b6bbe698a91 while I fix
some compile errors reported on the buildbots (see e.g.
https://lab.llvm.org/buildbot/#/builders/53/builds/13369).
2025-03-06 16:22:39 +11:00
Peter Collingbourne
c72ebeeb7f
ADT: Switch to a raw pointer for DoubleAPFloat::Floats.
In order for the union APFloat::Storage to permit access to the
semantics field when another union member is stored there, all members
of Storage must be standard layout. This is not necessarily the case
for DoubleAPFloat which may be non-standard layout because there is no
requirement that its std::unique_ptr member is standard layout. Fix this
by converting Floats to a raw pointer.

Reviewers: arsenm

Reviewed By: arsenm

Pull Request: https://github.com/llvm/llvm-project/pull/129981
2025-03-05 21:15:08 -08:00
Lang Hames
f905bf3e1e [ORC] Remove the Triple argument from LLJITBuilder::ObjectLinkingLayerCreator.
ExecutionSession can provide the Triple, so this argument has been redundant
for a while, and no in-tree clients use it.
2025-03-06 16:13:10 +11:00
Lang Hames
a22881c9db [ORC-RT] Fix type name in comment. NFC. 2025-03-06 16:13:10 +11:00
Jim Lin
316f68f7f2
[RISCV] Remove the TODO for folding bswap and shift from the test. (NFC) (#129972)
https://reviews.llvm.org/D122655 has supported it.
2025-03-06 13:02:17 +08:00
Jim Lin
463a0964f4
[RISCV] Remove the TODO for fmaximum/fminimum from the tests. (NFC) (#129969)
https://reviews.llvm.org/D156069 has supported it.
2025-03-06 13:01:58 +08:00
Jinsong Ji
e4c3d258b7
[NFC][c-index-test] factor data len out (#129971)
Follow up of #129922
2025-03-05 23:21:07 -05:00
Thurston Dang
c14d3dff97
[msan][NFCI] Add tests for Arm NEON vector load (#125267)
Forked from llvm/test/CodeGen/AArch64/arm64-ld1.ll

Incorrectly handled by handleUnknownInstruction:
- llvm.aarch64.neon.ld1x2, llvm.aarch64.neon.ld1x3,
llvm.aarch64.neon.ld1x4
- llvm.aarch64.neon.ld2, llvm.aarch64.neon.ld3, llvm.aarch64.neon.ld4
- llvm.aarch64.neon.ld2lane, llvm.aarch64.neon.ld3lane,
llvm.aarch64.neon.ld4lane
- llvm.aarch64.neon.ld2r, llvm.aarch64.neon.ld3r, llvm.aarch64.neon.ld4r
2025-03-05 20:03:38 -08:00
A. Jiang
27e686c788
[libc++] Verify that LWG4140 is implemented (#128624)
According to the commit history, the constructors removed by LWG4140
have never been added to libc++.

Existence of non-public or deleted default constructor is observable,
this patch tests that there's no such default constructor at all.
2025-03-06 11:12:17 +08:00
Matt Arsenault
a216358ce7
AMDGPU: Replace amdgpu-no-agpr with amdgpu-agpr-alloc (#129893)
This performs the minimal replacment of amdgpu-no-agpr to
amdgpu-agpr-alloc=0. Most of the test diffs are due to the new
attribute sorting later alphabetically.

We could do better by trying to perform range merging in the attributor,
and trying to pick non-0 values.
2025-03-06 09:17:51 +07:00
Matt Arsenault
f4ba2bf236
AMDGPU: Add amdgpu-agpr-alloc attribute to control AGPR allocation (#128034)
This provides a range to decide how to subdivide the vector register
budget on gfx90a+. A single value declares the minimum AGPRs that
should be allocatable. Eventually this should replace amdgpu-no-agpr.

I want this primarily for testing agpr allocation behavior. We should
have a heuristic try to detect a reasonable number of AGPRs to keep
allocatable.
2025-03-06 09:13:59 +07:00
Jinsong Ji
560cfd5099
c-index-test: fix buffer overflow (#129922)
We should not try to overwrite the pointer of struct, also need to add 1
for end of line.
2025-03-05 20:52:11 -05:00
Valentin Clement (バレンタイン クレメン)
2130285564
[flang][cuda] Make sure allocator id is set for pointer allocate (#129950) 2025-03-05 17:29:09 -08:00
Prabhuk
45ca613c13
[clang] Use TargetInfo to decide Mangling for C (#129920)
Instead of hardcoding the decision on what mangling scheme to use based
on targets, use TargetInfo to make the decision.
2025-03-05 17:26:52 -08:00
A. Jiang
c28c508962
[libc++] Implement part of P2562R1: constexpr ranges::stable_partition (#129839) 2025-03-06 09:23:55 +08:00
Joseph Huber
12c5a46c30 [Clang] Fix incorrect condition on ballot
Summary:
Somehow these got the `!` dropped and it wasn't tested because the
existing test only used the 32-bit variant.
2025-03-05 19:15:23 -06:00
Deric C.
b4ecebe745
[HLSL] [DXIL] Implement the AddUint64 HLSL function and the UAddc DXIL op (#127137)
Fixes #99205.

- Implements the HLSL intrinsic `AddUint64` used to perform unsigned
64-bit integer addition by using pairs of unsigned 32-bit integers
instead of native 64-bit types
- The LLVM intrinsic `uadd_with_overflow` is used in the implementation
of `AddUint64` in `CGBuiltin.cpp`
- The DXIL op `UAddc` was defined in `DXIL.td`, and a lowering of the
LLVM intrinsic `uadd_with_overflow` to the `UAddc` DXIL op was
implemented in `DXILOpLowering.cpp`

Notes:
- `__builtin_addc` was not able to be used to implement `AddUint64` in
`hlsl_intrinsics.h` because its `CarryOut` argument is a pointer, and
pointers are not supported in HLSL
- A lowering of the LLVM intrinsic `uadd_with_overflow` to SPIR-V
[already
exists](https://github.com/llvm/llvm-project/blob/main/llvm/test/CodeGen/SPIRV/llvm-intrinsics/uadd.with.overflow.ll)
- When lowering the LLVM intrinsic `uadd_with_overflow` to the `UAddc`
DXIL op, the anonymous struct type `{ i32, i1 }` is replaced with a
named struct type `%dx.types.i32c`. This aspect of the implementation
may be changed when issue #113192 gets addressed
- Fixes issues mentioned in the comments on the original PR #125319

---------

Co-authored-by: Finn Plummer <50529406+inbelic@users.noreply.github.com>
Co-authored-by: Farzon Lotfi <farzonlotfi@microsoft.com>
Co-authored-by: Chris B <beanz@abolishcrlf.org>
Co-authored-by: Justin Bogner <mail@justinbogner.com>
2025-03-05 17:04:10 -08:00
Sirraide
5b3ba261c4
[Clang] [Sema] Allow non-local/non-variable declarations in for loop (#129737)
Currently, we error on non-variable or non-local variable declarations
in `for` loops such as `for (struct S {}; 0; ) {}`. However, this is
valid in C23, so this patch changes the error to a compatibilty warning
and also allows this as an extension in earlier language modes. This
also matches GCC’s behaviour.
2025-03-06 02:03:22 +01:00
Shafik Yaghmour
4b454afc45
Convert unreachable return statement into llvm_unreachable (#129627)
Static analysis flags the final return statement in `ReadExtensionBlock`
as unreachable and indeed it is since there is no way to exit the
`while(true)` loop besides a *return statement*.

So I am converting it into a `llvm_unreachable` to explicitly document
this.
2025-03-05 16:59:44 -08:00
Jonas Devlieghere
b5e70d0682
[lldb-dap] Use LLDB_INVALID_LINE_NUMBER & LLDB_INVALID_COLUMN_NUMBER (#129948)
Consistently use LLDB_INVALID_LINE_NUMBER & LLDB_INVALID_COLUMN_NUMBER
when parsing line and column numbers respectively.
2025-03-05 16:19:36 -08:00
lntue
f2bebdc688
[libc][math] Add skip accurate pass option for exp*, log*, and powf functions. (#129831) 2025-03-05 19:18:25 -05:00
Amara Emerson
5422e2c681 [AArch64][GlobalISel] On Darwin don't fold globals into the offset of prefetches.
ld64 doesn't currently support the PAGEOFF relocations on anything but load/stores
so we need to bail-out here to fix the build failures on greendragon.

rdar://145495288
2025-03-05 16:06:08 -08:00
choikwa
45759fe5b4
[AMDGPU] Filter candidates of LiveRegOptimizer for profitable cases (#124624)
It is known that for vector whose element fits in i16 will be split and
scalarized in SelectionDag's type legalizer
(see SIISelLowering::getPreferredVectorAction).

LRO attempts to undo the scalarizing of vectors across basic block
boundary and shoehorn Values in VGPRs. LRO is beneficial for operations
that natively work on illegal vector types to prevent flip-flopping
between unpacked and packed. If we know that operations on vector will
be split and scalarized, then we don't want to shoehorn them back to
packed VGPR.

Operations that we know to work natively on illegal vector types usually
come in the form of intrinsics (MFMA, DOT8), buffer store, shuffle, phi
nodes to name a few.
2025-03-05 18:44:48 -05:00
Bruno Cardoso Lopes
aea74034eb
[MLIR][LLVMIR] Add elementtype attribute (#129918)
These are very common when using intrinsics (e.g. ARM NEON).

For more context: ClangIR has currently been blocked on such intrinsics
emission because of this lacking capability.
2025-03-05 15:21:45 -08:00
Zhen Wang
d1abbb4dc5
[flang][cuda] Change induction variable from i32 to index for doconcurrent inside cuf kernel directive (#129924)
Use `index` instead of `i32` for induction variables for doconcurrent
inside cuf kernel directive. Regular do loop inside cuf kernel directive
also uses `index`:
```
cuf.kernel<<<*, *>>> (%arg0 : index) = ...
```
2025-03-05 14:50:42 -08:00
Joseph Huber
77363f7518
[libc] Allow building the GPU targets that don't have CRT (#129945)
Summary:
If there's no subdirectory we can't make an alias. This allows it to at
least continue.
2025-03-05 16:39:35 -06:00
Vy Nguyen
275eab91ed
[LLDB]Fix test crash (#129921)
Use the `SubsystemRAII` to unregister the fake manager at end of tests
(Should fix https://github.com/llvm/llvm-project/issues/129910)
2025-03-05 17:36:06 -05:00