533385 Commits

Author SHA1 Message Date
tdanyluk
76d2e0881e
[mlir] fix references of attributes which are not defined earlier (#134364)
If an attribute is not defined earlier in the same file, but just
referenced from its dialect directly, then currently not the correct
check is being emited.

What would it emit for #toy.shape<[1, 2, 3]>:
Earlier:
// CHECK: #[['?']]<[1, 2, 3]>
Now:
// CHECK: #toy.shape<[1, 2, 3]>
2025-04-08 17:34:20 +02:00
Christian Sigg
4e9cfcf6af [llvm][bazel] Fix BUILD after 561506144531cf0a760bb437fd74c683931c60ae. 2025-04-08 17:28:20 +02:00
Sirraide
6c74fe9087
[Clang] [NFC] Tablegen component diags headers (#134777)
The component diagnostic headers (i.e. `DiagnosticAST.h` and friends)
all follow the same format, and there’s enough of them (and in them) to
where updating all of them has become rather tedious (at least it was
for me while working on #132348), so this patch instead generates all of
them (or rather their contents) via Tablegen.

Also, it seems that `%enum_select` currently wouldn’t work in
`DiagnosticCommonKinds.td` because the infrastructure for that was
missing from `DiagnosticIDs.h`; this patch should fix that as well.
2025-04-08 17:21:45 +02:00
Matt Arsenault
34e8f00066
Attributor: Propagate align to cmpxchg instructions (#134838)
Fixes #134480
2025-04-08 22:15:50 +07:00
Matt Arsenault
66f0343609
Attributor: Propagate align to atomicrmw instructions (#134837)
Partially fixes #134480
2025-04-08 22:12:20 +07:00
Matt Arsenault
2cf4254466
Attributor: Add baseline tests for propagating align to atomics (#134836) 2025-04-08 22:08:11 +07:00
Adrian Prantl
5615061445
[dsymutil] Avoid copying binary swiftmodules built from textual (#134719)
.swiftinterface files into the dSYM bundle. These typically come only
from the SDK (since textual interfaces require library evolution) and
thus are a waste of space to copy into the bundle.

The information about this is being parsed out of the control block,
which means duplicating 5 constants from the Swift frontend. If a file
cannot be parsed, dsymutil errs on the side of copying the file anyway.

rdar://138186524
2025-04-08 08:03:32 -07:00
Matt Arsenault
dfe4d9187c
GCStrategy: Use Twine properly for error message (#132760) 2025-04-08 21:57:29 +07:00
Christopher McGirr
ae3faea1f2
[MLIR][mlir-opt] move action debugger hook flag (#134842)
Currently if a developer uses the flag `--mlir-enable-debugger-hook` the
debugger hook is not actually enabled. It seems the DebugConfig and the
MainMLIROptConfig are not connected.

To fix this we can move the `enableDebuggerHook` CL Option to the
DebugConfigCLOptions struct so that it can get registered and enabled
along with the other debugger flags. AFAICS there are no other uses of
the flag so this should be safe.

This also adds a small LIT test to check that the hook is enabled by
checking the std::cerr output for the log message.
2025-04-08 16:54:11 +02:00
Alan Li
b5045ae9bc
[MLIR][Fix] Fix missing dep in AMDGPUDialect. (#134862)
Issue introduced in https://github.com/llvm/llvm-project/pull/133498
2025-04-08 10:46:55 -04:00
Michael Liao
4f77e50042 [MLIR][AMDGPU] Fix shared build. NFC 2025-04-08 10:46:15 -04:00
Han-Kuan Chen
2347aa1fcc
[SLP][REVEC] Fix the mismatch between the result of getAltInstrMask and the VecTy argument of TargetTransformInfo::isLegalAltInstr. (#134795)
We cannot determine ScalarTy from VL because some ScalarTy is determined
from VL[0]->getType(), while others are determined from
getValueType(VL[0]).

Fix "Mask and VecTy are incompatible".
2025-04-08 22:29:11 +08:00
Han-Kuan Chen
97c4cb4d13
[SLP][REVEC] getNumElements should not be used as VF when REVEC is enabled. (#134763) 2025-04-08 22:29:03 +08:00
Philip Reames
c1e95b2e5e
[RISCV] Fix matching bug in VLA shuffle lowering (#134750)
Fix https://github.com/llvm/llvm-project/issues/134126.

The matching code was previous written as if we were mutating the
indices to replace undef elements with preferred values, but the actual
lowering code just took a prefix of the index vector. This resulted in
us using undef indices for lanes which should have been defined,
resulting in incorrect codegen.

Longer term, we probably should rewrite the mask, but this seemed like
an easier tactical fix.
2025-04-08 07:20:25 -07:00
Michael Kruse
8b11c39a0f
[llvm-mt] Do not build llvm-mt if not functional (#134631)
llvm-mt requires libxml2 to work, so do not even build it without
libxml2.

CMake 3.31 and later prefer llvm-mt.exe over Microsoft's mt.exe if
available and using clang-cl.exe as CMAKE_CXX_COMPILER. When CMake picks
up llvm-mt.exe without libxml2, any build will fail with the message
```
llvm-mt: error: no libxml2
```

Any test except `--help` already uses `REQUIRES: libxml2`. There is no
point in having a non-functional executable. Not building llvm-mt.exe
will force CMake to use Microsoft's `mt.exe` instead.

Fixes: #134237
2025-04-08 16:16:53 +02:00
Mircea Trofin
b2dea4fd22
[ctxprof] root autodetection mechanism (#133147)
This is an optional mechanism that automatically detects roots. It's a best-effort mechanism, and its main goal is to *avoid* pointing at the message pump function as a root. This is the function that polls message queue(s) in an infinite loop, and is thus a bad root (it never exits).

High-level, when collection is requested - which should happen when a server has already been set up and handing requests - we spend a bit of time sampling all the server's threads. Each sample is a stack which we insert in a `PerThreadCallsiteTrie`. After a while, we run for each `PerThreadCallsiteTrie` the root detection logic. We then traverse all the `FunctionData`, find the ones matching the detected roots, and allocate a `ContextRoot` for them. From here, we special case `FunctionData` objects, in `__llvm_ctx_profile_get_context, that have a `CtxRoot` and route them to `__llvm_ctx_profile_start_context`.

For this to work, on the llvm side, we need to have all functions call `__llvm_ctx_profile_release_context` because they _might_ be roots. This comes at a slight (percentages) penalty during collection - which we can afford since the overall technique is ~5x faster than normal instrumentation. We can later explore conditionally enabling autoroot detection and avoiding this penalty, if desired. 

Note that functions that `musttail call` can't have their return instrumented this way, and a subsequent patch will harden the mechanism against this case.

The mechanism could be used in combination with explicit root specification, too.
2025-04-08 06:59:38 -07:00
Shilei Tian
f19c6f23ab
[Clang][AMDGPU] Improve error message when device libraries for COV6 are missing (#134745)
#130963 switches the default to COV6, which requires ROCm 6.3.
Currently, if the
device libraries for COV6 are not found, the error message is not very
helpful.
This PR provides a more informative error message in such cases.
2025-04-08 09:57:43 -04:00
Romaric Jodin
0e98817458
libclc: frexp: fix implementation regarding denormals (#134823)
Devices not supporting denormals can compare them true against zero. It
leads to result not matching the CTS expectation when either supporting
or not denormals.

For example for 0x1.008p-140 we get {0x1.008p-140, 0} while the CTS
expects {0x1.008p-1, -139} when supporting denormals, or {0, 0} when not
supporting denormals (flushed to zero).

Ref #129871
2025-04-08 14:50:26 +01:00
Christian Sigg
3a6b9b3a87 [mlir][bazel] Fix after dae0ef53a0b99c6c2b74143baee5896e8bc5c8e7
Remove unnecessary include.
2025-04-08 15:47:14 +02:00
Hans Wennborg
35b3886382
[win/arm64] Enable tail call with inreg arguments when possible (#134671)
Tail calls were disabled from callers with inreg parameters in 5dc8aeb
with a fixme to check if the callee also takes an inreg parameter.

The issue is that inreg parameters (which are passed in x0 or x1 for
free and member functions respectively) are supposed to be returned (in
x0) at the end of the function. In case of a tail call, that means the
callee needs to return the same value as the caller would.

We can check for that case, and it's not as niche as it sounds, as
that's how Clang will lower one function with an sret return value
calling another, such as:

```
struct T { int x; };
struct S {
    T foo();
    T bar();
};
T S::foo() { return bar(); } // foo's sret argument will get passed directly to bar
```

Fixes #133098
2025-04-08 15:25:28 +02:00
wldfngrs
fdf20941a8
[libc][math] Fix signaling NaN handling for math functions. (#133347)
Add tests for signaling NaNs, and fix function behavior for handling
signaling NaN input.

Fixes https://github.com/llvm/llvm-project/issues/124812
2025-04-08 15:23:38 +02:00
Alan Li
dae0ef53a0
[MLIR][AMDGPU] Add a wrapper for global LDS load intrinsics in AMDGPU (#133498)
Defining a new `amdgpu.global_load` op, which is a thin wrap around
ROCDL `global_load_lds` intrinsic, along with its lowering logics to
`rocdl.global.load.lds`.
2025-04-08 09:18:30 -04:00
Nico Weber
94b9d75c6d [gn] port 65813e0e94c04 2025-04-08 09:16:37 -04:00
Jay Foad
008c875be8
[AMDGPU] Fix excessive stack usage in SIInsertWaitcnts::run (#134835)
Noticed on Windows when running LLVM as part of a graphics driver, with
total stack usage limited to about 128 KB. In some cases this function
would overflow the stack.

On Linux this reduces stack usage in this function from about 32 KB to
about 0.5 KB.
2025-04-08 14:08:42 +01:00
Kajetan Puchalski
7e1b76c2d7
Revert "[flang] Use precompiled parsing headers" (#134851)
Reverts llvm/llvm-project#130600

Reverting on account of Windows issues with ccache, will bring it back
along with #131137 once those are resolved.
2025-04-08 13:47:25 +01:00
TatWai Chong
728320f946
[mlir][tosa] Increase test coverage for profile-based validation (#134754)
Add more tests to increase test coverage.
2025-04-08 13:33:16 +01:00
Akshat Oke
fcaefc2c19
[AMDGPU][NPM] Port SIPreEmitPeephole to NPM (#130065) 2025-04-08 17:58:48 +05:30
Joseph Huber
79cb6f05da
[Clang] Unify 'nvptx-arch' and 'amdgpu-arch' into 'offload-arch' (#134713)
Summary:
These two tools do the same thing, we should unify them into a single
tool. We create symlinks for backward compatiblity and provide a way to
get the old vendor specific behavior with `--amdgpu-only` and
`--nvptx-only`.
2025-04-08 07:27:12 -05:00
David Spickett
db7fb704f6 [lldb][test] Explain why TestExprFromNonZeroFrame is disabled on Windows
It's not scientific but I think the PDB we produce on the Windows on Arm
bot simply doesn't have the information needed. Could also be that clang
is producing some DWARF, but link.exe is dropping it from the final executable,
the effect is the same.
2025-04-08 12:17:07 +00:00
Kajetan Puchalski
25e08c0b9c
Revert "[CMake] Fix using precompiled headers with ccache" (#134848)
Reverts llvm/llvm-project#131397

Reverting for now on account of build bot failures on certain platforms.
2025-04-08 13:13:49 +01:00
Florian Hahn
a51e282784
[LV] Check if plan has an early exit via plan's exit blocks. (NFC) (#134720)
Add a dedicated function to check if a plan is for a loop with an early
exit. This can easily be determined by checking the exit blocks.

This allows removing a use of Legal->hasUncountableEarlyExit() from
InnerLoopVectorizer.

PR: https://github.com/llvm/llvm-project/pull/134720
2025-04-08 12:52:38 +01:00
Michael Klemm
69c4e172d9
[Flang][OpenMP] Add semantic tests for threadprivate variables with host assoc (#134680) 2025-04-08 13:22:05 +02:00
Omair Javaid
c2c1031e90
[Flang][Windows] Fix test_errors.py by enforcing UTF-8 encoding (#134625)
This patch fixes UnicodeDecodeError on Windows in test_errors.py. This
issue was observed on the flang-arm64-windows-msvc buildbot.
Semantics/OpenMP/interop-construct.f90 was crashing due to Python
defaulting to the cp1252 codec on Windows.

I have fixed this by explicitly setting encoding="utf-8" when reading
source files and invoking subprocess.run() in test_errors.py

flang-arm64-windows-msvc was running on stagging master which resulted
in this issue not being fixed earlier.
https://lab.llvm.org/staging/#/builders/206
2025-04-08 16:16:26 +05:00
Kajetan Puchalski
e8dc8add3c
[CMake] Fix using precompiled headers with ccache (#131397)
Using precompiled headers with ccache requires special accommodations.
Add the required ccache options, clang and gcc compiler flags to CMake.
Refactor ccache configuration to pass options directly on the command line for versions of ccache that support it.

---------

Signed-off-by: Kajetan Puchalski <kajetan.puchalski@arm.com>
2025-04-08 12:09:52 +01:00
Paul Walker
e06a9ca2cb
[LLVM][CodeGen][SVE] Improve lowering of fixed length masked mem ops. (#134402)
Converting fixed length masks, as used by MLOAD, to scalable vectors is
done by comparing the mask to zero. When the mask is the result of a
compare we can instead promote the operands and regenerate the original
compare. At worst this reduces the dependecy chain and in most cases
removes the need for multiple compares.
2025-04-08 12:09:10 +01:00
Nikolas Klauser
483edfeeb5
[libc++] Use __add_pointer and __remove_pointer builtins when they are fixed (#134147) 2025-04-08 13:05:24 +02:00
Nathan Gauër
739062d2c3
[SPIR-V] Add spv.gep support for ptrcast legal (#134388)
Adds support the the spv.gep intrinsic to the spv ptrcast legalization
step. Those intrinsics are generated by the backend thus not directly
visible in the tests.
This is a pre-requisite to implement addrspacecast legalization for
logical SPIR-V.
2025-04-08 12:55:37 +02:00
Jonathan Thackray
204d8c0d58
[clang][llvm] Fix AArch64 MOP4{A/S} intrinsic tests (NFC) (#134746)
Fix some of the recently-added tests (PRs #127797, #128854, #129226 and
#129230) which were incorrectly defined.
2025-04-08 11:45:47 +01:00
Simon Tatham
7af2b51e76
[AArch64][v8.5A] Omit BTI for non-addr-taken static fns on Linux (#134669)
This is a conditional revert of cca40aa8d8aa732, which made LLVM's
branch-target-enforcement mode generate BTI at the start of _every_
function, even in the case where the function has internal linkage and
its address is never taken for use in an indirect call.

The rationale was that it might turn out at link time that a direct call
to the function spanned a larger distance than the range of a BL
instruction (say, if the translation unit generated multiple code
sections and the linker put them a very long way apart). Then the linker
might insert a long-branch thunk using an indirect call instruction.

SYSVABI64 has now clarified that in this situation the static linker may
not assume that the target function is safe to call directly. If it
needs to use this strategy, it's responsible for also generating a
'landing pad' near the target function, with a BTI followed by a direct
branch, and using that as the target of the long-distance indirect call.

606ce44fe4

LLD complies with this spec as of commit 098b0d18add97de.

So if we're compiling in a mode that respects SYSVABI64, such as
targeting Linux, it's safe to leave out the BTI at the start of a
function with internal linkage if we can prove that its address isn't
either used in an indirect call in _this_ translation unit or passed out
of the object.

Therefore, this patch goes back to the behavior before cca40aa8d8aa732,
leaving out BTIs in functions that can't be called indirectly, but only
if the target triple is Linux. (I wasn't able to find a more precise
query for "is this a SYSVABI64-compliant platform?", but Linux certainly
is, and this check at least fails in the safe direction - if in doubt,
we put in all the BTIs that might be necessary.)
2025-04-08 11:44:12 +01:00
Paul Walker
1997073a54
[LLVM][InstCombine][SVE] Refactor sve.mul/fmul combines. (#134116)
After https://github.com/llvm/llvm-project/issues/126928 it's now
possible to rewrite the existing combines, which mostly only handle
cases where a operand is an identity value, to use existing simplify
code to unlock general constant folding.
2025-04-08 11:38:27 +01:00
Simon Pilgrim
83fbe67986
[X86] combineX86ShufflesRecursively - iteratively peek through bitcasts to free subvector widening/narrowing sources. (#134701)
Generalizes the existing code to repeatedly peek though mixed bitcast/insert_subvector/extract_subvector chains to find the source of the shuffle operand.
2025-04-08 11:28:40 +01:00
Anatoly Trosinenko
8521bd2424
[BOLT][AArch64] Handle PAuth call instructions in isIndirectCall (#133227)
Handle `BLRA*` opcodes in AArch64MCPlusBuilder::isIndirectCall, update
getRegUsedAsCallDest accordingly.
2025-04-08 13:23:10 +03:00
MisakaVan
ff5b649a84
[libc++] Fix a comment typo in __tree (#134831)
"Returns true **is** __root is a proper red black tree"
->
"Returns true **if** __root is a proper red black tree"
2025-04-08 12:17:43 +02:00
Ramkumar Ramachandra
6a42fb8fbf
[LV] Clarify code in isPredicatedInst (NFC) (#134251) 2025-04-08 10:46:17 +01:00
Jay Foad
6f93c0676f
[AMDGPU] Make a few WaitcntBrackets methods const. NFC. (#134824) 2025-04-08 10:44:02 +01:00
Jakub Ficek
a5509d62a7
[clang] fp options fix for __builtin_convertvector (#134102)
Add missing CGFPOptionsRAII for fptoi and itofp cases
2025-04-08 11:36:48 +02:00
Tom Eccles
4c09ae0b2e
[flang][OpenMP] Lowering for CANCEL and CANCELLATIONPOINT (#134248)
These will still hit TODOs in OpenMPToLLVMIRConversion.cpp
2025-04-08 10:29:18 +01:00
Tom Eccles
446d4f51eb
[flang][OpenMP][Lower] fix statement context cleanup insertion point (#133891)
The statement context is used for lowering clauses for openmp operations
using generalised helpers from flang lowering. The statement context
stores closures which generate code for cleaning up temporary values
generated by the lowering helper. These closures are run when the
statement construct is destroyed. Keeping the statement context local to
the clause or operation being lowered without any special handling was
not correct because any cleanup code would be generated at the insertion
point when that statement context went out of scope (which would in
general be inside of the newly created container operation). It would be
better to generate the cleanup code after the newly created operation
(clause processing is synchronous even for deferred tasks).

Currently supported clauses are mostly populated with simple scalar
values that require no cleanup. Even the simple array sections added by
#132994 needed no cleanup because indexing the right values of the array
did not create any temporaries. Supporting array sections with vector
indexing will generate hlfir.destroy operations for cleanup. This patch
fixes where those will be created. Those hlfir.destroy operations don't
generate any FIR (or LLVM) code, but the issue still exists
theoretically.

I wasn't able to find any clauses which have any cleanup to use to test
this PR. It is probably NFC for the current lowering. This will be
tested in [the PR adding vector subscripting of array
sections](https://github.com/llvm/llvm-project/pull/133892).
2025-04-08 10:27:27 +01:00
Nathan Gauër
fe4f666363
[CI] Always upload queue/running count (#134814)
Before this commit, we only pushed a queue/running count when the value
was not zero. This makes building Grafana alerting a bit harder.
Changing this to always upload a value for watched workflows.
2025-04-08 11:16:24 +02:00
David Green
c23e1cb936
[BasicAA] Treat ExtractValue(Argument) similar to Argument in relation to function-local objects. (#134716)
This is a much smaller, technically orthogonal patch similar to #134505. It
states that a extractvalue(Argument) can be treated like an Argument for alias
analysis, where the extractelement acts like a phi / copy. No inttoptr here.
2025-04-08 10:05:58 +01:00