529559 Commits

Author SHA1 Message Date
Jerry-Ge
3b38992de1
[mlir][tosa] Update AVG_POOL2D description to align with TOSAv1.0 Spec (#129782) 2025-03-05 01:18:56 +00:00
Jerry-Ge
2b5ac43359
[mlir][tosa] Update RFFT2D description to align with TOSA v1.0 spec (#129789) 2025-03-05 01:17:56 +00:00
Jerry-Ge
2ae5dedd7a
[mlir][tosa] Update ControlFlow variable names to match with TOSA v1.0 spec (#129790) 2025-03-05 01:17:42 +00:00
Matt Arsenault
91aac7c379
AMDGPU: Handle s_add_u32 in eliminateFrameIndex (#129628)
We can fold frame indexes directly into existing immediate operands,
just like is already done for s_add_i32. We happen to use s_add_i32 in
the 32-bit add case, but s_add_u32 appears in the a 64-bit add sequence
of a flat pointer if an addrpacecast source is a frame index.

This avoids, but does not address a failure exposed after
a3165398db0736588daedb07650195502592e567 where two literal operands
end up in the final instruction. The underlying issue still exists for
some instructions without special handling in eliminateFrameIndex.
2025-03-05 08:09:46 +07:00
Cyndy Ishida
b41baafbc7
[readtapi] Condense output when comparing tbd files with mismatched inlined libraries (#129754)
Previously, when an inlined library existed in TBD file A but not in file B, all of the inlined library's attributes were printed. This is noisy since the important detail is the complete contents are missing. Instead, only print the install name of the inlined library and the marker for which the input file exists in.
2025-03-04 17:05:01 -08:00
Thurston Dang
8aafbfdc3a
[msan][NFC] Add arm64-vmax.ll tests (#129760)
Forked from llvm/test/CodeGen/AArch64/arm64-vmax.ll

Pairwise instructions which are handled incorrectly by heuristics:
- llvm.aarch64.neon.fmaxp (floating-point maximum pairwise)
- llvm.aarch64.neon.fminp
- llvm.aarch64.neon.fmaxnmp (floating-point maximum number pairwise)
- llvm.aarch64.neon.fminnmp
- llvm.aarch64.neon.smaxp
- llvm.aarch64.neon.sminp
- llvm.aarch64.neon.umaxp
- llvm.aarch64.neon.uminp
Future work should consider whether handlePairwiseShadowOrIntrinsic is a
more appropriate handler.

Other instructions which are handled correctly by heuristics:
- llvm.aarch64.neon.fmax
- llvm.aarch64.neon.fmin
- llvm.aarch64.neon.smax
- llvm.aarch64.neon.smin
- llvm.aarch64.neon.umax
- llvm.aarch64.neon.umin
2025-03-04 16:44:18 -08:00
Thurston Dang
dec4cae131
[msan][NFC] Add expand-experimental-reductions.ll (#129768)
Forked from llvm/test/CodeGen/Generic/expand-experimental-reductions.ll

Handled suboptimally by visitInstruction:
- llvm.vector.reduce.smax
- llvm.vector.reduce.smin
- llvm.vector.reduce.umax
- llvm.vector.reduce.umin
- llvm.vector.reduce.fmax
- llvm.vector.reduce.fmin
2025-03-04 16:44:10 -08:00
Mircea Trofin
2068a18c86
[ctxprof][nfc] Prepare CtxProfAnalysis for flat profiles (#129623)
Mostly remove the equivalence "no contexts == no CtxProfAnalysis result", and instead check explicitly there are no contextual profiles.
2025-03-04 16:42:47 -08:00
A. Jiang
e739ce2e10
[libc++] Add missed constexpr to erase(_if) in <string> (#129666)
`std::erase(_if)` for `basic_string` were made `constexpr` in C++20 by
cplusplus/draft@2c1ab9775c as follow-up
changes of P0980R1.

This patch implements the missed changes that were not tracked in a
specific paper.
2025-03-05 08:31:28 +08:00
Greg Clayton
27901cec0e
Add subsection and permissions support to ObjectFileJSON. (#129801)
This patch adds the ability to create subsections in a section and
allows permissions to be specified.
2025-03-04 16:19:20 -08:00
Krzysztof Drewniak
e697c99b63
[AMDGPU] Add custom MachineValueType entries for buffer fat poiners (#127692)
The old hack of returning v5/v6i32 for the fat and strided buffer
pointers was causing issuse during vectorization queries that expected
to be able to construct a VectorType from the return value of `MVT
getPointerType()`. On example is in the test attached to this PR, which
used to crash.

Now, we define the custom MVT entries, the 160-bit
amdgpuBufferFatPointer and 192-bit amdgpuBufferStridedPointer, which are
used to represent ptr addrspace(7) and ptr addrspace(9) respectively.

Neither of these types will be present at the time of lowering to a
SelectionDAG or other MIR - MVT::amdgpuBufferFatPointer is eliminated by
the LowerBufferFatPointers pass and amdgpu::bufferStridedPointer is not
currently used outside of the SPIR-V translator (which does its own
lowering).

An alternative solution would be to add MVT::i160 and MVT::i192. We
elect not to do this now as it would require changes to unrelated code
and runs the risk of breaking any SelectionDAG code that assumes that
the MVT series are all powers of two (and so can be split apart and
merged back together) in ways that wouldn't be obvious if someone tried
to use MVT::i160 in codegen. If i160 is added at some future point,
these custom types can be retired.
2025-03-04 17:19:06 -06:00
Andy Kaylor
fa072bd29a
[CIR] Add lowering for Func, Return, Alloca, Load, and Store (#129571)
Add support for lowering recently upstreamed CIR ops to LLVM IR.
2025-03-04 14:50:34 -08:00
Slava Zakharin
9b1604065e
[flang-rt] Move unit-map.cpp to host-only sources list. (#129763)
This file is not enabled for the offload builds.
This patch aligns the list with flang/runtime/CMakeLists.txt
(that is about to be removed).
2025-03-04 14:39:16 -08:00
Greg Clayton
7b596ce362
[lldb] Fix ObjectFileJSON to section addresses. (#129648)
ObjectFileJSON sections didn't work, they were set to zero all of the
time. Fixed the bug and fixed the test to ensure it was testing real
values.
2025-03-04 14:35:42 -08:00
Michael Jones
ed5cd8d464
[libc] Fix casts for arm32 after Wconversion (#129771)
Followup to #127523

There were some test failures on arm32 after enabling Wconversion. There
were some tests that were failing due to missing casts. Also I changed
BigInt's `safe_get_at` back to being signed since it needed the ability
to be negative.
2025-03-04 14:32:36 -08:00
Peng Liu
a12744ff05
[libc++] Optimize ranges::swap_ranges for vector<bool>::iterator (#121150)
This PR optimizes the performance of `std::ranges::swap_ranges` for
`vector<bool>::iterator`, addressing a subtask outlined in issue #64038.
The optimizations yield performance improvements of up to **611x** for
aligned range swap and **78x** for unaligned range swap comparison.
Additionally, comprehensive tests covering up to 4 storage words (256
bytes) with odd and even bit sizes are provided, which validate the
proposed optimizations in this patch.
2025-03-04 17:15:36 -05:00
youngd007
b08769c3ec
Modify dwarf verification JSON to include detailed counts by sub-category (#128018)
To help make better use of dwarfdump verification for identifying and
fixing issues with debug information, the JSON will now emit details
(sub-categories) where relevant. First modification concerns missing
tags as those were recently missing for BOLT debug names.

Test:
test files for JSON output were previously added, so modify here to
expect the new JSON keys. One test has sub-categories and another is
empty.
  ninja check-llvm-tools-llvm-dwarfdump
Also build the tool and run with a local executable to verify.
  ninja llvm-dwarfdump
2025-03-04 14:00:13 -08:00
David Green
4c2d1b4c53 [AArch64] Add test for scalar copysign. NFC 2025-03-04 21:46:55 +00:00
Philip Reames
42429fedf9
[RISCV] Simplify costShuffleViaVRegSplitting [nfc] (#129766)
This code goes to some length to cost the subvector extracts, but by
construction, all of the subvector extracts are subregister extracts
from a vector register group and thus have zero cost. As a result, none
of this code is needed.
2025-03-04 13:35:52 -08:00
Philip Reames
df1c8ba26c [RISCV][CostModel] Add additional deinterleave tests with EMUL>1 2025-03-04 13:33:55 -08:00
Deric C.
1440f02259
[Scalarizer] Ensure valid VectorSplits for each struct element in visitExtractValueInst (#128538)
Fixes #127739 

The `visitExtractValueInst` is missing a check that was present in
`splitCall` / `visitCallInst`.
This check ensures that each struct element has a VectorSplit, and that
each VectorSplit contains the same number of elements packed per
fragment.

---------

Co-authored-by: Jay Foad <jay.foad@amd.com>
2025-03-04 13:10:31 -08:00
Jan Voung
d6301b218c
Revert "[clang][dataflow] Fix unsupported types always being equal" (#129761)
Reverts llvm/llvm-project#129502

seeing new crashes around
859520eca8/nullability/test/smart_pointers_diagnosis.cc (L57)

Would like some time to investigate.
2025-03-04 15:48:42 -05:00
Alexey Bataev
855178af99
[SLP]Fix/improve getSpillCost analysis
Previous implementation may took some extra time, when walked over the
same instructions several times. And also it did not include proper
analysis for cross-basic-block use of the vectorized values. This
version fixes it.

It walks over the tree and checks the deps between entries and their
operands. If there are non-vectorized calls in between, it adds
a single(!) spill cost, because the vector value should be
spilled/reloaded only once.

Also, this version caches analysis for each entries, which are detected,
and do not repeats it, uses data, found during previous analysis for
previous nodes.

Also, it has the internal limit. If the number of instructions
between nodes and their operands is too big (> than ScheduleRegionSizeBudget / VectorizableTree.size()), it is considered that the spill is required. It allows to improve compile time.

Reviewers: preames, RKSimon, mikhailramalho

Reviewed By: preames

Pull Request: https://github.com/llvm/llvm-project/pull/129258
2025-03-04 15:47:23 -05:00
Brox Chen
5cc033b5f2
[AMDGPU][True16][CodeGen] fshr true16 pattern (#129085)
true16 pattern for fshr.

GlobalIsel will be enabled latter when merge_value selection is
supported in true16 mode
2025-03-04 15:43:43 -05:00
Mircea Trofin
1b46db7776
[ctxprof] ProfileWriter abstraction (#129590)
Introduce a `ProfileWriter` abstraction to replace the callback passed to `__llvm_ctx_profile_fetch`. Subsequent changes will add support for flat profile collection (as in, collection of non-contextual profile for those functions not under a contextual root), which require also a change in the profile format. The abstraction makes it easy to add "write flat" - related capabilities without constantly complicating the signature of `__llvm_ctx_profile_fetch`.
2025-03-04 12:41:16 -08:00
Philip Reames
c8dd8522fa [RISCV][TTI] Use early return to simplify costShuffleViaVRegSplitting [nfc] 2025-03-04 12:27:37 -08:00
Craig Topper
6ca2a9f2df
[CodeGen] Use Register in SDep interface. NFC (#129734) 2025-03-04 12:26:28 -08:00
Jorge Gorbe Moya
423862f3d5 [bazel][libc] Add missing dep after 1e6e845d49a336e9da7ca6c576ec45c0b419b5f6 2025-03-04 12:00:40 -08:00
Jacques Pienaar
540d7ddb15
[mlir][py] Plumb OpPrintingFlags::printNameLocAsPrefix() through the C/Python APIs (#129607) 2025-03-04 11:49:34 -08:00
Lei Wang
d38380d3d8
[CSSPGO] Fix redundant reading of profile metadata (#129609)
Fix a build speed regression due to repeated reading of profile
metadata. Before the function `readFuncMetadata(ProfileHasAttribute,
Profiles)` reads the metadata for all the functions(`Profiles`),
however, it's actually used for on-demand loading, it can be called for
multiple times, which leads to redundant reading that causes the build
speed regression. Now fix it to read the metadata only for the new
loaded functions(functions in the `FuncsToUse`).
2025-03-04 11:39:59 -08:00
Sam Elliott
ee4bc5a8ca
[RISCV] Remove Last Traces of User Interrupts (#129300)
These were left over from when Craig removed
`__attribute__((interrupt("user")))` support in
05d0caef6081e1a6cb23a5a5afe43dc82e8ca558.

The tests change "interrupt"="user" into "interrupt"="machine" as they
are still intending to be interrupt tests. ISelLowering will now reject
"interrupt"="user". The docs no longer mention "user" as a possible
interrupt attribute argument.
2025-03-04 11:36:16 -08:00
Jorge Gorbe Moya
f9a6ea4489
[libc][bazel] Add BUILD targets for complex functions and tests. (#129618)
This involved a little bit of yak shaving because one of the new tests
depends on MPC, and we didn't have targets for it yet, so I ended up
needing to add a similar setup to what we have for MPFR.
2025-03-04 11:05:01 -08:00
Andy Kaylor
6f256145c0
[CIR] Clean up warnings (#129604)
Previous CIR commits have introduced a few warnings. This change fixes
those.

There are still warnings present when building with GCC because GCC
warns about virtual functions being hidden in the mlir::OpConversion
classes. A separate discussion will be required to decide what should be
done about those.
2025-03-04 10:50:06 -08:00
Philip Reames
9295b03e2a [RISCV] Fix a typo in fixed_m1_in_m2_tail test [nfc]
When I added these, they were supposed to be sub-vector inserts, but since
I got a couple index values wrong, they were instead general shuffles.
2025-03-04 10:47:08 -08:00
Janek van Oirschot
0a93bc7d7a
[AMDGPU] Debug dump for AMDGPU resource usage (#122952) 2025-03-04 18:15:33 +00:00
John Harrison
6e28700ab1
[lldb-dap] Improving EOF handling on stream input and adding new unit tests (#129581)
This should improve the handling of EOF on stdin and adding some new
unit tests to malformed requests.
2025-03-04 10:09:28 -08:00
Matt Arsenault
c8f4c35a66
AMDGPU: Correctly handle folding immediates into subregister use operands (#129664)
This fixes a miscompile where a 64-bit materialize incorrectly folds
into
a sub1 use operand.

We currently do not see many subregister use operands. Incidentally,
there are also SIFoldOperands bugs that prevent this fold from
appearing here. Pre-fix folding of 32-bit subregister uses from 64-bit
materializes, in preparation for future patches.

The existing APIs are awkward since they expect to have a fully formed
instruction with operands to use, and not something new which needs
to be created.
2025-03-05 01:06:11 +07:00
Dominik Steenken
0f869cc336
[SystemZ] Make I5 operand of R[INOX]SGB(Z)? optional (#129512)
The I5 operand of the instructions in RIE-f format is optional and
assumed 0 when not specified. This was not properly modeled thus far,
and is corrected with this PR. In addition, assembly and disassembly
tests are updated to reflect these changes
2025-03-04 18:53:36 +01:00
Philip Reames
863260523f [RISCV][TTI] Simplify code using getRealVLen() [NFC] 2025-03-04 09:48:06 -08:00
Alex
b8a66f50b4
[OFFLOAD] Update ffi_cif structure to match libffi (#128756)
The ffi_cif structure defined in the wrapper header is smaller than the
actual structure in libffi which results in other structures being
overwritten when libffi is called, and finally in a segfault.

The patch updates the structure to the correct layout as specified in
ffi.h
2025-03-04 11:40:12 -06:00
Nick Fitzgerald
6018930ef1
[lld][WebAssembly] Support for the custom-page-sizes WebAssembly proposal (#128942)
This commit adds support for WebAssembly's custom-page-sizes proposal to
`wasm-ld`. An overview of the proposal can be found
[here](https://github.com/WebAssembly/custom-page-sizes/blob/main/proposals/custom-page-sizes/Overview.md).
In a sentence, it allows customizing a Wasm memory's page size, enabling
Wasm to target environments with less than 64KiB of memory (the default
Wasm page size) available for Wasm memories.

This commit contains the following:

* Adds a `--page-size=N` CLI flag to `wasm-ld` for configuring the
linked Wasm binary's linear memory's page size.

* When the page size is configured to a non-default value, then the
final Wasm binary will use the encodings defined in the
custom-page-sizes proposal to declare the linear memory's page size.

* Defines a `__wasm_first_page_end` symbol, whose address points to the
first page in the Wasm linear memory, a.k.a. is the Wasm memory's page
size. This allows writing code that is compatible with any page size,
and doesn't require re-compiling its object code. At the same time,
because it just lowers to a constant rather than a memory access or
something, it enables link-time optimization.

* Adds tests for these new features.

r? @sbc100 

cc @sunfishcode
2025-03-04 09:39:30 -08:00
Deric C.
bbbdb23c33
[DirectX] Set module-level flag LowPrecisionPresent in DXIL Shader Flags Analysis (#129109)
Fixes #114561
2025-03-04 09:37:59 -08:00
Tai Ly
25a29cef31
[mlir][tosa] Switch zero point of avgpool2d to input variable type (#128983)
This commit changes the TOSA operator AvgPool2d's zero point attributes
to inputs to align with TOSA 1.0 spec.

Signed-off-by: Luke Hutton <luke.hutton@arm.com>
Co-authored-by: Luke Hutton <luke.hutton@arm.com>
2025-03-04 09:34:23 -08:00
Peilin Ye
17bfc00f7c
[BPF] Add load-acquire and store-release instructions under -mcpu=v4 (#108636)
As discussed in [1], introduce BPF instructions with load-acquire and
store-release semantics under -mcpu=v4.  Define 2 new flags:

  BPF_LOAD_ACQ    0x100
  BPF_STORE_REL   0x110

A "load-acquire" is a BPF_STX | BPF_ATOMIC instruction with the 'imm'
field set to BPF_LOAD_ACQ (0x100).

Similarly, a "store-release" is a BPF_STX | BPF_ATOMIC instruction with
the 'imm' field set to BPF_STORE_REL (0x110).

Unlike existing atomic read-modify-write operations that only support
BPF_W (32-bit) and BPF_DW (64-bit) size modifiers, load-acquires and
store-releases also support BPF_B (8-bit) and BPF_H (16-bit).  An 8- or
16-bit load-acquire zero-extends the value before writing it to a 32-bit
register, just like ARM64 instruction LDAPRH and friends.

As an example (assuming little-endian):

  long foo(long *ptr) {
      return __atomic_load_n(ptr, __ATOMIC_ACQUIRE);
  }

foo() can be compiled to:

  db 10 00 00 00 01 00 00  r0 = load_acquire((u64 *)(r1 + 0x0))
  95 00 00 00 00 00 00 00  exit

  opcode (0xdb): BPF_ATOMIC | BPF_DW | BPF_STX
  imm (0x00000100): BPF_LOAD_ACQ

Similarly:

  void bar(short *ptr, short val) {
      __atomic_store_n(ptr, val, __ATOMIC_RELEASE);
  }

bar() can be compiled to:

  cb 21 00 00 10 01 00 00  store_release((u16 *)(r1 + 0x0), w2)
  95 00 00 00 00 00 00 00  exit

  opcode (0xcb): BPF_ATOMIC | BPF_H | BPF_STX
  imm (0x00000110): BPF_STORE_REL

Inline assembly is also supported.

Add a pre-defined macro, __BPF_FEATURE_LOAD_ACQ_STORE_REL, to let
developers detect this new feature.  It can also be disabled using a new
llc option, -disable-load-acq-store-rel.

Using __ATOMIC_RELAXED for __atomic_store{,_n}() will generate a "plain"
store (BPF_MEM | BPF_STX) instruction:

  void foo(short *ptr, short val) {
      __atomic_store_n(ptr, val, __ATOMIC_RELAXED);
  }

  6b 21 00 00 00 00 00 00  *(u16 *)(r1 + 0x0) = w2
  95 00 00 00 00 00 00 00  exit

Similarly, using __ATOMIC_RELAXED for __atomic_load{,_n}() will generate
a zero-extending, "plain" load (BPF_MEM | BPF_LDX) instruction:

  int foo(char *ptr) {
      return __atomic_load_n(ptr, __ATOMIC_RELAXED);
  }

  71 11 00 00 00 00 00 00  w1 = *(u8 *)(r1 + 0x0)
  bc 10 08 00 00 00 00 00  w0 = (s8)w1
  95 00 00 00 00 00 00 00  exit

Currently __ATOMIC_CONSUME is an alias for __ATOMIC_ACQUIRE.  Using
__ATOMIC_SEQ_CST ("sequentially consistent") is not supported yet and
will cause an error:

  $ clang --target=bpf -mcpu=v4 -c bar.c > /dev/null
bar.c:1:5: error: sequentially consistent (seq_cst) atomic load/store is
not supported
1 | int foo(int *ptr) { return __atomic_load_n(ptr, __ATOMIC_SEQ_CST); }
      |     ^
  ...

Finally, rename those isST*() and isLD*() helper functions in
BPFMISimplifyPatchable.cpp based on what the instructions actually do,
rather than their instruction class.

[1]
https://lore.kernel.org/all/20240729183246.4110549-1-yepeilin@google.com/
2025-03-04 09:19:39 -08:00
Iris
9e1eaff95b
[clang] Fix gnu::init_priority attribute handling for reserved values (#121577)
- Added a new diagnostic group `InitPriorityReserved`
- Allow values within the range 0-100 of `init_priority` to be used
outside system library, but with a warning
- Updated relavant tests

Fixes #121108
2025-03-04 12:07:40 -05:00
Mariusz Sikora
cd3acd1bff
[AMDGPU] Remove unused s_barrier_{init,join,leave} instructions (#129548) 2025-03-04 17:52:43 +01:00
jeanPerier
9a659fac2f
[flang] fix MAXVAL(x%array_comp_with_custom_lower_bounds) (#129684)
The HLFIR inlining of MAXVAL kicks in at O1 and more when the argument
is an array component reference but the implementation did not account
for the rare cases where the array components have non default lower
bounds.

This patch fixes the issue by using `getElementAt` to compute the
element address.
Rename `indices` to `oneBasedIndices` for more clarity.
2025-03-04 17:52:05 +01:00
Alexander Richardson
17f0aaac57
[TTI] Assert that TargetIRAnalyis is not requested for intrinsics
This catches the bug fixed in https://github.com/llvm/llvm-project/pull/127760
and also finds another call in LowerTypeTests where we request the TTI
for instrinsics instead of skipping them.

Reviewed By: nikic

Pull Request: https://github.com/llvm/llvm-project/pull/129600
2025-03-04 08:50:38 -08:00
Alexander Richardson
3d864c4682
[LowerTypeTests] Skip declarations when determining Thumb support
When looping over all functions in a module to determine whether any of
them is built with support for B.W, we can skip declarations since those
do not have an associated target-feature attribute.
This was found by the assertion from https://github.com/llvm/llvm-project/pull/129600

Reviewed By: statham-arm

Pull Request: https://github.com/llvm/llvm-project/pull/129599
2025-03-04 08:47:18 -08:00
Lucas Ramirez
03677f63a7
[MachineScheduler] Optional scheduling of single-MI regions (#129704)
Following 15e295d the machine scheduler no longer filters-out single-MI
regions when emitting regions to schedule. While this has no functional
impact at the moment, it generally has a negative compile-time impact
(see #128739).

Since all targets but AMDGPU do not care for this behavior, this
introduces an off-by-default flag to `ScheduleDAGInstrs` to control
whether such regions are going to be scheduled, effectively reverting
15e295d for all targets but AMDGPU (currently the only target enabling
this flag).
2025-03-04 17:46:44 +01:00