llvm-project

mirror of https://github.com/llvm/llvm-project.git synced 2025-04-24 04:56:07 +00:00

Author	SHA1	Message	Date
Jerry-Ge	3b38992de1	[mlir][tosa] Update AVG_POOL2D description to align with TOSAv1.0 Spec (#129782 )	2025-03-05 01:18:56 +00:00
Jerry-Ge	2b5ac43359	[mlir][tosa] Update RFFT2D description to align with TOSA v1.0 spec (#129789 )	2025-03-05 01:17:56 +00:00
Jerry-Ge	2ae5dedd7a	[mlir][tosa] Update ControlFlow variable names to match with TOSA v1.0 spec (#129790 )	2025-03-05 01:17:42 +00:00
Matt Arsenault	91aac7c379	AMDGPU: Handle s_add_u32 in eliminateFrameIndex (#129628 ) We can fold frame indexes directly into existing immediate operands, just like is already done for s_add_i32. We happen to use s_add_i32 in the 32-bit add case, but s_add_u32 appears in the a 64-bit add sequence of a flat pointer if an addrpacecast source is a frame index. This avoids, but does not address a failure exposed after a3165398db0736588daedb07650195502592e567 where two literal operands end up in the final instruction. The underlying issue still exists for some instructions without special handling in eliminateFrameIndex.	2025-03-05 08:09:46 +07:00
Cyndy Ishida	b41baafbc7	[readtapi] Condense output when comparing tbd files with mismatched inlined libraries (#129754 ) Previously, when an inlined library existed in TBD file A but not in file B, all of the inlined library's attributes were printed. This is noisy since the important detail is the complete contents are missing. Instead, only print the install name of the inlined library and the marker for which the input file exists in.	2025-03-04 17:05:01 -08:00
Thurston Dang	8aafbfdc3a	[msan][NFC] Add arm64-vmax.ll tests (#129760 ) Forked from llvm/test/CodeGen/AArch64/arm64-vmax.ll Pairwise instructions which are handled incorrectly by heuristics: - llvm.aarch64.neon.fmaxp (floating-point maximum pairwise) - llvm.aarch64.neon.fminp - llvm.aarch64.neon.fmaxnmp (floating-point maximum number pairwise) - llvm.aarch64.neon.fminnmp - llvm.aarch64.neon.smaxp - llvm.aarch64.neon.sminp - llvm.aarch64.neon.umaxp - llvm.aarch64.neon.uminp Future work should consider whether handlePairwiseShadowOrIntrinsic is a more appropriate handler. Other instructions which are handled correctly by heuristics: - llvm.aarch64.neon.fmax - llvm.aarch64.neon.fmin - llvm.aarch64.neon.smax - llvm.aarch64.neon.smin - llvm.aarch64.neon.umax - llvm.aarch64.neon.umin	2025-03-04 16:44:18 -08:00
Thurston Dang	dec4cae131	[msan][NFC] Add expand-experimental-reductions.ll (#129768 ) Forked from llvm/test/CodeGen/Generic/expand-experimental-reductions.ll Handled suboptimally by visitInstruction: - llvm.vector.reduce.smax - llvm.vector.reduce.smin - llvm.vector.reduce.umax - llvm.vector.reduce.umin - llvm.vector.reduce.fmax - llvm.vector.reduce.fmin	2025-03-04 16:44:10 -08:00
Mircea Trofin	2068a18c86	[ctxprof][nfc] Prepare CtxProfAnalysis for flat profiles (#129623 ) Mostly remove the equivalence "no contexts == no CtxProfAnalysis result", and instead check explicitly there are no contextual profiles.	2025-03-04 16:42:47 -08:00
A. Jiang	e739ce2e10	[libc++] Add missed `constexpr` to `erase(_if)` in `<string>` (#129666 ) `std::erase(_if)` for `basic_string` were made `constexpr` in C++20 by cplusplus/draft@2c1ab9775c as follow-up changes of P0980R1. This patch implements the missed changes that were not tracked in a specific paper.	2025-03-05 08:31:28 +08:00
Greg Clayton	27901cec0e	Add subsection and permissions support to ObjectFileJSON. (#129801 ) This patch adds the ability to create subsections in a section and allows permissions to be specified.	2025-03-04 16:19:20 -08:00
Krzysztof Drewniak	e697c99b63	[AMDGPU] Add custom MachineValueType entries for buffer fat poiners (#127692 ) The old hack of returning v5/v6i32 for the fat and strided buffer pointers was causing issuse during vectorization queries that expected to be able to construct a VectorType from the return value of `MVT getPointerType()`. On example is in the test attached to this PR, which used to crash. Now, we define the custom MVT entries, the 160-bit amdgpuBufferFatPointer and 192-bit amdgpuBufferStridedPointer, which are used to represent ptr addrspace(7) and ptr addrspace(9) respectively. Neither of these types will be present at the time of lowering to a SelectionDAG or other MIR - MVT::amdgpuBufferFatPointer is eliminated by the LowerBufferFatPointers pass and amdgpu::bufferStridedPointer is not currently used outside of the SPIR-V translator (which does its own lowering). An alternative solution would be to add MVT::i160 and MVT::i192. We elect not to do this now as it would require changes to unrelated code and runs the risk of breaking any SelectionDAG code that assumes that the MVT series are all powers of two (and so can be split apart and merged back together) in ways that wouldn't be obvious if someone tried to use MVT::i160 in codegen. If i160 is added at some future point, these custom types can be retired.	2025-03-04 17:19:06 -06:00
Andy Kaylor	fa072bd29a	[CIR] Add lowering for Func, Return, Alloca, Load, and Store (#129571 ) Add support for lowering recently upstreamed CIR ops to LLVM IR.	2025-03-04 14:50:34 -08:00
Slava Zakharin	9b1604065e	[flang-rt] Move unit-map.cpp to host-only sources list. (#129763 ) This file is not enabled for the offload builds. This patch aligns the list with flang/runtime/CMakeLists.txt (that is about to be removed).	2025-03-04 14:39:16 -08:00
Greg Clayton	7b596ce362	[lldb] Fix ObjectFileJSON to section addresses. (#129648 ) ObjectFileJSON sections didn't work, they were set to zero all of the time. Fixed the bug and fixed the test to ensure it was testing real values.	2025-03-04 14:35:42 -08:00
Michael Jones	ed5cd8d464	[libc] Fix casts for arm32 after Wconversion (#129771 ) Followup to #127523 There were some test failures on arm32 after enabling Wconversion. There were some tests that were failing due to missing casts. Also I changed BigInt's `safe_get_at` back to being signed since it needed the ability to be negative.	2025-03-04 14:32:36 -08:00
Peng Liu	a12744ff05	[libc++] Optimize ranges::swap_ranges for vector<bool>::iterator (#121150 ) This PR optimizes the performance of `std::ranges::swap_ranges` for `vector<bool>::iterator`, addressing a subtask outlined in issue #64038. The optimizations yield performance improvements of up to 611x for aligned range swap and 78x for unaligned range swap comparison. Additionally, comprehensive tests covering up to 4 storage words (256 bytes) with odd and even bit sizes are provided, which validate the proposed optimizations in this patch.	2025-03-04 17:15:36 -05:00
youngd007	b08769c3ec	Modify dwarf verification JSON to include detailed counts by sub-category (#128018 ) To help make better use of dwarfdump verification for identifying and fixing issues with debug information, the JSON will now emit details (sub-categories) where relevant. First modification concerns missing tags as those were recently missing for BOLT debug names. Test: test files for JSON output were previously added, so modify here to expect the new JSON keys. One test has sub-categories and another is empty. ninja check-llvm-tools-llvm-dwarfdump Also build the tool and run with a local executable to verify. ninja llvm-dwarfdump	2025-03-04 14:00:13 -08:00
David Green	4c2d1b4c53	[AArch64] Add test for scalar copysign. NFC	2025-03-04 21:46:55 +00:00
Philip Reames	42429fedf9	[RISCV] Simplify costShuffleViaVRegSplitting [nfc] (#129766 ) This code goes to some length to cost the subvector extracts, but by construction, all of the subvector extracts are subregister extracts from a vector register group and thus have zero cost. As a result, none of this code is needed.	2025-03-04 13:35:52 -08:00
Philip Reames	df1c8ba26c	[RISCV][CostModel] Add additional deinterleave tests with EMUL>1	2025-03-04 13:33:55 -08:00
Deric C.	1440f02259	[Scalarizer] Ensure valid VectorSplits for each struct element in `visitExtractValueInst` (#128538 ) Fixes #127739 The `visitExtractValueInst` is missing a check that was present in `splitCall` / `visitCallInst`. This check ensures that each struct element has a VectorSplit, and that each VectorSplit contains the same number of elements packed per fragment. --------- Co-authored-by: Jay Foad <jay.foad@amd.com>	2025-03-04 13:10:31 -08:00
Jan Voung	d6301b218c	Revert "[clang][dataflow] Fix unsupported types always being equal" (#129761 ) Reverts llvm/llvm-project#129502 seeing new crashes around `859520eca8/nullability/test/smart_pointers_diagnosis.cc (L57)` Would like some time to investigate.	2025-03-04 15:48:42 -05:00
Alexey Bataev	855178af99	[SLP]Fix/improve getSpillCost analysis Previous implementation may took some extra time, when walked over the same instructions several times. And also it did not include proper analysis for cross-basic-block use of the vectorized values. This version fixes it. It walks over the tree and checks the deps between entries and their operands. If there are non-vectorized calls in between, it adds a single(!) spill cost, because the vector value should be spilled/reloaded only once. Also, this version caches analysis for each entries, which are detected, and do not repeats it, uses data, found during previous analysis for previous nodes. Also, it has the internal limit. If the number of instructions between nodes and their operands is too big (> than ScheduleRegionSizeBudget / VectorizableTree.size()), it is considered that the spill is required. It allows to improve compile time. Reviewers: preames, RKSimon, mikhailramalho Reviewed By: preames Pull Request: https://github.com/llvm/llvm-project/pull/129258	2025-03-04 15:47:23 -05:00
Brox Chen	5cc033b5f2	[AMDGPU][True16][CodeGen] fshr true16 pattern (#129085 ) true16 pattern for fshr. GlobalIsel will be enabled latter when merge_value selection is supported in true16 mode	2025-03-04 15:43:43 -05:00
Mircea Trofin	1b46db7776	[ctxprof] ProfileWriter abstraction (#129590 ) Introduce a `ProfileWriter` abstraction to replace the callback passed to `__llvm_ctx_profile_fetch`. Subsequent changes will add support for flat profile collection (as in, collection of non-contextual profile for those functions not under a contextual root), which require also a change in the profile format. The abstraction makes it easy to add "write flat" - related capabilities without constantly complicating the signature of `__llvm_ctx_profile_fetch`.	2025-03-04 12:41:16 -08:00
Philip Reames	c8dd8522fa	[RISCV][TTI] Use early return to simplify costShuffleViaVRegSplitting [nfc]	2025-03-04 12:27:37 -08:00
Craig Topper	6ca2a9f2df	[CodeGen] Use Register in SDep interface. NFC (#129734 )	2025-03-04 12:26:28 -08:00
Jorge Gorbe Moya	423862f3d5	[bazel][libc] Add missing dep after 1e6e845d49a336e9da7ca6c576ec45c0b419b5f6	2025-03-04 12:00:40 -08:00
Jacques Pienaar	540d7ddb15	[mlir][py] Plumb OpPrintingFlags::printNameLocAsPrefix() through the C/Python APIs (#129607 )	2025-03-04 11:49:34 -08:00
Lei Wang	d38380d3d8	[CSSPGO] Fix redundant reading of profile metadata (#129609 ) Fix a build speed regression due to repeated reading of profile metadata. Before the function `readFuncMetadata(ProfileHasAttribute, Profiles)` reads the metadata for all the functions(`Profiles`), however, it's actually used for on-demand loading, it can be called for multiple times, which leads to redundant reading that causes the build speed regression. Now fix it to read the metadata only for the new loaded functions(functions in the `FuncsToUse`).	2025-03-04 11:39:59 -08:00
Sam Elliott	ee4bc5a8ca	[RISCV] Remove Last Traces of User Interrupts (#129300 ) These were left over from when Craig removed `__attribute__((interrupt("user")))` support in 05d0caef6081e1a6cb23a5a5afe43dc82e8ca558. The tests change "interrupt"="user" into "interrupt"="machine" as they are still intending to be interrupt tests. ISelLowering will now reject "interrupt"="user". The docs no longer mention "user" as a possible interrupt attribute argument.	2025-03-04 11:36:16 -08:00
Jorge Gorbe Moya	f9a6ea4489	[libc][bazel] Add BUILD targets for complex functions and tests. (#129618 ) This involved a little bit of yak shaving because one of the new tests depends on MPC, and we didn't have targets for it yet, so I ended up needing to add a similar setup to what we have for MPFR.	2025-03-04 11:05:01 -08:00
Andy Kaylor	6f256145c0	[CIR] Clean up warnings (#129604 ) Previous CIR commits have introduced a few warnings. This change fixes those. There are still warnings present when building with GCC because GCC warns about virtual functions being hidden in the mlir::OpConversion classes. A separate discussion will be required to decide what should be done about those.	2025-03-04 10:50:06 -08:00
Philip Reames	9295b03e2a	[RISCV] Fix a typo in fixed_m1_in_m2_tail test [nfc] When I added these, they were supposed to be sub-vector inserts, but since I got a couple index values wrong, they were instead general shuffles.	2025-03-04 10:47:08 -08:00
Janek van Oirschot	0a93bc7d7a	[AMDGPU] Debug dump for AMDGPU resource usage (#122952 )	2025-03-04 18:15:33 +00:00
John Harrison	6e28700ab1	[lldb-dap] Improving EOF handling on stream input and adding new unit tests (#129581 ) This should improve the handling of EOF on stdin and adding some new unit tests to malformed requests.	2025-03-04 10:09:28 -08:00
Matt Arsenault	c8f4c35a66	AMDGPU: Correctly handle folding immediates into subregister use operands (#129664 ) This fixes a miscompile where a 64-bit materialize incorrectly folds into a sub1 use operand. We currently do not see many subregister use operands. Incidentally, there are also SIFoldOperands bugs that prevent this fold from appearing here. Pre-fix folding of 32-bit subregister uses from 64-bit materializes, in preparation for future patches. The existing APIs are awkward since they expect to have a fully formed instruction with operands to use, and not something new which needs to be created.	2025-03-05 01:06:11 +07:00
Dominik Steenken	0f869cc336	[SystemZ] Make I5 operand of R[INOX]SGB(Z)? optional (#129512 ) The I5 operand of the instructions in RIE-f format is optional and assumed 0 when not specified. This was not properly modeled thus far, and is corrected with this PR. In addition, assembly and disassembly tests are updated to reflect these changes	2025-03-04 18:53:36 +01:00
Philip Reames	863260523f	[RISCV][TTI] Simplify code using getRealVLen() [NFC]	2025-03-04 09:48:06 -08:00
Alex	b8a66f50b4	[OFFLOAD] Update ffi_cif structure to match libffi (#128756 ) The ffi_cif structure defined in the wrapper header is smaller than the actual structure in libffi which results in other structures being overwritten when libffi is called, and finally in a segfault. The patch updates the structure to the correct layout as specified in ffi.h	2025-03-04 11:40:12 -06:00
Nick Fitzgerald	6018930ef1	[lld][WebAssembly] Support for the custom-page-sizes WebAssembly proposal (#128942 ) This commit adds support for WebAssembly's custom-page-sizes proposal to `wasm-ld`. An overview of the proposal can be found [here](https://github.com/WebAssembly/custom-page-sizes/blob/main/proposals/custom-page-sizes/Overview.md). In a sentence, it allows customizing a Wasm memory's page size, enabling Wasm to target environments with less than 64KiB of memory (the default Wasm page size) available for Wasm memories. This commit contains the following: * Adds a `--page-size=N` CLI flag to `wasm-ld` for configuring the linked Wasm binary's linear memory's page size. * When the page size is configured to a non-default value, then the final Wasm binary will use the encodings defined in the custom-page-sizes proposal to declare the linear memory's page size. * Defines a `__wasm_first_page_end` symbol, whose address points to the first page in the Wasm linear memory, a.k.a. is the Wasm memory's page size. This allows writing code that is compatible with any page size, and doesn't require re-compiling its object code. At the same time, because it just lowers to a constant rather than a memory access or something, it enables link-time optimization. * Adds tests for these new features. r? @sbc100 cc @sunfishcode	2025-03-04 09:39:30 -08:00
Deric C.	bbbdb23c33	[DirectX] Set module-level flag `LowPrecisionPresent` in DXIL Shader Flags Analysis (#129109 ) Fixes #114561	2025-03-04 09:37:59 -08:00
Tai Ly	25a29cef31	[mlir][tosa] Switch zero point of avgpool2d to input variable type (#128983 ) This commit changes the TOSA operator AvgPool2d's zero point attributes to inputs to align with TOSA 1.0 spec. Signed-off-by: Luke Hutton <luke.hutton@arm.com> Co-authored-by: Luke Hutton <luke.hutton@arm.com>	2025-03-04 09:34:23 -08:00
Peilin Ye	17bfc00f7c	[BPF] Add load-acquire and store-release instructions under -mcpu=v4 (#108636 ) As discussed in [1], introduce BPF instructions with load-acquire and store-release semantics under -mcpu=v4. Define 2 new flags: BPF_LOAD_ACQ 0x100 BPF_STORE_REL 0x110 A "load-acquire" is a BPF_STX \| BPF_ATOMIC instruction with the 'imm' field set to BPF_LOAD_ACQ (0x100). Similarly, a "store-release" is a BPF_STX \| BPF_ATOMIC instruction with the 'imm' field set to BPF_STORE_REL (0x110). Unlike existing atomic read-modify-write operations that only support BPF_W (32-bit) and BPF_DW (64-bit) size modifiers, load-acquires and store-releases also support BPF_B (8-bit) and BPF_H (16-bit). An 8- or 16-bit load-acquire zero-extends the value before writing it to a 32-bit register, just like ARM64 instruction LDAPRH and friends. As an example (assuming little-endian): long foo(long ptr) { return __atomic_load_n(ptr, __ATOMIC_ACQUIRE); } foo() can be compiled to: db 10 00 00 00 01 00 00 r0 = load_acquire((u64 )(r1 + 0x0)) 95 00 00 00 00 00 00 00 exit opcode (0xdb): BPF_ATOMIC \| BPF_DW \| BPF_STX imm (0x00000100): BPF_LOAD_ACQ Similarly: void bar(short ptr, short val) { __atomic_store_n(ptr, val, __ATOMIC_RELEASE); } bar() can be compiled to: cb 21 00 00 10 01 00 00 store_release((u16 )(r1 + 0x0), w2) 95 00 00 00 00 00 00 00 exit opcode (0xcb): BPF_ATOMIC \| BPF_H \| BPF_STX imm (0x00000110): BPF_STORE_REL Inline assembly is also supported. Add a pre-defined macro, __BPF_FEATURE_LOAD_ACQ_STORE_REL, to let developers detect this new feature. It can also be disabled using a new llc option, -disable-load-acq-store-rel. Using __ATOMIC_RELAXED for __atomic_store{,_n}() will generate a "plain" store (BPF_MEM \| BPF_STX) instruction: void foo(short ptr, short val) { __atomic_store_n(ptr, val, __ATOMIC_RELAXED); } 6b 21 00 00 00 00 00 00 (u16 )(r1 + 0x0) = w2 95 00 00 00 00 00 00 00 exit Similarly, using __ATOMIC_RELAXED for __atomic_load{,_n}() will generate a zero-extending, "plain" load (BPF_MEM \| BPF_LDX) instruction: int foo(char ptr) { return __atomic_load_n(ptr, __ATOMIC_RELAXED); } 71 11 00 00 00 00 00 00 w1 = (u8 )(r1 + 0x0) bc 10 08 00 00 00 00 00 w0 = (s8)w1 95 00 00 00 00 00 00 00 exit Currently __ATOMIC_CONSUME is an alias for __ATOMIC_ACQUIRE. Using __ATOMIC_SEQ_CST ("sequentially consistent") is not supported yet and will cause an error: $ clang --target=bpf -mcpu=v4 -c bar.c > /dev/null bar.c:1:5: error: sequentially consistent (seq_cst) atomic load/store is not supported 1 \| int foo(int ptr) { return __atomic_load_n(ptr, __ATOMIC_SEQ_CST); } \| ^ ... Finally, rename those isST() and isLD*() helper functions in BPFMISimplifyPatchable.cpp based on what the instructions actually do, rather than their instruction class. [1] https://lore.kernel.org/all/20240729183246.4110549-1-yepeilin@google.com/	2025-03-04 09:19:39 -08:00
Iris	9e1eaff95b	[clang] Fix `gnu::init_priority` attribute handling for reserved values (#121577 ) - Added a new diagnostic group `InitPriorityReserved` - Allow values within the range 0-100 of `init_priority` to be used outside system library, but with a warning - Updated relavant tests Fixes #121108	2025-03-04 12:07:40 -05:00
Mariusz Sikora	cd3acd1bff	[AMDGPU] Remove unused s_barrier_{init,join,leave} instructions (#129548 )	2025-03-04 17:52:43 +01:00
jeanPerier	9a659fac2f	[flang] fix MAXVAL(x%array_comp_with_custom_lower_bounds) (#129684 ) The HLFIR inlining of MAXVAL kicks in at O1 and more when the argument is an array component reference but the implementation did not account for the rare cases where the array components have non default lower bounds. This patch fixes the issue by using `getElementAt` to compute the element address. Rename `indices` to `oneBasedIndices` for more clarity.	2025-03-04 17:52:05 +01:00
Alexander Richardson	17f0aaac57	[TTI] Assert that TargetIRAnalyis is not requested for intrinsics This catches the bug fixed in https://github.com/llvm/llvm-project/pull/127760 and also finds another call in LowerTypeTests where we request the TTI for instrinsics instead of skipping them. Reviewed By: nikic Pull Request: https://github.com/llvm/llvm-project/pull/129600	2025-03-04 08:50:38 -08:00
Alexander Richardson	3d864c4682	[LowerTypeTests] Skip declarations when determining Thumb support When looping over all functions in a module to determine whether any of them is built with support for B.W, we can skip declarations since those do not have an associated target-feature attribute. This was found by the assertion from https://github.com/llvm/llvm-project/pull/129600 Reviewed By: statham-arm Pull Request: https://github.com/llvm/llvm-project/pull/129599	2025-03-04 08:47:18 -08:00
Lucas Ramirez	03677f63a7	[MachineScheduler] Optional scheduling of single-MI regions (#129704 ) Following 15e295d the machine scheduler no longer filters-out single-MI regions when emitting regions to schedule. While this has no functional impact at the moment, it generally has a negative compile-time impact (see #128739). Since all targets but AMDGPU do not care for this behavior, this introduces an off-by-default flag to `ScheduleDAGInstrs` to control whether such regions are going to be scheduled, effectively reverting 15e295d for all targets but AMDGPU (currently the only target enabling this flag).	2025-03-04 17:46:44 +01:00

... 2 3 4 5 6 ...

529559 Commits