llvm-project

mirror of https://github.com/llvm/llvm-project.git synced 2025-04-18 07:16:40 +00:00

Author	SHA1	Message	Date
Kazu Hirata	cde58bfc16	[Transforms] Use range constructors of *Set (NFC) (#133203 )	2025-03-27 07:51:58 -07:00
Craig Topper	ba1d901967	[RISCV] Set mayRaiseFPException = 0 on FCVT_D_W(U). (#133200 ) The input is an integer which can't be NAN so the NV(invalid) exception can't be raised. The conversion is exact so it can't raise NX(inexact), UF(underflow), or OF(overflow). The instructions are not divide so they can't raise DZ(divide by zero). Fixes #133192.	2025-03-27 07:45:42 -07:00
Guray Ozen	38d9a44510	[MLIR][NVGPU] Add `tma.fence.descriptor` OP (#133218 ) When the TMA descriptor is transferred from host memory to global memory using cudaMemcpy, each thread block must insert a fence before any thread accesses the updated tensor map in global memory. Once the tensor map has been accessed, no additional fences are needed by that block unless the map is modified again. [Example from cuda programming guide](https://docs.nvidia.com/cuda/cuda-c-programming-guide/#using-tma-to-transfer-multi-dimensional-arrays). The `tma.fence.descriptor` basically implements `ptx::fence_proxy_tensormap_generic`. ``` #include <cuda.h> #include <cuda/ptx> namespace ptx = cuda::ptx; __device__ CUtensorMap global_tensor_map; __global__ void kernel(CUtensorMap *tensor_map) { // Fence acquire tensor map: ptx::n32_t<128> size_bytes; // Since the tensor map was modified from the host using cudaMemcpy, // the scope should be .sys. ptx::fence_proxy_tensormap_generic( ptx::sem_acquire, ptx::scope_sys, tensor_map, size_bytes ); // Safe to use tensor_map after fence inside this thread.. } int main() { CUtensorMap local_tensor_map; // [ ..Initialize map.. ] cudaMemcpy(&global_tensor_map, &local_tensor_map, sizeof(CUtensorMap), cudaMemcpyHostToDevice); kernel<<<1, 1>>>(global_tensor_map); } ```	2025-03-27 15:20:19 +01:00
Guray Ozen	bc7e3915e1	[MLIR][NVGPU] Add `mbarrier.get` Op (#133221 ) The `mbarrier.create` op can create multiple mbarrier objects, and other mbarrier-related ops can access an mbarrier using a dynamic SSA value. This is especially useful when using mbarriers in dynamic loops. This PR adds the `mbarrier.get` op, which returns a pointer to a specific mbarrier object from a group of barriers created by the nvgpu.mbarrier.create operation. It is useful when composing the NVGPU and NVVM dialects. Example: ``` %mbars = nvgpu.mbarrier.create -> !nvgpu.mbarrier.group<memorySpace = #gpu.address_space<workgroup>, num_barriers = 10> %mbar_pointer = nvgpu.mbarrier.get %mbars[%c2] : !nvgpu.mbarrier.group<memorySpace = #gpu.address_space<workgroup>> -> i32 ```	2025-03-27 15:20:07 +01:00
Nikolas Klauser	427ce92ea6	[libc++][NFC] Move dylib function in <__filesystem/operations.h> together Most of the dylib functions inside `<__filesystem/operations.h>` are at the top of the file. There are a few spread out in the file for some reason, which this patch fixes.	2025-03-27 15:14:17 +01:00
Craig Topper	b9666cf203	[RISCV] Reverse the order of Base and Offset in Core-V RegReg operand. (#133209 ) This puts the base before the offset to match the order we use for base ISA where the offset is an immediate. I'm investigating using sub-operands for the base ISA loads and stores too so having a consistent operand order will allow more sharing.	2025-03-27 07:12:54 -07:00
Luke Lau	27a437108b	[InstCombine] Handle scalable splats of constants in getMinimumFPType (#132960 ) We previously handled ConstantExpr scalable splats in 5d929794a87602cfd873381e11cc99149196bb49, but only fpexts. ConstantExpr fpexts have since been removed, and simultaneously we didn't handle splats of constants that weren't extended. This updates it to remove the fpext check and instead see if we can shrink the result of getSplatValue. Note that the test case doesn't get completely folded away due to #132922	2025-03-27 13:24:00 +00:00
Peng Liu	8fdfe3f2a7	[libc++] Refactor ranges::{min, max, min_element, max_element} to use std::__min_element (#132418 ) Previously, ranges::min_element delegated to ranges::__min_element_impl, which duplicated the definition of std::__min_element. This patch updates ranges::min_element to directly call std::__min_element, which allows removing the redundant code in ranges::__min_element_impl. Upon removal of ranges::__min_element_impl, the other ranges algorithms ranges::{min,max,max_element}, which previously delegated to ranges::__min_element_impl, have been updated to call std::__min_element instead. This refactoring unifies the implementation across these algorithms, ensuring that future optimizations or maintenance work only need to be applied in one place.	2025-03-27 09:05:37 -04:00
Joseph Huber	d0aa1f9c43	[Clang] Make `--lto-partitions` only default for HIP (#133164 ) Summary: The default behavior for LTO on other targets does not specify the number of LTO partitions. Recent changes made this default to 8 on AMDGPU which had some issues with the `libc` project. The option to disable this is HIP only so I think for now we should restrict this just to HIP. I'm definitely on board with getting some more parallelism here, but I think it should probably be restricted to just offloading languages. The new driver goes through the `--target=amdgcn-amd-amdhsa` for its output, which means we'd need to forward the default somehow.	2025-03-27 07:34:57 -05:00
Nikolas Klauser	17d0569538	[libc++] Instantiate hash function externally (#127040 ) This has multiple benefits: - There is a single instance of our hash function, reducing object file size - The hash implementation isn't instantiated in every TU anymore, reducing compile times - Behind an ABI configuration macro it would be possible to salt the hash	2025-03-27 13:19:59 +01:00
Longsheng Mou	ac09b789d8	[mlir][scf] Remove redundant ensureTerminator for `scf.forall` (#133081 ) The override function `ensureTerminator` ensures that the terminator `InParallelOp` has a region. However, if the terminator of `scf.forall` is not an `InParallelOp`, calling ensureTerminator causes a crash. Since the InParallelOp builder already guarantees the existence of a region, `ForallOp::ensureTerminator` is redundant and can be safely removed. Fixes #130019.	2025-03-27 20:07:20 +08:00
Pavel Labath	17aca79d98	[lldb] Teach FuncUnwinders about discontinuous functions (#133072 ) The main change here is that we're now able to correctly look up plans for these functions. Previously, due to caching, we could end up with one entry covering most of the address space (because part of the function was at the beginning and one at the end). Now, we can correctly recognise that the part in between does not belong to that function, and we can create a different FuncUnwinders instance for it. It doesn't help the discontinuous function much (its plan will still be garbled), but we can at least properly unwind out of the simple functions in between. Fixing the unwind plans for discontinuous functions requires handling each unwind source specially, and this setup allows us to make the transition incrementally.	2025-03-27 12:51:20 +01:00
Sudharsan Veeravalli	a6e56162c2	[RISCV] Modify operand regclass in load store patterns (#133071 ) $rs1 is defined as GPRMem in the correspoding instruction definition classes.	2025-03-27 17:20:25 +05:30
Pavel Labath	39e7efe1e4	[lldb] Respect LaunchInfo::SetExecutable in ProcessLauncherPosixFork (#133093 ) Using argv[0] for this was incorrect. I'm ignoring LaunchInfo::SetArg0, as that's what darwin and windows launchers do (they use the first element of the args vector instead). I picked up the funny unit test re-exec method from the llvm unit tests.	2025-03-27 12:44:56 +01:00
Younan Zhang	a9672515ce	[Clang] Correct the DeclRefExpr's Type after the initializer gets instantiated (#133212 ) The instantiation of a VarDecl's initializer might be deferred until the variable is actually used. However, we were still building the DeclRefExpr with a type that could later be changed by the initializer's instantiation, which is incorrect when incomplete arrays are involved. Fixes #79750 Fixes #113936 Fixes #133047	2025-03-27 19:40:02 +08:00
Sudharsan Veeravalli	1a140820ab	[RISCV] Have GPRMem on the correct operand in QCIRVInstESStore (#133042 ) It should be on rs1 and not rs2.	2025-03-27 17:02:32 +05:30
Ryotaro Kasuga	6c56a842b7	[clang][CodeGen] Generate follow-up metadata for loops in correct format (#131985 ) When pragma of loop transformations is specified, follow-up metadata for loops is generated after each transformation. On the LLVM side, follow-up metadata is expected to be a list of properties, such as the following: ``` !followup = !{!"llvm.loop.vectorize.followup_all", !mp, !isvectorized} !mp = !{!"llvm.loop.mustprogress"} !isvectorized = !{"llvm.loop.isvectorized"} ``` However, on the clang side, the generated metadata contains an MDNode that has those properties, as shown below: ``` !followup = !{!"llvm.loop.vectorize.followup_all", !loop_id} !loop_id = distinct !{!loop_id, !mp, !isvectorized} !mp = !{!"llvm.loop.mustprogress"} !isvectorized = !{"llvm.loop.isvectorized"} ``` According to the [LangRef](https://llvm.org/docs/TransformMetadata.html#transformation-metadata-structure), the LLVM side is correct. Due to this inconsistency, follow-up metadata was not interpreted correctly, e.g., only one transformation is applied when multiple pragmas are used. This patch fixes clang side to emit followup metadata in correct format.	2025-03-27 20:29:37 +09:00
Fraser Cormack	3284559cca	[libclc] Move atan2/atan2pi to the CLC library (#133226 ) As with other work in this area, these builtins are now vectorized. A further table has been split into two. There was discrepancy between comments above the table describing the values as "lead" and "tail" and variables taken from the table called "head" and "tail", so these have been unified as head/tail.	2025-03-27 10:59:09 +00:00
Simon Pilgrim	6c2171672f	Fix gcc signed/unsigned comparison warning. NFC.	2025-03-27 10:50:37 +00:00
Nikolas Klauser	8abca171c3	[libc++] Introduce unversioned namespace macros (#133009 ) We've started using `_LIBCPP_BEGIN_NAMESPACE_STD` and `_LIBCPP_END_NAMESPACE_STD` for more than just the namespace for a while now. For example, we're using it to add visibility annotations to types. This works very well and avoids a bunch of annotations, but doesn't work for the few places where we have an unversioned namespace. This adds `_LIBCPP_BEGIN_UNVERSIONED_NAMESPACE_STD` and `_LIBCPP_END_UNVERSIONED_NAMESPACE_STD` to make it simpler to add new annotations consistently across the library as well as making it more explicit that the unversioned namespace is indeed intended.	2025-03-27 11:34:38 +01:00
Simon Pilgrim	491d3dfc76	[X86] combineINSERT_SUBVECTOR - fold insert_subvector(base,extract_subvector(broadcast)) -> blend shuffle(base,broadcast) (#133083 ) If the broadcast is already the full vector width, try to prefer a blend over a vector insertion which is usually a lower latency (and sometimes a lower uop count).	2025-03-27 10:29:32 +00:00
Pavel Labath	71d54cd4f1	[lldb] Remove (deprecated) Function::GetAddressRange (#132923 ) All uses have been replaced by GetAddressRanges or GetAddress. Also fix two internal uses of the range member.	2025-03-27 11:27:56 +01:00
Pavel Labath	d7cea2b187	[lldb] Remove UnwindPlan::Row shared_ptrs (#132370 ) The surrounding code doesn't use them anymore. This removes the internal usages. This patch makes the Rows actual values. An alternative would be to make them unique_ptrs. That would make vector resizes faster at the cost of more pointer chasing and heap fragmentation. I don't know which one is better so I picked the simpler option.	2025-03-27 11:26:42 +01:00
WÁNG Xuěruì	99ec6f8aec	[LoongArch][MC] Add support for disassembly option "no-aliases" (#132900 ) This parallels the GNU Binutils feature's usage. A hidden command-line option `--loongarch-no-aliases` is also added, similar to how `--loongarch-numeric-reg` is for the `numeric` option.	2025-03-27 17:43:46 +08:00
Simon Pilgrim	f7a3334016	[X86] SimplifyDemandedBitsForTargetNode - add X86ISD::BLENDI handling (#133102 )	2025-03-27 09:32:40 +00:00
Hans Wennborg	00c527abab	[Coro] Allow spilling @llvm.coro.suspend() to the coro frame (#133088 ) It was excluded from spilling in a263a60, possibly by accident. In the linked bug, we hit a situation like this: ``` %s = call @llvm.coro.suspend(...) \| switch (%s) case v1: / \ case v2: ... ... \| suspend point \| ... \ / %x = phi [v1] [v2] \| ... \| use(%x) ``` Instcombine will notice that %x correlates exactly with %s, and so use(%x) becomes use(%s). However, corosplit would substitute different values for %s when splitting the function, so even though %s had a particular value when control actually passed through the switch, it could have a different value when reaching use(%s). This illustrates that while IR from the frontend typically does not use these suspend return values across suspend points, mid-level optimizations on the presplit coroutine may introduce new uses of suspend values, so they must be considered eligible to spill to the coroutine frame. Fixes: #130326	2025-03-27 10:23:30 +01:00
Fraser Cormack	db98e2922f	[libclc] Move log1p/asinh/acosh/atanh to the CLC library (#132956 ) These four functions all related in that they share tables and helper functions. Furthermore, the acosh and atanh builtins call log1p. As with other work in this area, these builtins are now vectorized. To enable this, there are new table accessor functions which return a vector of table values using a vector of indices. These are internally scalarized, in the absence of gather operations. Some tables which were tables of multiple entries (e.g., double2) are split into two separate "low" and "high" tables. This might affect the performance of memory operations but are hopefully mitigated by better codegen overall.	2025-03-27 09:19:07 +00:00
Simon Pilgrim	1715386e80	Fix MSVC signed/unsigned comparison warning. NFC.	2025-03-27 08:56:21 +00:00
Simon Pilgrim	f15924d1d1	Fix MSVC "not all control paths return a value" warning. NFC.	2025-03-27 08:55:47 +00:00
Djordje Todorovic	58a0c9570c	Revert "[MIPS] Define SubTargetFeature for i6500 cpu" (#133215 ) Reverts llvm/llvm-project#132907 due to some test failures.	2025-03-27 09:06:02 +01:00
Owen Pan	05fb8408de	[clang-format] Allow `Language: Cpp` for C files (#133033 ) Fix #132832	2025-03-27 01:00:02 -07:00
Owen Pan	b8a0558dea	[clang-format] Fix a regression on annotating template angles (#132885 ) Annotate the angles in `A<B != A>B` as template opener/closer as it's unlikely that they are less/greater-than operators in this context. Fix #132248	2025-03-27 00:57:17 -07:00
Mallikarjuna Gouda	6294325a53	[MIPS] Define SubTargetFeature for i6500 cpu (#132907 ) PR #130587 defined same SubTargetFeature for CPUs i6400 and i6500 which resulted into following warning when -mcpu=i6500 was used: +i6500' is not a recognized feature for this target (ignoring feature) This PR fixes above issue by defining separate SubTargetFeature for i6500.	2025-03-27 08:48:34 +01:00
LU-JOHN	2df25a4733	Invalidate range metadata when folding bitcast into load (#133095 )	2025-03-27 14:10:55 +07:00
Letu Ren	9438694a54	[mlir][llvmir] Add llvm.intr.ldexp operation (#133070 ) https://llvm.org/docs/LangRef.html#llvm-ldexp-intrinsic	2025-03-27 07:50:39 +01:00
Letu Ren	ad51368881	[mlir][llvmir] add llvm.experimental.constrained.sitofp intrinsics (#133166 ) https://llvm.org/docs/LangRef.html#llvm-experimental-constrained-sitofp-intrinsic Signed-off-by: Letu Ren <fantasquex@gmail.com>	2025-03-27 07:50:04 +01:00
Jonas Devlieghere	55b95151d2	[lldb] Avoid flickering by not clearing the statusline when redrawing When redrawing the statusline, the current implementation would clear the current line before drawing the new content. Since we always overwrite the whole statusline from beginning to end, there's no need to clear it and we can avoid the potential for flickering.	2025-03-26 23:35:36 -07:00
Jonas Devlieghere	618e8b9a70	[lldb] Avoid unnecessary statusline redraw in HandleProgressEvent There's no need to call RedrawStatusline from HandleProgressEvent. The statusline gets redraw after handling all events, including progress events, in the default event handler loop.	2025-03-26 23:35:36 -07:00
Chuanqi Xu	80f216db53	[C++20] [Modules] Add a test A test from a regression in the downstream. The test should be always good.	2025-03-27 14:04:41 +08:00
Thurston Dang	68e90e4f0c	[asan][NFCI] Add ASAN_POISONING_H header guard (#133178 )	2025-03-26 22:52:34 -07:00
NAKAMURA Takumi	291fd8f2ce	clang/lib/Sema/SemaExprCXX.cpp: Suppress a warning introduced in #133113 . [-Wunused-but-set-variable]	2025-03-27 13:34:29 +09:00
Luke Lau	cc30fbacec	[Docs][RISCV] Update RISCVVectorExtension.rst to reflect RISCVVMV0Elimination. NFC (#133058 ) Also correct the old name of RISCVFoldMasks to RISCVVectorPeephole	2025-03-27 06:24:35 +02:00
Craig Topper	16d1942c06	[RISCV] Drop '-x c' from riscv-target-features.c. NFC	2025-03-26 21:20:04 -07:00
Kazu Hirata	64046e9d26	[Driver] Use a range constructor of StringSet (NFC) (#133201 ) This patch uses a range constructor to collapse: llvm::StringSet<> Dest; for (const auto &S : Src) Dest.insert(S); down to: llvm::StringSet<> Dest(llvm::from_range, Src);	2025-03-26 20:54:08 -07:00
Craig Topper	d58f57228d	[RISCV] Use named sub-operands to simplify encoding/decoding for CoreV Reg-Reg instructions. (#133181 ) We can name the sub-operands using a DAG in the 'ins'. This allows those names to be matched to the encoding fields. This removes the need for a custom encoder/decoder that treats the 2 sub-operands as a single 10-bit value. While doing this, I noticed the base and offset names in the MIOperandInfo were swapped relative to how the operands are parsed and printed. Assuming that I've correctly understood the parsing/print format as "offset(base)".	2025-03-26 19:58:35 -07:00
Wu Yingcong	e54f31a20c	[compiler-rt][builtins] Add missing flag for builtins standalone build (#133046 ) When builtins are built with runtimes, it is built before compiler-rt, and this makes some of the HAS_XXX_FLAGs missing. In this case, the COMPILER_RT_HAS_FCF_PROTECTION_FLAG is missing which makes it impossible to enable CET in this case. This patch addresses this issue by also check for such flag in standalone build instead of relying on the compiler-rt's detection.	2025-03-27 10:15:48 +08:00
Aiden Grossman	2ca27e7c3e	[CI] Fix typo in compute_projects_test.py I apparently forgot to properly name the test before submitting the last patch. This patch properly names the test.	2025-03-27 01:51:13 +00:00
Krzysztof Drewniak	263ec7221e	[mlir][NFC] Move and rename EnumAttrCase, EnumAttr C++ classes (#132650 ) This moves the EnumAttrCase and EnumAttr classes from Attribute.h/.cpp to a new EnumInfo.h/cpp and renames them to EnumCase and EnumInfo, respectively. This doesn't change any of the tablegen files or any user-facing aspects of the enum attribute generation system, just reorganizes code in order to make main PR (#132148) shorter.	2025-03-26 20:26:14 -05:00
Philip Reames	79e82b6f14	[RISCV] Use a precise size for MMO on scalable spill and fill (#133171 ) The primary effect of this is that we get proper scalable sizes printed by the assembler, but this may also enable proper aliasing analysis. I don't see any test changes resulting from the later. Getting the size is slightly tricky as we store the scalable size as a non-scalable quantity in the object size field for the frame index. We really should remove that hack at some point... For the synthetic tuple spills and fills, I dropped the size from the split loads and stores to avoid incorrect (overly large) sizes. We could also divide by the NF factor if we felt like writing the code to do so.	2025-03-26 18:25:59 -07:00
tangaac	88099a7da9	[LoongArch] Custom lower vector trunc to vector shuffle (#130938 )	2025-03-27 09:23:05 +08:00

... 3 4 5 6 7 ...

532325 Commits