llvm-project

mirror of https://github.com/llvm/llvm-project.git synced 2025-04-16 22:36:34 +00:00

Author	SHA1	Message	Date
Joseph Huber	a2432793ea	[Clang] Add 'Joseph Huber' as offloading driver maintainer (#133296 ) Summary: I am probably the person most familiar with the offloading pipeline in clang at this point.	2025-03-27 14:02:47 -05:00
Philip Reames	c90a536bcf	[CodeGen] Simplify code using TypeSize overloads of getMachineMemOperand [nfc] These were added in d584cea. This change runs through existing uses and simplifies where obvious.	2025-03-27 11:47:51 -07:00
Fabian Mora	1a7af2a90f	[mlir][DataLayout] Add `IsolatedFromAbove` to `DataLayoutOpInterface` (#132742 ) This patch adds the `IsolatedFromAbove` trait as a dependent trait to the `DataLayoutOpInterface` op interface. The motivation behind this change comes from the implementation of the `ptr` dialect, specifically the `ptr.type_offset` op. This op produces an int-like value that equates to the size of a memory element. This is useful for ptr arithmetic and indexing arrays. For example: ```mlir %f32_off = ptr.type_offset f32 : index %addr = ptr.ptradd %ptr, %f32_off : !ptr, index %x = ptr.load %addr : !ptr -> f32 // Read ptr[1] ``` Without the `IsolatedFromAvobe` trait in the DL interface, the `ptr.type_offset` cannot be `ConstantLike`. Why? Take the example: ```mlir op {DL1} { %f32_off0 = ptr.type_offset f32 : index op {DL2} { %f32_off1 = ptr.type_offset f32 : index } } ``` If `ptr.type_offset` were to be `ConstantLike` then `canonicalize` would hoist and unique the value. However, that could be wrong as DL2 could have an entry to specify the size that's different from the size in DL1. The best solution to the above problem is to make `DataLayoutOpInterface` require the `IsolatedFromAbove` trait, as it preserves the constness of values in the DL with respect to the canonicalizer.	2025-03-27 14:37:37 -04:00
Florian Hahn	8ddbc01295	[VPlan] Manage FindLastIV start value in ComputeFindLastIVResult (NFC) (#132690 ) Keep the start value as operand of ComputeFindLastIVResult. A follow-up patch will use this to make sure the start value is frozen if needed. Depends on https://github.com/llvm/llvm-project/pull/132689 PR: https://github.com/llvm/llvm-project/pull/132690	2025-03-27 18:34:13 +00:00
Ariel-Burton	fb993cd229	Add guard to for loop test clang/test/Sema/for.c (#133169 ) Commit 20b7f5982622f includes a case that checks diagnostics for for loops using thread locals. This fails on platforms which do not support TLS. This change adds guards to run this part of the test iff the feature is supported.	2025-03-27 14:26:06 -04:00
Aaron Ballman	4480f26e93	Fix failing test case for _Countof Test just needs an explicit triple that was missed.	2025-03-27 14:07:57 -04:00
Philip Reames	d584cea064	[RISCV] Use TypeSize instead of uint64_t in getMachineMemOperand interface (#133274 ) The primary reason is that if you pass a TypeSize without explicitly converting to LocationSize, you otherwise implicit convert to uint64_t to call the respective LocationSize constructor. This means that any scalable value becomes a runtime assertion failure. By replacing uint64_t with TypeSize in this API, we avoid the implicit conversion for TypeSize. uint64_t callers implicit convert to LocationSize (via the raw constructor) which should have unchanged behavior.	2025-03-27 11:04:00 -07:00
Min-Yih Hsu	aa207c3f05	[RISCV] Update the latency of floating point load in SiFive P500 scheduling model (#133165 ) P500-series cores should have a floating point load latency closer to 5 cycles, just like P400- and P600-series cores.	2025-03-27 11:00:07 -07:00
Bruno Cardoso Lopes	08aedf7201	[MLIR][LLVM] Lift alignstack attribute ptr type restriction (#133195 ) Current usage of alignstack is restricted to LLVM pointer types, whereas when it's used in parameters it's possible to use it for other types, see examples like `{i8, i8}, [2 x float], etc` in `llvm/test/CodeGen`. This PR lifts the restriction and add testcases.	2025-03-27 10:28:37 -07:00
David Green	c6406c8dba	[AArch64] Add getVectorInstrCost Codesize costs handling. (#130946 ) We have a lot of missing Codesize costs for vector operations. This patch starts things off by adding codesize costs for getVectorInstrCost, returning a single cost instead of the VectorInsertExtractBaseCost (which is typically 2). Insert of a load are given a cost of 0 as they use ld1, otherwise the cost is 1.	2025-03-27 17:25:02 +00:00
cor3ntin	ae54f476f7	[Clang] Improve subsumption. (#132849 ) The main goal of this patch is to improve the performance of concept subsumption by - Making sure literal (atomic) clauses are de-duplicated (Whether 2 atomic constraint is established during the initial normal form production). - Eagerly removing duplicated clauses. This should minimize the risks of exponentially large formulas that can be produced by a naive {C,D}NF transformation. While at it, I restructured that part of the code to be a bit clearer. Subsumption of fold expanded constraint is also cached. --- Note that removing duplicated clauses seems to be necessary and sufficient to have acceptable performance on anything that could be construed as reasonable code. Ultimately, the number of clauses is always going to be fairly small (but $2^{fairly\ small}$ is quickly fairly large..). I went too far in the rabbit hole of Tseitin transformations etc, which was much faster but would then require to check satisfiabiliy to establish subsumption between some constraints (although it was good enough to pass all but ones of our tests...). It doesn't help that the C++ standard has a very specific definition of subsumption that is really more of an implication... While that sort of musing is fascinating, it was ultimately a fool's errand, at least until such time that there is more motivation for a SAT solver in clang (clang-tidy can after all use z3!). Here be dragons. Fixes #122581	2025-03-27 18:23:58 +01:00
Aaron Ballman	00c43ae235	[C2y] Implement WG14 N3369 and N3469 (_Countof) (#133125 ) C2y adds the `_Countof` operator which returns the number of elements in an array. As with `sizeof`, `_Countof` either accepts a parenthesized type name or an expression. Its operand must be (of) an array type. When passed a constant-size array operand, the operator is a constant expression which is valid for use as an integer constant expression. This is being exposed as an extension in earlier C language modes, but not in C++. C++ already has `std::extent` and `std::size` to cover these needs, so the operator doesn't seem to get the user enough benefit to warrant carrying this as an extension. Fixes #102836	2025-03-27 13:23:16 -04:00
Mikhail R. Gadelha	08bb0b86dc	[RISCV] Add test case for PR #133256	2025-03-27 14:22:13 -03:00
Aaron Ballman	85c54a519f	[Docs] Document freestanding requirements (#132232 ) This adds some initial documentation about freestanding requirements for Clang. The most critical part of the documentation is spelling out that a conforming freestanding C Standard Library is required; Clang will not be providing the headers for <string.h> in C23 which expose a number of symbols in freestanding mode. The docs also make it clear that in addition to a conforming freestanding C standard library, the library must provide some additional symbols which LLVM requires. These docs are not comprehensive, this is just getting the bare bones in place so that they can be expanded later. This also updates the C status page to make it clear that we don't have anything to do for WG14 N2524 which adds string interfaces to freestanding mode.	2025-03-27 13:17:05 -04:00
Mark de Wever	82c078c54d	[libc++] Remove official Clang 18 support. (#130142 ) Since Clang 20 has been release we no longer support Clang 18 per our policy. Note the Clang 18 workarounds will be removed in a follow-up patch.	2025-03-27 18:00:46 +01:00
Sarah Spall	f612d70525	[HLSL] Add new int overloads for math builtins (#133162 ) Add int overloads which cast the various ints to a float and call the float builtin. These overloads are conditional on hlsl version 202x or earlier. Add tests and puts tests in own files, including some of the tests added for double overloads. Closes #128229	2025-03-27 09:34:25 -07:00
Michael Jones	3a5d77608b	[libc] Update headers on aarch64 (#133180 ) The entrypoints for aarch64 are mostly up to date, but the headers are not. This patch fixes that, and also makes explicit the dependency from OSUtils/linux on sys/syscalls.h	2025-03-27 09:05:24 -07:00
Thurston Dang	e5ec87f3b6	[asan] Print diagnostic if unlimited stack size detected (#133170 ) This adds a diagnostic message if the stack size is unlimited. This would have simplified the diagnosis of https://github.com/google/sanitizers/issues/856#issuecomment-2747076811; we anticipate this may help diagnose future issues too.	2025-03-27 08:55:36 -07:00
LLVM GN Syncbot	64178316cf	[gn build] Port 59d06071e9b5	2025-03-27 15:47:30 +00:00
Farzon Lotfi	59d06071e9	[NFC][HLSL] Move emitter out of AMDGPU.cpp (#133251 ) - Move all HLSL code out of AMDGPU.cpp to CGHLSLBuiltins.cpp - Fixes accidental reorganization of HLSL code into AMDGPU caused by (https://github.com/llvm/llvm-project/pull/132252, https://github.com/llvm/llvm-project/commit/7f920e2e5f70b)	2025-03-27 11:47:14 -04:00
Philip Reames	0ae6185b45	[RISCV] Manually update MIR inputs to reflect #79e82b6 Since we've changed what get's generated, we should update the snapshots of MIR. Otherwise, we end up testing configurations which are no longer possible from codegen.	2025-03-27 08:32:53 -07:00
Simon Pilgrim	a8575b3ea8	[DAG] visitEXTRACT_SUBVECTOR - accumulate SimplifyDemandedVectorElts demanded elts across all EXTRACT_SUBVECTOR uses (#133130 ) Similar to what is done for visitEXTRACT_VECTOR_ELT - if all uses of a vector are EXTRACT_SUBVECTOR, then determine the accumulated demanded elts across all users and call SimplifyDemandedVectorElts in "AssumeSingleUse" use.	2025-03-27 15:31:06 +00:00
Philip Reames	8742022ec7	[RISCV] Canonicalize foldable branch conditions in optimizeCondBranch (#132988 ) optimizeCondBranch isn't allowed to modify the CFG, but it can rewrite the branch condition freely. However, If we could fold a conditional branch to an unconditional one (aside from that restriction), we can also rewrite it into some canonical conditional branch instead. Looking at the diffs, the only cases this catches in tree tests are cases where we could have constant folded during lowering from IR, but didn't. This is inspired by trying to salvage code from https://github.com/llvm/llvm-project/pull/131684 which might be useful. Given the test impact, it's of questionable merits. The main advantage over only the late cleanup pass is that it kills off the LIs for the constants early - which can help e.g. register allocation.	2025-03-27 08:12:03 -07:00
Philip Reames	b38c23b4c1	[RISCV] Update two autogen tests to reduce spurious diffs [NFC]	2025-03-27 08:00:40 -07:00
Fraser Cormack	d32e71d7c7	[libclc] Move fmod, remainder & remquo to the CLC library (#132054 ) These functions were already nominally in the CLC namespace; this commit just formally moves them over. Note that 'half' versions of these CLC functions are now provided. Previously the corresponding OpenCL builtins would forward directly to the 'float' versions of the CLC builtins. Now the OpenCL builtins call the 'half' CLC builtins, which themselves call the 'float' CLC versions. This keeps the interface between the OpenCL and CLC libraries neater and keeps the CLC library self-contained. No changes to the generated code for non-SPIR-V targets is observed.	2025-03-27 14:53:19 +00:00
Kazu Hirata	7cc17fb085	[ADT] Remove old range constructors of SmallSet and StringSet (#133205 ) This patch removes the old range constructors of SmallSet and StringSet that do not take the llvm::from_range tag. Since there are so few uses, this patch directly removes them without going through the deprecation process.	2025-03-27 07:52:13 -07:00
Kazu Hirata	cde58bfc16	[Transforms] Use range constructors of *Set (NFC) (#133203 )	2025-03-27 07:51:58 -07:00
Craig Topper	ba1d901967	[RISCV] Set mayRaiseFPException = 0 on FCVT_D_W(U). (#133200 ) The input is an integer which can't be NAN so the NV(invalid) exception can't be raised. The conversion is exact so it can't raise NX(inexact), UF(underflow), or OF(overflow). The instructions are not divide so they can't raise DZ(divide by zero). Fixes #133192.	2025-03-27 07:45:42 -07:00
Guray Ozen	38d9a44510	[MLIR][NVGPU] Add `tma.fence.descriptor` OP (#133218 ) When the TMA descriptor is transferred from host memory to global memory using cudaMemcpy, each thread block must insert a fence before any thread accesses the updated tensor map in global memory. Once the tensor map has been accessed, no additional fences are needed by that block unless the map is modified again. [Example from cuda programming guide](https://docs.nvidia.com/cuda/cuda-c-programming-guide/#using-tma-to-transfer-multi-dimensional-arrays). The `tma.fence.descriptor` basically implements `ptx::fence_proxy_tensormap_generic`. ``` #include <cuda.h> #include <cuda/ptx> namespace ptx = cuda::ptx; __device__ CUtensorMap global_tensor_map; __global__ void kernel(CUtensorMap *tensor_map) { // Fence acquire tensor map: ptx::n32_t<128> size_bytes; // Since the tensor map was modified from the host using cudaMemcpy, // the scope should be .sys. ptx::fence_proxy_tensormap_generic( ptx::sem_acquire, ptx::scope_sys, tensor_map, size_bytes ); // Safe to use tensor_map after fence inside this thread.. } int main() { CUtensorMap local_tensor_map; // [ ..Initialize map.. ] cudaMemcpy(&global_tensor_map, &local_tensor_map, sizeof(CUtensorMap), cudaMemcpyHostToDevice); kernel<<<1, 1>>>(global_tensor_map); } ```	2025-03-27 15:20:19 +01:00
Guray Ozen	bc7e3915e1	[MLIR][NVGPU] Add `mbarrier.get` Op (#133221 ) The `mbarrier.create` op can create multiple mbarrier objects, and other mbarrier-related ops can access an mbarrier using a dynamic SSA value. This is especially useful when using mbarriers in dynamic loops. This PR adds the `mbarrier.get` op, which returns a pointer to a specific mbarrier object from a group of barriers created by the nvgpu.mbarrier.create operation. It is useful when composing the NVGPU and NVVM dialects. Example: ``` %mbars = nvgpu.mbarrier.create -> !nvgpu.mbarrier.group<memorySpace = #gpu.address_space<workgroup>, num_barriers = 10> %mbar_pointer = nvgpu.mbarrier.get %mbars[%c2] : !nvgpu.mbarrier.group<memorySpace = #gpu.address_space<workgroup>> -> i32 ```	2025-03-27 15:20:07 +01:00
Nikolas Klauser	427ce92ea6	[libc++][NFC] Move dylib function in <__filesystem/operations.h> together Most of the dylib functions inside `<__filesystem/operations.h>` are at the top of the file. There are a few spread out in the file for some reason, which this patch fixes.	2025-03-27 15:14:17 +01:00
Craig Topper	b9666cf203	[RISCV] Reverse the order of Base and Offset in Core-V RegReg operand. (#133209 ) This puts the base before the offset to match the order we use for base ISA where the offset is an immediate. I'm investigating using sub-operands for the base ISA loads and stores too so having a consistent operand order will allow more sharing.	2025-03-27 07:12:54 -07:00
Luke Lau	27a437108b	[InstCombine] Handle scalable splats of constants in getMinimumFPType (#132960 ) We previously handled ConstantExpr scalable splats in 5d929794a87602cfd873381e11cc99149196bb49, but only fpexts. ConstantExpr fpexts have since been removed, and simultaneously we didn't handle splats of constants that weren't extended. This updates it to remove the fpext check and instead see if we can shrink the result of getSplatValue. Note that the test case doesn't get completely folded away due to #132922	2025-03-27 13:24:00 +00:00
Peng Liu	8fdfe3f2a7	[libc++] Refactor ranges::{min, max, min_element, max_element} to use std::__min_element (#132418 ) Previously, ranges::min_element delegated to ranges::__min_element_impl, which duplicated the definition of std::__min_element. This patch updates ranges::min_element to directly call std::__min_element, which allows removing the redundant code in ranges::__min_element_impl. Upon removal of ranges::__min_element_impl, the other ranges algorithms ranges::{min,max,max_element}, which previously delegated to ranges::__min_element_impl, have been updated to call std::__min_element instead. This refactoring unifies the implementation across these algorithms, ensuring that future optimizations or maintenance work only need to be applied in one place.	2025-03-27 09:05:37 -04:00
Joseph Huber	d0aa1f9c43	[Clang] Make `--lto-partitions` only default for HIP (#133164 ) Summary: The default behavior for LTO on other targets does not specify the number of LTO partitions. Recent changes made this default to 8 on AMDGPU which had some issues with the `libc` project. The option to disable this is HIP only so I think for now we should restrict this just to HIP. I'm definitely on board with getting some more parallelism here, but I think it should probably be restricted to just offloading languages. The new driver goes through the `--target=amdgcn-amd-amdhsa` for its output, which means we'd need to forward the default somehow.	2025-03-27 07:34:57 -05:00
Nikolas Klauser	17d0569538	[libc++] Instantiate hash function externally (#127040 ) This has multiple benefits: - There is a single instance of our hash function, reducing object file size - The hash implementation isn't instantiated in every TU anymore, reducing compile times - Behind an ABI configuration macro it would be possible to salt the hash	2025-03-27 13:19:59 +01:00
Longsheng Mou	ac09b789d8	[mlir][scf] Remove redundant ensureTerminator for `scf.forall` (#133081 ) The override function `ensureTerminator` ensures that the terminator `InParallelOp` has a region. However, if the terminator of `scf.forall` is not an `InParallelOp`, calling ensureTerminator causes a crash. Since the InParallelOp builder already guarantees the existence of a region, `ForallOp::ensureTerminator` is redundant and can be safely removed. Fixes #130019.	2025-03-27 20:07:20 +08:00
Pavel Labath	17aca79d98	[lldb] Teach FuncUnwinders about discontinuous functions (#133072 ) The main change here is that we're now able to correctly look up plans for these functions. Previously, due to caching, we could end up with one entry covering most of the address space (because part of the function was at the beginning and one at the end). Now, we can correctly recognise that the part in between does not belong to that function, and we can create a different FuncUnwinders instance for it. It doesn't help the discontinuous function much (its plan will still be garbled), but we can at least properly unwind out of the simple functions in between. Fixing the unwind plans for discontinuous functions requires handling each unwind source specially, and this setup allows us to make the transition incrementally.	2025-03-27 12:51:20 +01:00
Sudharsan Veeravalli	a6e56162c2	[RISCV] Modify operand regclass in load store patterns (#133071 ) $rs1 is defined as GPRMem in the correspoding instruction definition classes.	2025-03-27 17:20:25 +05:30
Pavel Labath	39e7efe1e4	[lldb] Respect LaunchInfo::SetExecutable in ProcessLauncherPosixFork (#133093 ) Using argv[0] for this was incorrect. I'm ignoring LaunchInfo::SetArg0, as that's what darwin and windows launchers do (they use the first element of the args vector instead). I picked up the funny unit test re-exec method from the llvm unit tests.	2025-03-27 12:44:56 +01:00
Younan Zhang	a9672515ce	[Clang] Correct the DeclRefExpr's Type after the initializer gets instantiated (#133212 ) The instantiation of a VarDecl's initializer might be deferred until the variable is actually used. However, we were still building the DeclRefExpr with a type that could later be changed by the initializer's instantiation, which is incorrect when incomplete arrays are involved. Fixes #79750 Fixes #113936 Fixes #133047	2025-03-27 19:40:02 +08:00
Sudharsan Veeravalli	1a140820ab	[RISCV] Have GPRMem on the correct operand in QCIRVInstESStore (#133042 ) It should be on rs1 and not rs2.	2025-03-27 17:02:32 +05:30
Ryotaro Kasuga	6c56a842b7	[clang][CodeGen] Generate follow-up metadata for loops in correct format (#131985 ) When pragma of loop transformations is specified, follow-up metadata for loops is generated after each transformation. On the LLVM side, follow-up metadata is expected to be a list of properties, such as the following: ``` !followup = !{!"llvm.loop.vectorize.followup_all", !mp, !isvectorized} !mp = !{!"llvm.loop.mustprogress"} !isvectorized = !{"llvm.loop.isvectorized"} ``` However, on the clang side, the generated metadata contains an MDNode that has those properties, as shown below: ``` !followup = !{!"llvm.loop.vectorize.followup_all", !loop_id} !loop_id = distinct !{!loop_id, !mp, !isvectorized} !mp = !{!"llvm.loop.mustprogress"} !isvectorized = !{"llvm.loop.isvectorized"} ``` According to the [LangRef](https://llvm.org/docs/TransformMetadata.html#transformation-metadata-structure), the LLVM side is correct. Due to this inconsistency, follow-up metadata was not interpreted correctly, e.g., only one transformation is applied when multiple pragmas are used. This patch fixes clang side to emit followup metadata in correct format.	2025-03-27 20:29:37 +09:00
Fraser Cormack	3284559cca	[libclc] Move atan2/atan2pi to the CLC library (#133226 ) As with other work in this area, these builtins are now vectorized. A further table has been split into two. There was discrepancy between comments above the table describing the values as "lead" and "tail" and variables taken from the table called "head" and "tail", so these have been unified as head/tail.	2025-03-27 10:59:09 +00:00
Simon Pilgrim	6c2171672f	Fix gcc signed/unsigned comparison warning. NFC.	2025-03-27 10:50:37 +00:00
Nikolas Klauser	8abca171c3	[libc++] Introduce unversioned namespace macros (#133009 ) We've started using `_LIBCPP_BEGIN_NAMESPACE_STD` and `_LIBCPP_END_NAMESPACE_STD` for more than just the namespace for a while now. For example, we're using it to add visibility annotations to types. This works very well and avoids a bunch of annotations, but doesn't work for the few places where we have an unversioned namespace. This adds `_LIBCPP_BEGIN_UNVERSIONED_NAMESPACE_STD` and `_LIBCPP_END_UNVERSIONED_NAMESPACE_STD` to make it simpler to add new annotations consistently across the library as well as making it more explicit that the unversioned namespace is indeed intended.	2025-03-27 11:34:38 +01:00
Simon Pilgrim	491d3dfc76	[X86] combineINSERT_SUBVECTOR - fold insert_subvector(base,extract_subvector(broadcast)) -> blend shuffle(base,broadcast) (#133083 ) If the broadcast is already the full vector width, try to prefer a blend over a vector insertion which is usually a lower latency (and sometimes a lower uop count).	2025-03-27 10:29:32 +00:00
Pavel Labath	71d54cd4f1	[lldb] Remove (deprecated) Function::GetAddressRange (#132923 ) All uses have been replaced by GetAddressRanges or GetAddress. Also fix two internal uses of the range member.	2025-03-27 11:27:56 +01:00
Pavel Labath	d7cea2b187	[lldb] Remove UnwindPlan::Row shared_ptrs (#132370 ) The surrounding code doesn't use them anymore. This removes the internal usages. This patch makes the Rows actual values. An alternative would be to make them unique_ptrs. That would make vector resizes faster at the cost of more pointer chasing and heap fragmentation. I don't know which one is better so I picked the simpler option.	2025-03-27 11:26:42 +01:00
WÁNG Xuěruì	99ec6f8aec	[LoongArch][MC] Add support for disassembly option "no-aliases" (#132900 ) This parallels the GNU Binutils feature's usage. A hidden command-line option `--loongarch-no-aliases` is also added, similar to how `--loongarch-numeric-reg` is for the `numeric` option.	2025-03-27 17:43:46 +08:00

1 2 3 4 5 ...

532151 Commits