llvm-project

mirror of https://github.com/llvm/llvm-project.git synced 2025-04-25 09:56:07 +00:00

Author	SHA1	Message	Date
Fangrui Song	e758237352	[docs] Mention --discard-locals/--discard-all change for llvm-strip PR #130704 updated llvm-strip as well. Suggested by @nga888 Pull Request: https://github.com/llvm/llvm-project/pull/131491	2025-03-17 20:09:52 -07:00
Alex Voicu	c1fabd681f	[llvm][AMDGPU] Enable FWD_PROGRESS bit for GFX10+ (#128367 ) From GFX10 onwards it is possible to employ benevolent scheduling of waves. This patch unconditionally enables, for the `amdhsa` OS, the bit which controls that capability, as it is beneficial for algorithms that rely on more complex concurrent coordination and it is generally performance neutral otherwise.	2025-03-17 23:17:46 +00:00
Alex MacLean	0191307bb2	[IR] Allow alignstack attribute on return values (#130439 ) The PTX target allows an alignment to be specified on both return values and parameters to allow for more efficient vectorized stores. Currently we represent these parameter alignments via the "alignstack" attribute, but must fall back to metadata for the return value. This PR allows "alignstack" on return values as well.	2025-03-17 14:22:08 -07:00
Shilei Tian	3959bbc134	[NFC] Remove trailing white spaces from `llvm/docs/LangRef.rst`	2025-03-17 11:19:09 -04:00
Aiden Grossman	1e02442df6	[Github][Docs] Add best practice for top level read permissions (#131470 ) This patch adds a section pointing out how permissions should be done within Github workflows. I believe all of our workflows are currently compliant with this, but it helps to have something to point to documenting the practice and especially the motivation.	2025-03-15 23:35:18 -07:00
Frederik Harwath	6962cf1700	Rename ExpandLargeFpConvertPass to ExpandFpPass (#131128 ) This is meant as a preparation for PR #130988 "[AMDGPU] Implement IR expansion for frem instruction" which implements the expansion of another instruction in this pass. The more general name seems more appropriate given this change and quite reasonable even without it.	2025-03-14 13:11:45 +01:00
Viktoria Maximova	a58a6a95b0	[SPIR-V] Support SPV_INTEL_fp_max_error extension for `!fpmath` metadata (#130619 ) Specification: https://github.khronos.org/SPIRV-Registry/extensions/INTEL/SPV_INTEL_fp_max_error.html	2025-03-14 12:12:28 +01:00
Nicolas van Kempen	0ab9f78957	[docs] Fix link appearing improperly (#131225 ) See current live website: https://llvm.org/docs/DeveloperPolicy.html#obtaining-commit-access ![image](https://github.com/user-attachments/assets/f3212ca6-b2a6-4d83-9323-11ba4148d991) Building locally, it appears correctly: ![image](https://github.com/user-attachments/assets/68f5e932-1abf-48cf-93f0-30dacb0910c6) Feel free to merge immediately.	2025-03-13 14:51:35 -07:00
PiJoules	0f8c075f7c	[llvm] Match llvm.type.checked.load.relative semantics to llvm.load.r… (#129583 ) …elative The semantics of `llvm.type.checked.load.relative` seem to be a little different from that of `llvm.load.relative`. It looks like the semantics for `llvm.type.checked.load.relative` is `ptr + offset + (ptr + offset)` whereas the semantics for `llvm.load.relative` is `ptr + (ptr + offset)`. That is, the offset for the former is added to the offset address whereas the later has the offset added to the original pointer. It really feels like the checked intrinsic was meant to match the semantics of the non-checked intrinsic, but I think for all cases the checked intrinsic is used (swift being the only use I know of), the calculation just happens to be the same because swift always uses an offset of zero. Likewise, all llvm tests for this intrinsic happen to use an offset of zero. Relative vtables in clang happens to be the first time where we're using this intrinsic and using it with non-zero values. This updates the semantics of the checked intrinsic to match the non-checked one. Effectively this shouldn't change any codegen by any users of this since all current users seem to use a zero offset. This PR also updates some tests with non-zero offsets.	2025-03-13 14:50:41 -07:00
u4f3	e61859f14d	[RISCV] Add Qualcomm uC Xqcili (load large immediates) extension (#130012 ) The Xqcili extension includes a two instructions that load large immediates than is available with the base RISC-V ISA. The current spec can be found at: https://github.com/quic/riscv-unified-db/releases/tag/Xqci-0.7.0 This patch adds assembler only support.	2025-03-13 11:13:02 -07:00
Pengcheng Wang	376e3b62cd	[TableGen] Add `!match` operator to do regex matching (#130759 ) The grammar is `!match(str, regex)` and this operator produces 1 if the `str` matches the regular expression `regex`. The format of `regex` is ERE (Extended POSIX Regular Expressions).	2025-03-13 12:13:09 +08:00
Andrzej Warzyński	77b55c76ac	[llvm][docs] Add notes on upstreaming code from downstream projects (#129743 ) For context, see: * https://discourse.llvm.org/t/code-review-process-when-upstreaming-patches	2025-03-12 11:17:19 +00:00
Carl Ritson	d921bf233c	[AMDGPU] Extend promotion of alloca to vectors (#127973 ) * Add multi dimensional array support * Make maximum vector size tunable * Make ratio of VGPRs used for vector promotion tunable * Maximum array size now based on VGPR count (32b) instead of element count	2025-03-12 15:11:30 +09:00
Fangrui Song	7a66a26658	[llvm-objcopy,ELF] --discard-locals/--discard-all: allow and keep symbols referenced by relocations In GNU objcopy, symbols referenced by relocations are retained. Our COFF (https://reviews.llvm.org/D56480) and Mach-O (https://reviews.llvm.org/D75104) ports port the behavior, but the ELF port doesn't. This PR implements the behavior for ELF. Close #47468 (tcl has a use case that requires `strip -x tclStubLib.o` to strip local symbols not referenced by relocations.) Pull Request: https://github.com/llvm/llvm-project/pull/130704	2025-03-11 21:26:13 -07:00
Alex Bradbury	d84755b10d	[docs] Fix broken formatting for link in HowToCrossCompileLLVM The usual `_ vs _` problem.	2025-03-11 20:37:41 +00:00
Alex Bradbury	ba11e1e522	[docs] Rewrite HowToCrossCompileLLVM (#129451 ) The document has had a few minor tweaks over the years, but the last major piece of work on it was 2016, after first being introduced in 2013. My aim is to provide a clear and clean recipe for cross-compiling LLVM that: * Should be achievable for anyone on common variants of Linux (_including_ the step of acquiring a working sysroot). * I think I've kept the coverage of setting up acquiring a Debian sysroot minimal enough that it can reasonably be included. `debootstrap` is packaged for most common Linux distributions including non-Debian derived distributions like Arch Linux and Fedora. * Describes a setup that we can reasonably support within the community. * I realise with the ninja symlink canonicalisation issue I haven't completely avoided hacks, but I look particularly to point 2 under hacks in the current docs which talks about libraries on the host being found by CMake and adding `-L` and `-I` to try to hack around this. We've all been there and made these kind of temporary workarounds to see if we can get further, but it's very hard to support someone who has problems with a setup that's improperly leaking between the host and target like this. The approach I describe with a clean sysroot and setting appropriate `CMAKE_FIND_ROOT_PATH_MODE_` settings doesn't have this issue. Cuts down on extraneous / outdated information, especially where it is better covered elsewhere (e.g. detailed descriptions of CMake options not directly relevant to cross compilation). I've run through the instructions for AArch64, RISC-V (64-bit), and armhf.	2025-03-11 18:33:53 +00:00
Sam Elliott	ae985267d0	[RISCV] Update to Xqciint v0.4 (#130219 ) The Xqci 0.7.0 spec just came out, with some updates to Xqciint, bringing it to v0.4. The main update of any relevance is that `qc.c.mienter` and `qc.c.mienter.nest` now update both the stack pointer and the frame pointer (before, they only updated the stack pointer). They both remain compatible with the frame pointer convention. This change bumps the Xqciint version, and ensures that we don't emit the unneeded frame pointer adjustment instruction after `qc.c.mienter(.nest)`.	2025-03-11 08:54:22 -07:00
Srinivasa Ravi	9a5a8c9a80	[NVPTX] Add intrinsics for st.bulk instruction (#128856 ) Adds NVVM intrinsics and NVPTX codegen for the `st.bulk` instruction introduced in ptx8.6 for sm_100. Tests added in `CodeGen/NVPTX/st_bulk.ll` and verified through ptxas 12.8.0. PTX Spec Reference: https://docs.nvidia.com/cuda/parallel-thread-execution/#data-movement-and-conversion-instructions-st-bulk	2025-03-11 09:27:21 +05:30
Matt Arsenault	0d2c55cb96	AMDGPU: Move enqueued block handling into clang (#128519 ) The previous implementation wasn't maintaining a faithful IR representation of how this really works. The value returned by createEnqueuedBlockKernel wasn't actually used as a function, and hacked up later to be a pointer to the runtime handle global variable. In reality, the enqueued block is a struct where the first field is a pointer to the kernel descriptor, not the kernel itself. We were also relying on passing around a reference to a global using a string attribute containing its name. It's better to base this on a proper IR symbol reference during final emission. This now avoids using a function attribute on kernels and avoids using the additional "runtime-handle" attribute to populate the final metadata. Instead, associate the runtime handle reference to the kernel with the !associated global metadata. We can then get a final, correctly mangled name at the end. I couldn't figure out how to get rename-with-external-symbol behavior using a combination of comdats and aliases, so leaves an IR pass to externalize the runtime handles for codegen. If anything breaks, it's most likely this, so leave avoiding this for a later step. Use a special section name to enable this behavior. This also means it's possible to declare enqueuable kernels in source without going through the dedicated block syntax or other dedicated compiler support. We could move towards initializing the runtime handle in the compiler/linker. I have a working patch where the linker sets up the first field of the handle, avoiding the need to export the block kernel symbol for the runtime. We would need new relocations to get the private and group sizes, but that would avoid the runtime's special case handling that requires the device_enqueue_symbol metadata field. https://reviews.llvm.org/D141700	2025-03-10 19:54:04 +07:00
Tom Tromey	993cbead9e	Two fixes for DISubrangeType (#130345 ) My previous patch to add DISubrangeType (#126772) had a couple of minor errors. This patch corrects them. 1. When using a DISubrangeType as an array index type, the wrong tag was written into the DIE. 2. I'd intended for subranges to use bit strides, not byte strides -- but neglected to actually implement this. Ada needs bit strides. This patch adds a new test that checks both these things. Finally, this patch adds some documentation for DISubrangeType.	2025-03-07 14:23:05 -08:00
Tejas Vipin	bd5f29c008	[llvm-strip] Let llvm-strip continue on encountering an error (#129531 ) This change means that llvm-strip no longer exits immediately upon encountering an error when modifying a file and will instead continue modifying the other inputs. Fixes #129412	2025-03-07 08:34:29 +00:00
Dmitry Sidorov	7a44ff13d9	[SPIR-V] Add SPV_INTEL_memory_access_aliasing extension (#129800 ) Spec can be found here https://github.com/intel/llvm/pull/15225 TODO for future patches: - During spec review need to decide whether only FunctionCall or Atomic instructions can be decorated and if not - move the code around adding handling for other instructions; - Handle optional string metadata; - Handle LLVM atomic instructions; - Handle SPIR-V friendly atomic calls returning via sret argument. Signed-off-by: Sidorov, Dmitry <dmitry.sidorov@intel.com>	2025-03-06 19:44:21 +01:00
Andrzej Warzyński	29ff168363	[llvm][docs] Extend docs on GitHub's "squash and merge" (#129497 ) From what I’ve observed, some contributors are still unaware that in LLVM, the PR summary - not the commit messages - is used as the final commit message when merging. This is especially unclear to those without commit access, as only users with commit access can edit the commit message before merging. This PR clarifies that policy and consolidates all relevant information into `GitHub.rst`, ensuring it is no longer split between `GitHub.rst` and `Contributing.rst`. Note, a big part of this change is merely moving text between the docs.	2025-03-06 12:24:41 +00:00
Trevor Gross	5ee1c0b714	[windows] Always pass fp128 arguments indirectly (#128848 ) LLVM currently expects `__float128` to be both passed and returned in xmm registers on Windows. However, this disagrees with the Windows x86-64 calling convention [1], which indicates values larger than 64 bits should be passed indirectly. Update LLVM's default Windows calling convention to pass `fp128` directly. Returning in xmm0 is unchanged since this seems like a reasonable extrapolation of the ABI. With this patch, the calling convention for `i128` and `f128` is the same. GCC passes `__float128` indirectly, which this also matches. However, it also returns indirectly, which is not done here. I intend to attempt a GCC change to also return in `xmm0` rather than making that change here, given the consistency with `i128`. This corresponds to the frontend change in [2], see more details there. [1]: https://learn.microsoft.com/en-us/cpp/build/x64-calling-convention?view=msvc-170 [2]: https://github.com/llvm/llvm-project/pull/115052	2025-03-06 20:14:11 +08:00
Lu Weining	bae6644e12	[LoongArch] Relax the restrictions of inlineasm operand modifier 'u' and 'w' (#129864 ) - Allow 'u' and 'w' on LASX, LSX or floating point register operands. - Also add missing description in LangRef. Fixes #129863.	2025-03-06 16:17:12 +08:00
quic_hchandel	6e7e46cafe	[RISCV] Add Qualcomm uC Xqcibm (Bit Manipulation) extension (#129504 ) This extension adds thirty eight bit manipulation instructions. The current spec can be found at: https://github.com/quic/riscv-unified-db/releases/tag/Xqci-0.6 This patch adds assembler only support. Co-authored-by: Sudharsan Veeravalli <quic_svs@quicinc.com>	2025-03-06 12:01:53 +05:30
Matt Arsenault	a216358ce7	AMDGPU: Replace amdgpu-no-agpr with amdgpu-agpr-alloc (#129893 ) This performs the minimal replacment of amdgpu-no-agpr to amdgpu-agpr-alloc=0. Most of the test diffs are due to the new attribute sorting later alphabetically. We could do better by trying to perform range merging in the attributor, and trying to pick non-0 values.	2025-03-06 09:17:51 +07:00
Matt Arsenault	f4ba2bf236	AMDGPU: Add amdgpu-agpr-alloc attribute to control AGPR allocation (#128034 ) This provides a range to decide how to subdivide the vector register budget on gfx90a+. A single value declares the minimum AGPRs that should be allocatable. Eventually this should replace amdgpu-no-agpr. I want this primarily for testing agpr allocation behavior. We should have a heuristic try to detect a reasonable number of AGPRs to keep allocatable.	2025-03-06 09:13:59 +07:00
Sam Elliott	e49180d84c	[RISCV] Xqccmp Code Generation (#128815 ) This adds support for Xqccmp to the following passes: - Prolog Epilog Insertion - reusing much of the existing push/pop logic, but extending it to cope with frame pointers and reorder the CFI information correctly. - Move Merger - extending it to support the `qc.` variants of the double-move instructions. - Push/Pop Optimizer - extending it to support the `qc.` variants of the pop instructions. The testing is based on existing Zcmp tests, but I have put them in separate files as some of the Zcmp tests were getting quite long.	2025-03-05 10:59:45 -08:00
Petr Penzin	b44fbdee00	[RISCV] Tune flag for fast vrgather.vv (#124664 ) Add tune knob for N*Log2(N) vrgather.vv cost.	2025-03-03 16:04:49 -08:00
Aiden Grossman	829401fe3c	[Github][Docs] Add CI Best Practices Docs (#129462 ) This is something that Tom and I have discussed briefly for a while now, a doc that lists out all of the best practices we want to adhere to surrounding CI things along with their associated motivations/any other relevant info. This patch adds that doc along with three best practices surrounding Github Workflows that we try and adhere to (although more work needs to be done to get 100% adherance).	2025-03-03 15:54:52 -08:00
Rahul Joshi	f44c18a97d	[NFC][CodingStandard] Fix text style for C++ keywords (#128932 ) Fix C++ keywords in Restrict Visibility section to use `` (inline literal) markup to be consistent with the rest of the doc.	2025-03-03 10:18:12 -08:00
Abhilash Majumder	7c580893ea	[NVPTX] Add Intrinsics for discard.* (#128404 ) [NVPTX] Add Intrinsics for discard.* This PR adds intrinsics for all variations of discard.* * These intrinsics supports generic or global for all variations. * The lowering is handled from nvvm to nvptx tablegen directly. * Lit tests are added as part of discard.ll * The generated PTX is verified with a 12.3 ptxas executable. * Added docs for these intrinsics in NVPTXUsage.rst. For more information, refer to the PTX ISA <https://docs.nvidia.com/cuda/parallel-thread-execution/#data-movement-and-conversion-instructions-discard>_. --------- Co-authored-by: abmajumder <abmajumder@nvidia.com> Co-authored-by: gonzalobg <65027571+gonzalobg@users.noreply.github.com>	2025-03-03 22:53:49 +05:30
Da-Viper	8ec0d60e28	[lldb-dap] Add: show return value on step out (#106907 ) https://github.com/user-attachments/assets/cff48c6f-37ae-4f72-b881-3eff4178fb3c	2025-03-01 18:21:21 -06:00
Trevor Gross	84b365c26b	[LangRef] Correct documentation for `roundeven` (#125452 ) Langref for `roundeven` implies that the C standard function `roundeven` may raise floating point exceptions. However, this is not correct; C23 does not mention exceptions for `roundeven`, and per [1] `FE_INEXACT` is never raised. Clarify that LLVM's `roundeven` behaves the same. [1]: https://en.cppreference.com/w/c/numeric/math/roundeven	2025-03-01 10:37:01 +07:00
bodqhrohro	78c96aa24f	[docs] Fix typo in GettingStarted.rst Unlinke -> Unlike (NFC) (#128616 )	2025-02-27 18:50:10 +00:00
Abhilash Majumder	241a56dfad	[NVPTX] Add Intrinsics for applypriority.* (#127989 ) \[NVPTX\] Add ApplyPriority intrinsics This PR adds applypriority.\* intrinsics with relevant eviction priorities. * The lowering is handled from nvvm to nvptx tablegen directly. * Lit tests are added as part of applypriority.ll * The generated PTX is verified with a 12.3 ptxas executable. * Added docs for these intrinsics in NVPTXUsage.rst. For more information, refer to the PTX ISA `<https://docs.nvidia.com/cuda/parallel-thread-execution/#data-movement-and-conversion-instructions-applypriority>`_. --------- Co-authored-by: abmajumder <abmajumder@nvidia.com>	2025-02-27 16:10:29 +05:30
Pradeep Kumar	f5ee401545	[LLVM][NVPTX] Add codegen support for tcgen05.{ld, st} instructions (#126740 ) This commit adds support for tcgen05.{ld, st} instructions with lit tests under tcgen05-ld.ll and tcgen05-st.ll and intrinsics documentation under NVPTXUsage.rst	2025-02-27 14:56:35 +05:30
Omar Hossam	fcc8802133	[Docs] Fix typo in GetElementPtr.rst (#127393 ) I couldn't find the verb "indices", and it was actually a bit confusing for me reading this. I think this should be "indexes" instead.	2025-02-27 10:21:56 +01:00
Sam Elliott	5066d7b601	[RISCV] Add Xqccmp 0.1 Assembly Support (#128731 ) Xqccmp is a new spec by Qualcomm that makes a vendor-specific effort to solve the push/pop + frame pointers issue. Broadly, it takes the Zcmp instructions and reverse the order they push/pop registers in, which ends up matching the frame pointer convention. This extension adds a new instruction not present in Zcmp, `qc.cm.pushfp`, which will set `fp` to the incoming `sp` value after it has pushed the registers. This change duplicates the Zcmp implementation, with minor changes to mnemonics (for the `qc.` prefix), predicates, and the addition of `qc.cm.pushfp`. There is also new logic to prevent combining Xqccmp and Zcmp. Xqccmp is kept separate to Xqci for decoding/encoding etc, as the specs are separate today. Specification: https://github.com/quic/riscv-unified-db/releases/tag/Xqccmp_extension-0.1.0	2025-02-26 20:03:02 -08:00
YunQiang Su	363b05944f	LangRef: Clarify llvm.minnum and llvm.maxnum about sNaN and signed zero (#112852 ) The documents claims that it ignores sNaN, while in the current code it may be different. - as the finally callback, it use libc call fmin(3)/fmax(3). while C23 clarifies that fmin(3)/fmax(3) should return NaN for sNaN vs NUM. - on some architectures, such as aarch64, it converts to `fmaxnm`, which returns qNaN for sNaN vs NUM. - on RISC-V (SPEC 2019+), it converts to `fmax`, which returns NUM for sNaN vs NUM. Since we have introduced llvm.minimumnum and llvm.maximumnum, which follow IEEE 754-2019's minimumNumber/maximumNumber. So, it's time for us to clarify llvm.minnum and llvm.maxnum. Since the final fallback of llvm.minnum and llvm.maxnum is fmin(3)/fmax(3), so that it is reasonable to follow the behaviors of fmin(3)/fmax(3). Although C23 clarified the behavior about sNaN and +0.0/-0.0: (NUM or NaN) vs sNaN -> qNaN +0.0 vs -0.0 -> either one of +0.0/-0.0 It is the same the IEEE754-2008's maxNUM and minNUM. Not all implementation work as expected. Since some architectures such as aarch64/MIPSr6/LoongArch, have instructions that implements +0.0>-0.0. So Let's define llvm.minnum and llvm.maxnum to IEEE754-2008 with +0.0>-0.0. The architectures without such instructions can implements `NSZ` flavor to speed up, and the frontend, such as clang, can call them with `nsz` attribute.	2025-02-27 11:22:30 +08:00
AdityaK	74306afe87	Fix the schedule of vectorizer improvement monthly sync	2025-02-26 10:22:25 -08:00
Justin Bogner	870b376f00	[DirectX] Support the CBufferLoadLegacy operation (#128699 ) Fixes #112992	2025-02-26 09:43:30 -08:00
Craig Topper	5d501c6137	[RISCV][Docs] RISCV -> RISC-V in RISCVUsage.rst. NFC (#128906 )	2025-02-26 09:30:09 -08:00
Philip Reames	8039f8e139	[RISCV][MC] Add assembler support for XRivosVisni (#128773 ) This implements assembler support for the XRivosVisni custom/vendor extension from Rivos Inc. which is defined in: https://github.com/rivosinc/rivos-custom-extensions (See src/xrivosvisni.adoc) Codegen support will follow in separate changes.	2025-02-26 08:55:35 -08:00
Alex MacLean	6c2e170d04	[NVPTX] Convert vector function nvvm.annotations to attributes (#127736 ) Replace some more nvvm.annotations with function attributes, auto-upgrading the annotations as needed. These new attributes will be more idiomatic and compile-time efficient than the annotations. - !"maxntid[xyz]" -> "nvvm.maxntid" - !"reqntid[xyz]" -> "nvvm.reqntid" - !"cluster_dim_[xyz]" -> "nvvm.cluster_dim"	2025-02-26 08:45:27 -08:00
Luke Quinn	aace6a2f9d	[RISCV] Xqcia 0.4 The spec was recently updated, this changes the name in the TD files associated and increments the Extension number in the clang driver. This is mostly a MC change as there is no other generated code for these instructions yet. Signed-off-by: Luke Quinn <quic_lquinn@quicinc.com>	2025-02-26 08:09:20 -05:00
Philip Reames	00f02fed88	[RISCV] Change the vendor prefix for Rivos from "rv." to "ri." (#128761 ) There had been concern raised about possible confusion with "rvv". After internal discussion, we decided to go with an alternate prefix to reduce possible confusion going forward. The specification document (https://github.com/rivosinc/rivos-custom-extensions) has been updated. And also add the XRivosVizip extension to the documentation. I'd missed that in the initial commit.	2025-02-25 11:27:18 -08:00
Florian Hahn	47822c80c1	[LangRef] Clarify that the pointer after an object must be valid. (#127892 ) In some places, we rely on the assumption that the pointer after the object must also be valid and not overflow, but it does not seem to be spelled out clearly in LangRef, unless I missed a reference. The GetElementPtr section mentions that the maximum object size is half the pointer index type space, but then the pointer past the object may wrap. Clarify that the pointer after the object must also be valid. This should match Alive2's semantics: https://alive2.llvm.org/ce/z/Dk8QFL (https://github.com/AliveToolkit/alive2/blob/master/tools/transform.cpp#L1288) PR: https://github.com/llvm/llvm-project/pull/127892	2025-02-24 21:22:59 +00:00
Andy Kaylor	20fd7f0a76	Remove floating-point working group meeting (#128258 ) This meeting never quite took off the way I had hoped, and I haven't had time for it in quite a while, so I am removing it from the Getting Involved page.	2025-02-24 09:06:44 -08:00

1 2 3 4 5 ...

11478 Commits