llvm-project

mirror of https://github.com/llvm/llvm-project.git synced 2025-04-25 18:56:06 +00:00

Author	SHA1	Message	Date
Ulrich Weigand	80267f8148	Support z17 processor name and scheduler description (#135254 ) The recently announced IBM z17 processor implements the architecture already supported as "arch15" in LLVM. This patch adds support for "z17" as an alternate architecture name for arch15. This patch also add the scheduler description for the z17 processor, provided by Jonas Paulsson.	2025-04-11 00:20:58 +02:00
Nathan Gauër	a625bc60e2	[HLSL][SPIR-V] Add hlsl_private address space for SPIR-V (#133464 ) This is an alternative to https://github.com/llvm/llvm-project/pull/122103 In SPIR-V, private global variables have the Private storage class. This PR adds a new address space which allows frontend to emit variable with this storage class when targeting this backend. This is covered in this proposal: llvm/wg-hlsl@4c9e11a This PR will cause addrspacecast to show up in several cases, like class member functions or assignment. Those will have to be handled in the backend later on, particularly to fixup pointer storage classes in some functions. Before this change, global variable were emitted with the 'Function' storage class, which was wrong.	2025-04-10 10:55:10 +02:00
Nick Sarnie	68ee56d150	[clang][OpenMP][SPIR-V] Fix addrspace of global constants (#134399 ) SPIR-V has strict address space rules, constant globals cannot be in the default address space. The OMPIRBuilder change was required for lit tests to pass, we were missing an addrspacecast. --------- Signed-off-by: Sarnie, Nick <nick.sarnie@intel.com>	2025-04-09 15:41:53 +00:00
Matheus Izvekov	d057811655	[clang] fix diagnostic printing of expressions ignoring LangOpts (#134693 ) Currently when printing a template argument of expression type, the expression is converted immediately into a string to be sent to the diagnostic engine, unsing a fake LangOpts. This makes the expression printing look incorrect for the current language, besides being inneficient, as we don't actually need to print the expression if the diagnostic would be ignored. This fixes a nastiness with the TemplateArgument constructor for expressions being implicit, and all current users just passing an expression to a diagnostic were implicitly going through the template argument path. The expressions are also being printed unquoted. This will be fixed in a subsequent patch, as the test churn is much larger.	2025-04-07 23:19:32 -03:00
Farzon Lotfi	16c84c4475	[DirectX] Add target builtins (#134439 ) - fixes #132303 - Moves dot2add from a language builtin to a target builtin. - Sets the scaffolding for Sema checks for DX builtins - Setup DirectX backend as able to have target builtins - Adds a DX TargetBuiltins emitter in `clang/lib/CodeGen/TargetBuiltins/DirectX.cpp`	2025-04-07 12:06:57 -04:00
Reid Kleckner	a1935fd380	[clang] Remove unused SourceManager.cpp includes, NFC (trying out clangd)	2025-04-04 22:10:19 -07:00
Ian Anderson	bd197ca003	[clang][modules] Determine if the SDK supports builtin modules independent of the target (#134005 ) Whether the SDK supports builtin modules is a property of the SDK itself, and really has nothing to do with the target. This was already worked around for Mac Catalyst, but there are some other more esoteric non-obvious target-to-sdk mappings that aren't handled. Have the SDK parse its OS out of CanonicalName and use that instead of the target to determine if builtin modules are supported.	2025-04-03 16:09:57 -07:00
Juan Manuel Martinez Caamaño	beae0e9f1a	[AMDGPU] Use a target feature to enable __builtin_amdgcn_global_load_lds on gfx9/10 (#133055 ) This patch introduces the `vmem-to-lds-load-insts` target feature, which can be used to enable builtins `__builtin_amdgcn_global_load_lds` and `__builtin_amdgcn_raw_ptr_buffer_load_lds` on platforms which have this feature. This feature is only available on gfx9/10. A limitation of using a common target feature for both builtins is that we could have made `__builtin_amdgcn_raw_ptr_buffer_load_lds` available on gfx6,7,8.	2025-04-02 20:00:09 +02:00
Sirraide	10c6ebc427	Reapply "[Clang] [NFC] Introduce a helper for emitting compatibility diagnostics (#132348 )" (#134043 ) This reapplies #132348 with a fix to the python bindings tests, reverting `076397ff32`.	2025-04-02 10:40:05 +02:00
Sirraide	076397ff32	Revert "[Clang] [NFC] Introduce a helper for emitting compatibility diagnostics" (#134036 ) Reverts llvm/llvm-project#132348 Some tests are failing and I still need to figure out what is going on here.	2025-04-02 08:29:05 +02:00
Sirraide	9d06e0879b	[Clang] [NFC] Introduce a helper for emitting compatibility diagnostics (#132348 ) This is a follow-up to #132129. Currently, only `Parser` and `SemaBase` get a `DiagCompat()` helper; I’m planning to keep refactoring compatibility warnings and add more helpers to other classes as needed. I also refactored a single parser compat warning just to make sure everything works properly when diagnostics across multiple components (i.e. Sema and Parser in this case) are involved.	2025-04-02 08:06:29 +02:00
Cassandra Beckley	9ce77255b9	[HLSL] Add __spirv__ macro (#132848 ) This macro can be used by HLSL code to detect that it is being compiled for the SPIR-V target.	2025-03-28 10:49:19 -04:00
Kazu Hirata	cb80b26e37	[clang] Use Set::insert_range (NFC) (#133357 ) We can use Set::insert_range to collapse: for (auto Elem : Range) Set.insert(E); down to: Set.insert_range(Range); In some cases, we can further fold that into the set declaration.	2025-03-27 20:14:25 -07:00
Mallikarjuna Gouda	0ca10ef51b	[MIPS] Add MIPS i6400 and i6500 processors (#130587 ) The i6400 and i6500 are high performance multi-core microprocessors from MIPS that provide best in class power efficiency for use in system-on-chip (SoC) applications. i6400 and i6500 implements Release 6 of the MIPS64 Instruction Set Architecture with full hardware multithreading and hardware virtualization support.	2025-03-20 23:08:33 -04:00
Jan Svoboda	99b1a2ac07	[clang] Remove deprecated `FileManager` APIs (#132063 ) This PR removes the `FileManager` APIs that have been deprecated for a while. LLVM 20.1.0 that was released earlier this month contains the formal deprecation of these APIs, so these should be fine to remove in the next major release.	2025-03-20 09:38:19 -07:00
Aaron Ballman	d781ac1cf0	[C23] Add __builtin_c23_va_start (#131166 ) This builtin is supported by GCC and is a way to improve diagnostic behavior for va_start in C23 mode. C23 no longer requires a second argument to the va_start macro in support of variadic functions with no leading parameters. However, we still want to diagnose passing more than two arguments, or diagnose when passing something other than the last parameter in the variadic function. This also updates the freestanding <stdarg.h> header to use the new builtin, same as how GCC works. Fixes #124031	2025-03-15 11:01:53 -04:00
Shilei Tian	dccc0a836c	[NFC][AMDGPU] Replace more direct arch comparison with isAMDGCN() (#131379 ) This is an extension of #131357. Hopefully this would be the last one.	2025-03-14 17:02:15 -04:00
zhijian lin	737a0aeb6b	[NFC][PowerPC] cleaned dead code of PPC.cpp and PPC.h (#130994 ) There are some variables in the PPC.h which are defined and assigned a value to them, but never be used, remove the code related to the variables.	2025-03-14 09:24:44 -04:00
Hubert Tong	e0e80dbe43	[Clang codegen][PPC] Produce AIX-specific "target features" only for AIX (#130864 ) Listing AIX-specific "target features" in the IR are a source of confusion on PPC Linux. Generate them only for AIX (at least by default).	2025-03-13 18:13:03 -04:00
Nick Sarnie	7a5e4f5405	[clang][NFCI] Fix getGridValues for unsupported targets (#131023 ) I broke this in `f3cd223838`, I should have added this to the `SPIRV64` subclass, but I accidentally added it to base `TargetInfo`. Using an unsupported target should error in the driver way before this though. Signed-off-by: Sarnie, Nick <nick.sarnie@intel.com>	2025-03-13 14:28:49 +00:00
A. Jiang	6abe19ac58	[clang] Predefine `_CRT_USE_BUILTIN_OFFSETOF` in MS-compatible modes (#127568 ) This patch makes Clang predefine `_CRT_USE_BUILTIN_OFFSETOF` in MS-compatible modes. The macro can make the `offsetof` provided by MS UCRT's `<stddef.h>` to select the `__builtin_offsetof` version, so with it Clang (Clang-cl) can directly consume UCRT's `offsetof`. MSVC predefines the macro as `1` since at least VS 2017 19.14, but I think it's also OK to define it in "older" compatible modes. Fixes #59689.	2025-03-13 14:02:44 +08:00
Ritanya-B-Bharadwaj	63635c1746	[clang] [OpenMP] New OpenMP 6.0 self_maps clause (#129888 ) Initial parsing/sema support for self maps in map and requirement clause [Sections 7.9.6 and 10.5.1.6 in OpenMP 6.0 spec]	2025-03-11 16:31:42 +05:30
Sarah Spall	431eaa8deb	[HLSL] make semantic matching case insensitive (#129773 ) Make semantic matching case insensitive update tests to reflect semantic printed as all lower case in error messages add new tests to show case insensitivity Closes #128063	2025-03-10 11:19:45 -07:00
Prabhuk	45ca613c13	[clang] Use TargetInfo to decide Mangling for C (#129920 ) Instead of hardcoding the decision on what mangling scheme to use based on targets, use TargetInfo to make the decision.	2025-03-05 17:26:52 -08:00
Peilin Ye	17bfc00f7c	[BPF] Add load-acquire and store-release instructions under -mcpu=v4 (#108636 ) As discussed in [1], introduce BPF instructions with load-acquire and store-release semantics under -mcpu=v4. Define 2 new flags: BPF_LOAD_ACQ 0x100 BPF_STORE_REL 0x110 A "load-acquire" is a BPF_STX \| BPF_ATOMIC instruction with the 'imm' field set to BPF_LOAD_ACQ (0x100). Similarly, a "store-release" is a BPF_STX \| BPF_ATOMIC instruction with the 'imm' field set to BPF_STORE_REL (0x110). Unlike existing atomic read-modify-write operations that only support BPF_W (32-bit) and BPF_DW (64-bit) size modifiers, load-acquires and store-releases also support BPF_B (8-bit) and BPF_H (16-bit). An 8- or 16-bit load-acquire zero-extends the value before writing it to a 32-bit register, just like ARM64 instruction LDAPRH and friends. As an example (assuming little-endian): long foo(long ptr) { return __atomic_load_n(ptr, __ATOMIC_ACQUIRE); } foo() can be compiled to: db 10 00 00 00 01 00 00 r0 = load_acquire((u64 )(r1 + 0x0)) 95 00 00 00 00 00 00 00 exit opcode (0xdb): BPF_ATOMIC \| BPF_DW \| BPF_STX imm (0x00000100): BPF_LOAD_ACQ Similarly: void bar(short ptr, short val) { __atomic_store_n(ptr, val, __ATOMIC_RELEASE); } bar() can be compiled to: cb 21 00 00 10 01 00 00 store_release((u16 )(r1 + 0x0), w2) 95 00 00 00 00 00 00 00 exit opcode (0xcb): BPF_ATOMIC \| BPF_H \| BPF_STX imm (0x00000110): BPF_STORE_REL Inline assembly is also supported. Add a pre-defined macro, __BPF_FEATURE_LOAD_ACQ_STORE_REL, to let developers detect this new feature. It can also be disabled using a new llc option, -disable-load-acq-store-rel. Using __ATOMIC_RELAXED for __atomic_store{,_n}() will generate a "plain" store (BPF_MEM \| BPF_STX) instruction: void foo(short ptr, short val) { __atomic_store_n(ptr, val, __ATOMIC_RELAXED); } 6b 21 00 00 00 00 00 00 (u16 )(r1 + 0x0) = w2 95 00 00 00 00 00 00 00 exit Similarly, using __ATOMIC_RELAXED for __atomic_load{,_n}() will generate a zero-extending, "plain" load (BPF_MEM \| BPF_LDX) instruction: int foo(char ptr) { return __atomic_load_n(ptr, __ATOMIC_RELAXED); } 71 11 00 00 00 00 00 00 w1 = (u8 )(r1 + 0x0) bc 10 08 00 00 00 00 00 w0 = (s8)w1 95 00 00 00 00 00 00 00 exit Currently __ATOMIC_CONSUME is an alias for __ATOMIC_ACQUIRE. Using __ATOMIC_SEQ_CST ("sequentially consistent") is not supported yet and will cause an error: $ clang --target=bpf -mcpu=v4 -c bar.c > /dev/null bar.c:1:5: error: sequentially consistent (seq_cst) atomic load/store is not supported 1 \| int foo(int ptr) { return __atomic_load_n(ptr, __ATOMIC_SEQ_CST); } \| ^ ... Finally, rename those isST() and isLD*() helper functions in BPFMISimplifyPatchable.cpp based on what the instructions actually do, rather than their instruction class. [1] https://lore.kernel.org/all/20240729183246.4110549-1-yepeilin@google.com/	2025-03-04 09:19:39 -08:00
Brandon Wu	c804e86f55	[RISCV][VLS] Support RISCV VLS calling convention (#100346 ) This patch adds a function attribute `riscv_vls_cc` for RISCV VLS calling convention which takes 0 or 1 argument, the argument is the `ABI_VLEN` which is the `VLEN` for passing the fixed-vector arguments, it wraps the argument as a scalable vector(VLA) using the `ABI_VLEN` and uses the corresponding mechanism to handle it. The range of `ABI_VLEN` is [32, 65536], if not specified, the default value is 128. Here is an example of VLS argument passing: Non-VLS call: ``` void original_call(__attribute__((vector_size(16))) int arg) {} => define void @original_call(i128 noundef %arg) { entry: ... ret void } ``` VLS call: ``` void __attribute__((riscv_vls_cc(256))) vls_call(__attribute__((vector_size(16))) int arg) {} => define riscv_vls_cc void @vls_call(<vscale x 1 x i32> %arg) { entry: ... ret void } } ``` The first Non-VLS call passes generic vector argument of 16 bytes by flattened integer. On the contrary, the VLS call uses `ABI_VLEN=256` which wraps the vector to <vscale x 1 x i32> where the number of scalable vector elements is calaulated by: `ORIG_ELTS * RVV_BITS_PER_BLOCK / ABI_VLEN`. Note: ORIG_ELTS = Vector Size / Type Size = 128 / 32 = 4. PsABI PR: https://github.com/riscv-non-isa/riscv-elf-psabi-doc/pull/418 C-API PR: https://github.com/riscv-non-isa/riscv-c-api-doc/pull/68	2025-03-03 12:39:35 +08:00
Ami-zhang	f5f3612453	[clang][LoongArch] Add OHOS target (#127555 ) Add support for OHOS on loongarch64.	2025-03-03 09:25:49 +08:00
Yaxun (Sam) Liu	240f2269ff	Add clang atomic control options and attribute (#114841 ) Add option and statement attribute for controlling emitting of target-specific metadata to atomicrmw instructions in IR. The RFC for this attribute and option is https://discourse.llvm.org/t/rfc-add-clang-atomic-control-options-and-pragmas/80641, Originally a pragma was proposed, then it was changed to clang attribute. This attribute allows users to specify one, two, or all three options and must be applied to a compound statement. The attribute can also be nested, with inner attributes overriding the options specified by outer attributes or the target's default options. These options will then determine the target-specific metadata added to atomic instructions in the IR. In addition to the attribute, three new compiler options are introduced: `-f[no-]atomic-remote-memory`, `-f[no-]atomic-fine-grained-memory`, `-f[no-]atomic-ignore-denormal-mode`. These compiler options allow users to override the default options through the Clang driver and front end. `-m[no-]unsafe-fp-atomics` is aliased to `-f[no-]ignore-denormal-mode`. In terms of implementation, the atomic attribute is represented in the AST by the existing AttributedStmt, with minimal changes to AST and Sema. During code generation in Clang, the CodeGenModule maintains the current atomic options, which are used to emit the relevant metadata for atomic instructions. RAII is used to manage the saving and restoring of atomic options when entering and exiting nested AttributedStmt.	2025-02-27 10:41:04 -05:00
Jon Chesterfield	43999deb37	[spirv][amdgpu] Set atomic size in the clang target info (#128569 ) Problem identified by Joseph. The openmp device runtime uses __scoped_atomic_load_n and similar which presently hit ``` error: large atomic operation may incur significant performance penalty; the access size (4 bytes) exceeds the max lock-free size (0 bytes) [-Werror,-Watomic-alignment] ``` This is because the spirv class doesn't set the corresponding field. The base does, but only if there's a host toolchain, which there isn't.	2025-02-25 18:31:10 +00:00
Ben Langmuir	4bb04d4176	[clang][modules] Fix local submodule visibility of macros from transitive import (#122955 ) When we mark a module visible, we normally mark all of its non-explicit submodules and other exports as visible. However, when we first enter a submodule we should not make them visible to the submodule itself until they are actually imported. Marking exports visible before import would cause bizarre behaviour with local submodule visibility, because it happened before we discovered the submodule's transitive imports and could fail to make them visible in the parent module depending on whether the submodules involved were explicitly defined (module X) or implicitly defined from an umbrella (module *). rdar://136524433	2025-02-21 10:04:42 -08:00
Sean Perry	d2d1f143e5	[z/OS] Add option to target older versions of LE on z/OS (#123399 ) Add an option similar to the -qtarget option in XL to allow the user to say they want to be able to run the generated program on an older version of the LE environment. This option will do two things: - set the `__TARGET_LIBS` macro so the system headers exclude newer interfaces when targeting older environments - set the arch level to match the minimum arch level for that older version of LE. It doesn't happen right now since all of the supported LE versions have a the same minimum ach level. So the option doesn't change this yet. The user can specify three different kinds of arguments: 1. -mzos-target=zosvVrR - where V & R are the version and release 2. -mzos-target=0x4vrrmmmm - v, r, m, p are the hex values for the version, release, and modlevel 3. -mzos-target=current - uses the latest version of LE the system headers have support for	2025-02-21 10:30:35 -05:00
Oliver Stannard	02e8fd7a30	[ARM,AArch64] Fix ABI bugs with over-sized bitfields (#126774 ) This fixes two bugs in the ABI for over-sized bitfields for ARM and AArch64: The container type picked for an over-sized bitfield already contributes to the alignment of the structure, but it should also contribute to the "unadjusted alignment" which is used by the ARM and AArch64 PCS. AAPCS64 defines the bitfield layout algorithm for over-sized bitfields as picking a container which is the fundamental integer data type with the largest size less than or equal to the bit-field width. Since AAPCS64 has a 128-bit integer fundamental data type, we need to consider Int128 as a container type for AArch64.	2025-02-20 17:07:16 +00:00
Sebastian Jodłowski	0127f169dc	[CUDA] Add support for sm101 and sm120 target architectures (#127187 ) Add support for sm101 and sm120 target architectures. It requires CUDA 12.8. --------- Co-authored-by: Sebastian Jodlowski <sjodlowski@nuro.ai>	2025-02-19 14:41:07 -08:00
Fabian Ritter	029c8e783d	[AMDGPU][clang] Replace gfx940 and gfx941 with gfx942 in clang (#126762 ) gfx940 and gfx941 are no longer supported. This is one of a series of PRs to remove them from the code base. This PR removes all occurrences of gfx940/gfx941 from clang that can be removed without changes in the llvm directory. The target-invalid-cpu-note/amdgcn.c test is not included here since it tests a list of targets that is defined in llvm/lib/TargetParser/TargetParser.cpp. For SWDEV-512631	2025-02-19 10:11:48 +01:00
Ming-Yi Lai	f6d74af4d9	[clang][X86] Only define __CET__ macro for X86 targets (#127616 ) The `-fcf-protection` flag is now also used to enable CFI features for the RISC-V target, so it's not suitable to define `__CET__` solely based on the flag anymore. This patch moves the definition of the `__CET__` macro into X86 target hook, so only X86 targets with the `-fcf-protection` flag would enable the `__CET__` macro. See https://github.com/llvm/llvm-project/pull/109784 and https://github.com/llvm/llvm-project/pull/112477 for the adoption of `-fcf-protection` flag for RISC-V targets.	2025-02-19 10:12:54 +08:00
Ming-Yi Lai	2fdb26da61	[clang][RISCV] Introduce preprocessor macro when Zicfiss-based shadow stack is enabled (#127592 ) The `-fcf-protection=[full\|return]` flag enables shadow stack implementation based on RISC-V Zicfiss extension. This patch adds the `__riscv_shadow_stack` predefined macro to preprocessing when such a shadow stack implementation is enabled.	2025-02-18 17:27:20 +08:00
Zahira Ammarguellat	cf69b4c668	[Clang] [OpenMP] Add support for '#pragma omp stripe'. (#126927 ) This patch was reviewed and approved here: https://github.com/llvm/llvm-project/pull/119891 However it has been reverted here: `083df25dc2` due to a build issue here: https://lab.llvm.org/buildbot/#/builders/51/builds/10694 This patch is reintroducing the support.	2025-02-13 07:14:36 -05:00
Kazu Hirata	67e1e98811	Revert "[Clang] [OpenMP] Add support for '#pragma omp stripe'. (#119891 )" This reverts commit 070f84ebc89b11df616a83a56df9ac56efbab783. Buildbot failure: https://lab.llvm.org/buildbot/#/builders/51/builds/10694	2025-02-11 12:39:01 -08:00
Zahira Ammarguellat	070f84ebc8	[Clang] [OpenMP] Add support for '#pragma omp stripe'. (#119891 ) Implement basic parsing and semantic support for `#pragma omp stripe` constuct introduced in https://www.openmp.org/wp-content/uploads/[OpenMP-API-Specification-6-0.pdf](https://www.openmp.org/wp-content/uploads/OpenMP-API-Specification-6-0.pdf), section 11.7.	2025-02-11 13:58:21 -05:00
kadir çetinkaya	ecb016a87d	[clang] Parse warning-suppression-mapping after setting up diagengine (#125714 ) We can emit diagnostics while parsing warning-suppression-mapping, make sure command line flags take affect when emitting those.	2025-02-06 10:18:38 +01:00
David Pagan	659d1feeaf	[clang][OpenMP] OpenMP 6.0 updates to restrictions with order/concurrent (#125621 ) From OpenMP 6.0 features list - OpenMP directives in concurrent loop regions - atomics constructs on concurrent loop regions - Lift nesting restriction on concurrent loop Testing - Updated test/OpenMP/for_order_messages.cpp - check-all	2025-02-05 10:10:06 -08:00
Daniil Kovalev	84b0c128a7	[PAC] Do not support some values of branch-protection with ptrauth-returns (#125280 ) This patch does two things. 1. Previously, when checking driver arguments, we emitted an error for unsupported values of `-mbranch-protection` when using pauthtest ABI. The reason for that was ptrauth-returns being enabled as part of pauthtest. This patch changes the check against pauthtest to a check against ptrauth-returns. 2. Similarly, check against values of the following function attribute which are unsupported with ptrauth-returns: `__attribute__((target("branch-protection=XXX`. Note that existing `validateBranchProtection` function is used, and current behavior is to ignore the unsupported attribute value, so no error is emitted.	2025-02-05 11:39:27 +03:00
Chandler Carruth	51d0ad7de0	[StrTable] Add factored prefix for Hexagon This target's builtins have an especially long prefix and so we get over 2x reduction in string table size required with this change.	2025-02-04 18:04:58 +00:00
Chandler Carruth	2ff42bdac3	[StrTable] Add prefixes for x86 builtins. This requires adding support to the general builtins emission for producing prefixed builtin infos separately from un-prefixed which is a bit crufty. But we don't currently have any good way of having a more refined model than a single hard-coded prefix string per TableGen emission. Something more powerful and/or elegant is possible, but this is a fairly minimal first step that at least allows factoring out the builtin prefix for something like X86.	2025-02-04 18:04:58 +00:00
Chandler Carruth	212ecb9d5c	[StrTable] Teach main builtin TableGen to use direct enums, strings, and info This moves the main builtins and several targets to use nice generated string tables and info structures rather than X-macros. Even without obvious prefixes factored out, the resulting tables are significantly smaller and much cheaper to compile with out all the X-macro overhead. This leaves the X-macros in place for atomic builtins which have a wide range of uses that don't seem reasonable to fold into TableGen. As future work, these should move to their own file (whether as X-macros or just generated patterns) so the AST headers don't have to include all the data for other builtins.	2025-02-04 18:04:58 +00:00
Chandler Carruth	64ea3f5a47	[StrTable] Switch AArch64 and ARM to use directly TableGen-ed builtin tables This leverages the sharded structure of the builtins to make it easy to directly tablegen most of the AArch64 and ARM builtins while still using X-macros for a few edge cases. It also extracts common prefixes as part of that. This makes the string tables for these targets dramatically smaller. This is especially important as the SVE builtins represent (by far) the largest string table and largest builtin table across all the targets in Clang.	2025-02-04 18:04:58 +00:00
Chandler Carruth	1cb979f001	[StrTable] Switch RISCV to leverage sharded, prefixed builtins w/ TableGen This lets the TableGen-ed code be much cleaner, directly building an efficient string table without duplicates and without the repeated prefix.	2025-02-04 18:04:57 +00:00
Chandler Carruth	cd269fee05	[StrTable] Switch Clang builtins to use string tables This both reapplies #118734, the initial attempt at this, and updates it significantly. First, it uses the newly added `StringTable` abstraction for string tables, and simplifies the construction to build the string table and info arrays separately. This should reduce any `constexpr` compile time memory or CPU cost of the original PR while significantly improving the APIs throughout. It also restructures the builtins to support sharding across several independent tables. This accomplishes two improvements from the original PR: 1) It improves the APIs used significantly. 2) When builtins are defined from different sources (like SVE vs MVE in AArch64), this allows each of them to build their own string table independently rather than having to merge the string tables and info structures. 3) It allows each shard to factor out a common prefix, often cutting the size of the strings needed for the builtins by a factor two. The second point is important both to allow different mechanisms of construction (for example a `.def` file and a tablegen'ed `.inc` file, or different tablegen'ed `.inc files), it also simply reduces the sizes of these tables which is valuable given how large they are in some cases. The third builds on that size reduction. Initially, we use this new sharding rather than merging tables in AArch64, LoongArch, RISCV, and X86. Mostly this helps ensure the system works, as without further changes these still push scaling limits. Subsequent commits will more deeply leverage the new structure, including using the prefix capabilities which cannot be easily factored out here and requires deep changes to the targets.	2025-02-04 18:04:57 +00:00
Durgadoss R	91cb8f5d32	[NVPTX] Add tcgen05 alloc/dealloc intrinsics (#124961 ) This patch adds intrinsics for the tcgen05 alloc/dealloc family of PTX instructions. This patch also adds an addrspace 6 for tensor memory which is used by these intrinsics. lit tests are added and verified with a ptxas-12.8 executable. Documentation for these additions is also added in NVPTXUsage.rst. Signed-off-by: Durgadoss R <durgadossr@nvidia.com>	2025-02-04 14:31:40 +05:30
David Green	9f1c825fb6	[AArch64] Enable vscale_range with +sme (#124466 ) If we have +sme but not +sve, we would not set vscale_range on functions. It should be valid to apply it with the same range with just +sme, which can help mitigate some performance regressions in cases such as scalable vector bitcasts (https://godbolt.org/z/exhe4jd8d).	2025-01-31 07:57:43 +00:00

1 2 3 4 5 ...

5250 Commits