llvm-project

mirror of https://github.com/llvm/llvm-project.git synced 2025-05-16 17:36:06 +00:00

Author	SHA1	Message	Date
OverMighty	21d83324fb	[clang] Implement __builtin_popcountg (#82359 ) Fixes #82058.	2024-02-26 13:59:42 -08:00
Farzon Lotfi	82acec15af	[HLSL] Implementation of dot intrinsic (#81190 ) This change implements https://github.com/llvm/llvm-project/issues/70073 HLSL has a dot intrinsic defined here: https://learn.microsoft.com/en-us/windows/win32/direct3dhlsl/dx-graphics-hlsl-dot The intrinsic itself is defined as a HLSL_LANG LangBuiltin in Builtins.td. This is used to associate all the dot product typdef defined hlsl_intrinsics.h with a single intrinsic check in CGBuiltin.cpp & SemaChecking.cpp. In IntrinsicsDirectX.td we define the llvmIR for the dot product. A few goals were in mind for this IR. First it should operate on only vectors. Second the return type should be the vector element type. Third the second parameter vector should be of the same size as the first parameter. Finally `a dot b` should be the same as `b dot a`. In CGBuiltin.cpp hlsl has built on top of existing clang intrinsics via EmitBuiltinExpr. Dot product though is language specific intrinsic and so is guarded behind getLangOpts().HLSL. The call chain looks like this: EmitBuiltinExpr -> EmitHLSLBuiltinExp EmitHLSLBuiltinExp dot product intrinsics makes a destinction between vectors and scalars. This is because HLSL supports dot product on scalars which simplifies down to multiply. Sema.h & SemaChecking.cpp saw the addition of CheckHLSLBuiltinFunctionCall, a language specific semantic validation that can be expanded for other hlsl specific intrinsics. Fixes #70073	2024-02-26 10:08:59 -06:00
Pavel Iliin	568babab7e	[AArch64] Implement __builtin_cpu_supports, compiler-rt tests. (#82378 ) The patch complements https://github.com/llvm/llvm-project/pull/68919 and adds AArch64 support for builtin `__builtin_cpu_supports("feature1+...+featureN")` which return true if all specified CPU features in argument are detected. Also compiler-rt aarch64 native run tests for features detection mechanism were added and 'cpu_model' check was fixed after its refactor merged https://github.com/llvm/llvm-project/pull/75635 Original RFC was https://reviews.llvm.org/D153153	2024-02-22 23:33:54 +00:00
zhijian lin	5b8e5604c2	[AIX] Lower intrinsic __builtin_cpu_is into AIX platform-specific code. (#80069 ) On AIX OS, __builtin_cpu_is() references the runtime external variable _system_configuration from /usr/include/sys/systemcfg.h. ref issue: https://github.com/llvm/llvm-project/issues/80042	2024-02-22 08:46:08 -05:00
Pierrick Bouvier	0ea64ad88a	[COFF][Aarch64] Add _InterlockedAdd64 intrinsic (#81849 ) Found when compiling openssl master branch using clang-cl. This commit introduces usage of InterlockedAdd64: `d0e1a0ae70` https://learn.microsoft.com/en-us/cpp/intrinsics/interlockedadd-intrinsic-functions	2024-02-16 13:20:08 +02:00
Shilei Tian	630f82ec0c	[Clang][CodeGen] Loose the cast check when emitting builtins (#81669 ) This patch looses the cast check (`canLosslesslyBitCastTo`) and leaves it to the one inside `CreateBitCast`. It seems too conservative for the use case here.	2024-02-14 12:59:59 -05:00
Joseph Huber	11fcae69db	[LLVM] Add `__builtin_readsteadycounter` intrinsic and builtin for realtime clocks (#81331 ) Summary: This patch adds a new intrinsic and builtin function mirroring the existing `__builtin_readcyclecounter`. The difference is that this implementation targets a separate counter that some targets have which returns a fixed frequency clock that can be used to determine elapsed time, this is different compared to the cycle counter which often has variable frequency. This patch only adds support for the NVPTX and AMDGPU targets. This is done as a new and separate builtin rather than an argument to `readcyclecounter` to avoid needing to change existing code and to make the separation more explicit.	2024-02-13 10:06:25 -06:00
Shilei Tian	c4b0dfcc99	[Clang] Fix a non-effective assertion (#81083 ) `PTy` here is literally `FTy->getParamType(i)`, which makes this assertion not work as expected.	2024-02-08 09:44:42 -05:00
Mészáros Gergely	5942868a21	[clang][AMDGPU][CUDA] Handle __builtin_printf for device printf (#68515 ) Previously `__builtin_printf` would result to emitting call to `printf`, even though directly calling `printf` was translated. Ref: #68478	2024-02-05 23:23:13 +05:30
Pierre van Houtryve	500846d2f5	[AMDGPU] Introduce Code Object V6 (#76954 ) Introduce Code Object V6 in Clang, LLD, Flang and LLVM. This is the same as V5 except a new "generic version" flag can be present in EFLAGS. This is related to new generic targets that'll be added in a follow-up patch. It's also likely V6 will have new changes (possibly new metadata entries) added later. Docs change are part of the follow-up patch #76955	2024-02-05 08:19:53 +01:00
Sander de Smalen	d313614b60	[AArch64] Replace LLVM IR function attributes for PSTATE.ZA. (#79166 ) Since https://github.com/ARM-software/acle/pull/276 the ACLE defines attributes to better describe the use of a given SME state. Previously the attributes merely described the possibility of it being 'shared' or 'preserved', whereas the new attributes have more semantics and also describe how the data flows through the program. For ZT0 we already had to add new LLVM IR attributes: * aarch64_new_zt0 * aarch64_in_zt0 * aarch64_out_zt0 * aarch64_inout_zt0 * aarch64_preserves_zt0 We have now done the same for ZA, such that we add: * aarch64_new_za (previously `aarch64_pstate_za_new`) * aarch64_in_za (more specific variation of `aarch64_pstate_za_shared`) * aarch64_out_za (more specific variation of `aarch64_pstate_za_shared`) * aarch64_inout_za (more specific variation of `aarch64_pstate_za_shared`) * aarch64_preserves_za (previously `aarch64_pstate_za_shared, aarch64_pstate_za_preserved`) This explicitly removes 'pstate' from the name, because with SME2 and the new ACLE attributes there is a difference between "sharing ZA" (sharing the ZA matrix register with the caller) and "sharing PSTATE.ZA" (sharing either the ZA or ZT0 register, both part of PSTATE.ZA with the caller).	2024-02-01 13:37:37 +00:00
Nemanja Ivanovic	67c1c1dbb6	[PowerPC][X86] Make cpu id builtins target independent and lower for PPC (#68919 ) Make __builtin_cpu_{init\|supports\|is} target independent and provide an opt-in query for targets that want to support it. Each target is still responsible for their specific lowering/code-gen. Also provide code-gen for PowerPC. I originally proposed this in https://reviews.llvm.org/D152914 and this addresses the comments I received there. --------- Co-authored-by: Nemanja Ivanovic <nemanjaivanovic@nemanjas-air.kpn> Co-authored-by: Nemanja Ivanovic <nemanja@synopsys.com>	2024-01-26 11:24:50 -05:00
Vojislav Tomasevic	2a77d92e2e	[clang] Incorrect IR involving the use of bcopy (#79298 ) This patch addresses the issue regarding the call of bcopy function in a conditional expression. It is analogous to the already accepted patch which deals with the same problem, just regarding the bzero function [0]. Here is the testcase which illustrates the issue: ``` void bcopy(const void , void , unsigned long); void foo(void); void test_bcopy() { char dst[20]; char src[20]; int _sz = 20, len = 20; return (_sz ? ((_sz >= len) ? bcopy(src, dst, len) : foo()) : bcopy(src, dst, len)); } ``` When processing it with clang, following issue occurs: Instruction does not dominate all uses! %arraydecay2 = getelementptr inbounds [20 x i8], ptr %dst, i64 0, i64 0, !dbg !38 %cond = phi ptr [ %arraydecay2, %cond.end ], [ %arraydecay5, %cond.false3 ], !dbg !33 fatal error: error in backend: Broken module found, compilation aborted! This happens because an incorrect phi node is created. It is created because bcopy function call is lowered to the call of llvm.memmove intrinsic and function memmove returns void *. Since llvm.memmove is called in two places in the same return statement, clang creates a phi node in the final basic block for the return value and that phi node is incorrect. However, bcopy function should return void in the first place, so this phi node is unnecessary. This is what this patch addresses. An appropriate test is also added and no existing tests fail when applying this patch. Also, this crash only happens when LLVM is configured with -DLLVM_ENABLE_ASSERTIONS=On option. [0] https://reviews.llvm.org/D39746	2024-01-24 09:39:36 -08:00
Mirko Brkušanin	7fdf608cef	[AMDGPU] Add GFX12 WMMA and SWMMAC instructions (#77795 ) Co-authored-by: Petar Avramovic <Petar.Avramovic@amd.com> Co-authored-by: Piotr Sobczak <piotr.sobczak@amd.com>	2024-01-24 13:43:07 +01:00
Matthew Devereau	6ba62f4f25	[AArch64][SME2] Refine fcvtu/fcvts/scvtf/ucvtf (#77947 ) Rename intrinsics for fcvtu to fcvtzu and fcvts to fcvtzs. Use llvm_anyvector_ty for both multi vector returns and operands, therefore the return and operands can be specified in the intrinsic call, e.g. @llvm.aarch64.sve.scvtf.x4.nxv4f32.nxv4i32	2024-01-22 15:11:49 +00:00
Piotr Sobczak	57f6a3f7ea	[AMDGPU] Add global_load_tr for GFX12 (#77772 ) Support new amdgcn_global_load_tr instructions for load with transpose. * MC layer support for GLOBAL_LOAD_TR_B64/GLOBAL_LOAD_TR_B128 * Intrinsic int_amdgcn_global_load_tr * Clang builtins amdgcn_global_load_tr*	2024-01-18 15:14:42 +01:00
Mikael Holmen	e6bd9835d9	[clang][CodeGen] Fix gcc warning about unused variable [NFC] Without the fix gcc warned with ../../clang/lib/CodeGen/CGBuiltin.cpp:1022:19: warning: unused variable 'DRE' [-Wunused-variable] 1022 \| if (const auto *DRE = dyn_cast<DeclRefExpr>(Base)) { \| ^~~ Fix the warning by removing the unused variable and change the "dyn_cast" to "isa".	2024-01-17 13:23:08 +01:00
Bill Wendling	00b6d032a2	[Clang] Implement the 'counted_by' attribute (#76348 ) The 'counted_by' attribute is used on flexible array members. The argument for the attribute is the name of the field member holding the count of elements in the flexible array. This information is used to improve the results of the array bound sanitizer and the '__builtin_dynamic_object_size' builtin. The 'count' field member must be within the same non-anonymous, enclosing struct as the flexible array member. For example: ``` struct bar; struct foo { int count; struct inner { struct { int count; /* The 'count' referenced by 'counted_by' / }; struct { / ... / struct bar array[] __attribute__((counted_by(count))); }; } baz; }; ``` This example specifies that the flexible array member 'array' has the number of elements allocated for it in 'count': ``` struct bar; struct foo { size_t count; /* ... / struct bar array[] __attribute__((counted_by(count))); }; ``` This establishes a relationship between 'array' and 'count'; specifically that 'p->array' must have at least 'p->count' number of elements available. It's the user's responsibility to ensure that this relationship is maintained throughout changes to the structure. In the following, the allocated array erroneously has fewer elements than what's specified by 'p->count'. This would result in an out-of-bounds access not not being detected: ``` struct foo p; void foo_alloc(size_t count) { p = malloc(MAX(sizeof(struct foo), offsetof(struct foo, array[0]) + count sizeof(struct bar ))); p->count = count + 42; } ``` The next example updates 'p->count', breaking the relationship requirement that 'p->array' must have at least 'p->count' number of elements available: ``` void use_foo(int index, int val) { p->count += 42; p->array[index] = val; / The sanitizer can't properly check this access */ } ``` In this example, an update to 'p->count' maintains the relationship requirement: ``` void use_foo(int index, int val) { if (p->count == 0) return; --p->count; p->array[index] = val; } ```	2024-01-16 14:26:12 -08:00
Craig Topper	142f270c27	Recommit "[AST] Use APIntStorage to fix memory leak in EnumConstantDecl. (#78311 )" With lldb build fix. Original message: EnumConstantDecl is allocated by the ASTContext allocator so the destructor is never called. This patch takes a similar approach to IntegerLiteral by using APIntStorage to allocate large APSInts using the ASTContext allocator as well. The downside is that an additional heap allocation and copy of the data needs to be made when calling getInitValue if the APSInt is large. Fixes #78160.	2024-01-16 13:52:17 -08:00
Craig Topper	f3d534c425	Revert "[AST] Use APIntStorage to fix memory leak in EnumConstantDecl. (#78311 )" This reverts commit 4737959d91fab7673b1bb642f88658bb2a24d723. Missed an lldb update.	2024-01-16 12:39:47 -08:00
Craig Topper	4737959d91	[AST] Use APIntStorage to fix memory leak in EnumConstantDecl. (#78311 ) EnumConstantDecl is allocated by the ASTContext allocator so the destructor is never called. This patch takes a similar approach to IntegerLiteral by using APIntStorage to allocate large APSInts using the ASTContext allocator as well. The downside is that an additional heap allocation and copy of the data needs to be made when calling getInitValue if the APSInt is large. Fixes #78160.	2024-01-16 12:10:38 -08:00
Rashmi Mudduluru	a511c1a9ec	Revert "[Clang] Implement the 'counted_by' attribute (#76348 )" This reverts commit 164f85db876e61cf4a3c34493ed11e8f5820f968.	2024-01-15 18:37:52 -08:00
Bill Wendling	164f85db87	[Clang] Implement the 'counted_by' attribute (#76348 ) The 'counted_by' attribute is used on flexible array members. The argument for the attribute is the name of the field member holding the count of elements in the flexible array. This information is used to improve the results of the array bound sanitizer and the '__builtin_dynamic_object_size' builtin. The 'count' field member must be within the same non-anonymous, enclosing struct as the flexible array member. For example: ``` struct bar; struct foo { int count; struct inner { struct { int count; /* The 'count' referenced by 'counted_by' / }; struct { / ... / struct bar array[] __attribute__((counted_by(count))); }; } baz; }; ``` This example specifies that the flexible array member 'array' has the number of elements allocated for it in 'count': ``` struct bar; struct foo { size_t count; /* ... / struct bar array[] __attribute__((counted_by(count))); }; ``` This establishes a relationship between 'array' and 'count'; specifically that 'p->array' must have at least 'p->count' number of elements available. It's the user's responsibility to ensure that this relationship is maintained throughout changes to the structure. In the following, the allocated array erroneously has fewer elements than what's specified by 'p->count'. This would result in an out-of-bounds access not not being detected: ``` struct foo p; void foo_alloc(size_t count) { p = malloc(MAX(sizeof(struct foo), offsetof(struct foo, array[0]) + count sizeof(struct bar ))); p->count = count + 42; } ``` The next example updates 'p->count', breaking the relationship requirement that 'p->array' must have at least 'p->count' number of elements available: ``` void use_foo(int index, int val) { p->count += 42; p->array[index] = val; / The sanitizer can't properly check this access */ } ``` In this example, an update to 'p->count' maintains the relationship requirement: ``` void use_foo(int index, int val) { if (p->count == 0) return; --p->count; p->array[index] = val; } ```	2024-01-10 22:20:31 -08:00
Nico Weber	2dce77201c	Revert "[Clang] Implement the 'counted_by' attribute (#76348 )" This reverts commit fefdef808c230c79dca2eb504490ad0f17a765a5. Breaks check-clang, see https://github.com/llvm/llvm-project/pull/76348#issuecomment-1886029515 Also revert follow-on "[Clang] Update 'counted_by' documentation" This reverts commit 4a3fb9ce27dda17e97341f28005a28836c909cfc.	2024-01-10 21:05:19 -05:00
Bill Wendling	4a3fb9ce27	[Clang] Update 'counted_by' documentation Describe a limitation of the 'counted_by' attribute when used in unions. Also fix a errant typo.	2024-01-10 15:36:33 -08:00
Bill Wendling	fefdef808c	[Clang] Implement the 'counted_by' attribute (#76348 ) The 'counted_by' attribute is used on flexible array members. The argument for the attribute is the name of the field member holding the count of elements in the flexible array. This information is used to improve the results of the array bound sanitizer and the '__builtin_dynamic_object_size' builtin. The 'count' field member must be within the same non-anonymous, enclosing struct as the flexible array member. For example: ``` struct bar; struct foo { int count; struct inner { struct { int count; /* The 'count' referenced by 'counted_by' / }; struct { / ... / struct bar array[] __attribute__((counted_by(count))); }; } baz; }; ``` This example specifies that the flexible array member 'array' has the number of elements allocated for it in 'count': ``` struct bar; struct foo { size_t count; /* ... / struct bar array[] __attribute__((counted_by(count))); }; ``` This establishes a relationship between 'array' and 'count'; specifically that 'p->array' must have at least 'p->count' number of elements available. It's the user's responsibility to ensure that this relationship is maintained throughout changes to the structure. In the following, the allocated array erroneously has fewer elements than what's specified by 'p->count'. This would result in an out-of-bounds access not not being detected: ``` struct foo p; void foo_alloc(size_t count) { p = malloc(MAX(sizeof(struct foo), offsetof(struct foo, array[0]) + count sizeof(struct bar ))); p->count = count + 42; } ``` The next example updates 'p->count', breaking the relationship requirement that 'p->array' must have at least 'p->count' number of elements available: ``` void use_foo(int index, int val) { p->count += 42; p->array[index] = val; / The sanitizer can't properly check this access */ } ``` In this example, an update to 'p->count' maintains the relationship requirement: ``` void use_foo(int index, int val) { if (p->count == 0) return; --p->count; p->array[index] = val; } ```	2024-01-10 15:21:10 -08:00
CarolineConcatto	14e7dac92a	[Clang][LLVM][AArch64]SVE2.1 update the intrinsics according to acle[1] (#76844 ) This patch changes the following intrinsic ```svst1uwq[_{d}] replaced by svst1wq[_{d}] svst1uwq_vnum[_{d}] replaced by svst1wq_vnum[_{d}] svst1udq[_{d}] replaced by svst1dq[_{d}] svst1udq_vnum[_{d}] replaced by svst1dq_vnum[_{d}] ``` Drops 'u' from the quadword stores because it is simply truncating the quadwords to 32 bits ``` svextq_lane[_{d}] replaced by svextq[_{d}] ``` EXTQ follows the previous defined EXT intrinsics ``` svdot[_{d}_{2}_{3}] replaced by svdot[_{d}_{2}] ``` Introduced with the latest SME2 ACLE change [1]https://github.com/ARM-software/acle/pull/257	2024-01-10 17:12:14 +00:00
Sander de Smalen	5055eeea52	[Clang][AArch64] Add missing SME functions to header file. (#75791 ) This includes: * __arm_in_streaming_mode() * __arm_has_sme() * __arm_za_disable() * __svundef_za()	2024-01-02 09:43:30 +00:00
Dinar Temirbulatov	809f2f3d7d	[AArch64][SME2] Add builtins for FDOT, BFDOT, SUDOT, USDOT, SDOT, UDOT. (#75737 ) Add SME2 DOT builtins.	2023-12-21 19:41:24 +00:00
Dinar Temirbulatov	77c5c44b01	[AArch64][SME2] Add SME2 MLA/MLS builtins. (#75584 ) Add SME2 MLA/MLS builtins.	2023-12-21 16:42:24 +00:00
Bill Wendling	cca4d6cfd2	Revert counted_by attribute feature (#75857 ) There are many issues that popped up with the counted_by feature. The patch #73730 has grown too large and approval is blocking Linux testing. Includes reverts of: commit 769bc11f684d ("[Clang] Implement the 'counted_by' attribute (#68750)") commit bc09ec696209 ("[CodeGen] Revamp counted_by calculations (#70606)") commit 1a09cfb2f35d ("[Clang] counted_by attr can apply only to C99 flexible array members (#72347)") commit a76adfb992c6 ("[NFC][Clang] Refactor code to calculate flexible array member size (#72790)") commit d8447c78ab16 ("[Clang] Correct handling of negative and out-of-bounds indices (#71877)") Partial commit b31cd07de5b7 ("[Clang] Regenerate test checks (NFC)") Closes #73168 Closes #75173	2023-12-18 15:16:09 -08:00
Paul Walker	dea16ebd26	[LLVM][IR] Replace ConstantInt's specialisation of getType() with getIntegerType(). (#75217 ) The specialisation will not be valid when ConstantInt gains native support for vector types. This is largely a mechanical change but with extra attention paid to constant folding, InstCombineVectorOps.cpp, LoopFlatten.cpp and Verifier.cpp to remove the need to call `getIntegerType()`. Co-authored-by: Nikita Popov <github@npopov.com>	2023-12-18 11:58:42 +00:00
Simon Pilgrim	df3ddd78f6	CGBuiltin - fix gcc Wunused-variable warning. NFC.	2023-12-18 11:51:24 +00:00
Akira Hatanaka	31429e7a89	[CodeGen] Emit a more accurate alignment for non-temporal loads/stores (#75675 ) Call EmitPointerWithAlignment to compute the alignment based on the underlying lvalue's alignment when it's available.	2023-12-17 18:22:44 -08:00
Lei Huang	aaa3f72c1c	[PowerPC] Emit libcall to frexpl for calls to frexp(ppcDoublDouble) (#75226 ) On Linux PPC call lib func ``frexpl`` for calls to ``frexp()`` for input of type PPCDoubleDouble. Fixes bug: https://github.com/llvm/llvm-project/issues/64426	2023-12-15 17:23:16 -05:00
CarolineConcatto	f2464ca317	[SVE2.1][Clang][LLVM]Int/FP reduce builtin in Clang and LLVM intrinsic (#69926 ) This patch implements the builtins in Clang and the LLVM-IR intrinsic for the following: // Variants are also available for: // _s8, _s16, _u16, _s32, _u32, _s64, _u64, // _f16, _f32, _f64uint8x16_t svaddqv[_u8](svbool_t pg, svuint8_t zn); // Variants are also available for: // _s8, _u16, _s16, _u32, _s32, _u64, _s64 uint8x16_t svandqv[_u8](svbool_t pg, svuint8_t zn); uint8x16_t sveorqv[_u8](svbool_t pg, svuint8_t zn); uint8x16_t svorqv[_u8](svbool_t pg, svuint8_t zn); // Variants are also available for: // _s8, _u16, _s16, _u32, _s32, _u64, _s64; uint8x16_t svmaxqv[_u8](svbool_t pg, svuint8_t zn); uint8x16_t svminqv[_u8](svbool_t pg, svuint8_t zn); // Variants are also available for _f32, _f64 float16x8_t svmaxnmqv[_f16](svbool_t pg, svfloat16_t zn); float16x8_t svminnmqv[_f16](svbool_t pg, svfloat16_t zn); According to the PR#257[1] The reduction instruction uses scalable vectors as input and fixed vectors as output, therefore we changed SVEEmitter to emit fixed vector types in case the neon header(arm_neon.h) is not present. [1]https://github.com/ARM-software/acle/pull/257 Co-author: Dinar Temirbulatov <dinar.temirbulatov@arm.com>	2023-12-13 15:45:59 +00:00
Dinar Temirbulatov	49b27b150b	[AArch64][SME2] Add builtins to cast svbool from/to svcount. (#74720 ) Add builtin: 'svreinterpret_b' to cast from svcount_t to svbool_t. Add builtin: 'svreinterpret_c' to cast from svbool_t to svcount_t. Patch by: Hassnaa Hamdi <hassnaa.hamdi@arm.com>	2023-12-08 16:38:29 +00:00
James Y Knight	4d4c30a37c	Use Address for CGBuilder's CreateAtomicRMW and CreateAtomicCmpXchg. (#74349 ) Update all callers to pass through the Address. For the older builtins such as `__sync_` and MSVC `_Interlocked`, natural alignment of the atomic access is _assumed_. This change preserves that behavior. It will pass through greater-than-required alignments, however.	2023-12-04 13:37:04 -05:00
Ulrich Weigand	c61eb44005	[SystemZ] Implement vector rotate in terms of funnel shift Clang currently implements a set of vector rotate builtins (__builtin_s390_verll) in terms of platform-specific LLVM intrinsics. To simplify the IR (and allow for common code optimizations if applicable), this patch removes those LLVM intrinsics and implements the builtins in terms of the platform-independent funnel shift intrinsics instead. Also, fix the prototype of the __builtin_s390_verll builtins for full compatibility with GCC.	2023-12-04 16:52:00 +01:00
Dominik Adamski	95943d2fab	[Flang] Add code-object-version option (#72638 ) Information about code object version can be configured by the user for AMD GPU target and it needs to be placed in LLVM IR generated by Flang. Information about code object version in MLIR generated by the parser can be reused by other tools. There is no need to specify extra flags if we want to invoke MLIR tools (like fir-opt) separately. Changes in comparison to a8ac93: * added information about required targets for test flang/test/Driver/driver-help.f90	2023-11-29 03:01:01 -06:00
Dominik Adamski	f00ffcdb58	Revert "[Flang] Add code-object-version option (#72638 )" This commit causes test errors on buildbots. This reverts commit a8ac930b99d93b2a539ada7e566993d148899144.	2023-11-28 13:18:46 -06:00
Dominik Adamski	a8ac930b99	[Flang] Add code-object-version option (#72638 ) Information about code object version can be configured by the user for AMD GPU target and it needs to be placed in LLVM IR generated by Flang. Information about code object version in MLIR generated by the parser can be reused by other tools. There is no need to specify extra flags if we want to invoke MLIR tools (like fir-opt) separately.	2023-11-28 19:57:36 +01:00
Youngsuk Kim	10e483521a	[clang][CodeGen] Remove ptr-to-ptr bitcasts (NFC) (#73020 ) Opaque ptr cleanup effort	2023-11-23 11:34:59 -05:00
Momchil Velikov	f335883808	[AArch64][SVE2.1] Add intrinsics for quadword loads/stores with unscaled offset (#70474 ) This patch adds a set of SVE2.1 quadword load/store intrisics: * Contiguous zero-extending load to quadword (single vector) sv<type>_t svld1uwq[_<typ>](svbool_t, const <type>_t ptr); sv<type>_t svld1uwq_vnum[_<typ>](svbool_t, const <type> ptr, int64_t vnum); sv<type>_t svld1udq[_<typ>](svbool_t, const <type>_t ptr); sv<type>_t svld1udq_vnum[_<typ>](svbool_t, const <type>_t ptr, int64_t vnum); * Contiguous truncating store of single vector operand void svst1uwq[_<typ>](svbool_t, const <type>_t ptr, sv<type>_t data); void svst1uwq_vnum[_<typ>](svbool_t, const <type>_t ptr, int64_t vnum, sv<type>_t data); void svst1udq[_<typ>](svbool_t, const <type>_t ptr, sv<type>_t data); void svst1udq_vnum[_<typ>](svbool_t, const <type>_t ptr, int64_t vnum, sv<type>_t data); * Gather load quadword sv<type>_t svld1q_gather[_u64base]_<typ>(svbool_t pg, svuint64_t zn); sv<type>_t svld1q_gather[_u64base]_offset_<typ>(svbool_t pg, svuint64_t zn, int64_t offset); * Scatter store quadword void svst1q_scatter[_u64base][_<typ>](svbool_t pg, svuint64_t zn, sv<type>_t data); void svst1q_scatter[_u64base]_offset[_<typ>](svbool_t pg, svuint64_t zn, int64_t offset, sv<type>_t data); * Contiguous load two, three or four quadword structures. sv<type>x2_t svld2q[_<typ>](svbool_t pg, const <type>_t rn); sv<type>x2_t svld2q_vnum[_<typ>](svbool_t pg, const <type>_t rn, uint64_t vnum); sv<type>x3_t svld3q[_<typ>](svbool_t pg, const <type>_t rn); sv<type>x3_t svld3q_vnum[_<typ>](svbool_t pg, const <type>_t rn, uint64_t vnum); sv<type>x4_t svld4q[_<typ>](svbool_t pg, const <type>_t rn); sv<type>x4_t svld4q_vnum[_<typ>](svbool_t pg, const <type>_t rn, uint64_t vnum); * Contiguous store two, three or four quadword structures. void svst2q[_<typ>](svbool_t pg, <type>_t rn, sv<type>x2_t zt); void svst2q_vnum[_<typ>](svbool_t pg, <type>_t rn, int64_t vnum, sv<type>x2_t zt); void svst3q[_<typ>](svbool_t pg, <type>_t rn, sv<type>x3_t zt); void svst3q_vnum[_<typ>](svbool_t pg, <type>_t rn, int64_t vnum, sv<type>x3_t zt); void svst4q[_<typ>](svbool_t pg, <type>_t rn, sv<type>x4_t zt); void svst4q_vnum[_<typ>](svbool_t pg, <type>_t rn, int64_t vnum, sv<type>x4_t zt); ACLE spec: https://github.com/ARM-software/acle/pull/257 Co-authored-by: Caroline Concatto <caroline.concatto@arm.com> Co-authored-by: Hassnaa Hamdi <hassnaa.hamdi@arm.com>	2023-11-21 15:34:59 +00:00
Bill Wendling	d8447c78ab	[Clang] Correct handling of negative and out-of-bounds indices (#71877 ) GCC returns 0 for a negative index on an array in a structure. It also returns 0 for an array index that goes beyond the extent of the array. In addition. a pointer to a struct field returns that field's size, not the size of it plus the rest of the struct, unless it's the first field in the struct. struct s { int count; char dummy; int array[] __attribute((counted_by(count))); }; struct s *p = malloc(...); p->count = 10; A __bdos on the elements of p return: __bdos(p, 0) == 30 __bdos(p->array, 0) == 10 __bdos(&p->array[0], 0) == 10 __bdos(&p->array[-1], 0) == 0 __bdos(&p->array[42], 0) == 0 Also perform some refactoring, putting the "counted_by" calculations in their own function.	2023-11-20 09:49:20 -08:00
Sam Tebbs	f7b5c25507	[AArch64][SME] Remove immediate argument restriction for svldr and svstr (#68565 ) The svldr_vnum and svstr_vnum builtins always modify the base register and tile slice and provide immediate offsets of zero, even when the offset provided to the builtin is an immediate. This patch optimises the output of the builtins when the offset is an immediate, to pass it directly to the instruction and to not need the base register and tile slice updates.	2023-11-20 09:57:29 +00:00
Bill Wendling	a76adfb992	[NFC][Clang] Refactor code to calculate flexible array member size (#72790 ) The code that calculates the flexible array member size is big enough to warrant its own method.	2023-11-19 19:25:10 -08:00
Momchil Velikov	96ef623a75	[AArch64] Cast predicate operand of SVE gather loads/scater stores to the parameter type of the intrinsic (NFC) (#71289 ) When emitting LLVM IR for gather loads/scatter stores, the predicate parameter is cast to a type that depends on the loaded, resp. stored type. That's correct for operation where we have a predicate per lane, however it is not correct for quadword loads and stores (`LD1Q`, `ST1Q`) where the predicate is per 128-bit chunk, independent from the ACLE intrinsic type. This can be universally handled by cast to the corresponding parameter type of the intrinsic. The intrinsic itself should be defined in a way that enforces relations between parameter types.	2023-11-13 16:01:07 +00:00
Jessica Del	b025864af8	[AMDGPU] - Add clang builtins for tied WMMA intrinsics (#70669 ) Add clang builtins for the new tied wmma intrinsics. These variations tie the destination accumulator matrix to the input accumulator matrix. See https://github.com/llvm/llvm-project/pull/69903 for context.	2023-11-13 13:23:26 +01:00
Fangrui Song	65f2cf25c3	Revert "[CodeGen] -fsanitize=alignment: add cl::opt sanitize-alignment-builtin to disable memcpy instrumentation (#69240 )" This reverts commit e8fe4de64ffb84924c41e54116a04570046eed74. memcpy/memmove instrumentation for -fsanitize=alignment has been tested on a huge code base. There were some cleanups but the number does not justify a workaround.	2023-11-12 22:26:27 -08:00

1 2 3 4 5 ...

1866 Commits