llvm-project

mirror of https://github.com/llvm/llvm-project.git synced 2025-05-03 20:36:05 +00:00

Author	SHA1	Message	Date
Akira Hatanaka	d9a685a9dd	[CodeGen][arm64e] Add methods and data members to Address, which are needed to authenticate signed pointers (#86721 ) To authenticate pointers, CodeGen needs access to the key and discriminators that were used to sign the pointer. That information is sometimes known from the context, but not always, which is why `Address` needs to hold that information. This patch adds methods and data members to `Address`, which will be needed in subsequent patches to authenticate signed pointers, and uses the newly added methods throughout CodeGen. Although this patch isn't strictly NFC as it causes CodeGen to use different code paths in some cases (e.g., `mergeAddressesInConditionalExpr`), it doesn't cause any changes in functionality as it doesn't add any information needed for authentication. In addition to the changes mentioned above, this patch introduces class `RawAddress`, which contains a pointer that we know is unsigned, and adds several new functions for creating `Address` and `LValue` objects. This reapplies 8bd1f9116aab879183f34707e6d21c7051d083b6. The commit broke msan bots because LValue::IsKnownNonNull was uninitialized.	2024-03-27 12:24:49 -07:00
Alex Voicu	ab7dba233a	[CodeGen][LLVM] Make the `va_list` related intrinsics generic. (#85460 ) Currently, the builtins used for implementing `va_list` handling unconditionally take their arguments as unqualified `ptr`s i.e. pointers to AS 0. This does not work for targets where the default AS is not 0 or AS 0 is not a viable AS (for example, a target might choose 0 to represent the constant address space). This patch changes the builtins' signature to take generic `anyptr` args, which corrects this issue. It is noisy due to the number of tests affected. A test for an upstream target which does not use 0 as its default AS (SPIRV for HIP device compilations) is added as well.	2024-03-27 11:41:34 +00:00
Changpeng Fang	d023995ae2	AMDGPU: Simplify EmitAMDGPUBuiltinExpr for load transposes, NFC (#86707 ) We should not manually get the types of the loading data. Instead, we can get the types from the intrinsics directly.	2024-03-26 17:51:03 -07:00
Akira Hatanaka	b311756450	Revert "[CodeGen][arm64e] Add methods and data members to Address, which are needed to authenticate signed pointers (#67454 )" (#86674 ) This reverts commit 8bd1f9116aab879183f34707e6d21c7051d083b6. It appears that the commit broke msan bots.	2024-03-26 07:37:57 -07:00
Akira Hatanaka	8bd1f9116a	[CodeGen][arm64e] Add methods and data members to Address, which are needed to authenticate signed pointers (#67454 ) To authenticate pointers, CodeGen needs access to the key and discriminators that were used to sign the pointer. That information is sometimes known from the context, but not always, which is why `Address` needs to hold that information. This patch adds methods and data members to `Address`, which will be needed in subsequent patches to authenticate signed pointers, and uses the newly added methods throughout CodeGen. Although this patch isn't strictly NFC as it causes CodeGen to use different code paths in some cases (e.g., `mergeAddressesInConditionalExpr`), it doesn't cause any changes in functionality as it doesn't add any information needed for authentication. In addition to the changes mentioned above, this patch introduces class `RawAddress`, which contains a pointer that we know is unsigned, and adds several new functions for creating `Address` and `LValue` objects.	2024-03-25 18:05:42 -07:00
Changpeng Fang	350bda4419	AMDGPU: Rename intrinsics and remove f16/bf16 versions for load transpose (#86313 ) Rename the intrinsics to close to the instruction mnemonic names: Use global_load_tr_b64 and global_load_tr_b128 instead of global_load_tr. This patch also removes f16/bf16 versions of builtins/intrinsics. To simplify the design, we should avoid enumerating all possible types in implementing builtins. We can always use bitcast.	2024-03-25 16:55:22 -07:00
Farzon Lotfi	060df78cdb	[DXIL] Add Float `Dot` Intrinsic Lowering (#86071 ) Completes #83626 - `CGBuiltin.cpp` - modify `getDotProductIntrinsic` to be able to emit `dot2`, `dot3`, and `dot4` intrinsics based on element count - `IntrinsicsDirectX.td` - for floating point add `dot2`, `dot3`, and `dot4` inntrinsics -`DXIL.td` add dxilop intrinsic lowering for `dot2`, `dot3`, & `dot4`. - `DXILOpLowering.cpp` - add vector arg flattening for dot product. - `DXILOpBuilder.h` - modify `createDXILOpCall` to take a smallVector instead of an iterator - `DXILOpBuilder.cpp` - modify `createDXILOpCall` by moving the small vector up to the calling function in `DXILOpLowering.cpp`. - Moving one function up gives us access to the `CallInst` and `Function` which were needed to distinguish the dot product intrinsics and get the operands without using the iterator.	2024-03-25 18:01:46 -04:00
Changpeng Fang	3054d0dae7	AMDGPU: Rename and add bf16 support for global_load_tr builtins (#86202 ) Make the name of a clang builtin as close to the mnemonic instruction name as possible. The data type suffix may not be enough to tell what instruction the builtin is going to produce. This patch also add the bf16 support for global_load_tr_b128 builtins.	2024-03-22 08:51:53 -07:00
OverMighty	c1c2551a28	[clang] Implement __builtin_{clzg,ctzg} (#83431 ) Fixes #83075, fixes #83076.	2024-03-21 09:33:16 -07:00
Yeoul Na	3eb9ff3095	Turn 'counted_by' into a type attribute and parse it into 'CountAttributedType' (#78000 ) In `-fbounds-safety`, bounds annotations are considered type attributes rather than declaration attributes. Constructing them as type attributes allows us to extend the attribute to apply nested pointers, which is essential to annotate functions that involve out parameters: `void foo(int __counted_by(out_count) out_buf, int out_count)`. We introduce a new sugar type to support bounds annotated types, `CountAttributedType`. In order to maintain extra data (the bounds expression and the dependent declaration information) that is not trackable in `AttributedType` we create a new type dedicate to this functionality. This patch also extends the parsing logic to parse the `counted_by` argument as an expression, which will allow us to extend the model to support arguments beyond an identifier, e.g., `__counted_by(n + m)` in the future as specified by `-fbounds-safety`. This also adjusts `__bdos` and array-bounds sanitizer code that already uses `CountedByAttr` to check `CountAttributedType` instead to get the field referred to by the attribute.	2024-03-20 13:36:56 +09:00
Farzon Lotfi	081a66ffac	[DXIL] implement dot intrinsic lowering for integers (#85662 ) this implements part 1 of 2 for #83626 - `CGBuiltin.cpp` - modified to have seperate cases for signed and unsigned integers. - `SemaChecking.cpp` - modified to prevent the generation of a double dot product intrinsic if the builtin were to be called directly. - `IntrinsicsDirectX.td` creation of the signed and unsigned dot intrinsics needed for instruction expansion. - `DXILIntrinsicExpansion.cpp` - handle instruction expansion cases for integer dot product.	2024-03-19 12:03:43 -04:00
Farzon Lotfi	8386a388bd	[HLSL] implement `clamp` intrinsic (#85424 ) closes #70071 - `CGBuiltin.cpp` - Add the unsigned\generic clamp intrinsic emitter. - `IntrinsicsDirectX.td` - add the `dx.clamp` & `dx.uclamp` intrinsics - `DXILIntrinsicExpansion.cpp` - add the `clamp` instruction expansion while maintaining vector form. - `SemaChecking.cpp` - Add `clamp` builtin Sema Checks. - `Builtins.td` - add a `clamp` builtin - `hlsl_intrinsics.h` - add the `clamp` api Why `clamp` as instruction expansion for DXIL? 1. SPIR-V has a GLSL `clamp` extension via: - [FClamp](https://registry.khronos.org/SPIR-V/specs/1.0/GLSL.std.450.html#FClamp) - [UClamp](https://registry.khronos.org/SPIR-V/specs/1.0/GLSL.std.450.html#UClamp) - [SClamp](https://registry.khronos.org/SPIR-V/specs/1.0/GLSL.std.450.html#SClamp) 2. Further Clamp lowers to `min(max( x, min_range ), max_range)` which we have float, signed, & unsigned dixilOps.	2024-03-15 20:57:08 -04:00
Ahmed Bougacha	0481f049c3	[AArch64][PAC] Support ptrauth builtins and -fptrauth-intrinsics. (#65996 ) This defines the basic set of pointer authentication clang builtins (provided in a new header, ptrauth.h), with diagnostics and IRGen support. The availability of the builtins is gated on a new flag, `-fptrauth-intrinsics`. Note that this only includes the basic intrinsics, and notably excludes `ptrauth_sign_constant`, `ptrauth_type_discriminator`, and `ptrauth_string_discriminator`, which need extra logic to be fully supported. This also introduces clang/docs/PointerAuthentication.rst, which describes the ptrauth model in general, in addition to these builtins. Co-Authored-By: Akira Hatanaka <ahatanaka@apple.com> Co-Authored-By: John McCall <rjmccall@apple.com>	2024-03-15 14:17:21 -07:00
Farzon Lotfi	de1a97db39	[DXIL] `exp`, `any`, `lerp`, & `rcp` Intrinsic Lowering (#84526 ) This change implements lowering for #70076, #70100, #70072, & #70102 `CGBuiltin.cpp` - - simplify `lerp` intrinsic `IntrinsicsDirectX.td` - simplify `lerp` intrinsic `SemaChecking.cpp` - remove unnecessary check `DXILIntrinsicExpansion.*` - add intrinsic to instruction expansion cases `DXILOpLowering.cpp` - make sure `DXILIntrinsicExpansion` happens first `DirectX.h` - changes to support new pass `DirectXTargetMachine.cpp` - changes to support new pass Why `any`, and `lerp` as instruction expansion just for DXIL? - SPIR-V there is an [OpAny](https://registry.khronos.org/SPIR-V/specs/unified1/SPIRV.html#OpAny) - SPIR-V has a GLSL lerp extension via [Fmix](https://registry.khronos.org/SPIR-V/specs/1.0/GLSL.std.450.html#FMix) Why `exp` instruction expansion? - We have an `exp2` opcode and `exp` reuses that opcode. So instruction expansion is a convenient way to do preprocessing. - Further SPIR-V has a GLSL exp extension via [Exp](https://registry.khronos.org/SPIR-V/specs/1.0/GLSL.std.450.html#Exp) and [Exp2](https://registry.khronos.org/SPIR-V/specs/1.0/GLSL.std.450.html#Exp2) Why `rcp` as instruction expansion? This one is a bit of the odd man out and might have to move to `cgbuiltins` when we better understand SPIRV requirements. However I included it because it seems like [fast math mode has an AllowRecip flag](https://registry.khronos.org/SPIR-V/specs/unified1/SPIRV.html#_fp_fast_math_mode) which lets you compute the reciprocal without performing the division. We don't have that in DXIL so thought to include it.	2024-03-14 20:25:57 -04:00
Farzon Lotfi	d192b64370	[HLSL] implement the `isinf` intrinsic (#84927 ) This change implements part 1 of 2 for #70095 - `hlsl_intrinsics.h` - add the `isinf` api - `Builtins.td` - add an hlsl builtin for `isinf`. - `CGBuiltin.cpp` add the ir generation for `isinf` intrinsic. - `SemaChecking.cpp` - add a non-math elementwise checks because this is a bool return. - `IntrinsicsDirectX.td` - add an `isinf` intrinsic. `DXIL.td` lowering is left, but changes need to be made there before we can support this case.	2024-03-14 18:07:48 -04:00
Farzon Lotfi	8f9ee39c58	[HLSL] Implement `rsqrt` intrinsic (#84820 ) This change implements #70074 - `hlsl_intrinsics.h` - add the `rsqrt` api - `DXIL.td` add the llvm intrinsic to DXIL op lowering map. - `Builtins.td` - add an hlsl builtin for rsqrt. - `CGBuiltin.cpp` add the ir generation for the rsqrt intrinsic. - `SemaChecking.cpp` - reuse the one arg float only checks. - `IntrinsicsDirectX.td` -add an `rsqrt` intrinsic.	2024-03-14 16:49:33 -04:00
Tim Northover	4299c727e4	AArch64: add __builtin_arm_trap It's useful to provide an indicator code with the trap, which the generic __builtin_trap can't do. asm("brk #N") is an option, but following that with a __builtin_unreachable() leads to two traps when the compiler doesn't know the block can't return. So compiler support like this is useful.	2024-03-14 11:32:44 +00:00
Sven van Haastregt	c7f1a987a6	[OpenCL] Elaborate about BIenqueue_kernel expansion; NFC	2024-03-12 12:53:22 +00:00
Joseph Huber	1fc5e50ceb	[AMDGPU] Implement 'llvm.get.fpenv' and 'llvm.set.fpenv' (#83906 ) Summary: This patch implements the LLVM floating point environment control intrinsics and also exposes it through clang. We encode the floating point environment as a 64-bit value that simply concatenates the values of the mode registers and the current trap status. We only fetch the bits relevant for floating point instructions. That is, rounding mode, denormalization mode, ieee, dx10 clamp, debug, enabled traps, f16 overflow, and active exceptions.	2024-03-06 08:11:54 -06:00
Farzon Lotfi	5a5266248d	[HLSL] implement the rcp intrinsic (#83857 ) This PR implements the frontend for llvm#70100 This PR is part 1 of 2. Part 2 requires an intrinsic to instructions lowering. - `Builtins.td` - add an `rcp` builtin - `CGBuiltin.cpp` - add the builtin to intrinsic lowering - `hlsl_intrinsics.h` - add the `rcp` api - `SemaChecking.cpp` - reuse frac's sema checks - `IntrinsicsDirectX.td` - add the llvm intrinsic	2024-03-05 16:11:13 -05:00
Farzon Lotfi	2807ea6b80	[HLSL] implement the any intrinsic (#83903 ) This PR implements the frontend for #70076 This PR is part 1 of 2. Part 2 requires an intrinsic to instructions lowering. - `Builtins.td` - add an `any` builtin - `CGBuiltin.cpp` add the builtin to intrinsic lowering - `hlsl_basic_types.h` -add the `bool` vectors since that is an input for any - `hlsl_intrinsics.h` - add the `any` api - `SemaChecking.cpp` - addy `any` builtin checking - `IntrinsicsDirectX.td` - add the llvm intrinsic	2024-03-05 12:46:01 -05:00
Farzon Lotfi	643b31dbe8	[HLSL] implement `mad` intrinsic (#83826 ) This change implements #83736 The dot product lowering needs a tertiary multipy add operation. DXIL has three mad opcodes for `fmad`(46), `imad`(48), and `umad`(49). Dot product in DXIL only uses `imad`\ `umad`, but for completeness and because the hlsl `mad` intrinsic requires it `fmad` was also included. Two new intrinsics were needed to be created to complete this change. the `fmad` case already supported by llvm via `fmuladd` intrinsic. - `hlsl_intrinsics.h` - exposed mad api call. - `Builtins.td` - exposed a `mad` builtin. - `Sema.h` - make `tertiary` calls check for float types optional. - `CGBuiltin.cpp` - pick the intrinsic for singed\unsigned & float also reuse `int_fmuladd`. - `SemaChecking.cpp` - type checks for `__builtin_hlsl_mad`. - `IntrinsicsDirectX.td` create the two new intrinsics for `imad`\`umad`/ - `DXIL.td` - create the llvm intrinsic to `DXIL` opcode mapping. --------- Co-authored-by: Farzon Lotfi <farzon@farzon.com>	2024-03-05 12:23:26 -05:00
Qiu Chaofan	906580bad3	[PowerPC] Add intrinsics for rldimi/rlwimi/rlwnm (#82968 ) These builtins are already there in Clang, however current codegen may produce suboptimal results due to their complex behavior. Implement them as intrinsics to ensure expected instructions are emitted.	2024-03-04 21:13:59 +08:00
Pavel Iliin	185b1df1b1	[X86][AArch64][PowerPC] __builtin_cpu_supports accepts unknown options. (#83515 ) The patch fixes https://github.com/llvm/llvm-project/issues/83407 modifing __builtin_cpu_supports behaviour so that it returns false if unsupported features names provided in parameter and issue a warning. __builtin_cpu_supports is target independent, but currently supported by X86, AArch64 and PowerPC only.	2024-03-01 10:12:19 +00:00
Farzon Lotfi	489eadd142	[HLSL] Implementation of the frac intrinsic (#83315 ) This change implements the frontend for #70099 Builtins.td - add the frac builtin CGBuiltin.cpp - add the builtin to DirectX intrinsic mapping hlsl_intrinsics.h - add the frac api SemaChecking.cpp - add type checks for builtin IntrinsicsDirectX.td - add the frac intrinsic The backend changes for this are going to be very simple: `f309a0eb55` They were not included because llvm/lib/Target/DirectX/DXIL.td is going through a major refactor.	2024-02-29 10:40:38 -08:00
Farzon Lotfi	e60ebbd000	[HLSL] implementation of lerp intrinsic (#83077 ) This is the start of implementing the lerp intrinsic https://learn.microsoft.com/en-us/windows/win32/direct3dhlsl/dx-graphics-hlsl-lerp Builtins.td - defines the builtin hlsl_intrinsics.h - defines the lerp api DiagnosticSemaKinds.td - needed a new error to be inclusive for more than two operands. CGBuiltin.cpp - add the lerp intrinsic lowering SemaChecking.cpp - type checks for lerp builtin IntrinsicsDirectX.td - define the lerp intrinsic this change implements the first half of #70102 Co-authored-by: Xiang Li <python3kgae@outlook.com>	2024-02-29 07:01:36 -08:00
OverMighty	21d83324fb	[clang] Implement __builtin_popcountg (#82359 ) Fixes #82058.	2024-02-26 13:59:42 -08:00
Farzon Lotfi	82acec15af	[HLSL] Implementation of dot intrinsic (#81190 ) This change implements https://github.com/llvm/llvm-project/issues/70073 HLSL has a dot intrinsic defined here: https://learn.microsoft.com/en-us/windows/win32/direct3dhlsl/dx-graphics-hlsl-dot The intrinsic itself is defined as a HLSL_LANG LangBuiltin in Builtins.td. This is used to associate all the dot product typdef defined hlsl_intrinsics.h with a single intrinsic check in CGBuiltin.cpp & SemaChecking.cpp. In IntrinsicsDirectX.td we define the llvmIR for the dot product. A few goals were in mind for this IR. First it should operate on only vectors. Second the return type should be the vector element type. Third the second parameter vector should be of the same size as the first parameter. Finally `a dot b` should be the same as `b dot a`. In CGBuiltin.cpp hlsl has built on top of existing clang intrinsics via EmitBuiltinExpr. Dot product though is language specific intrinsic and so is guarded behind getLangOpts().HLSL. The call chain looks like this: EmitBuiltinExpr -> EmitHLSLBuiltinExp EmitHLSLBuiltinExp dot product intrinsics makes a destinction between vectors and scalars. This is because HLSL supports dot product on scalars which simplifies down to multiply. Sema.h & SemaChecking.cpp saw the addition of CheckHLSLBuiltinFunctionCall, a language specific semantic validation that can be expanded for other hlsl specific intrinsics. Fixes #70073	2024-02-26 10:08:59 -06:00
Pavel Iliin	568babab7e	[AArch64] Implement __builtin_cpu_supports, compiler-rt tests. (#82378 ) The patch complements https://github.com/llvm/llvm-project/pull/68919 and adds AArch64 support for builtin `__builtin_cpu_supports("feature1+...+featureN")` which return true if all specified CPU features in argument are detected. Also compiler-rt aarch64 native run tests for features detection mechanism were added and 'cpu_model' check was fixed after its refactor merged https://github.com/llvm/llvm-project/pull/75635 Original RFC was https://reviews.llvm.org/D153153	2024-02-22 23:33:54 +00:00
zhijian lin	5b8e5604c2	[AIX] Lower intrinsic __builtin_cpu_is into AIX platform-specific code. (#80069 ) On AIX OS, __builtin_cpu_is() references the runtime external variable _system_configuration from /usr/include/sys/systemcfg.h. ref issue: https://github.com/llvm/llvm-project/issues/80042	2024-02-22 08:46:08 -05:00
Pierrick Bouvier	0ea64ad88a	[COFF][Aarch64] Add _InterlockedAdd64 intrinsic (#81849 ) Found when compiling openssl master branch using clang-cl. This commit introduces usage of InterlockedAdd64: `d0e1a0ae70` https://learn.microsoft.com/en-us/cpp/intrinsics/interlockedadd-intrinsic-functions	2024-02-16 13:20:08 +02:00
Shilei Tian	630f82ec0c	[Clang][CodeGen] Loose the cast check when emitting builtins (#81669 ) This patch looses the cast check (`canLosslesslyBitCastTo`) and leaves it to the one inside `CreateBitCast`. It seems too conservative for the use case here.	2024-02-14 12:59:59 -05:00
Joseph Huber	11fcae69db	[LLVM] Add `__builtin_readsteadycounter` intrinsic and builtin for realtime clocks (#81331 ) Summary: This patch adds a new intrinsic and builtin function mirroring the existing `__builtin_readcyclecounter`. The difference is that this implementation targets a separate counter that some targets have which returns a fixed frequency clock that can be used to determine elapsed time, this is different compared to the cycle counter which often has variable frequency. This patch only adds support for the NVPTX and AMDGPU targets. This is done as a new and separate builtin rather than an argument to `readcyclecounter` to avoid needing to change existing code and to make the separation more explicit.	2024-02-13 10:06:25 -06:00
Shilei Tian	c4b0dfcc99	[Clang] Fix a non-effective assertion (#81083 ) `PTy` here is literally `FTy->getParamType(i)`, which makes this assertion not work as expected.	2024-02-08 09:44:42 -05:00
Mészáros Gergely	5942868a21	[clang][AMDGPU][CUDA] Handle __builtin_printf for device printf (#68515 ) Previously `__builtin_printf` would result to emitting call to `printf`, even though directly calling `printf` was translated. Ref: #68478	2024-02-05 23:23:13 +05:30
Pierre van Houtryve	500846d2f5	[AMDGPU] Introduce Code Object V6 (#76954 ) Introduce Code Object V6 in Clang, LLD, Flang and LLVM. This is the same as V5 except a new "generic version" flag can be present in EFLAGS. This is related to new generic targets that'll be added in a follow-up patch. It's also likely V6 will have new changes (possibly new metadata entries) added later. Docs change are part of the follow-up patch #76955	2024-02-05 08:19:53 +01:00
Sander de Smalen	d313614b60	[AArch64] Replace LLVM IR function attributes for PSTATE.ZA. (#79166 ) Since https://github.com/ARM-software/acle/pull/276 the ACLE defines attributes to better describe the use of a given SME state. Previously the attributes merely described the possibility of it being 'shared' or 'preserved', whereas the new attributes have more semantics and also describe how the data flows through the program. For ZT0 we already had to add new LLVM IR attributes: * aarch64_new_zt0 * aarch64_in_zt0 * aarch64_out_zt0 * aarch64_inout_zt0 * aarch64_preserves_zt0 We have now done the same for ZA, such that we add: * aarch64_new_za (previously `aarch64_pstate_za_new`) * aarch64_in_za (more specific variation of `aarch64_pstate_za_shared`) * aarch64_out_za (more specific variation of `aarch64_pstate_za_shared`) * aarch64_inout_za (more specific variation of `aarch64_pstate_za_shared`) * aarch64_preserves_za (previously `aarch64_pstate_za_shared, aarch64_pstate_za_preserved`) This explicitly removes 'pstate' from the name, because with SME2 and the new ACLE attributes there is a difference between "sharing ZA" (sharing the ZA matrix register with the caller) and "sharing PSTATE.ZA" (sharing either the ZA or ZT0 register, both part of PSTATE.ZA with the caller).	2024-02-01 13:37:37 +00:00
Nemanja Ivanovic	67c1c1dbb6	[PowerPC][X86] Make cpu id builtins target independent and lower for PPC (#68919 ) Make __builtin_cpu_{init\|supports\|is} target independent and provide an opt-in query for targets that want to support it. Each target is still responsible for their specific lowering/code-gen. Also provide code-gen for PowerPC. I originally proposed this in https://reviews.llvm.org/D152914 and this addresses the comments I received there. --------- Co-authored-by: Nemanja Ivanovic <nemanjaivanovic@nemanjas-air.kpn> Co-authored-by: Nemanja Ivanovic <nemanja@synopsys.com>	2024-01-26 11:24:50 -05:00
Vojislav Tomasevic	2a77d92e2e	[clang] Incorrect IR involving the use of bcopy (#79298 ) This patch addresses the issue regarding the call of bcopy function in a conditional expression. It is analogous to the already accepted patch which deals with the same problem, just regarding the bzero function [0]. Here is the testcase which illustrates the issue: ``` void bcopy(const void , void , unsigned long); void foo(void); void test_bcopy() { char dst[20]; char src[20]; int _sz = 20, len = 20; return (_sz ? ((_sz >= len) ? bcopy(src, dst, len) : foo()) : bcopy(src, dst, len)); } ``` When processing it with clang, following issue occurs: Instruction does not dominate all uses! %arraydecay2 = getelementptr inbounds [20 x i8], ptr %dst, i64 0, i64 0, !dbg !38 %cond = phi ptr [ %arraydecay2, %cond.end ], [ %arraydecay5, %cond.false3 ], !dbg !33 fatal error: error in backend: Broken module found, compilation aborted! This happens because an incorrect phi node is created. It is created because bcopy function call is lowered to the call of llvm.memmove intrinsic and function memmove returns void *. Since llvm.memmove is called in two places in the same return statement, clang creates a phi node in the final basic block for the return value and that phi node is incorrect. However, bcopy function should return void in the first place, so this phi node is unnecessary. This is what this patch addresses. An appropriate test is also added and no existing tests fail when applying this patch. Also, this crash only happens when LLVM is configured with -DLLVM_ENABLE_ASSERTIONS=On option. [0] https://reviews.llvm.org/D39746	2024-01-24 09:39:36 -08:00
Mirko Brkušanin	7fdf608cef	[AMDGPU] Add GFX12 WMMA and SWMMAC instructions (#77795 ) Co-authored-by: Petar Avramovic <Petar.Avramovic@amd.com> Co-authored-by: Piotr Sobczak <piotr.sobczak@amd.com>	2024-01-24 13:43:07 +01:00
Matthew Devereau	6ba62f4f25	[AArch64][SME2] Refine fcvtu/fcvts/scvtf/ucvtf (#77947 ) Rename intrinsics for fcvtu to fcvtzu and fcvts to fcvtzs. Use llvm_anyvector_ty for both multi vector returns and operands, therefore the return and operands can be specified in the intrinsic call, e.g. @llvm.aarch64.sve.scvtf.x4.nxv4f32.nxv4i32	2024-01-22 15:11:49 +00:00
Piotr Sobczak	57f6a3f7ea	[AMDGPU] Add global_load_tr for GFX12 (#77772 ) Support new amdgcn_global_load_tr instructions for load with transpose. * MC layer support for GLOBAL_LOAD_TR_B64/GLOBAL_LOAD_TR_B128 * Intrinsic int_amdgcn_global_load_tr * Clang builtins amdgcn_global_load_tr*	2024-01-18 15:14:42 +01:00
Mikael Holmen	e6bd9835d9	[clang][CodeGen] Fix gcc warning about unused variable [NFC] Without the fix gcc warned with ../../clang/lib/CodeGen/CGBuiltin.cpp:1022:19: warning: unused variable 'DRE' [-Wunused-variable] 1022 \| if (const auto *DRE = dyn_cast<DeclRefExpr>(Base)) { \| ^~~ Fix the warning by removing the unused variable and change the "dyn_cast" to "isa".	2024-01-17 13:23:08 +01:00
Bill Wendling	00b6d032a2	[Clang] Implement the 'counted_by' attribute (#76348 ) The 'counted_by' attribute is used on flexible array members. The argument for the attribute is the name of the field member holding the count of elements in the flexible array. This information is used to improve the results of the array bound sanitizer and the '__builtin_dynamic_object_size' builtin. The 'count' field member must be within the same non-anonymous, enclosing struct as the flexible array member. For example: ``` struct bar; struct foo { int count; struct inner { struct { int count; /* The 'count' referenced by 'counted_by' / }; struct { / ... / struct bar array[] __attribute__((counted_by(count))); }; } baz; }; ``` This example specifies that the flexible array member 'array' has the number of elements allocated for it in 'count': ``` struct bar; struct foo { size_t count; /* ... / struct bar array[] __attribute__((counted_by(count))); }; ``` This establishes a relationship between 'array' and 'count'; specifically that 'p->array' must have at least 'p->count' number of elements available. It's the user's responsibility to ensure that this relationship is maintained throughout changes to the structure. In the following, the allocated array erroneously has fewer elements than what's specified by 'p->count'. This would result in an out-of-bounds access not not being detected: ``` struct foo p; void foo_alloc(size_t count) { p = malloc(MAX(sizeof(struct foo), offsetof(struct foo, array[0]) + count sizeof(struct bar ))); p->count = count + 42; } ``` The next example updates 'p->count', breaking the relationship requirement that 'p->array' must have at least 'p->count' number of elements available: ``` void use_foo(int index, int val) { p->count += 42; p->array[index] = val; / The sanitizer can't properly check this access */ } ``` In this example, an update to 'p->count' maintains the relationship requirement: ``` void use_foo(int index, int val) { if (p->count == 0) return; --p->count; p->array[index] = val; } ```	2024-01-16 14:26:12 -08:00
Craig Topper	142f270c27	Recommit "[AST] Use APIntStorage to fix memory leak in EnumConstantDecl. (#78311 )" With lldb build fix. Original message: EnumConstantDecl is allocated by the ASTContext allocator so the destructor is never called. This patch takes a similar approach to IntegerLiteral by using APIntStorage to allocate large APSInts using the ASTContext allocator as well. The downside is that an additional heap allocation and copy of the data needs to be made when calling getInitValue if the APSInt is large. Fixes #78160.	2024-01-16 13:52:17 -08:00
Craig Topper	f3d534c425	Revert "[AST] Use APIntStorage to fix memory leak in EnumConstantDecl. (#78311 )" This reverts commit 4737959d91fab7673b1bb642f88658bb2a24d723. Missed an lldb update.	2024-01-16 12:39:47 -08:00
Craig Topper	4737959d91	[AST] Use APIntStorage to fix memory leak in EnumConstantDecl. (#78311 ) EnumConstantDecl is allocated by the ASTContext allocator so the destructor is never called. This patch takes a similar approach to IntegerLiteral by using APIntStorage to allocate large APSInts using the ASTContext allocator as well. The downside is that an additional heap allocation and copy of the data needs to be made when calling getInitValue if the APSInt is large. Fixes #78160.	2024-01-16 12:10:38 -08:00
Rashmi Mudduluru	a511c1a9ec	Revert "[Clang] Implement the 'counted_by' attribute (#76348 )" This reverts commit 164f85db876e61cf4a3c34493ed11e8f5820f968.	2024-01-15 18:37:52 -08:00
Bill Wendling	164f85db87	[Clang] Implement the 'counted_by' attribute (#76348 ) The 'counted_by' attribute is used on flexible array members. The argument for the attribute is the name of the field member holding the count of elements in the flexible array. This information is used to improve the results of the array bound sanitizer and the '__builtin_dynamic_object_size' builtin. The 'count' field member must be within the same non-anonymous, enclosing struct as the flexible array member. For example: ``` struct bar; struct foo { int count; struct inner { struct { int count; /* The 'count' referenced by 'counted_by' / }; struct { / ... / struct bar array[] __attribute__((counted_by(count))); }; } baz; }; ``` This example specifies that the flexible array member 'array' has the number of elements allocated for it in 'count': ``` struct bar; struct foo { size_t count; /* ... / struct bar array[] __attribute__((counted_by(count))); }; ``` This establishes a relationship between 'array' and 'count'; specifically that 'p->array' must have at least 'p->count' number of elements available. It's the user's responsibility to ensure that this relationship is maintained throughout changes to the structure. In the following, the allocated array erroneously has fewer elements than what's specified by 'p->count'. This would result in an out-of-bounds access not not being detected: ``` struct foo p; void foo_alloc(size_t count) { p = malloc(MAX(sizeof(struct foo), offsetof(struct foo, array[0]) + count sizeof(struct bar ))); p->count = count + 42; } ``` The next example updates 'p->count', breaking the relationship requirement that 'p->array' must have at least 'p->count' number of elements available: ``` void use_foo(int index, int val) { p->count += 42; p->array[index] = val; / The sanitizer can't properly check this access */ } ``` In this example, an update to 'p->count' maintains the relationship requirement: ``` void use_foo(int index, int val) { if (p->count == 0) return; --p->count; p->array[index] = val; } ```	2024-01-10 22:20:31 -08:00
Nico Weber	2dce77201c	Revert "[Clang] Implement the 'counted_by' attribute (#76348 )" This reverts commit fefdef808c230c79dca2eb504490ad0f17a765a5. Breaks check-clang, see https://github.com/llvm/llvm-project/pull/76348#issuecomment-1886029515 Also revert follow-on "[Clang] Update 'counted_by' documentation" This reverts commit 4a3fb9ce27dda17e97341f28005a28836c909cfc.	2024-01-10 21:05:19 -05:00

1 2 3 4 5 ...

1892 Commits