To authenticate pointers, CodeGen needs access to the key and
discriminators that were used to sign the pointer. That information is
sometimes known from the context, but not always, which is why `Address`
needs to hold that information.
This patch adds methods and data members to `Address`, which will be
needed in subsequent patches to authenticate signed pointers, and uses
the newly added methods throughout CodeGen. Although this patch isn't
strictly NFC as it causes CodeGen to use different code paths in some
cases (e.g., `mergeAddressesInConditionalExpr`), it doesn't cause any
changes in functionality as it doesn't add any information needed for
authentication.
In addition to the changes mentioned above, this patch introduces class
`RawAddress`, which contains a pointer that we know is unsigned, and
adds several new functions for creating `Address` and `LValue` objects.
This reapplies 8bd1f9116aab879183f34707e6d21c7051d083b6. The commit
broke msan bots because LValue::IsKnownNonNull was uninitialized.
Currently, the builtins used for implementing `va_list` handling
unconditionally take their arguments as unqualified `ptr`s i.e. pointers
to AS 0. This does not work for targets where the default AS is not 0 or
AS 0 is not a viable AS (for example, a target might choose 0 to
represent the constant address space). This patch changes the builtins'
signature to take generic `anyptr` args, which corrects this issue. It
is noisy due to the number of tests affected. A test for an upstream
target which does not use 0 as its default AS (SPIRV for HIP device
compilations) is added as well.
To authenticate pointers, CodeGen needs access to the key and
discriminators that were used to sign the pointer. That information is
sometimes known from the context, but not always, which is why `Address`
needs to hold that information.
This patch adds methods and data members to `Address`, which will be
needed in subsequent patches to authenticate signed pointers, and uses
the newly added methods throughout CodeGen. Although this patch isn't
strictly NFC as it causes CodeGen to use different code paths in some
cases (e.g., `mergeAddressesInConditionalExpr`), it doesn't cause any
changes in functionality as it doesn't add any information needed for
authentication.
In addition to the changes mentioned above, this patch introduces class
`RawAddress`, which contains a pointer that we know is unsigned, and
adds several new functions for creating `Address` and `LValue` objects.
Rename the intrinsics to close to the instruction mnemonic names:
Use global_load_tr_b64 and global_load_tr_b128 instead of
global_load_tr.
This patch also removes f16/bf16 versions of builtins/intrinsics. To
simplify the design, we should avoid enumerating all possible types in
implementing builtins. We can always use bitcast.
Completes #83626
- `CGBuiltin.cpp` - modify `getDotProductIntrinsic` to be able to emit
`dot2`, `dot3`, and `dot4` intrinsics based on element count
- `IntrinsicsDirectX.td` - for floating point add `dot2`, `dot3`, and
`dot4` inntrinsics -`DXIL.td` add dxilop intrinsic lowering for `dot2`,
`dot3`, & `dot4`.
- `DXILOpLowering.cpp` - add vector arg flattening for dot product.
- `DXILOpBuilder.h` - modify `createDXILOpCall` to take a smallVector
instead of an iterator
- `DXILOpBuilder.cpp` - modify `createDXILOpCall` by moving the small
vector up to the calling function in `DXILOpLowering.cpp`.
- Moving one function up gives us access to the `CallInst` and
`Function` which were needed to distinguish the dot product intrinsics
and get the operands without using the iterator.
Make the name of a clang builtin as close to the mnemonic instruction
name as possible. The data type suffix may not be enough to tell what
instruction the builtin is going to produce.
This patch also add the bf16 support for global_load_tr_b128 builtins.
In `-fbounds-safety`, bounds annotations are considered type attributes
rather than declaration attributes. Constructing them as type attributes
allows us to extend the attribute to apply nested pointers, which is
essential to annotate functions that involve out parameters: `void
foo(int *__counted_by(*out_count) *out_buf, int *out_count)`.
We introduce a new sugar type to support bounds annotated types,
`CountAttributedType`. In order to maintain extra data (the bounds
expression and the dependent declaration information) that is not
trackable in `AttributedType` we create a new type dedicate to this
functionality.
This patch also extends the parsing logic to parse the `counted_by`
argument as an expression, which will allow us to extend the model to
support arguments beyond an identifier, e.g., `__counted_by(n + m)` in
the future as specified by `-fbounds-safety`.
This also adjusts `__bdos` and array-bounds sanitizer code that already
uses `CountedByAttr` to check `CountAttributedType` instead to get the
field referred to by the attribute.
this implements part 1 of 2 for #83626
- `CGBuiltin.cpp` - modified to have seperate cases for signed and
unsigned integers.
- `SemaChecking.cpp` - modified to prevent the generation of a double
dot product intrinsic if the builtin were to be called directly.
- `IntrinsicsDirectX.td` creation of the signed and unsigned dot
intrinsics needed for instruction expansion.
- `DXILIntrinsicExpansion.cpp` - handle instruction expansion cases for
integer dot product.
This defines the basic set of pointer authentication clang builtins
(provided in a new header, ptrauth.h), with diagnostics and IRGen
support. The availability of the builtins is gated on a new flag,
`-fptrauth-intrinsics`.
Note that this only includes the basic intrinsics, and notably excludes
`ptrauth_sign_constant`, `ptrauth_type_discriminator`, and
`ptrauth_string_discriminator`, which need extra logic to be fully
supported.
This also introduces clang/docs/PointerAuthentication.rst, which
describes the ptrauth model in general, in addition to these builtins.
Co-Authored-By: Akira Hatanaka <ahatanaka@apple.com>
Co-Authored-By: John McCall <rjmccall@apple.com>
This change implements lowering for #70076, #70100, #70072, & #70102
`CGBuiltin.cpp` - - simplify `lerp` intrinsic
`IntrinsicsDirectX.td` - simplify `lerp` intrinsic
`SemaChecking.cpp` - remove unnecessary check
`DXILIntrinsicExpansion.*` - add intrinsic to instruction expansion
cases
`DXILOpLowering.cpp` - make sure `DXILIntrinsicExpansion` happens first
`DirectX.h` - changes to support new pass
`DirectXTargetMachine.cpp` - changes to support new pass
Why `any`, and `lerp` as instruction expansion just for DXIL?
- SPIR-V there is an
[OpAny](https://registry.khronos.org/SPIR-V/specs/unified1/SPIRV.html#OpAny)
- SPIR-V has a GLSL lerp extension via
[Fmix](https://registry.khronos.org/SPIR-V/specs/1.0/GLSL.std.450.html#FMix)
Why `exp` instruction expansion?
- We have an `exp2` opcode and `exp` reuses that opcode. So instruction
expansion is a convenient way to do preprocessing.
- Further SPIR-V has a GLSL exp extension via
[Exp](https://registry.khronos.org/SPIR-V/specs/1.0/GLSL.std.450.html#Exp)
and
[Exp2](https://registry.khronos.org/SPIR-V/specs/1.0/GLSL.std.450.html#Exp2)
Why `rcp` as instruction expansion?
This one is a bit of the odd man out and might have to move to
`cgbuiltins` when we better understand SPIRV requirements. However I
included it because it seems like [fast math mode has an AllowRecip
flag](https://registry.khronos.org/SPIR-V/specs/unified1/SPIRV.html#_fp_fast_math_mode)
which lets you compute the reciprocal without performing the division.
We don't have that in DXIL so thought to include it.
This change implements part 1 of 2 for #70095
- `hlsl_intrinsics.h` - add the `isinf` api
- `Builtins.td` - add an hlsl builtin for `isinf`.
- `CGBuiltin.cpp` add the ir generation for `isinf` intrinsic.
- `SemaChecking.cpp` - add a non-math elementwise checks because this is
a bool return.
- `IntrinsicsDirectX.td` - add an `isinf` intrinsic.
`DXIL.td` lowering is left, but changes need to be made there before we
can support this case.
This change implements #70074
- `hlsl_intrinsics.h` - add the `rsqrt` api
- `DXIL.td` add the llvm intrinsic to DXIL op lowering map.
- `Builtins.td` - add an hlsl builtin for rsqrt.
- `CGBuiltin.cpp` add the ir generation for the rsqrt intrinsic.
- `SemaChecking.cpp` - reuse the one arg float only checks.
- `IntrinsicsDirectX.td` -add an `rsqrt` intrinsic.
It's useful to provide an indicator code with the trap, which the generic
__builtin_trap can't do. asm("brk #N") is an option, but following that with a
__builtin_unreachable() leads to two traps when the compiler doesn't know the
block can't return. So compiler support like this is useful.
Summary:
This patch implements the LLVM floating point environment control
intrinsics and also exposes it through clang. We encode the floating
point environment as a 64-bit value that simply concatenates the values
of the mode registers and the current trap status. We only fetch the
bits relevant for floating point instructions. That is, rounding mode,
denormalization mode, ieee, dx10 clamp, debug, enabled traps, f16
overflow, and active exceptions.
This PR implements the frontend for llvm#70100
This PR is part 1 of 2.
Part 2 requires an intrinsic to instructions lowering.
- `Builtins.td` - add an `rcp` builtin
- `CGBuiltin.cpp` - add the builtin to intrinsic lowering
- `hlsl_intrinsics.h` - add the `rcp` api
- `SemaChecking.cpp` - reuse frac's sema checks
- `IntrinsicsDirectX.td` - add the llvm intrinsic
This PR implements the frontend for #70076
This PR is part 1 of 2.
Part 2 requires an intrinsic to instructions lowering.
- `Builtins.td` - add an `any` builtin
- `CGBuiltin.cpp` add the builtin to intrinsic lowering
- `hlsl_basic_types.h` -add the `bool` vectors since that is an input
for any
- `hlsl_intrinsics.h` - add the `any` api
- `SemaChecking.cpp` - addy `any` builtin checking
- `IntrinsicsDirectX.td` - add the llvm intrinsic
This change implements #83736
The dot product lowering needs a tertiary multipy add operation. DXIL
has three mad opcodes for `fmad`(46), `imad`(48), and `umad`(49). Dot
product in DXIL only uses `imad`\ `umad`, but for completeness and
because the hlsl `mad` intrinsic requires it `fmad` was also included.
Two new intrinsics were needed to be created to complete this change.
the `fmad` case already supported by llvm via `fmuladd` intrinsic.
- `hlsl_intrinsics.h` - exposed mad api call.
- `Builtins.td` - exposed a `mad` builtin.
- `Sema.h` - make `tertiary` calls check for float types optional.
- `CGBuiltin.cpp` - pick the intrinsic for singed\unsigned & float also
reuse `int_fmuladd`.
- `SemaChecking.cpp` - type checks for `__builtin_hlsl_mad`.
- `IntrinsicsDirectX.td` create the two new intrinsics for
`imad`\`umad`/
- `DXIL.td` - create the llvm intrinsic to `DXIL` opcode mapping.
---------
Co-authored-by: Farzon Lotfi <farzon@farzon.com>
These builtins are already there in Clang, however current codegen may
produce suboptimal results due to their complex behavior. Implement them
as intrinsics to ensure expected instructions are emitted.
The patch fixes https://github.com/llvm/llvm-project/issues/83407
modifing __builtin_cpu_supports behaviour so that it returns false if
unsupported features names provided in parameter and issue a warning.
__builtin_cpu_supports is target independent, but currently supported by
X86, AArch64 and PowerPC only.
This change implements the frontend for #70099
Builtins.td - add the frac builtin
CGBuiltin.cpp - add the builtin to DirectX intrinsic mapping
hlsl_intrinsics.h - add the frac api
SemaChecking.cpp - add type checks for builtin
IntrinsicsDirectX.td - add the frac intrinsic
The backend changes for this are going to be very simple:
f309a0eb55
They were not included because llvm/lib/Target/DirectX/DXIL.td is going
through a major refactor.
This is the start of implementing the lerp intrinsic
https://learn.microsoft.com/en-us/windows/win32/direct3dhlsl/dx-graphics-hlsl-lerp
Builtins.td - defines the builtin
hlsl_intrinsics.h - defines the lerp api
DiagnosticSemaKinds.td - needed a new error to be inclusive for more
than two operands.
CGBuiltin.cpp - add the lerp intrinsic lowering
SemaChecking.cpp - type checks for lerp builtin
IntrinsicsDirectX.td - define the lerp intrinsic
this change implements the first half of #70102
Co-authored-by: Xiang Li <python3kgae@outlook.com>
This change implements https://github.com/llvm/llvm-project/issues/70073
HLSL has a dot intrinsic defined here:
https://learn.microsoft.com/en-us/windows/win32/direct3dhlsl/dx-graphics-hlsl-dot
The intrinsic itself is defined as a HLSL_LANG LangBuiltin in
Builtins.td.
This is used to associate all the dot product typdef defined
hlsl_intrinsics.h
with a single intrinsic check in CGBuiltin.cpp & SemaChecking.cpp.
In IntrinsicsDirectX.td we define the llvmIR for the dot product.
A few goals were in mind for this IR. First it should operate on only
vectors. Second the return type should be the vector element type. Third
the second parameter vector should be of the same size as the first
parameter. Finally `a dot b` should be the same as `b dot a`.
In CGBuiltin.cpp hlsl has built on top of existing clang intrinsics via
EmitBuiltinExpr. Dot
product though is language specific intrinsic and so is guarded behind
getLangOpts().HLSL.
The call chain looks like this: EmitBuiltinExpr -> EmitHLSLBuiltinExp
EmitHLSLBuiltinExp dot product intrinsics makes a destinction
between vectors and scalars. This is because HLSL supports dot product
on scalars which simplifies down to multiply.
Sema.h & SemaChecking.cpp saw the addition of
CheckHLSLBuiltinFunctionCall, a language specific semantic validation
that can be expanded for other hlsl specific intrinsics.
Fixes#70073
This patch looses the cast check (`canLosslesslyBitCastTo`) and leaves
it to the
one inside `CreateBitCast`. It seems too conservative for the use case
here.
Summary:
This patch adds a new intrinsic and builtin function mirroring the
existing `__builtin_readcyclecounter`. The difference is that this
implementation targets a separate counter that some targets have which
returns a fixed frequency clock that can be used to determine elapsed
time, this is different compared to the cycle counter which often has
variable frequency.
This patch only adds support for the NVPTX and AMDGPU targets.
This is done as a new and separate builtin rather than an argument to
`readcyclecounter` to avoid needing to change existing code and to make
the separation more explicit.
Introduce Code Object V6 in Clang, LLD, Flang and LLVM. This is the same
as V5 except a new "generic version" flag can be present in EFLAGS. This
is related to new generic targets that'll be added in a follow-up patch.
It's also likely V6 will have new changes (possibly new metadata
entries) added later.
Docs change are part of the follow-up patch #76955
Since https://github.com/ARM-software/acle/pull/276 the ACLE
defines attributes to better describe the use of a given SME state.
Previously the attributes merely described the possibility of it being
'shared' or 'preserved', whereas the new attributes have more semantics
and also describe how the data flows through the program.
For ZT0 we already had to add new LLVM IR attributes:
* aarch64_new_zt0
* aarch64_in_zt0
* aarch64_out_zt0
* aarch64_inout_zt0
* aarch64_preserves_zt0
We have now done the same for ZA, such that we add:
* aarch64_new_za (previously `aarch64_pstate_za_new`)
* aarch64_in_za (more specific variation of `aarch64_pstate_za_shared`)
* aarch64_out_za (more specific variation of `aarch64_pstate_za_shared`)
* aarch64_inout_za (more specific variation of
`aarch64_pstate_za_shared`)
* aarch64_preserves_za (previously `aarch64_pstate_za_shared,
aarch64_pstate_za_preserved`)
This explicitly removes 'pstate' from the name, because with SME2 and
the new ACLE attributes there is a difference between "sharing ZA"
(sharing
the ZA matrix register with the caller) and "sharing PSTATE.ZA" (sharing
either the ZA or ZT0 register, both part of PSTATE.ZA with the caller).
Make __builtin_cpu_{init|supports|is} target independent and provide an
opt-in query for targets that want to support it. Each target is still
responsible for their specific lowering/code-gen. Also provide code-gen
for PowerPC.
I originally proposed this in https://reviews.llvm.org/D152914 and this
addresses the comments I received there.
---------
Co-authored-by: Nemanja Ivanovic <nemanjaivanovic@nemanjas-air.kpn>
Co-authored-by: Nemanja Ivanovic <nemanja@synopsys.com>
This patch addresses the issue regarding the call of bcopy function in a
conditional expression.
It is analogous to the already accepted patch which deals with the same
problem, just regarding the bzero function [0].
Here is the testcase which illustrates the issue:
```
void bcopy(const void *, void *, unsigned long);
void foo(void);
void test_bcopy() {
char dst[20];
char src[20];
int _sz = 20, len = 20;
return (_sz
? ((_sz >= len)
? bcopy(src, dst, len)
: foo())
: bcopy(src, dst, len));
}
```
When processing it with clang, following issue occurs:
Instruction does not dominate all uses!
%arraydecay2 = getelementptr inbounds [20 x i8], ptr %dst, i64 0, i64 0,
!dbg !38
%cond = phi ptr [ %arraydecay2, %cond.end ], [ %arraydecay5,
%cond.false3 ], !dbg !33
fatal error: error in backend: Broken module found, compilation aborted!
This happens because an incorrect phi node is created. It is created
because bcopy function call is lowered to the call of llvm.memmove
intrinsic and function memmove returns void *. Since llvm.memmove is
called in two places in the same return statement, clang creates a phi
node in the final basic block for the return value and that phi node is
incorrect. However, bcopy function should return void in the first
place, so this phi node is unnecessary. This is what this patch
addresses. An appropriate test is also added and no existing tests fail
when applying this patch.
Also, this crash only happens when LLVM is configured with
-DLLVM_ENABLE_ASSERTIONS=On option.
[0] https://reviews.llvm.org/D39746
Rename intrinsics for fcvtu to fcvtzu and fcvts to fcvtzs.
Use llvm_anyvector_ty for both multi vector returns and operands,
therefore the return and operands can be specified in the intrinsic
call, e.g.
@llvm.aarch64.sve.scvtf.x4.nxv4f32.nxv4i32
Support new amdgcn_global_load_tr instructions for load with transpose.
* MC layer support for GLOBAL_LOAD_TR_B64/GLOBAL_LOAD_TR_B128
* Intrinsic int_amdgcn_global_load_tr
* Clang builtins amdgcn_global_load_tr*
Without the fix gcc warned with
../../clang/lib/CodeGen/CGBuiltin.cpp:1022:19: warning: unused variable 'DRE' [-Wunused-variable]
1022 | if (const auto *DRE = dyn_cast<DeclRefExpr>(Base)) {
| ^~~
Fix the warning by removing the unused variable and change the "dyn_cast"
to "isa".
The 'counted_by' attribute is used on flexible array members. The
argument for the attribute is the name of the field member holding the
count of elements in the flexible array. This information is used to
improve the results of the array bound sanitizer and the
'__builtin_dynamic_object_size' builtin. The 'count' field member must
be within the same non-anonymous, enclosing struct as the flexible array
member. For example:
```
struct bar;
struct foo {
int count;
struct inner {
struct {
int count; /* The 'count' referenced by 'counted_by' */
};
struct {
/* ... */
struct bar *array[] __attribute__((counted_by(count)));
};
} baz;
};
```
This example specifies that the flexible array member 'array' has the
number of elements allocated for it in 'count':
```
struct bar;
struct foo {
size_t count;
/* ... */
struct bar *array[] __attribute__((counted_by(count)));
};
```
This establishes a relationship between 'array' and 'count';
specifically that 'p->array' must have *at least* 'p->count' number of
elements available. It's the user's responsibility to ensure that this
relationship is maintained throughout changes to the structure.
In the following, the allocated array erroneously has fewer elements
than what's specified by 'p->count'. This would result in an
out-of-bounds access not not being detected:
```
struct foo *p;
void foo_alloc(size_t count) {
p = malloc(MAX(sizeof(struct foo),
offsetof(struct foo, array[0]) + count *
sizeof(struct bar *)));
p->count = count + 42;
}
```
The next example updates 'p->count', breaking the relationship
requirement that 'p->array' must have at least 'p->count' number of
elements available:
```
void use_foo(int index, int val) {
p->count += 42;
p->array[index] = val; /* The sanitizer can't properly check this access */
}
```
In this example, an update to 'p->count' maintains the relationship
requirement:
```
void use_foo(int index, int val) {
if (p->count == 0)
return;
--p->count;
p->array[index] = val;
}
```
With lldb build fix.
Original message:
EnumConstantDecl is allocated by the ASTContext allocator so the
destructor is never called.
This patch takes a similar approach to IntegerLiteral by using
APIntStorage to allocate large APSInts using the ASTContext allocator as
well.
The downside is that an additional heap allocation and copy of the data
needs to be made when calling getInitValue if the APSInt is large.
Fixes#78160.
EnumConstantDecl is allocated by the ASTContext allocator so the
destructor is never called.
This patch takes a similar approach to IntegerLiteral by using
APIntStorage to allocate large APSInts using the ASTContext allocator as
well.
The downside is that an additional heap allocation and copy of the data
needs to be made when calling getInitValue if the APSInt is large.
Fixes#78160.
The 'counted_by' attribute is used on flexible array members. The
argument for the attribute is the name of the field member holding the
count of elements in the flexible array. This information is used to
improve the results of the array bound sanitizer and the
'__builtin_dynamic_object_size' builtin. The 'count' field member must
be within the same non-anonymous, enclosing struct as the flexible array
member. For example:
```
struct bar;
struct foo {
int count;
struct inner {
struct {
int count; /* The 'count' referenced by 'counted_by' */
};
struct {
/* ... */
struct bar *array[] __attribute__((counted_by(count)));
};
} baz;
};
```
This example specifies that the flexible array member 'array' has the
number of elements allocated for it in 'count':
```
struct bar;
struct foo {
size_t count;
/* ... */
struct bar *array[] __attribute__((counted_by(count)));
};
```
This establishes a relationship between 'array' and 'count';
specifically that 'p->array' must have *at least* 'p->count' number of
elements available. It's the user's responsibility to ensure that this
relationship is maintained throughout changes to the structure.
In the following, the allocated array erroneously has fewer elements
than what's specified by 'p->count'. This would result in an
out-of-bounds access not not being detected:
```
struct foo *p;
void foo_alloc(size_t count) {
p = malloc(MAX(sizeof(struct foo),
offsetof(struct foo, array[0]) + count *
sizeof(struct bar *)));
p->count = count + 42;
}
```
The next example updates 'p->count', breaking the relationship
requirement that 'p->array' must have at least 'p->count' number of
elements available:
```
void use_foo(int index, int val) {
p->count += 42;
p->array[index] = val; /* The sanitizer can't properly check this access */
}
```
In this example, an update to 'p->count' maintains the relationship
requirement:
```
void use_foo(int index, int val) {
if (p->count == 0)
return;
--p->count;
p->array[index] = val;
}
```
This reverts commit fefdef808c230c79dca2eb504490ad0f17a765a5.
Breaks check-clang, see
https://github.com/llvm/llvm-project/pull/76348#issuecomment-1886029515
Also revert follow-on "[Clang] Update 'counted_by' documentation"
This reverts commit 4a3fb9ce27dda17e97341f28005a28836c909cfc.