1866 Commits

Author SHA1 Message Date
OverMighty
21d83324fb
[clang] Implement __builtin_popcountg (#82359)
Fixes #82058.
2024-02-26 13:59:42 -08:00
Farzon Lotfi
82acec15af
[HLSL] Implementation of dot intrinsic (#81190)
This change implements https://github.com/llvm/llvm-project/issues/70073

HLSL has a dot intrinsic defined here:

https://learn.microsoft.com/en-us/windows/win32/direct3dhlsl/dx-graphics-hlsl-dot

The intrinsic itself is defined as a HLSL_LANG LangBuiltin in
Builtins.td.
This is used to associate all the dot product typdef defined
hlsl_intrinsics.h
with a single intrinsic check in CGBuiltin.cpp & SemaChecking.cpp.

In IntrinsicsDirectX.td we define the llvmIR for the dot product.
A few goals were in mind for this IR. First it should operate on only
vectors. Second the return type should be the vector element type. Third
the second parameter vector should be of the same size as the first
parameter. Finally `a dot b` should be the same as `b dot a`.

In CGBuiltin.cpp hlsl has built on top of existing clang intrinsics via
EmitBuiltinExpr. Dot
product though is language specific intrinsic and so is guarded behind
getLangOpts().HLSL.
The call chain looks like this: EmitBuiltinExpr -> EmitHLSLBuiltinExp

EmitHLSLBuiltinExp dot product intrinsics makes a destinction
between vectors and scalars. This is because HLSL supports dot product
on scalars which simplifies down to multiply.

Sema.h & SemaChecking.cpp saw the addition of
CheckHLSLBuiltinFunctionCall, a language specific semantic validation
that can be expanded for other hlsl specific intrinsics.

Fixes #70073
2024-02-26 10:08:59 -06:00
Pavel Iliin
568babab7e
[AArch64] Implement __builtin_cpu_supports, compiler-rt tests. (#82378)
The patch complements https://github.com/llvm/llvm-project/pull/68919
and adds AArch64 support for builtin
`__builtin_cpu_supports("feature1+...+featureN")`
which return true if all specified CPU features in argument are
detected. Also compiler-rt aarch64 native run tests for features
detection mechanism were added and 'cpu_model' check was fixed after its
refactor merged https://github.com/llvm/llvm-project/pull/75635 Original
RFC was https://reviews.llvm.org/D153153
2024-02-22 23:33:54 +00:00
zhijian lin
5b8e5604c2
[AIX] Lower intrinsic __builtin_cpu_is into AIX platform-specific code. (#80069)
On AIX OS, __builtin_cpu_is() references the runtime external variable
_system_configuration from /usr/include/sys/systemcfg.h.

ref issue:  https://github.com/llvm/llvm-project/issues/80042
2024-02-22 08:46:08 -05:00
Pierrick Bouvier
0ea64ad88a
[COFF][Aarch64] Add _InterlockedAdd64 intrinsic (#81849)
Found when compiling openssl master branch using clang-cl.

This commit introduces usage of InterlockedAdd64:

d0e1a0ae70


https://learn.microsoft.com/en-us/cpp/intrinsics/interlockedadd-intrinsic-functions
2024-02-16 13:20:08 +02:00
Shilei Tian
630f82ec0c
[Clang][CodeGen] Loose the cast check when emitting builtins (#81669)
This patch looses the cast check (`canLosslesslyBitCastTo`) and leaves
it to the
one inside `CreateBitCast`. It seems too conservative for the use case
here.
2024-02-14 12:59:59 -05:00
Joseph Huber
11fcae69db
[LLVM] Add __builtin_readsteadycounter intrinsic and builtin for realtime clocks (#81331)
Summary:
This patch adds a new intrinsic and builtin function mirroring the
existing `__builtin_readcyclecounter`. The difference is that this
implementation targets a separate counter that some targets have which
returns a fixed frequency clock that can be used to determine elapsed
time, this is different compared to the cycle counter which often has
variable frequency.

This patch only adds support for the NVPTX and AMDGPU targets.

This is done as a new and separate builtin rather than an argument to
`readcyclecounter` to avoid needing to change existing code and to make
the separation more explicit.
2024-02-13 10:06:25 -06:00
Shilei Tian
c4b0dfcc99
[Clang] Fix a non-effective assertion (#81083)
`PTy` here is literally `FTy->getParamType(i)`, which makes this
assertion not
work as expected.
2024-02-08 09:44:42 -05:00
Mészáros Gergely
5942868a21
[clang][AMDGPU][CUDA] Handle __builtin_printf for device printf (#68515)
Previously `__builtin_printf` would result to emitting call to `printf`,
even though directly calling `printf` was translated.

Ref: #68478
2024-02-05 23:23:13 +05:30
Pierre van Houtryve
500846d2f5
[AMDGPU] Introduce Code Object V6 (#76954)
Introduce Code Object V6 in Clang, LLD, Flang and LLVM. This is the same
as V5 except a new "generic version" flag can be present in EFLAGS. This
is related to new generic targets that'll be added in a follow-up patch.
It's also likely V6 will have new changes (possibly new metadata
entries) added later.

Docs change are part of the follow-up patch #76955
2024-02-05 08:19:53 +01:00
Sander de Smalen
d313614b60
[AArch64] Replace LLVM IR function attributes for PSTATE.ZA. (#79166)
Since https://github.com/ARM-software/acle/pull/276 the ACLE
defines attributes to better describe the use of a given SME state.

Previously the attributes merely described the possibility of it being
'shared' or 'preserved', whereas the new attributes have more semantics
and also describe how the data flows through the program.

For ZT0 we already had to add new LLVM IR attributes:
* aarch64_new_zt0
* aarch64_in_zt0
* aarch64_out_zt0
* aarch64_inout_zt0
* aarch64_preserves_zt0

We have now done the same for ZA, such that we add:
* aarch64_new_za       (previously `aarch64_pstate_za_new`)
* aarch64_in_za (more specific variation of `aarch64_pstate_za_shared`)
* aarch64_out_za (more specific variation of `aarch64_pstate_za_shared`)
* aarch64_inout_za (more specific variation of
`aarch64_pstate_za_shared`)
* aarch64_preserves_za (previously `aarch64_pstate_za_shared,
aarch64_pstate_za_preserved`)

This explicitly removes 'pstate' from the name, because with SME2 and
the new ACLE attributes there is a difference between "sharing ZA"
(sharing
the ZA matrix register with the caller) and "sharing PSTATE.ZA" (sharing
either the ZA or ZT0 register, both part of PSTATE.ZA with the caller).
2024-02-01 13:37:37 +00:00
Nemanja Ivanovic
67c1c1dbb6
[PowerPC][X86] Make cpu id builtins target independent and lower for PPC (#68919)
Make __builtin_cpu_{init|supports|is} target independent and provide an
opt-in query for targets that want to support it. Each target is still
responsible for their specific lowering/code-gen. Also provide code-gen
for PowerPC.

I originally proposed this in https://reviews.llvm.org/D152914 and this
addresses the comments I received there.

---------

Co-authored-by: Nemanja Ivanovic <nemanjaivanovic@nemanjas-air.kpn>
Co-authored-by: Nemanja Ivanovic <nemanja@synopsys.com>
2024-01-26 11:24:50 -05:00
Vojislav Tomasevic
2a77d92e2e
[clang] Incorrect IR involving the use of bcopy (#79298)
This patch addresses the issue regarding the call of bcopy function in a
conditional expression.
It is analogous to the already accepted patch which deals with the same
problem, just regarding the bzero function [0].

Here is the testcase which illustrates the issue:

```
void bcopy(const void *, void *, unsigned long);
void foo(void);

void test_bcopy() {
  char dst[20];
  char src[20];
  int _sz = 20, len = 20;
  return (_sz
          ? ((_sz >= len)
             ? bcopy(src, dst, len)
             : foo())
          : bcopy(src, dst, len));
}
```

When processing it with clang, following issue occurs:

Instruction does not dominate all uses!
%arraydecay2 = getelementptr inbounds [20 x i8], ptr %dst, i64 0, i64 0,
!dbg !38
%cond = phi ptr [ %arraydecay2, %cond.end ], [ %arraydecay5,
%cond.false3 ], !dbg !33
fatal error: error in backend: Broken module found, compilation aborted!

This happens because an incorrect phi node is created. It is created
because bcopy function call is lowered to the call of llvm.memmove
intrinsic and function memmove returns void *. Since llvm.memmove is
called in two places in the same return statement, clang creates a phi
node in the final basic block for the return value and that phi node is
incorrect. However, bcopy function should return void in the first
place, so this phi node is unnecessary. This is what this patch
addresses. An appropriate test is also added and no existing tests fail
when applying this patch.

Also, this crash only happens when LLVM is configured with
-DLLVM_ENABLE_ASSERTIONS=On option.

[0] https://reviews.llvm.org/D39746
2024-01-24 09:39:36 -08:00
Mirko Brkušanin
7fdf608cef
[AMDGPU] Add GFX12 WMMA and SWMMAC instructions (#77795)
Co-authored-by: Petar Avramovic <Petar.Avramovic@amd.com>
Co-authored-by: Piotr Sobczak <piotr.sobczak@amd.com>
2024-01-24 13:43:07 +01:00
Matthew Devereau
6ba62f4f25
[AArch64][SME2] Refine fcvtu/fcvts/scvtf/ucvtf (#77947)
Rename intrinsics for fcvtu to fcvtzu and fcvts to fcvtzs.

Use llvm_anyvector_ty for both multi vector returns and operands,
therefore the return and operands can be specified in the intrinsic
call, e.g.

@llvm.aarch64.sve.scvtf.x4.nxv4f32.nxv4i32
2024-01-22 15:11:49 +00:00
Piotr Sobczak
57f6a3f7ea
[AMDGPU] Add global_load_tr for GFX12 (#77772)
Support new amdgcn_global_load_tr instructions for load with transpose.

* MC layer support for GLOBAL_LOAD_TR_B64/GLOBAL_LOAD_TR_B128
* Intrinsic int_amdgcn_global_load_tr
* Clang builtins amdgcn_global_load_tr*
2024-01-18 15:14:42 +01:00
Mikael Holmen
e6bd9835d9 [clang][CodeGen] Fix gcc warning about unused variable [NFC]
Without the fix gcc warned with
 ../../clang/lib/CodeGen/CGBuiltin.cpp:1022:19: warning: unused variable 'DRE' [-Wunused-variable]
  1022 |   if (const auto *DRE = dyn_cast<DeclRefExpr>(Base)) {
       |                   ^~~

Fix the warning by removing the unused variable and change the "dyn_cast"
to "isa".
2024-01-17 13:23:08 +01:00
Bill Wendling
00b6d032a2 [Clang] Implement the 'counted_by' attribute (#76348)
The 'counted_by' attribute is used on flexible array members. The
argument for the attribute is the name of the field member holding the
count of elements in the flexible array. This information is used to
improve the results of the array bound sanitizer and the
'__builtin_dynamic_object_size' builtin. The 'count' field member must
be within the same non-anonymous, enclosing struct as the flexible array
member. For example:

```
  struct bar;
  struct foo {
    int count;
    struct inner {
      struct {
        int count; /* The 'count' referenced by 'counted_by' */
      };
      struct {
        /* ... */
        struct bar *array[] __attribute__((counted_by(count)));
      };
    } baz;
  };
```

This example specifies that the flexible array member 'array' has the
number of elements allocated for it in 'count':

```
  struct bar;
  struct foo {
    size_t count;
     /* ... */
    struct bar *array[] __attribute__((counted_by(count)));
  };
```

This establishes a relationship between 'array' and 'count';
specifically that 'p->array' must have *at least* 'p->count' number of
elements available. It's the user's responsibility to ensure that this
relationship is maintained throughout changes to the structure.

In the following, the allocated array erroneously has fewer elements
than what's specified by 'p->count'. This would result in an
out-of-bounds access not not being detected:

```
  struct foo *p;

  void foo_alloc(size_t count) {
    p = malloc(MAX(sizeof(struct foo),
                   offsetof(struct foo, array[0]) + count *
                       sizeof(struct bar *)));
    p->count = count + 42;
  }
```

The next example updates 'p->count', breaking the relationship
requirement that 'p->array' must have at least 'p->count' number of
elements available:

```
  void use_foo(int index, int val) {
    p->count += 42;
    p->array[index] = val; /* The sanitizer can't properly check this access */
  }
```

In this example, an update to 'p->count' maintains the relationship
requirement:

```
  void use_foo(int index, int val) {
    if (p->count == 0)
      return;
    --p->count;
    p->array[index] = val;
  }
```
2024-01-16 14:26:12 -08:00
Craig Topper
142f270c27 Recommit "[AST] Use APIntStorage to fix memory leak in EnumConstantDecl. (#78311)"
With lldb build fix.

Original message:

EnumConstantDecl is allocated by the ASTContext allocator so the
destructor is never called.

This patch takes a similar approach to IntegerLiteral by using
APIntStorage to allocate large APSInts using the ASTContext allocator as
well.

The downside is that an additional heap allocation and copy of the data
needs to be made when calling getInitValue if the APSInt is large.

Fixes #78160.
2024-01-16 13:52:17 -08:00
Craig Topper
f3d534c425 Revert "[AST] Use APIntStorage to fix memory leak in EnumConstantDecl. (#78311)"
This reverts commit 4737959d91fab7673b1bb642f88658bb2a24d723.

Missed an lldb update.
2024-01-16 12:39:47 -08:00
Craig Topper
4737959d91
[AST] Use APIntStorage to fix memory leak in EnumConstantDecl. (#78311)
EnumConstantDecl is allocated by the ASTContext allocator so the
destructor is never called.

This patch takes a similar approach to IntegerLiteral by using
APIntStorage to allocate large APSInts using the ASTContext allocator as
well.

The downside is that an additional heap allocation and copy of the data
needs to be made when calling getInitValue if the APSInt is large.

Fixes #78160.
2024-01-16 12:10:38 -08:00
Rashmi Mudduluru
a511c1a9ec
Revert "[Clang] Implement the 'counted_by' attribute (#76348)"
This reverts commit 164f85db876e61cf4a3c34493ed11e8f5820f968.
2024-01-15 18:37:52 -08:00
Bill Wendling
164f85db87 [Clang] Implement the 'counted_by' attribute (#76348)
The 'counted_by' attribute is used on flexible array members. The
argument for the attribute is the name of the field member holding the
count of elements in the flexible array. This information is used to
improve the results of the array bound sanitizer and the
'__builtin_dynamic_object_size' builtin. The 'count' field member must
be within the same non-anonymous, enclosing struct as the flexible array
member. For example:

```
  struct bar;
  struct foo {
    int count;
    struct inner {
      struct {
        int count; /* The 'count' referenced by 'counted_by' */
      };
      struct {
        /* ... */
        struct bar *array[] __attribute__((counted_by(count)));
      };
    } baz;
  };
```

This example specifies that the flexible array member 'array' has the
number of elements allocated for it in 'count':

```
  struct bar;
  struct foo {
    size_t count;
     /* ... */
    struct bar *array[] __attribute__((counted_by(count)));
  };
```

This establishes a relationship between 'array' and 'count';
specifically that 'p->array' must have *at least* 'p->count' number of
elements available. It's the user's responsibility to ensure that this
relationship is maintained throughout changes to the structure.

In the following, the allocated array erroneously has fewer elements
than what's specified by 'p->count'. This would result in an
out-of-bounds access not not being detected:

```
  struct foo *p;

  void foo_alloc(size_t count) {
    p = malloc(MAX(sizeof(struct foo),
                   offsetof(struct foo, array[0]) + count *
                       sizeof(struct bar *)));
    p->count = count + 42;
  }
```

The next example updates 'p->count', breaking the relationship
requirement that 'p->array' must have at least 'p->count' number of
elements available:

```
  void use_foo(int index, int val) {
    p->count += 42;
    p->array[index] = val; /* The sanitizer can't properly check this access */
  }
```

In this example, an update to 'p->count' maintains the relationship
requirement:

```
  void use_foo(int index, int val) {
    if (p->count == 0)
      return;
    --p->count;
    p->array[index] = val;
  }
```
2024-01-10 22:20:31 -08:00
Nico Weber
2dce77201c Revert "[Clang] Implement the 'counted_by' attribute (#76348)"
This reverts commit fefdef808c230c79dca2eb504490ad0f17a765a5.

Breaks check-clang, see
https://github.com/llvm/llvm-project/pull/76348#issuecomment-1886029515

Also revert follow-on "[Clang] Update 'counted_by' documentation"

This reverts commit 4a3fb9ce27dda17e97341f28005a28836c909cfc.
2024-01-10 21:05:19 -05:00
Bill Wendling
4a3fb9ce27 [Clang] Update 'counted_by' documentation
Describe a limitation of the 'counted_by' attribute when used in unions.
Also fix a errant typo.
2024-01-10 15:36:33 -08:00
Bill Wendling
fefdef808c
[Clang] Implement the 'counted_by' attribute (#76348)
The 'counted_by' attribute is used on flexible array members. The
argument for the attribute is the name of the field member holding the
count of elements in the flexible array. This information is used to
improve the results of the array bound sanitizer and the
'__builtin_dynamic_object_size' builtin. The 'count' field member must
be within the same non-anonymous, enclosing struct as the flexible array
member. For example:

```
  struct bar;
  struct foo {
    int count;
    struct inner {
      struct {
        int count; /* The 'count' referenced by 'counted_by' */
      };
      struct {
        /* ... */
        struct bar *array[] __attribute__((counted_by(count)));
      };
    } baz;
  };
```

This example specifies that the flexible array member 'array' has the
number of elements allocated for it in 'count':

```
  struct bar;
  struct foo {
    size_t count;
     /* ... */
    struct bar *array[] __attribute__((counted_by(count)));
  };
```

This establishes a relationship between 'array' and 'count';
specifically that 'p->array' must have *at least* 'p->count' number of
elements available. It's the user's responsibility to ensure that this
relationship is maintained throughout changes to the structure.

In the following, the allocated array erroneously has fewer elements
than what's specified by 'p->count'. This would result in an
out-of-bounds access not not being detected:

```
  struct foo *p;

  void foo_alloc(size_t count) {
    p = malloc(MAX(sizeof(struct foo),
                   offsetof(struct foo, array[0]) + count *
                       sizeof(struct bar *)));
    p->count = count + 42;
  }
```

The next example updates 'p->count', breaking the relationship
requirement that 'p->array' must have at least 'p->count' number of
elements available:

```
  void use_foo(int index, int val) {
    p->count += 42;
    p->array[index] = val; /* The sanitizer can't properly check this access */
  }
```

In this example, an update to 'p->count' maintains the relationship
requirement:

```
  void use_foo(int index, int val) {
    if (p->count == 0)
      return;
    --p->count;
    p->array[index] = val;
  }
```
2024-01-10 15:21:10 -08:00
CarolineConcatto
14e7dac92a
[Clang][LLVM][AArch64]SVE2.1 update the intrinsics according to acle[1] (#76844)
This patch changes the following intrinsic

 ```svst1uwq[_{d}]  replaced by svst1wq[_{d}]
 svst1uwq_vnum[_{d}] replaced by svst1wq_vnum[_{d}]
 svst1udq[_{d}]  replaced by svst1dq[_{d}]
 svst1udq_vnum[_{d}] replaced by svst1dq_vnum[_{d}]
```
Drops 'u' from the quadword stores because it is simply truncating the
quadwords to 32 bits

```
 svextq_lane[_{d}] replaced by  svextq[_{d}]
```
EXTQ follows the previous defined EXT intrinsics

```
 svdot[_{d}_{2}_{3}] replaced by svdot[_{d}_{2}]
```
Introduced with the latest SME2 ACLE change

[1]https://github.com/ARM-software/acle/pull/257
2024-01-10 17:12:14 +00:00
Sander de Smalen
5055eeea52
[Clang][AArch64] Add missing SME functions to header file. (#75791)
This includes:
* __arm_in_streaming_mode()
* __arm_has_sme()
* __arm_za_disable()
* __svundef_za()
2024-01-02 09:43:30 +00:00
Dinar Temirbulatov
809f2f3d7d
[AArch64][SME2] Add builtins for FDOT, BFDOT, SUDOT, USDOT, SDOT, UDOT. (#75737)
Add SME2 DOT builtins.
2023-12-21 19:41:24 +00:00
Dinar Temirbulatov
77c5c44b01
[AArch64][SME2] Add SME2 MLA/MLS builtins. (#75584)
Add SME2 MLA/MLS builtins.
2023-12-21 16:42:24 +00:00
Bill Wendling
cca4d6cfd2
Revert counted_by attribute feature (#75857)
There are many issues that popped up with the counted_by feature. The
patch #73730 has grown too large and approval is blocking Linux testing.

Includes reverts of:
commit 769bc11f684d ("[Clang] Implement the 'counted_by' attribute
(#68750)")
commit bc09ec696209 ("[CodeGen] Revamp counted_by calculations
(#70606)")
commit 1a09cfb2f35d ("[Clang] counted_by attr can apply only to C99
flexible array members (#72347)")
commit a76adfb992c6 ("[NFC][Clang] Refactor code to calculate flexible
array member size (#72790)")
commit d8447c78ab16 ("[Clang] Correct handling of negative and
out-of-bounds indices (#71877)")
Partial commit b31cd07de5b7 ("[Clang] Regenerate test checks (NFC)")

Closes #73168
Closes #75173
2023-12-18 15:16:09 -08:00
Paul Walker
dea16ebd26
[LLVM][IR] Replace ConstantInt's specialisation of getType() with getIntegerType(). (#75217)
The specialisation will not be valid when ConstantInt gains native
support for vector types.

This is largely a mechanical change but with extra attention paid to constant
folding, InstCombineVectorOps.cpp, LoopFlatten.cpp and Verifier.cpp to
remove the need to call `getIntegerType()`.

Co-authored-by: Nikita Popov <github@npopov.com>
2023-12-18 11:58:42 +00:00
Simon Pilgrim
df3ddd78f6 CGBuiltin - fix gcc Wunused-variable warning. NFC. 2023-12-18 11:51:24 +00:00
Akira Hatanaka
31429e7a89
[CodeGen] Emit a more accurate alignment for non-temporal loads/stores (#75675)
Call EmitPointerWithAlignment to compute the alignment based on the
underlying lvalue's alignment when it's available.
2023-12-17 18:22:44 -08:00
Lei Huang
aaa3f72c1c
[PowerPC] Emit libcall to frexpl for calls to frexp(ppcDoublDouble) (#75226)
On Linux PPC call lib func ``frexpl`` for calls to ``frexp()`` for input
of type PPCDoubleDouble.

Fixes bug: https://github.com/llvm/llvm-project/issues/64426
2023-12-15 17:23:16 -05:00
CarolineConcatto
f2464ca317
[SVE2.1][Clang][LLVM]Int/FP reduce builtin in Clang and LLVM intrinsic (#69926)
This patch implements the builtins in Clang
and the LLVM-IR intrinsic for the following:

// Variants are also available for:
// _s8, _s16, _u16, _s32, _u32, _s64, _u64,
// _f16, _f32, _f64uint8x16_t svaddqv[_u8](svbool_t pg, svuint8_t zn);

// Variants are also available for:
// _s8, _u16, _s16, _u32, _s32, _u64, _s64
uint8x16_t svandqv[_u8](svbool_t pg, svuint8_t zn); uint8x16_t
sveorqv[_u8](svbool_t pg, svuint8_t zn); uint8x16_t svorqv[_u8](svbool_t
pg, svuint8_t zn);

// Variants are also available for:
// _s8, _u16, _s16, _u32, _s32, _u64, _s64;
uint8x16_t svmaxqv[_u8](svbool_t pg, svuint8_t zn); uint8x16_t
svminqv[_u8](svbool_t pg, svuint8_t zn);

// Variants are also available for _f32, _f64
float16x8_t svmaxnmqv[_f16](svbool_t pg, svfloat16_t zn); float16x8_t
svminnmqv[_f16](svbool_t pg, svfloat16_t zn);

According to the PR#257[1]

The reduction instruction uses scalable vectors as input and fixed
vectors as output, therefore we changed SVEEmitter to emit fixed vector
types in case the neon header(arm_neon.h) is not present.

[1]https://github.com/ARM-software/acle/pull/257

Co-author: Dinar Temirbulatov <dinar.temirbulatov@arm.com>
2023-12-13 15:45:59 +00:00
Dinar Temirbulatov
49b27b150b
[AArch64][SME2] Add builtins to cast svbool from/to svcount. (#74720)
Add builtin: 'svreinterpret_b' to cast from svcount_t to svbool_t.
Add builtin: 'svreinterpret_c'  to cast from svbool_t  to svcount_t.

Patch by: Hassnaa Hamdi <hassnaa.hamdi@arm.com>
2023-12-08 16:38:29 +00:00
James Y Knight
4d4c30a37c
Use Address for CGBuilder's CreateAtomicRMW and CreateAtomicCmpXchg. (#74349)
Update all callers to pass through the Address.

For the older builtins such as `__sync_*` and MSVC `_Interlocked*`,
natural alignment of the atomic access is _assumed_. This change
preserves that behavior. It will pass through greater-than-required
alignments, however.
2023-12-04 13:37:04 -05:00
Ulrich Weigand
c61eb44005 [SystemZ] Implement vector rotate in terms of funnel shift
Clang currently implements a set of vector rotate builtins
(__builtin_s390_verll*) in terms of platform-specific LLVM
intrinsics.  To simplify the IR (and allow for common code
optimizations if applicable), this patch removes those LLVM
intrinsics and implements the builtins in terms of the
platform-independent funnel shift intrinsics instead.

Also, fix the prototype of the __builtin_s390_verll*
builtins for full compatibility with GCC.
2023-12-04 16:52:00 +01:00
Dominik Adamski
95943d2fab [Flang] Add code-object-version option (#72638)
Information about code object version can be configured by the user for
AMD GPU target and it needs to be placed in LLVM IR generated by Flang.

Information about code object version in MLIR generated by the parser
can be reused by other tools. There is no need to specify extra flags if
we want to invoke MLIR tools (like fir-opt) separately.

Changes in comparison to a8ac93:
 * added information about required targets for test
   flang/test/Driver/driver-help.f90
2023-11-29 03:01:01 -06:00
Dominik Adamski
f00ffcdb58 Revert "[Flang] Add code-object-version option (#72638)"
This commit causes test errors on buildbots.

This reverts commit a8ac930b99d93b2a539ada7e566993d148899144.
2023-11-28 13:18:46 -06:00
Dominik Adamski
a8ac930b99
[Flang] Add code-object-version option (#72638)
Information about code object version can be configured by the user for
AMD GPU target and it needs to be placed in LLVM IR generated by Flang.

Information about code object version in MLIR generated by the parser
can be reused by other tools. There is no need to specify extra flags if
we want to invoke MLIR tools (like fir-opt) separately.
2023-11-28 19:57:36 +01:00
Youngsuk Kim
10e483521a
[clang][CodeGen] Remove ptr-to-ptr bitcasts (NFC) (#73020)
Opaque ptr cleanup effort
2023-11-23 11:34:59 -05:00
Momchil Velikov
f335883808
[AArch64][SVE2.1] Add intrinsics for quadword loads/stores with unscaled offset (#70474)
This patch adds a set of SVE2.1 quadword load/store intrisics:

  * Contiguous zero-extending load to quadword (single vector)

    sv<type>_t svld1uwq[_<typ>](svbool_t, const <type>_t *ptr);
    sv<type>_t svld1uwq_vnum[_<typ>](svbool_t, const <type> *ptr, int64_t vnum);
 
    sv<type>_t svld1udq[_<typ>](svbool_t, const <type>_t *ptr);
    sv<type>_t svld1udq_vnum[_<typ>](svbool_t, const <type>_t *ptr, int64_t vnum);

  * Contiguous truncating store of single vector operand

    void svst1uwq[_<typ>](svbool_t, const <type>_t *ptr, sv<type>_t data);
    void svst1uwq_vnum[_<typ>](svbool_t, const <type>_t *ptr, int64_t vnum, sv<type>_t data);

    void svst1udq[_<typ>](svbool_t, const <type>_t *ptr, sv<type>_t data);
    void svst1udq_vnum[_<typ>](svbool_t, const <type>_t *ptr, int64_t vnum, sv<type>_t data);

  * Gather load quadword

    sv<type>_t svld1q_gather[_u64base]_<typ>(svbool_t pg, svuint64_t zn);
    sv<type>_t svld1q_gather[_u64base]_offset_<typ>(svbool_t pg, svuint64_t zn, int64_t offset);

  * Scatter store quadword

    void svst1q_scatter[_u64base][_<typ>](svbool_t pg, svuint64_t zn, sv<type>_t data);
    void svst1q_scatter[_u64base]_offset[_<typ>](svbool_t pg, svuint64_t zn, int64_t offset, sv<type>_t data);

  * Contiguous load two, three or four quadword structures.

    sv<type>x2_t svld2q[_<typ>](svbool_t pg, const <type>_t *rn);
    sv<type>x2_t svld2q_vnum[_<typ>](svbool_t pg, const <type>_t *rn, uint64_t vnum);
    sv<type>x3_t svld3q[_<typ>](svbool_t pg, const <type>_t *rn);
    sv<type>x3_t svld3q_vnum[_<typ>](svbool_t pg, const <type>_t *rn, uint64_t vnum);
    sv<type>x4_t svld4q[_<typ>](svbool_t pg, const <type>_t *rn);
    sv<type>x4_t svld4q_vnum[_<typ>](svbool_t pg, const <type>_t *rn, uint64_t vnum);

  * Contiguous store two, three or four quadword structures.

    void svst2q[_<typ>](svbool_t pg, <type>_t *rn, sv<type>x2_t zt);
    void svst2q_vnum[_<typ>](svbool_t pg, <type>_t *rn, int64_t vnum, sv<type>x2_t zt);
    void svst3q[_<typ>](svbool_t pg, <type>_t *rn, sv<type>x3_t zt);
    void svst3q_vnum[_<typ>](svbool_t pg, <type>_t *rn, int64_t vnum, sv<type>x3_t zt);
    void svst4q[_<typ>](svbool_t pg, <type>_t *rn, sv<type>x4_t zt);
    void svst4q_vnum[_<typ>](svbool_t pg, <type>_t *rn, int64_t vnum, sv<type>x4_t zt);

ACLE spec: https://github.com/ARM-software/acle/pull/257

Co-authored-by: Caroline Concatto <caroline.concatto@arm.com>
Co-authored-by: Hassnaa Hamdi <hassnaa.hamdi@arm.com>
2023-11-21 15:34:59 +00:00
Bill Wendling
d8447c78ab
[Clang] Correct handling of negative and out-of-bounds indices (#71877)
GCC returns 0 for a negative index on an array in a structure. It also
returns 0 for an array index that goes beyond the extent of the array.
In addition. a pointer to a struct field returns that field's size, not
the size of it plus the rest of the struct, unless it's the first field
in the struct.

  struct s {
    int count;
    char dummy;
    int array[] __attribute((counted_by(count)));
  };

  struct s *p = malloc(...);

  p->count = 10;

A __bdos on the elements of p return:

  __bdos(p, 0) == 30
  __bdos(p->array, 0) == 10
  __bdos(&p->array[0], 0) == 10
  __bdos(&p->array[-1], 0) == 0
  __bdos(&p->array[42], 0) == 0

Also perform some refactoring, putting the "counted_by" calculations in
their own function.
2023-11-20 09:49:20 -08:00
Sam Tebbs
f7b5c25507
[AArch64][SME] Remove immediate argument restriction for svldr and svstr (#68565)
The svldr_vnum and svstr_vnum builtins always modify the base register
and tile slice and provide immediate offsets of zero, even when the
offset provided to the builtin is an immediate. This patch optimises the
output of the builtins when the offset is an immediate, to pass it
directly to the instruction and to not need the base register and tile
slice updates.
2023-11-20 09:57:29 +00:00
Bill Wendling
a76adfb992
[NFC][Clang] Refactor code to calculate flexible array member size (#72790)
The code that calculates the flexible array member size is big enough to
warrant its own method.
2023-11-19 19:25:10 -08:00
Momchil Velikov
96ef623a75
[AArch64] Cast predicate operand of SVE gather loads/scater stores to the parameter type of the intrinsic (NFC) (#71289)
When emitting LLVM IR for gather loads/scatter stores, the predicate
parameter is cast to a type that depends on the loaded, resp. stored
type. That's correct for operation where we have a predicate per lane,
however it is not correct for quadword loads and stores (`LD1Q`, `ST1Q`)
where the predicate is per 128-bit chunk, independent from the ACLE
intrinsic type.

This can be universally handled by cast to the corresponding parameter
type of the intrinsic. The intrinsic itself should be defined in a way
that enforces relations between parameter types.
2023-11-13 16:01:07 +00:00
Jessica Del
b025864af8
[AMDGPU] - Add clang builtins for tied WMMA intrinsics (#70669)
Add clang builtins for the new tied wmma intrinsics. 
These variations tie the destination
accumulator matrix to the input
accumulator matrix.

See https://github.com/llvm/llvm-project/pull/69903 for context.
2023-11-13 13:23:26 +01:00
Fangrui Song
65f2cf25c3 Revert "[CodeGen] -fsanitize=alignment: add cl::opt sanitize-alignment-builtin to disable memcpy instrumentation (#69240)"
This reverts commit e8fe4de64ffb84924c41e54116a04570046eed74.

memcpy/memmove instrumentation for -fsanitize=alignment has been tested
on a huge code base. There were some cleanups but the number does not
justify a workaround.
2023-11-12 22:26:27 -08:00