This implements the tand intrinsic by performing a multiplication
by pi/180 to the argument before calling tan inline.
This is a commonly provided extension that is used by OpenRadioss
Differential Revision: https://reviews.llvm.org/D154614
This patch changes the default lowering for complex operations to use
the more accurate libm operations as opposed to the mlir complex
operations. This is necessary due to precision issues in the mlir
complex dialect that cause failures in e.g. the LAPACK tests.
The mlir complex dialect lowering will still be used when
`-fapprox-func` is set (and by extension `-ffast-math` and `-Ofast`)
Differential Revision: https://reviews.llvm.org/D155310
On a subsequent commit, I intend to update the expression and call the evaluate
function more times. This refactors enables reusing some of the existing code
for that.
Differential Revision: https://reviews.llvm.org/D155197
Adapt the existing ANY/ZERO_EXTEND_VECTOR_INREG shuffle matching to also recognise SIGN_EXTEND_VECTOR_INREG patterns to handle cases where we're effectively "splatting" all-signbits sources.
Currently when compiling for an execute-only target without movt then
EmitStructByval will generate a constant pool load which isn't
compatible with execute-only. Handle this by emitting tMOVi32imm,
and also simplify the existing movt handling by emitting t2MOVi32imm
or MOVi32imm.
Differential Revision: https://reviews.llvm.org/D154944
The expansion of the various MOVi32imm pseudo-instructions works by
splitting the operand into components (either halfwords or bytes) and
emitting instructions to combine those components into the final
result. When the operand is an immediate with some components being
zero this can result in pointless instructions that just add zero.
Avoid this by restructuring things so that a separate function handles
splitting the operand into components, then don't emit the component
if it is a zero immediate. This is straightforward for movw/movt,
where we just don't emit the movt if it's zero, but the thumb1
expansion using mov/add/lsl is more complex, as even when we don't
emit a given byte we still need to get the shift correct.
Differential Revision: https://reviews.llvm.org/D154943
Currently for armv6-m and armv8-m.baseline, we emit constant pool code when we
use execute-only (XO) in combination with stack guards.
XO is a new feature for armv6-m, and this patch is part of a series of patches
that substitutes constant pool generation with the tMOVi32imm equivalent.
However XO for armv8-m.baseline has been available for about 6 years, and so
for armv8-m.baseline this is a bugfix.
Reviewed By: simonwallis2, olista01
Differential Revision: https://reviews.llvm.org/D155170
The ignore_tkr directive is applied to the third argument so that the number of
interfaces can be reduced.
Differential Revision: https://reviews.llvm.org/D155624
Some ISAInfo unittest base on experimental-zihintntl, but zihintntl will become none-experimental. This patch use another experimental extension zicond to replace zihintntl.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D155673
AArch64 introduced CMLA and CADD instructions as part of SVE2. This
change allows to generate such instructions when this architecture
feature is available.
Differential Revision: https://reviews.llvm.org/D153808
This work introduces : `wgmma.fence.aligned`, `wgmma.commit.group.sync.aligned` and `wgmma.wait.group.sync.aligned` Ops. They are used to syncronize warpgroup level matrix multiply-accumulate instructions, as known as WGMMA.
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D155676
This is a reduced version of one of the tests that was broken by the
original commit of D154281 "[CodeGen] Store SP adjustment in
MachineBasicBlock. NFCI.".
Differential Revision: https://reviews.llvm.org/D155471
These vector lanes are never accessed. They are used for shifting a value into the right lane
and therefore only 1 value of the whole vector is actually used
The extension class to MatchOp has a class method called match_op_names.
The previous version of that function did not allow to specify the
result type. This, however, may be useful/necessary if the op consuming
the resulting handle requires a particular type (such as the
bufferization.EmptyTensorToAllocTensorOp). This patch adds an overload
to match_op_names that allows to specify the result type.
Reviewed By: ftynse
Differential Revision: https://reviews.llvm.org/D155567
If the source operand is already all-signbits we don't need to create the sign extended elements - just splat the source element to the destination element width
This patch mostly renames files so it better reflects the function they declare.
Reviewed By: michaelrj
Differential Revision: https://reviews.llvm.org/D155607
Block dispositions of values defined inside the loop may change
during peeling, so clear them. We already do this for other kinds
of unrolling.
Differential Revision: https://reviews.llvm.org/D153762
When linking a big-endian image for Arm, clang has
to select between BE8 and BE32 formats. The default
is dependent on the selected target architecture.
For ARMv6 and later architectures the default is
BE8, for older architectures the default is BE32.
For BE8 and BE32, compiler outputs a big endian ELF
relocatable object file with the instructions and
data both big endian. The difference is that at
link time, for BE8 a linker must endian reverse
the instructions to little endian. For BE8, the
clang has to pass --be8 to the linker for Arm.
At the moment clang is not passing the --be8 flag
to linker for the baremetal target architectures
above ArmV6 for Arm. This patch passes through --be8
and -BE or EL to the linker, taking into account the
target and the -mbig-endian and -mlittle-endian flag.
Also there are few more changes in the baremetal
driver so that the code can cope with AArch64 being
big-endian as well.
Reviewed By: michaelplatings, MaskRay
Differential Revision: https://reviews.llvm.org/D154786
The library msvcrt.lib pulls in ucrt.lib and vcruntime.lib anyway,
there's no need to manually link against the individual dependencies.
This matches how the tests link against libraries - they only link
against msvcrt and msvcprt, not directly against ucrt and vcruntime.
Differential Revision: https://reviews.llvm.org/D155555
With the recent addition of "-mattr" and "-march" to the list of options
supported by mlir-cpu-runner [1], the SVE integration
tests can be updated to use mlir-cpu-runner instead of lli. This will
allow better code re-use and more consistency
This patch updates 2 tests to demonstrate the new logic. The remaining
tests will be updated in the follow-up patches.
[1] https://reviews.llvm.org/D146917
Depends on D155403
Differential Revision: https://reviews.llvm.org/D155405
This was left in place to reduce the risk of breaking anything,
and to keep the diff smaller, in D155233.
Differential Revision: https://reviews.llvm.org/D155431
This deprecates various compatibility APIs that have been
introduced as part of the opaque pointer migration.
These will be removed at some point after the LLVM 17 release.
Differential Revision: https://reviews.llvm.org/D155585