Generate the FP16FML intrinsics into arm_neon.h (AArch64 only for now).
Add two new type modifiers to NeonEmitter to handle the new prototypes.
Define __ARM_FEATURE_FP16FML when +fp16fml is enabled and guard the
intrinsics with the macro in arm_neon.h.
Based on a patch by Gao Yiling.
Differential Revision: https://reviews.llvm.org/D53633
llvm-svn: 345344
Similar to how ICC handles CPU-Dispatch on Windows, this patch uses the
resolver function directly to forward the call to the proper function.
This is not nearly as efficient as IFuncs of course, but is still quite
useful for large functions specifically developed for certain
processors.
This is unfortunately still limited to x86, since it depends on
__builtin_cpu_supports and __builtin_cpu_is, which are x86 builtins.
The naming for the resolver/forwarding function for cpu-dispatch was
taken from ICC's implementation, which uses the unmodified name for this
(no mangling additions). This is possible, since cpu-dispatch uses '.A'
for the 'default' version.
In 'target' multiversioning, this function keeps the '.resolver'
extension in order to keep the default function keeping the default
mangling.
Change-Id: I4731555a39be26c7ad59a2d8fda6fa1a50f73284
Differential Revision: https://reviews.llvm.org/D53586
llvm-svn: 345298
I'm unsure if KNL has this feature, but the backend never thought it did, only clang did. The predefined-arch-macros test lost the check for __RTM__ on KNL when it was removed Skylake CPUs in r344117.
I think we want to drop it from KNL for consistency with Skylake anyway regardless of how we got here.
llvm-svn: 344978
The `GNUABIN32` environment in a target triple implies using the N32
ABI. This patch adds support for this environment and switches on N32
ABI if necessary.
Patch by Patch by YunQiang Su.
Differential revision: https://reviews.llvm.org/D51464
llvm-svn: 344570
Summary:
gcc defines macros such as __code_model_small_ based on the user passed command line flag -mcmodel. clang accepts a flag with the same name and similar effects, but does not generate any macro that the user can use. This cl narrows the gap between gcc and clang behaviour.
However, achieving full compatibility with gcc is not trivial: The set of valid values for mcmodel in gcc and clang are not equal. Also, gcc defines different macros for different architectures. In this cl, we only tackle an easy part of the problem and define the macro only for x64 architecture. When the user does not specify a mcmodel, the macro for small code model is produced, as is the case with gcc.
Reviewers: compnerd, MaskRay
Reviewed By: MaskRay
Subscribers: cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D52920
llvm-svn: 344000
These intrinsics exist in icc. They can be found on the Intel Intrinsics Guide website.
All the backend support is in place to pattern match a load+bswap or a bswap+store pattern to the MOVBE instructions. So we just need to get the frontend to emit the correct IR. The pointer arguments in icc are declared as void so I had to jump through a packed struct to forcing a specific alignment on the load/store. Same trick we use in the unaligned vector load/store intrinsics
Differential Revision: https://reviews.llvm.org/D52586
llvm-svn: 343343
My previous change (rL340911) set the two features for architectures
>= 6, which wrongly includes v6m. Now set to >= 6 and not Cortex-M.
Differential Revision: https://reviews.llvm.org/D52644
llvm-svn: 343309
This patch allows targetting Armv8.5-A from Clang. Most of the
implementation is in TargetParser, so this is mostly just adding tests.
Patch by Pablo Barrio!
Differential revision: https://reviews.llvm.org/D52491
llvm-svn: 343111
Windows uses `unsigned short` for `wint_t`. Correct the type definition as
vended by the compiler. This type is defined in corecrt.h and is
unconditionally typedef'ed. cl does not have an equivalent to `__WINT_TYPE__`
which is why this was never detected.
llvm-svn: 342557
The instruction set first appeared with Westmere, but not all processors
in that and the next few generations have the instructions. According to
Wikipedia[1], the first generation in which all SKUs have AES
instructions are Skylake and Goldmont. I can't find any Skylake,
Kabylake, Kabylake-R or Cannon Lake currently listed at
https://ark.intel.com that says "Intel® AES New Instructions" "No".
This matches GCC commit
https://gcc.gnu.org/ml/gcc-patches/2018-08/msg01940.html
[1] https://en.wikipedia.org/wiki/AES_instruction_set
Patch By: thiagomacieira
Differential Revision: https://reviews.llvm.org/D51510
llvm-svn: 341862
ARM_FEATURE_DSP is already set for targets with the +dsp feature. In
the backend, this target feature is also used to represent the
availability of the of the instructions that the ACLE guard through
the __ARM_FEATURE_SIMD32 macro. We don't have any cores that
implement one and not the other, so set this macro for cores later
than V6 or for Cortex-M cores that the target parser, or user, reports
that the 'dsp' instructions are supported.
Differential Revision: https://reviews.llvm.org/D51093
llvm-svn: 340911
As reported on http://lists.llvm.org/pipermail/cfe-dev/2018-August/058760.html,
this broke i386-freebsd11 due to its lack of atomic 64 bit primitives.
While that's not really this commit's fault, let's revert back to the old
behaviour until this can be fixed. This means generating cmpxchg8b etc for i386
and i486 which don't technically support those, but that's been the behaviour
for a long time, so a little longer probably doesn't hurt that much.
> Adjust MaxAtomicInlineWidth for i386/i486 targets.
>
> This is to fix the bug reported in https://bugs.llvm.org/show_bug.cgi?id=34347#c6.
> Currently, all MaxAtomicInlineWidth of x86-32 targets are set to 64. However,
> i386 doesn't support any cmpxchg related instructions. i486 only supports cmpxchg.
> So in this patch MaxAtomicInlineWidth is reset as follows:
> For i386, the MaxAtomicInlineWidth should be 0 because no cmpxchg is supported.
> For i486, the MaxAtomicInlineWidth should be 32 because it supports cmpxchg.
> For others 32 bits x86 cpu, the MaxAtomicInlineWidth should be 64 because of cmpxchg8b.
>
> Differential Revision: https://reviews.llvm.org/D42154
llvm-svn: 340666
subtarget features for indirect calls and indirect branches.
This is in preparation for enabling *only* the call retpolines when
using speculative load hardening.
I've continued to use subtarget features for now as they continue to
seem the best fit given the lack of other retpoline like constructs so
far.
The LLVM side is pretty simple. I'd like to eventually get rid of the
old feature, but not sure what backwards compatibility issues that will
cause.
This does remove the "implies" from requesting an external thunk. This
always seemed somewhat questionable and is now clearly not desirable --
you specify a thunk the same way no matter which set of things are
getting retpolines.
I really want to keep this nicely isolated from end users and just an
LLVM implementation detail, so I've moved the `-mretpoline` flag in
Clang to no longer rely on a specific subtarget feature by that name and
instead to be directly handled. In some ways this is simpler, but in
order to preserve existing behavior I've had to add some fallback code
so that users who relied on merely passing -mretpoline-external-thunk
continue to get the same behavior. We should eventually remove this
I suspect (we have never tested that it works!) but I've not done that
in this patch.
Differential Revision: https://reviews.llvm.org/D51150
llvm-svn: 340515
Set __mips_fpr to 0 if o32 ABI is used with either -mfpxx
or none of -mfp32, -mfpxx, -mfp64 being specified.
Introduce additional checks:
-mfpxx is only to be used in conjunction with the o32 ABI.
report an error when incompatible options are provided.
Formerly no errors were raised when combining n32/n64 ABIs
with -mfp32 and -mfpxx.
There are other cases when __mips_fpr should be set to 0
that are not covered, ex. using o32 on a mips64 cpu
which is valid but not supported in the backend as of yet.
Differential Revision: https://reviews.llvm.org/D50557
llvm-svn: 340391
Fast FMAF is not a sufficient condition to enable denormals.
Before VI, enabling denormals caused F32 instructions to
run at F64 speeds.
llvm-svn: 339278
The way address space declarations for builtins currently work
is nearly useless. The code assumes the address spaces used for
builtins is a confusingly named "target address space" from user
code using __attribute__((address_space(N))) that matches
the builtin declaration. There's no way to use this to declare
a builtin that returns a language specific address space.
The terminology used is highly cofusing since it has nothing
to do with the the address space selected by the target to use
for a language address space.
This feature is essentially unused as-is. AMDGPU and NVPTX
are the only in-tree targets attempting to use this. The AMDGPU
builtins certainly do not behave as intended (i.e. all of the
builtins returning pointers can never compile because the numbered
address space never matches the expected named address space).
The NVPTX builtins are missing tests for some, and the others
seem to rely on an implicit addrspacecast.
Change the used address space for builtins based on a target
hook to allow using a language address space for a builtin.
This allows the same builtin declaration to be used for multiple
languages with similarly purposed address spaces (e.g. the same
AMDGPU builtin can be used in OpenCL and CUDA even though the
constant address spaces are arbitarily different).
This breaks the possibility of using arbitrary numbered
address spaces alongside the named address spaces for builtins.
If this is an issue we probably need to introduce another builtin
declaration character to distinguish language address spaces from
so-called "target address spaces".
llvm-svn: 338707
This adds tests for Armv8.4-A, and also some v8.2 and v8.3 tests that were
missing.
Differential Revision: https://reviews.llvm.org/D50068
llvm-svn: 338525
Summary: Microsoft's C++ object model for ARM64 is the same as that for X86_64.
For example, small structs with non-trivial copy constructors or virtual
function tables are passed indirectly. Currently, they are passed in registers
when compiled with clang.
Reviewers: rnk, mstorsjo, TomTan, haripul, javed.absar
Reviewed By: rnk, mstorsjo
Subscribers: kristof.beyls, chrib, llvm-commits, cfe-commits
Differential Revision: https://reviews.llvm.org/D49770
llvm-svn: 338076
Changing it to unsigned long (which is 32-bit on wasm32) makes it the same
type as wasm64 (where unsigned long is 64-bit), which would eliminate the most
common cause for mangled names being different between wasm32 and wasm64. For
example, export lists containing symbol names could now often be the same
between wasm32 and wasm64.
Differential Revision: https://reviews.llvm.org/D40526
llvm-svn: 337783
As documented here: https://software.intel.com/en-us/node/682969 and
https://software.intel.com/en-us/node/523346. cpu_dispatch multiversioning
is an ICC feature that provides for function multiversioning.
This feature is implemented with two attributes: First, cpu_specific,
which specifies the individual function versions. Second, cpu_dispatch,
which specifies the location of the resolver function and the list of
resolvable functions.
This is valuable since it provides a mechanism where the resolver's TU
can be specified in one location, and the individual implementions
each in their own translation units.
The goal of this patch is to be source-compatible with ICC, so this
implementation diverges from the ICC implementation in a few ways:
1- Linux x86/64 only: This implementation uses ifuncs in order to
properly dispatch functions. This is is a valuable performance benefit
over the ICC implementation. A future patch will be provided to enable
this feature on Windows, but it will obviously more closely fit ICC's
implementation.
2- CPU Identification functions: ICC uses a set of custom functions to identify
the feature list of the host processor. This patch uses the cpu_supports
functionality in order to better align with 'target' multiversioning.
1- cpu_dispatch function def/decl: ICC's cpu_dispatch requires that the function
marked cpu_dispatch be an empty definition. This patch supports that as well,
however declarations are also permitted, since the linker will solve the
issue of multiple emissions.
Differential Revision: https://reviews.llvm.org/D47474
llvm-svn: 337552
The SPIR target currently allows for half precision floating point types to be
emitted using the LLVM intrinsic functions which convert half types to floats
and doubles. However, this is illegal in SPIR as the only intrinsic allowed by
SPIR is memcpy, as per section 3 of the SPIR specification. Currently this is
leading to an assert being hit in the Clang CodeGen when attempting to emit a
constant or literal _Float16 type in a comparison operation on a SPIR or SPIR64
target. This assert stems from the CodeGen attempting to emit a constant half
value as an integer because the backend has specified that it is using these
half conversion intrinsics (which represents half as i16). This patch prevents
SPIR targets from using these intrinsics by overloading the responsible target
info method, marks SPIR targets as having a legal half type and provides
additional regression testing for the _Float16 type on SPIR targets.
Patch by: Stephen McGroarty
Differential Revision: https://reviews.llvm.org/D48188
llvm-svn: 335111
Diasble the use of the type __float128 for PPC machines older
than Power9.
The use of -mfloat128 for PPC machine older than Power9 will result
in an error.
Differential Revision: https://reviews.llvm.org/D48088
llvm-svn: 334613
Adding __attribute__((aligned(32))) to __m256 breaks the implementation
of _mm256_loadu_ps on Windows. On Windows, alignment attributes have
higher precedence than packing attributes.
We also might want to carefully consider the consequences of changing
our vector typedefs, since many users copy them and invent their own
new, non-Intel specific vector type names.
llvm-svn: 333958
This fixes two major problems:
- We were not capping vector alignment as desired on 32-bit ARM.
- We were using different alignments based on the AVX settings on
Intel, so we did not have a consistent ABI.
This is an ABI break, but we think we can get away with it because
vectors tend to be used mostly in inline code (which is why not having
a consistent ABI has not proven disastrous on Intel).
Intel's AVX types are specified as having 32-byte / 64-byte alignment,
so align them explicitly instead of relying on the base ABI rule.
Note that this sort of attribute is stripped from template arguments
in template substitution, so there's a possibility that code templated
over vectors will produce inadequately-aligned objects. The right
long-term solution for this is for alignment attributes to be
interpreted as true qualifiers and thus preserved in the canonical type.
llvm-svn: 333791
An intrinsic for an old instruction, as described in the Intel SDM.
Reviewers: craig.topper, rnk
Reviewed By: craig.topper, rnk
Differential Revision: https://reviews.llvm.org/D47142
llvm-svn: 333256
in gcc by https://gcc.gnu.org/ml/gcc-cvs/2018-04/msg00534.html.
The -mibt feature flag is being removed, and the -fcf-protection
option now also defines a CET macro and causes errors when used
on non-X86 targets, while X86 targets no longer check for -mibt
and -mshstk to determine if -fcf-protection is supported. -mshstk
is now used only to determine availability of shadow stack intrinsics.
Comes with an LLVM patch (D46882).
Patch by mike.dvoretsky
Differential Revision: https://reviews.llvm.org/D46881
llvm-svn: 332704
When looking at lib/Basic/Targets/OSTargets.h, I noticed that _REENTRANT is defined
unconditionally on Solaris, unlike all other targets and what either Studio cc (only define
it with -mt) or gcc (only define it with -pthread) do.
This patch follows that lead.
Differential Revision: https://reviews.llvm.org/D41241
llvm-svn: 332343