mirror of
https://github.com/llvm/llvm-project.git
synced 2025-04-25 02:46:11 +00:00

We have had quite a few issues created around how Clang treats intrinsics vs how MSVC treats intrinsics. While I was writing this I also added some sections on behaviour changes that caught me while porting my MSVC codebase to clang-cl. Hopefully we can point issues around intrinsics to this doc and hopefully it is useful to others who run into similar behaviour differences. The behaviour differences highlighted here are differences, as far as I am aware, that we do not intend to change or fix for MSVC.
287 lines
12 KiB
ReStructuredText
287 lines
12 KiB
ReStructuredText
.. raw:: html
|
|
|
|
<style type="text/css">
|
|
.none { background-color: #FFCCCC }
|
|
.partial { background-color: #FFFF99 }
|
|
.good { background-color: #CCFF99 }
|
|
</style>
|
|
|
|
.. role:: none
|
|
.. role:: partial
|
|
.. role:: good
|
|
|
|
==================
|
|
MSVC compatibility
|
|
==================
|
|
|
|
When Clang compiles C++ code for Windows, it attempts to be compatible with
|
|
MSVC. There are multiple dimensions to compatibility.
|
|
|
|
First, Clang attempts to be ABI-compatible, meaning that Clang-compiled code
|
|
should be able to link against MSVC-compiled code successfully. However, C++
|
|
ABIs are particularly large and complicated, and Clang's support for MSVC's C++
|
|
ABI is a work in progress. If you don't require MSVC ABI compatibility or don't
|
|
want to use Microsoft's C and C++ runtimes, the mingw32 toolchain might be a
|
|
better fit for your project.
|
|
|
|
Second, Clang implements many MSVC language extensions, such as
|
|
``__declspec(dllexport)`` and a handful of pragmas. These are typically
|
|
controlled by ``-fms-extensions``.
|
|
|
|
Third, MSVC accepts some C++ code that Clang will typically diagnose as
|
|
invalid. When these constructs are present in widely included system headers,
|
|
Clang attempts to recover and continue compiling the user's program. Most
|
|
parsing and semantic compatibility tweaks are controlled by
|
|
``-fms-compatibility`` and ``-fdelayed-template-parsing``, and they are a work
|
|
in progress.
|
|
|
|
Finally, there is :ref:`clang-cl`, a driver program for clang that attempts to
|
|
be compatible with MSVC's cl.exe.
|
|
|
|
ABI features
|
|
============
|
|
|
|
The status of major ABI-impacting C++ features:
|
|
|
|
* Record layout: :good:`Complete`. We've tested this with a fuzzer and have
|
|
fixed all known bugs.
|
|
|
|
* Class inheritance: :good:`Mostly complete`. This covers all of the standard
|
|
OO features you would expect: virtual method inheritance, multiple
|
|
inheritance, and virtual inheritance. Every so often we uncover a bug where
|
|
our tables are incompatible, but this is pretty well in hand. This feature
|
|
has also been fuzz tested.
|
|
|
|
* Name mangling: :good:`Ongoing`. Every new C++ feature generally needs its own
|
|
mangling. For example, member pointer template arguments have an interesting
|
|
and distinct mangling. Fortunately, incorrect manglings usually do not result
|
|
in runtime errors. Non-inline functions with incorrect manglings usually
|
|
result in link errors, which are relatively easy to diagnose. Incorrect
|
|
manglings for inline functions and templates result in multiple copies in the
|
|
final image. The C++ standard requires that those addresses be equal, but few
|
|
programs rely on this.
|
|
|
|
* Member pointers: :good:`Mostly complete`. Standard C++ member pointers are
|
|
fully implemented and should be ABI compatible. Both `#pragma
|
|
pointers_to_members`_ and the `/vm`_ flags are supported. However, MSVC
|
|
supports an extension to allow creating a `pointer to a member of a virtual
|
|
base class`_. Clang does not yet support this.
|
|
|
|
.. _#pragma pointers_to_members:
|
|
https://msdn.microsoft.com/en-us/library/83cch5a6.aspx
|
|
.. _/vm: https://msdn.microsoft.com/en-us/library/yad46a6z.aspx
|
|
.. _pointer to a member of a virtual base class: https://llvm.org/PR15713
|
|
|
|
* Debug info: :good:`Mostly complete`. Clang emits relatively complete CodeView
|
|
debug information if ``/Z7`` or ``/Zi`` is passed. Microsoft's link.exe will
|
|
transform the CodeView debug information into a PDB that works in Windows
|
|
debuggers and other tools that consume PDB files like ETW. Work to teach lld
|
|
about CodeView and PDBs is ongoing.
|
|
|
|
* RTTI: :good:`Complete`. Generation of RTTI data structures has been
|
|
finished, along with support for the ``/GR`` flag.
|
|
|
|
* C++ Exceptions: :good:`Mostly complete`. Support for
|
|
C++ exceptions (``try`` / ``catch`` / ``throw``) have been implemented for
|
|
x86 and x64. Our implementation has been well tested but we still get the
|
|
odd bug report now and again.
|
|
C++ exception specifications are ignored, but this is `consistent with Visual
|
|
C++`_.
|
|
|
|
.. _consistent with Visual C++:
|
|
https://msdn.microsoft.com/en-us/library/wfa0edys.aspx
|
|
|
|
* Asynchronous Exceptions (SEH): :partial:`Partial`.
|
|
Structured exceptions (``__try`` / ``__except`` / ``__finally``) mostly
|
|
work on x86 and x64.
|
|
LLVM does not model asynchronous exceptions, so it is currently impossible to
|
|
catch an asynchronous exception generated in the same frame as the catching
|
|
``__try``.
|
|
|
|
* Thread-safe initialization of local statics: :good:`Complete`. MSVC 2015
|
|
added support for thread-safe initialization of such variables by taking an
|
|
ABI break.
|
|
We are ABI compatible with both the MSVC 2013 and 2015 ABI for static local
|
|
variables.
|
|
|
|
* Lambdas: :good:`Mostly complete`. Clang is compatible with Microsoft's
|
|
implementation of lambdas except for providing overloads for conversion to
|
|
function pointer for different calling conventions. However, Microsoft's
|
|
extension is non-conforming.
|
|
|
|
Template instantiation and name lookup
|
|
======================================
|
|
|
|
MSVC allows many invalid constructs in class templates that Clang has
|
|
historically rejected. In order to parse widely distributed headers for
|
|
libraries such as the Active Template Library (ATL) and Windows Runtime Library
|
|
(WRL), some template rules have been relaxed or extended in Clang on Windows.
|
|
|
|
The first major semantic difference is that MSVC appears to defer all parsing
|
|
an analysis of inline method bodies in class templates until instantiation
|
|
time. By default on Windows, Clang attempts to follow suit. This behavior is
|
|
controlled by the ``-fdelayed-template-parsing`` flag. While Clang delays
|
|
parsing of method bodies, it still parses the bodies *before* template argument
|
|
substitution, which is not what MSVC does. The following compatibility tweaks
|
|
are necessary to parse the template in those cases.
|
|
|
|
MSVC allows some name lookup into dependent base classes. Even on other
|
|
platforms, this has been a `frequently asked question`_ for Clang users. A
|
|
dependent base class is a base class that depends on the value of a template
|
|
parameter. Clang cannot see any of the names inside dependent bases while it
|
|
is parsing your template, so the user is sometimes required to use the
|
|
``typename`` keyword to assist the parser. On Windows, Clang attempts to
|
|
follow the normal lookup rules, but if lookup fails, it will assume that the
|
|
user intended to find the name in a dependent base. While parsing the
|
|
following program, Clang will recover as if the user had written the
|
|
commented-out code:
|
|
|
|
.. _frequently asked question:
|
|
https://clang.llvm.org/compatibility.html#dep_lookup
|
|
|
|
.. code-block:: c++
|
|
|
|
template <typename T>
|
|
struct Foo : T {
|
|
void f() {
|
|
/*typename*/ T::UnknownType x = /*this->*/unknownMember;
|
|
}
|
|
};
|
|
|
|
After recovery, Clang warns the user that this code is non-standard and issues
|
|
a hint suggesting how to fix the problem.
|
|
|
|
As of this writing, Clang is able to compile a simple ATL hello world
|
|
application. There are still issues parsing WRL headers for modern Windows 8
|
|
apps, but they should be addressed soon.
|
|
|
|
__forceinline behavior
|
|
======================
|
|
|
|
``__forceinline`` behaves like ``[[clang::always_inline]]``.
|
|
Inlining is always attempted regardless of optimization level.
|
|
|
|
This differs from MSVC where ``__forceinline`` is only respected once inline expansion is enabled
|
|
which allows any function marked implicitly or explicitly ``inline`` or ``__forceinline`` to be expanded.
|
|
Therefore functions marked ``__forceinline`` will be expanded when the optimization level is ``/Od`` unlike
|
|
MSVC where ``__forceinline`` will not be expanded under ``/Od``.
|
|
|
|
SIMD and instruction set intrinsic behavior
|
|
===========================================
|
|
|
|
Clang follows the GCC model for intrinsics and not the MSVC model.
|
|
There are currently no plans to support the MSVC model.
|
|
|
|
MSVC intrinsics always emit the machine instruction the intrinsic models regardless of the compile time options specified.
|
|
For example ``__popcnt`` always emits the x86 popcnt instruction even if the compiler does not have the option enabled to emit popcnt on its own volition.
|
|
|
|
There are two common cases where code that compiles with MSVC will need reworking to build on clang.
|
|
Assume the examples are only built with `-msse2` so we do not have the intrinsics at compile time.
|
|
|
|
.. code-block:: c++
|
|
|
|
unsigned PopCnt(unsigned v) {
|
|
if (HavePopCnt)
|
|
return __popcnt(v);
|
|
else
|
|
return GenericPopCnt(v);
|
|
}
|
|
|
|
.. code-block:: c++
|
|
|
|
__m128 dot4_sse3(__m128 v0, __m128 v1) {
|
|
__m128 r = _mm_mul_ps(v0, v1);
|
|
r = _mm_hadd_ps(r, r);
|
|
r = _mm_hadd_ps(r, r);
|
|
return r;
|
|
}
|
|
|
|
Clang expects that either you have compile time support for the target features, `-msse3` and `-mpopcnt`, you mark the function with the expected target feature or use runtime detection with an indirect call.
|
|
|
|
.. code-block:: c++
|
|
|
|
__attribute__((__target__("sse3"))) __m128 dot4_sse3(__m128 v0, __m128 v1) {
|
|
__m128 r = _mm_mul_ps(v0, v1);
|
|
r = _mm_hadd_ps(r, r);
|
|
r = _mm_hadd_ps(r, r);
|
|
return r;
|
|
}
|
|
|
|
The SSE3 dot product can be easily fixed by either building the translation unit with SSE3 support or using `__target__` to compile that specific function with SSE3 support.
|
|
|
|
.. code-block:: c++
|
|
|
|
unsigned PopCnt(unsigned v) {
|
|
if (HavePopCnt)
|
|
return __popcnt(v);
|
|
else
|
|
return GenericPopCnt(v);
|
|
}
|
|
|
|
The above ``PopCnt`` example must be changed to work with clang. If we mark the function with `__target__("popcnt")` then the compiler is free to emit popcnt at will which we do not want. While this isn't a concern in our small example it is a concern in larger functions with surrounding code around the intrinsics. Similar reasoning for compiling the translation unit with `-mpopcnt`.
|
|
We must split each branch into its own function that can be called indirectly instead of using the intrinsic directly.
|
|
|
|
.. code-block:: c++
|
|
|
|
__attribute__((__target__("popcnt"))) unsigned hwPopCnt(unsigned v) { return __popcnt(v); }
|
|
unsigned (*PopCnt)(unsigned) = HavePopCnt ? hwPopCnt : GenericPopCnt;
|
|
|
|
.. code-block:: c++
|
|
|
|
__attribute__((__target__("popcnt"))) unsigned hwPopCnt(unsigned v) { return __popcnt(v); }
|
|
unsigned PopCnt(unsigned v) {
|
|
if (HavePopCnt)
|
|
return hwPopCnt(v);
|
|
else
|
|
return GenericPopCnt(v);
|
|
}
|
|
|
|
In the above example ``hwPopCnt`` will not be inlined into ``PopCnt`` since ``PopCnt`` doesn't have the popcnt target feature.
|
|
With a larger function that does real work the function call overhead is negligible. However in our popcnt example there is the function call
|
|
overhead. There is no analog for this specific MSVC behavior in clang.
|
|
|
|
For clang we effectively have to create the dispatch function ourselves to each specfic implementation.
|
|
|
|
SIMD vector types
|
|
=================
|
|
|
|
Clang's simd vector types are builtin types and not user defined types as in MSVC. This does have some observable behavior changes.
|
|
We will look at the x86 `__m128` type for the examples below but the statements apply to all vector types including ARM's `float32x4_t`.
|
|
|
|
There are no members that can be accessed on the vector types. Vector types are not structs in clang.
|
|
You cannot use ``__m128.m128_f32[0]`` to access the first element of the `__m128`.
|
|
This also means struct initialization like ``__m128{ { 0.0f, 0.0f, 0.0f, 0.0f } }`` will not compile with clang.
|
|
|
|
Since vector types are builtin types, clang implements operators on them natively.
|
|
|
|
.. code-block:: c++
|
|
|
|
#ifdef _MSC_VER
|
|
__m128 operator+(__m128 a, __m128 b) { return _mm_add_ps(a, b); }
|
|
#endif
|
|
|
|
The above code will fail to compile since overloaded 'operator+' must have at least one parameter of class or enumeration type.
|
|
You will need to fix such code to have the check ``#if defined(_MSC_VER) && !defined(__clang__)``.
|
|
|
|
Since `__m128` is not a class type in clang any overloads after a template definition will not be considered.
|
|
|
|
.. code-block:: c++
|
|
|
|
template<class T>
|
|
void foo(T) {}
|
|
|
|
template<class T>
|
|
void bar(T t) {
|
|
foo(t);
|
|
}
|
|
|
|
void foo(__m128) {}
|
|
|
|
int main() {
|
|
bar(_mm_setzero_ps());
|
|
}
|
|
|
|
With MSVC ``foo(__m128)`` will be selected but with clang ``foo<__m128>()`` will be selected since on clang `__m128` is a builtin type.
|
|
|
|
In general the takeaway is `__m128` is a builtin type on clang while a class type on MSVC.
|