mirror of
https://github.com/llvm/llvm-project.git
synced 2025-04-17 06:56:44 +00:00

The MMX instruction set is legacy, and the SSE2 variants are in every way superior, when they are available -- and they have been available since the Pentium 4 was released, 20 years ago. Therefore, we are switching the "MMX" intrinsics to depend on SSE2, unconditionally. This change entirely drops the ability to generate vectorized code using compiler intrinsics for chips with MMX but without SSE2: the Intel Pentium MMX, Pentium, II, and Pentium III (released 1997-1999), as well as AMD K6 and K7 series chips of around the same timeframe. Targeting these older CPUs remains supported -- simply without the ability to use MMX compiler intrinsics. Migrating away from the use of MMX registers also fixes a rather non-obvious requirement. The long-standing programming model for these MMX intrinsics requires that the programmer be aware of the x87/MMX mode-switching semantics, and manually call `_mm_empty()` between using any MMX instruction and any x87 FPU instruction. If you neglect to, then every future x87 operation will return a NaN result. This requirement is not at all obvious to users of these these intrinsic functions, and causes very difficult to detect bugs. Worse, even if the user did write code that correctly calls `_mm_empty()` in the right places, LLVM may sometimes reorder x87 and mmx operations around each-other, unaware of this mode switching issue. Eliminating the use of MMX registers eliminates this problem. This change also deletes the now-unnecessary MMX `__builtin_ia32_*` functions from Clang. Only 3 MMX-related builtins remain in use -- `__builtin_ia32_emms`, used by `_mm_empty`, and `__builtin_ia32_vec_{ext,set}_v4si`, used by `_mm_insert_pi16` and `_mm_extract_pi16`. Note particularly that the latter two lower to generic, non-MMX, IR. Support for the LLVM intrinsics underlying these removed builtins still remains, for the moment. The file `clang/www/builtins.py` has been updated with mappings from the newly-removed `__builtin_ia32` functions to the still-supported equivalents in `mmintrin.h`. (Originally uploaded at https://reviews.llvm.org/D86855 and https://reviews.llvm.org/D94252) Fixes issue #41665 Works towards #98272
24 lines
945 B
C
24 lines
945 B
C
// RUN: %clang_cc1 -emit-llvm -triple i386 -target-feature +sse2 %s -o - | FileCheck %s
|
|
#include <mmintrin.h>
|
|
|
|
void shift(__m64 a, __m64 b, int c) {
|
|
// CHECK: <8 x i16> @llvm.x86.sse2.pslli.w(<8 x i16> %{{.*}}, i32 {{.*}})
|
|
_mm_slli_pi16(a, c);
|
|
// CHECK: <4 x i32> @llvm.x86.sse2.pslli.d(<4 x i32> %{{.*}}, i32 {{.*}})
|
|
_mm_slli_pi32(a, c);
|
|
// CHECK: <2 x i64> @llvm.x86.sse2.pslli.q(<2 x i64> %{{.*}}, i32 {{.*}})
|
|
_mm_slli_si64(a, c);
|
|
|
|
// CHECK: <8 x i16> @llvm.x86.sse2.psrli.w(<8 x i16> %{{.*}}, i32 {{.*}})
|
|
_mm_srli_pi16(a, c);
|
|
// CHECK: <4 x i32> @llvm.x86.sse2.psrli.d(<4 x i32> %{{.*}}, i32 {{.*}})
|
|
_mm_srli_pi32(a, c);
|
|
// CHECK: <2 x i64> @llvm.x86.sse2.psrli.q(<2 x i64> %{{.*}}, i32 {{.*}})
|
|
_mm_srli_si64(a, c);
|
|
|
|
// CHECK: <8 x i16> @llvm.x86.sse2.psrai.w(<8 x i16> %{{.*}}, i32 {{.*}})
|
|
_mm_srai_pi16(a, c);
|
|
// CHECK: <4 x i32> @llvm.x86.sse2.psrai.d(<4 x i32> %{{.*}}, i32 {{.*}})
|
|
_mm_srai_pi32(a, c);
|
|
}
|