mirror of
https://github.com/llvm/llvm-project.git
synced 2025-04-23 21:56:04 +00:00

These functions all map to the corresponding LLVM intrinsics, but the vector intrinsics weren't being generated. The intrinsic mapping from CLC vector function to vector intrinsic was working correctly, but the mapping from OpenCL builtin to CLC function was suboptimally recursively splitting vectors in halves. For example, with this change, `ceil(float16)` calls `llvm.ceil.v16f32` directly once optimizations are applied. Now also, instead of generating LLVM intrinsics through `__asm` we now call clang elementwise builtins for each CLC builtin. This should be a more standard way of achieving the same result The CLC versions of each of these builtins are also now built and enabled for SPIR-V targets. The LLVM -> SPIR-V translator maps the intrinsics to the appropriate OpExtInst, so there should be no difference in semantics, despite the newly introduced indirection from OpenCL builtin through the CLC builtin to the intrinsic. The AMDGPU targets make use of the same `_CLC_DEFINE_UNARY_BUILTIN` macro to override `sqrt`, so those functions also appear more optimal with this change, calling the vector `llvm.sqrt.vXf32` intrinsics directly.