mirror of
https://github.com/llvm/llvm-project.git
synced 2025-04-29 10:16:05 +00:00

This patch adds FeatureSVEB16B16 to the AArch64 backend in order to represent the new behavior of FEAT_SVE_B16B16 (as described in the latest [Armv9.4 extensions documentation](https://developer.arm.com/documentation/109697/0100/Feature-descriptions/The-Armv9-4-architecture-extension?lang=en#md461-the-armv94-architecture-extension__FEAT_SVE_B16B16)) as well as a 'sve-b16b16' flag to enable it. The predication of non-widening SVE BFloat16 instructions has changed to require this feature, instead of the previously required and soon-to-be-removed FeatureB16B16 which is enabled by the 'b16b16' flag. Therefore, this change weakens the 'b16b16' flag in favour of 'sve-b16b16'. Existing tests that are effected by this have been modified to use and/or expect 'sve-b16b16', and new tests have been added to verify the behavior and implementation of 'sve-b16b16'. This patch is in response to the response to the following changes. The architecture features previously enabled by FEAT_SVE_B16B16 have been relaxed such that it now implements: - With FEAT_SVE2 : SVE non-widening BFloat16 instructions in Non-streaming SVE mode - With FEAT_SME2: SVE non-widening BFloat16 instructions when the PE is in Streaming SVE mode and SME Z-targeting multi-vector non-widening BFloat16 instructions. - **It no longer implements** SME ZA-targeting non-widening BFloat16 instructions. The SME ZA-targeting non-widening BFloat16 instructions are implemented by the new FEAT_SME_B16B16, **this patch does not change how this architecture feature is enabled** ('+b16b16+sme2'). Only those that are implemented by FEAT_SVE_B16B16 have been changed to require 'sve-b16b16' instead of 'b16b16'. New flags must be created to represent FEAT_SVE_B16B16 and FEAT_SME_B16B16: - 'sve-b16b16' enables the updated FEAT_SVE_B16B16 (described here) - 'sme-b16b16' will enable the new FEAT_SME_B16B16 - **This patch includes 'sve-b16b16' only** A future patch will add 'sme-b16b16', SME ZA-targeting non-widening BFloat16 instructions would then be guarded by '+sme-b16b16+sme2', and 'b16b16' can be removed.