Investigating #96612 shows our implementation was different from the
Standard and could cause UB. Testing the codegen showed quite a bit of
assembly generated for these functions. The functions have been written
differently which allows Clang to optimize the code to use simple CPU
rotate instructions.
Fixes: https://github.com/llvm/llvm-project/issues/96612
This "bug" was probably not noticed because it doesn't affect any integer
type we currently support. It requires integers with more than 2x the
size of `unsigned long long`. However, with such types, the algorithm
used to break down the large integer into groups of size `unsigned long long`
didn't work because we rotated in the wrong direction.
For example, the 256 bit number (1 << 255) would yield the wrong answer
when used with the algorithm before this patch.
In particular, note that the current rotation happens to work for 128 bit
integers because it just swaps the halves in this case.
Differential Revision: https://reviews.llvm.org/D134625
Co-authored-by: Louis Dionne <ldionne.2@gmail.com>