.enable lowers to Unroll LoopControl
.disable lowers to DontUnroll LoopControl
.count lowers to PartialCount LoopControl
.full lowers to Unroll LoopControl
TODO in future patches: enable structurizer for non-vulkan targets.
---------
Signed-off-by: Sidorov, Dmitry <dmitry.sidorov@intel.com>
After https://github.com/llvm/llvm-project/pull/130605
structurizer/cf.switch.ifstmt.simple2.ll test case starts failing with
the "PHI operand is not live-out from predecessor" diagnostic message.
This test case didn't include usual "-verify-machineinstrs" argument and
the fail was missed before
https://github.com/llvm/llvm-project/pull/130605 merging.
The problem looks not blocking, because the test case successfully
passes its CHECK's. This PR is to fix the build process by mark the test
case as XFAIL when LLVM_ENABLE_EXPENSIVE_CHECKS is enabled.
Investigation of the Machine Verifier complaint is to do. The issue is
created: https://github.com/llvm/llvm-project/issues/133141
When retrieving the ExplicitSpecifier from a CXXConstructorDecl, one of
its canonical declarations is returned. To correctly write the
declaration record the ExplicitSpecifier of the current declaration must
be used.
Failing to do so results in a crash during deserialization.
Added pattern so s_nor is selected for ((i1 x or i1 y) xor -1) instead
of s_or and s_xor . This patch is for i1 divergent. The ballot in the
test is added for the retrieval of lanemask. The control flow is needed
because the combiner can't pass through phi instructions.
When building with asserts enabled, this can actually cause strange
miscompilations because an incorrect llvm.assume is generated at the
point of the assertion.
Previously `optin.taint.GenericTaint` misinterpreted the parameter
indices and produced false positives in situations when a [format
attribute](https://clang.llvm.org/docs/AttributeReference.html#format)
is applied on a non-static method. This commit fixes this bug
Relands #132907 with a fix in the testcase:
clang/test/CodeGen/Mips/subtarget-feature-test.c
enable this test for only mips64 target
PR #130587 defined same SubTargetFeature for CPUs i6400 and i6500 which
resulted into following warning when -mcpu=i6500 was used:
+i6500' is not a recognized feature for this target (ignoring feature)
This PR fixes above issue by defining separate SubTargetFeature for
i6500.
Add a new VPIRPhi subclass of VPIRInstruction, that purely serves as an
overlay, to provide more convenient checking (via directly doing
isa/dyn_cast/cast) and specialied execute/print implementations.
Both VPIRInstruction and VPIRPhi share the same VPDefID, and are
differentiated by the backing IR instruction.
This pattern could alos be used to provide more specialized interfaces
for some VPInstructions ocpodes, without introducing new, completely
spearate recipes. An example would be modeling VPWidenPHIRecipe &
VPScalarPHIRecip using VPInstructions opcodes and providing an interface
to retrieve incoming blocks and values through a VPInstruction subclass
similar to VPIRPhi.
PR: https://github.com/llvm/llvm-project/pull/129387
The format is: `!instances<T>([regex])`.
This operator produces a list of records whose type is `T`. If
`regex` is provided, only records whose name matches the regular
expression `regex` will be included. The format of `regex` is ERE
(Extended POSIX Regular Expressions).
These functions were already nominally in the CLC library.
Similar to others, these builtins are now vectorized and are not broken
down into scalar types.
The libclc build system isn't well set up to pass arbitrary options to
arbitrary source files in a non-intrusive way. There isn't currently any
other motivating example to warrant rewriting the build system just to
satisfy this requirement. So this commit uses a filename-based approach
to inserting this option into the list of compile flags.
Inspired by https://reviews.llvm.org/D130755.
I don't know the logic behind the value 5, it is copied from AArch64.
For some tests, I have to change the trip count so that we don't
break what they are testing.
Set `addiu` as `isAsCheapAsAMove` only when the src register or imm is
zero only.
If other cases are set `isAsCheapAsAMove`, MachineLICM will reject to
hoist it.
In lite mode, we only emit code for a subset of functions while
preserving the original code in .bolt.org.text. This requires updating
code references in non-emitted functions to ensure that:
* Non-optimized versions of the optimized code never execute.
* Function pointer comparison semantics is preserved.
On x86-64, we can update code references in-place using "pending
relocations" added in scanExternalRefs(). However, on AArch64, this is
not always possible due to address range limitations and linker address
"relaxation".
There are two types of code-to-code references: control transfer (e.g.,
calls and branches) and function pointer materialization.
AArch64-specific control transfer instructions are covered by #116964.
For function pointer materialization, simply changing the immediate
field of an instruction is not always sufficient. In some cases, we need
to modify a pair of instructions, such as undoing linker relaxation and
converting NOP+ADR into ADRP+ADD sequence.
To achieve this, we use the instruction patch mechanism instead of
pending relocations. Instruction patches are emitted via the regular MC
layer, just like regular functions. However, they have a fixed address
and do not have an associated symbol table entry. This allows us to make
more complex changes to the code, ensuring that function pointers are
correctly updated. Such mechanism should also be portable to RISC-V and
other architectures.
To summarize, for AArch64, we extend the scanExternalRefs() process to
undo linker relaxation and use instruction patches to partially
overwrite unoptimized code.
We can use *Set::insert_range to collapse:
for (auto Elem : Range)
Set.insert(E.first);
down to:
Set.insert_range(llvm::make_first_range(Range));
In some cases, we can further fold that into the set declaration.
This patch combines:
DenseMap<MachineBasicBlock *, bool> ReachableMap;
SmallVector<MachineBasicBlock *, 4> ReachableOrdered;
into:
MapVector<MachineBasicBlock *, bool> ReachableMap;
because we add elements to the two data structures in lockstep, and we
care about preserving the insertion order.
As a side benefit, we get to avoid hash lookups at:
ReachableMap[MBB] = true;
This also fixes errors when using Clang with step-by-step compilation.
Because the optimization will pass relocation information to memory
access instructions. For example:
t.c:
```
float f = 0.1;
float foo() { return f;}
```
```
clang --target=loongarch64 -O2 -c t.c --save-temps
```
Reviewed By: tangaac, SixWeining
Pull Request: https://github.com/llvm/llvm-project/pull/133225
We can use *Set::insert_range to collapse:
for (auto Elem : Range)
Set.insert(E);
down to:
Set.insert_range(Range);
In some cases, we can further fold that into the set declaration.
Clean the unneeded field 'TotalCollectedSamples' and the unnecessary
loop.
The field seems introduced in:https://reviews.llvm.org/D31952, and its
uses were removed in: https://reviews.llvm.org/D19287, but this field
and unnecessary calculation were not cleaned up.
This patch will remove these unneeded codes.
v2048i1 is an MVT, but v2048i8 is not so we don't support i8 vectors
with more than 1024 elements. Lowering a v2048i1 shufflevector would
requires promoting to v2048i8. Since v2048i8 isn't legal and isn't an
MVT this leads to a crash.
To fix the crash, this patch makes v2048i1 an illegal type.
I'm looking into using sub-operands for memory operands. This would use
MIOperandInfo to create a single operand that contains a register and
immediate as sub-operands. We can treat this as a single operand for
parsing and matching in the assembler. I believe this will provide some
simplifications like removing the InstAliases we need to support "(rs1)"
without an immediate.
Doing this requires making CompressInstEmitter aware of sub-operands.
I've chosen to use a flat list of operands in the CompressPats so each
sub-operand is represented individually.
Fixes a regression introduced in
https://github.com/llvm/llvm-project/pull/130537 and reported here
https://github.com/llvm/llvm-project/issues/133144
This fixes a crash in ASTStructuralEquivalence where the non-null
precondition for IsStructurallyEquivalent would be violated, when
comparing member pointers with a dependent class.
This also drive-by fixes the ast node traverser for member pointers so
it doesn't traverse into the qualifier in case it's not a type, or the
class declaration in case it would be equivalent to what the qualifier
refers.
This avoids printing of `<<<NULL>>>` on the text node dumper, which is
redundant.
No release notes since the regression was never released.
Fixes https://github.com/llvm/llvm-project/issues/133144
… unnecessary FunctionSanitizer construction (NFC)
This patch moves several early-exit checks (e.g., empty function, etc.)
out of `AddressSanitizer::instrumentFunction` and into the caller. This
change avoids unnecessary construction of FunctionSanitizer when
instrumentation is not needed.
Adding wide integer emulation support for `arith.fpto*i` operations. As
the other emulated operations, the upper and lower `N` bits of the `i2N`
integer result are emitted separately.
For the unsigned case we use the following emulation
```c
// example is 64 -> 32 bit emulation, but the implementation is generalized to any 2N -> N case
const double TWO_POW_N = (uint_64_t(1) << N); // 2^N, N is the bitwidth of the widest int supported
// f is a floating-point value representing the input of the fptoui op.
uint32_t hi = (uint32_t)(f / TWO_POW_N); // Truncates the division result
uint32_t lo = (uint32_t)(f - hi * TWO_POW_N); // Subtracts to get the lower bits.
```
For the signed case, we defer the emulation of the absolute value to
`fptoui` and handle the sign:
```
fptosi(fp) = sign(fp) * fptoui(abs(fp))
```
The edge cases of `NaNs, +-inf` and overflows/underflows are undefined
behaviour and the resulting numbers are the combination of the lower
bitwidth UB values. These operations also propagate poison values.
Signed-off-by: Ege Beysel <beysel@roofline.ai>
Reverts llvm/llvm-project#108880 .
The patch has no regression test, no description of why the fix is
necessary, and the code is modifying MC datastructures in a way that's
forbidden in the AsmPrinter.
Fixes#132055.