Added pattern so s_nor is selected for ((i1 x or i1 y) xor -1) instead
of s_or and s_xor . This patch is for i1 divergent. The ballot in the
test is added for the retrieval of lanemask. The control flow is needed
because the combiner can't pass through phi instructions.
When building with asserts enabled, this can actually cause strange
miscompilations because an incorrect llvm.assume is generated at the
point of the assertion.
Previously `optin.taint.GenericTaint` misinterpreted the parameter
indices and produced false positives in situations when a [format
attribute](https://clang.llvm.org/docs/AttributeReference.html#format)
is applied on a non-static method. This commit fixes this bug
Relands #132907 with a fix in the testcase:
clang/test/CodeGen/Mips/subtarget-feature-test.c
enable this test for only mips64 target
PR #130587 defined same SubTargetFeature for CPUs i6400 and i6500 which
resulted into following warning when -mcpu=i6500 was used:
+i6500' is not a recognized feature for this target (ignoring feature)
This PR fixes above issue by defining separate SubTargetFeature for
i6500.
Add a new VPIRPhi subclass of VPIRInstruction, that purely serves as an
overlay, to provide more convenient checking (via directly doing
isa/dyn_cast/cast) and specialied execute/print implementations.
Both VPIRInstruction and VPIRPhi share the same VPDefID, and are
differentiated by the backing IR instruction.
This pattern could alos be used to provide more specialized interfaces
for some VPInstructions ocpodes, without introducing new, completely
spearate recipes. An example would be modeling VPWidenPHIRecipe &
VPScalarPHIRecip using VPInstructions opcodes and providing an interface
to retrieve incoming blocks and values through a VPInstruction subclass
similar to VPIRPhi.
PR: https://github.com/llvm/llvm-project/pull/129387
The format is: `!instances<T>([regex])`.
This operator produces a list of records whose type is `T`. If
`regex` is provided, only records whose name matches the regular
expression `regex` will be included. The format of `regex` is ERE
(Extended POSIX Regular Expressions).
These functions were already nominally in the CLC library.
Similar to others, these builtins are now vectorized and are not broken
down into scalar types.
The libclc build system isn't well set up to pass arbitrary options to
arbitrary source files in a non-intrusive way. There isn't currently any
other motivating example to warrant rewriting the build system just to
satisfy this requirement. So this commit uses a filename-based approach
to inserting this option into the list of compile flags.
Inspired by https://reviews.llvm.org/D130755.
I don't know the logic behind the value 5, it is copied from AArch64.
For some tests, I have to change the trip count so that we don't
break what they are testing.
Set `addiu` as `isAsCheapAsAMove` only when the src register or imm is
zero only.
If other cases are set `isAsCheapAsAMove`, MachineLICM will reject to
hoist it.
In lite mode, we only emit code for a subset of functions while
preserving the original code in .bolt.org.text. This requires updating
code references in non-emitted functions to ensure that:
* Non-optimized versions of the optimized code never execute.
* Function pointer comparison semantics is preserved.
On x86-64, we can update code references in-place using "pending
relocations" added in scanExternalRefs(). However, on AArch64, this is
not always possible due to address range limitations and linker address
"relaxation".
There are two types of code-to-code references: control transfer (e.g.,
calls and branches) and function pointer materialization.
AArch64-specific control transfer instructions are covered by #116964.
For function pointer materialization, simply changing the immediate
field of an instruction is not always sufficient. In some cases, we need
to modify a pair of instructions, such as undoing linker relaxation and
converting NOP+ADR into ADRP+ADD sequence.
To achieve this, we use the instruction patch mechanism instead of
pending relocations. Instruction patches are emitted via the regular MC
layer, just like regular functions. However, they have a fixed address
and do not have an associated symbol table entry. This allows us to make
more complex changes to the code, ensuring that function pointers are
correctly updated. Such mechanism should also be portable to RISC-V and
other architectures.
To summarize, for AArch64, we extend the scanExternalRefs() process to
undo linker relaxation and use instruction patches to partially
overwrite unoptimized code.
We can use *Set::insert_range to collapse:
for (auto Elem : Range)
Set.insert(E.first);
down to:
Set.insert_range(llvm::make_first_range(Range));
In some cases, we can further fold that into the set declaration.
This patch combines:
DenseMap<MachineBasicBlock *, bool> ReachableMap;
SmallVector<MachineBasicBlock *, 4> ReachableOrdered;
into:
MapVector<MachineBasicBlock *, bool> ReachableMap;
because we add elements to the two data structures in lockstep, and we
care about preserving the insertion order.
As a side benefit, we get to avoid hash lookups at:
ReachableMap[MBB] = true;
This also fixes errors when using Clang with step-by-step compilation.
Because the optimization will pass relocation information to memory
access instructions. For example:
t.c:
```
float f = 0.1;
float foo() { return f;}
```
```
clang --target=loongarch64 -O2 -c t.c --save-temps
```
Reviewed By: tangaac, SixWeining
Pull Request: https://github.com/llvm/llvm-project/pull/133225
We can use *Set::insert_range to collapse:
for (auto Elem : Range)
Set.insert(E);
down to:
Set.insert_range(Range);
In some cases, we can further fold that into the set declaration.
Clean the unneeded field 'TotalCollectedSamples' and the unnecessary
loop.
The field seems introduced in:https://reviews.llvm.org/D31952, and its
uses were removed in: https://reviews.llvm.org/D19287, but this field
and unnecessary calculation were not cleaned up.
This patch will remove these unneeded codes.
v2048i1 is an MVT, but v2048i8 is not so we don't support i8 vectors
with more than 1024 elements. Lowering a v2048i1 shufflevector would
requires promoting to v2048i8. Since v2048i8 isn't legal and isn't an
MVT this leads to a crash.
To fix the crash, this patch makes v2048i1 an illegal type.
I'm looking into using sub-operands for memory operands. This would use
MIOperandInfo to create a single operand that contains a register and
immediate as sub-operands. We can treat this as a single operand for
parsing and matching in the assembler. I believe this will provide some
simplifications like removing the InstAliases we need to support "(rs1)"
without an immediate.
Doing this requires making CompressInstEmitter aware of sub-operands.
I've chosen to use a flat list of operands in the CompressPats so each
sub-operand is represented individually.
Fixes a regression introduced in
https://github.com/llvm/llvm-project/pull/130537 and reported here
https://github.com/llvm/llvm-project/issues/133144
This fixes a crash in ASTStructuralEquivalence where the non-null
precondition for IsStructurallyEquivalent would be violated, when
comparing member pointers with a dependent class.
This also drive-by fixes the ast node traverser for member pointers so
it doesn't traverse into the qualifier in case it's not a type, or the
class declaration in case it would be equivalent to what the qualifier
refers.
This avoids printing of `<<<NULL>>>` on the text node dumper, which is
redundant.
No release notes since the regression was never released.
Fixes https://github.com/llvm/llvm-project/issues/133144
… unnecessary FunctionSanitizer construction (NFC)
This patch moves several early-exit checks (e.g., empty function, etc.)
out of `AddressSanitizer::instrumentFunction` and into the caller. This
change avoids unnecessary construction of FunctionSanitizer when
instrumentation is not needed.
Adding wide integer emulation support for `arith.fpto*i` operations. As
the other emulated operations, the upper and lower `N` bits of the `i2N`
integer result are emitted separately.
For the unsigned case we use the following emulation
```c
// example is 64 -> 32 bit emulation, but the implementation is generalized to any 2N -> N case
const double TWO_POW_N = (uint_64_t(1) << N); // 2^N, N is the bitwidth of the widest int supported
// f is a floating-point value representing the input of the fptoui op.
uint32_t hi = (uint32_t)(f / TWO_POW_N); // Truncates the division result
uint32_t lo = (uint32_t)(f - hi * TWO_POW_N); // Subtracts to get the lower bits.
```
For the signed case, we defer the emulation of the absolute value to
`fptoui` and handle the sign:
```
fptosi(fp) = sign(fp) * fptoui(abs(fp))
```
The edge cases of `NaNs, +-inf` and overflows/underflows are undefined
behaviour and the resulting numbers are the combination of the lower
bitwidth UB values. These operations also propagate poison values.
Signed-off-by: Ege Beysel <beysel@roofline.ai>
Reverts llvm/llvm-project#108880 .
The patch has no regression test, no description of why the fix is
necessary, and the code is modifying MC datastructures in a way that's
forbidden in the AsmPrinter.
Fixes#132055.
In the `qc.cm.pushfp` instruction, it is like `cm.pushfp` except in one
important way - `qc.cm.pushfp {ra}, -N*16` is not a valid encoding,
because this would update `s0`/`fp`/`x8` without saving it.
This change now correctly rejects this variant of the instruction, both
during parsing and during disassembly. I also implemented validation for
immediates that represent register lists (both kinds), which may help to
catch bugs in the future.
This commit pulls apart the inherent attribute dependence of classes
like EnumAttrInfo and EnumAttrCase, factoring them out into simpler
EnumCase and EnumInfo variants. This allows specifying the cases of an
enum without needing to make the cases, or the EnumInfo itself, a
subclass of SignlessIntegerAttrBase.
The existing classes are retained as subclasses of the new ones, both
for backwards compatibility and to allow attribute-specific information.
In addition, the new BitEnum class changes its default printer/parser
behavior: cases when multiple keywords appear, like having both nuw and
nsw in overflow flags, will no longer be quoted by the operator<<, and
the FieldParser instance will now expect multiple keywords. All
instances of BitEnumAttr retain the old behavior.