The existing Create method took a path to the ORC runtime and created a
StaticLibraryDefinitionGenerator for it. The new overload takes a
std::unique_ptr<DefinitionGenerator> directly instead. This provides more
flexibility when constructing MachOPlatforms. E.g. The runtime archive can be
embedded in a special section in the ORC controller executable or library,
rather than being on-disk.
Noticed while working on Issue #59867 and Issue #53419 - there's still more to do here, but for "all vector" comparisons, we should try to cast to a scalar integer for sub-128bit types
This reverts commit c52255d26a23df6ecf09f60ca3e3615467f16bbe.
That commit caused certain files (in ffmpeg, libvpx and libaom) to hang
while compiling, see https://reviews.llvm.org/D143143 for repro.
We should be able to use load(literal) to access constant pool under
the tiny code model.
Reviewed By: aemerson
Differential Revision: https://reviews.llvm.org/D132536
Renames the existing allocateString method to allocateContent and adds a pair of
allocateCString methods.
The previous allocateString method did not include a null-terminator. It behaved
the same as allocateContent except with a Twine input, rather than an
ArrayRef<char>. Renaming allocateString to allocateBuffer (overloading the
existing method) makes this clearer.
The new allocateCString methods allocate the given content plus a
null-terminator character, and return a buffer covering both the string and
null-terminator. This makes them suitable for creating c-string content for
jitlink::Blocks.
Existing users of the old allocateString method have been updated to use the
new allocateContent overload.
getFauxShuffleMask can't handle ISD::TRUNCATE itself as it can't handle inputs that are larger than the output
Another step towards removing combineX86ShuffleChainWithExtract
combineX86ShuffleChain and combineX86ShuffleChainWithExtract no longer require the shuffle inputs to be the same width as the root vector, so we can stop generating widening nodes on the fly (combineX86ShuffleChain should handles all of this).
This requires a couple of additional folds to avoid a couple of notable regressions:
getFauxShuffleMask - recognise INSERT_SUBVECTOR(X,Y,C) as a shuffle pattern as long as its not just widening the subvector.
combineConcatVectorOps - folds CONCAT_VECTORS(AssertSext(X,Ty),AssertSext(Y,Ty)) -> AssertSext(CONCAT_VECTORS(X,Y),Ty)
One of the final stages towards fixing Issue #45319 and addressing the regressions in the interleaved tests in D127115
This change moves "DefaultVLIWScheduler" class declaration from
DFAPacketizer.cpp to DFAPacketizer.h.
This is needed because there is a protected class member of
type "DefaultVLIWScheduler*" in "VLIWPacketizerList" class.
The derived classes cannot use this memeber unless declaration
is available to it. More specifically :
// Without this change
```
class HexagonPacketizerList : public VLIWPacketizerList {
public :
HexagonPacketizerList() {
// Below line will cause incomplete class error since
// declaration was not available through header.
VLIWScheduler->schedule();
}
}
```
Reviewed By: kparzysz
Differential Revision: https://reviews.llvm.org/D139767
These are similar to hardware registers already added for GFX940,
but with different numbers and slightly different names.
Differential Revision: https://reviews.llvm.org/D143740
Summary:
This is part of the leftover work for https://reviews.llvm.org/D143138.
In this work, we pass code object version as an argument to initialize target ID
and use it for targetID dump.
Reviewers: arsenm
Differential Revision
https://reviews.llvm.org/D143293
Without this patch, migrateDebugInfo doesn't understand how to handle existing
fragments that are smaller than the to-be-split store. This can occur
if. e.g. a vector store (1 dbg.assign) is split (many dbg.assigns - 1 fragment
for each scalar) and later those stores are re-vectorized (many dbg.assigns),
and then SROA runs on that.
The approach taken in this patch is to drop intrinsics with fragments outside
of the slice.
For example, starting with:
store <2 x float> %v, ptr %dest !DIAssignID !1
call void @llvm.dbg.assign(..., DIExpression(DW_OP_LLVM_fragment, 0, 32), !1, ...)
call void @llvm.dbg.assign(..., DIExpression(DW_OP_LLVM_fragment, 32, 32), !1, ...)
When visiting the slice of bits 0 to 31 we get:
store float %v.extract.0, ptr %dest !DIAssignID !2
call void @llvm.dbg.assign(..., DIExpression(DW_OP_LLVM_fragment, 0, 32), !2, ...)
The other dbg.assign associated with the currently-split store is dropped for
this split part. And visiting bits 32 to 63 we get the following:
store float %v.extract.1, ptr %adjusted.dest !DIAssignID !3
call void @llvm.dbg.assign(..., DIExpression(DW_OP_LLVM_fragment, 32, 32), !3, ...)
I've added two tests that cover this case.
Implementing this meant re-writing the fragment-calculation part of
migrateDebugInfo to work with the absolute offset of the new slice in terms of
the base alloca (instead of the offset of the slice into the new alloca), the
fragment (if any) of the variable associated with the base alloca, and the
fragment associated with the split store. Because we need the offset into the
base alloca for the variables being split, some careful wiring is required for
memory intrinsics due to the fact that memory intrinsics can be split when
either the source or dest allocas are split. In the case where the source
alloca drives the splitting, we need to be careful to pass migrateDebugInfo the
information in relation to the dest alloca.
Reviewed By: StephenTozer
Differential Revision: https://reviews.llvm.org/D143146
The mid end will reassociate sub(sub(x, m1), m2) to sub(x, add(m1, m2)). This
reassociates it back to allow the creation of more mls instructions.
Differential Revision: https://reviews.llvm.org/D143143
operator~ promote the single bit input to int. The ~ will cause the upper
31 bits to become 1s making it a negative value. This is undefined for
shift.
Mask it back down to a single bit.
The extra 1s were being shifted to bit 8 and above and the they aren't
used by the emitByte call so this shouldn't be a functional change.
If a query uses an exclusion set but we haven't used it to determine the
result, we can cache the query without exclusion set too. When we lookup
a cached result we can check for the non-exclusion set version first.
This relands the commit previously reverted in
`8570bee53a8ce0c5d04bc11f288e19a457474c4c` due to failures on linux.
The problem was that the test executable was built with absolute
OSO prefix paths. This re-commit adds a modified version of the
executable that strips the absolute OSO prefix paths and makes
sure the test appends the OSO prefix appropriately (via the appropriate
dsymutil flags).
Differential Revision: https://reviews.llvm.org/D143458
As shown in issue #60649, the new shuffles were
being inserted before a phi, and that is invalid.
It seems like most test coverage for this fold
(foldSelectShuffle) lives in the AArch64 dir,
but this doesn't repro there for a base target.
This seems to cause large regressions in existing code, as much as 75% slower
(4x the time taken). Small always inline functions seem to be used a lot in the
cmsis-dsp library.
I would add a phase ordering test to show the problems, but one already exists!
The llvm/test/Transforms/PhaseOrdering/ARM/arm_mult_q15.ll was just changed by
removing alwaysinline to hide the problems that existed.
This reverts commit cae033dcf227aeecf58fca5af6fc7fde1fd2fb4f.
This reverts commit 8e33c41e72ad42e4c27f8cbc3ad2e02b169637a1.
This fixes a few places where the addrx3 and strx3 forms were missed.
Previously this meant if one of these forms appeared somewhere various
errors could occur. This now also adds an extra test case for the addrx3
form (which previously failed).
Differential Revision: https://reviews.llvm.org/D143488
Without this patch `getDerefOffsetInBytes` incorrectly always returns
`std::nullopt` for expressions with fragments due to an off-by-one error with
fragment element indices.
Reviewed By: StephenTozer
Differential Revision: https://reviews.llvm.org/D143567
Similar to 62a0a1b9eea7788c1f9dbae -
We have pow math intrinsics in IR, but no ldexp intrinsics
to handle vector types.
A patch for that was proposed in D14327, but it was not completed.
Issue #60605
We have exp2 math intrinsics in IR, but no ldexp intrinsics
to handle vector types.
A patch for that was proposed in D14327, but it was not completed.
Issue #60605
When working out whether we can see a compressible jump-table pattern during
ConstantIslands, we were stopping when we saw a debug instruction. Instead it's
better to keep iterating backwards to the first real instruction.
https://reviews.llvm.org/D142019
Currently default simd alignment is defined by Clang specific TargetInfo class.
This class cannot be reused for LLVM Flang. That's why default simd alignment
calculation has been moved to OMPIRBuilder which is common for Flang and Clang.
Previous attempt: https://reviews.llvm.org/D138496 was wrong because
the default alignment depended on the number of built LLVM targets.
If we wanted to calculate the default alignment for PPC and we hadn't specified
PPC LLVM target to build, then we would get 0 as the alignment because
OMPIRBuilder couldn't create PPCTargetMachine object and it returned 0 as
the default value.
If PPC LLVM target had been built earlier, then OMPIRBuilder could have created
PPCTargetMachine object and it would have returned 128.
Differential Revision: https://reviews.llvm.org/D141910
Reviewed By: jdoerfert
This effectively reverts 5c38c6a and 4f772b0.
A recently introduced LazyValueInfo::getConstantRangeAtUse returns incorrect
ranges for values in certain cases. One such example is described in PR60629.
The issue has something to do with traversing PHI uses of a value transitively.
As nikic pointed out, we're effectively reasoning about values from different
loop iterations.
In the faulting test case, CVP made a miscompilation because the calculated
range for a shift argument was incorrect. It returned empty-set, however it is
clearly not a dead code. CVP then erased the shift instruction because
of empty range.