166738 Commits

Author SHA1 Message Date
Florian Hahn
2e6430666c
[LV] Update recipe builder functions to pass VPlan directly (NFC).
Passing VPlanPtr requires a dereference of std::unique_ptr on each
access, which is unnecessary. Just pass the plan by reference.
2023-02-12 22:35:14 +00:00
Kazu Hirata
15cb5ebed7 [Support] Use llvm::popcount (NFC)
This should fix builds on Windows.
2023-02-12 13:39:18 -08:00
Lang Hames
be2fc577c3 [ORC] Add MachOPlatform::Create overload -- Pass ORC runtime as def generator.
The existing Create method took a path to the ORC runtime and created a
StaticLibraryDefinitionGenerator for it. The new overload takes a
std::unique_ptr<DefinitionGenerator> directly instead. This provides more
flexibility when constructing MachOPlatforms. E.g. The runtime archive can be
embedded in a special section in the ORC controller executable or library,
rather than being on-disk.
2023-02-12 13:30:37 -08:00
Simon Pilgrim
19c1682b6a [X86] combineConcatVectorOps - concatenate 512-bit VPERMILPS nodes. 2023-02-12 18:26:28 +00:00
Simon Pilgrim
faf5616e11 BlockFrequencyInfoImpl.cpp - add missing closing namespace comment. NFC
Fixes clang-tidy llvm-namespace-comment warning
2023-02-12 16:42:28 +00:00
Simon Pilgrim
1bb95a3a99 [X86] combinePredicateReduction - attempt to fold subvector all_of(icmp_eq()) / any_of(icmp_ne()) to integers
Noticed while working on Issue #59867 and Issue #53419 - there's still more to do here, but for "all vector" comparisons, we should try to cast to a scalar integer for sub-128bit types
2023-02-12 15:23:47 +00:00
Simon Pilgrim
738370ae0e DemandedBits.cpp - use auto* when initializing from cast<>. NFC.
Silence clang-tidy warnings
2023-02-12 14:57:11 +00:00
Simon Pilgrim
1300a4fdae Revert rG23cb32c6d5bda0919cc1ef129917ceb2dbf1b1b8 "[X86] combineX86ShufflesRecursively - treat ISD::TRUNCATE as faux shuffle"
This is causing a miscompile - waiting on a regression test from @bkramer
2023-02-12 14:46:08 +00:00
Martin Storsjö
7717e1114a Revert "[AArch64] Reassociate sub(x, add(m1, m2)) to sub(sub(x, m1), m2)"
This reverts commit c52255d26a23df6ecf09f60ca3e3615467f16bbe.

That commit caused certain files (in ffmpeg, libvpx and libaom) to hang
while compiling, see https://reviews.llvm.org/D143143 for repro.
2023-02-12 16:00:32 +02:00
Sanjay Patel
f48f178717 [InstCombine] canonicalize cmp+select as smin/smax
(V == SMIN) ? SMIN+1 : V --> smax(V, SMIN+1)
(V == SMAX) ? SMAX-1 : V --> smin(V, SMAX-1)

https://alive2.llvm.org/ce/z/d5bqjy

Follow-up for the unsigned variants added with:
86b4d8645fc1b866

issue #60374
2023-02-12 07:54:43 -05:00
NAKAMURA Takumi
0e18b5feaa LLVMFuzzerCLI: [CMake] Prune the last PARTIAL_SOURCES_INTENDED to cover all sources. 2023-02-12 20:12:37 +09:00
Hsiangkai Wang
c9a7b92a23 [AArch64] Consider tiny code model in emitLoadFromConstantPool.
We should be able to use load(literal) to access constant pool under
the tiny code model.

Reviewed By: aemerson

Differential Revision: https://reviews.llvm.org/D132536
2023-02-12 06:02:47 +00:00
Kazu Hirata
df3b703a4c [AArch64] Use llvm::countr_{zero,one} (NFC) 2023-02-11 17:53:01 -08:00
Craig Topper
c8ad1de4f0 [RISCV] Remove dead code from RISCVDAGToDAGISel::selectVSETVLI. NFC
vsetvli no longer has side effects so we don't need code for
handling INSTRINSIC_W_CHAIN.
2023-02-11 16:51:35 -08:00
Craig Topper
7e772e12d1 [RISCV] Fix mistake in comment. NFC 2023-02-11 12:32:54 -08:00
Lang Hames
10b5fec256 [JITLink][ORC] Add LinkGraph::allocateCString method.
Renames the existing allocateString method to allocateContent and adds a pair of
allocateCString methods.

The previous allocateString method did not include a null-terminator. It behaved
the same as allocateContent except with a Twine input, rather than an
ArrayRef<char>. Renaming allocateString to allocateBuffer (overloading the
existing method) makes this clearer.

The new allocateCString methods allocate the given content plus a
null-terminator character, and return a buffer covering both the string and
null-terminator. This makes them suitable for creating c-string content for
jitlink::Blocks.

Existing users of the old allocateString method have been updated to use the
new allocateContent overload.
2023-02-11 12:05:28 -08:00
Simon Pilgrim
23cb32c6d5 [X86] combineX86ShufflesRecursively - treat ISD::TRUNCATE as faux shuffle
getFauxShuffleMask can't handle ISD::TRUNCATE itself as it can't handle inputs that are larger than the output

Another step towards removing combineX86ShuffleChainWithExtract
2023-02-11 19:16:08 +00:00
Lang Hames
9eccc6cce0 [JITLink] Add a predicate to test for C-string blocks. 2023-02-11 10:51:50 -08:00
Lang Hames
3d4e9d5eb0 [ORC] Move ORC-specific object format details into OrcShared.
This allows these details to be shared with JITLink, which is allowed to
depend on the OrcShared library (but not on OrcJIT).
2023-02-11 10:51:38 -08:00
Simon Pilgrim
a55b35dbee [X86] combineVectorInsert - pull out Vec/Scl/Idx operands. NFC.
These will be reused in a future patch
2023-02-11 14:02:00 +00:00
Simon Pilgrim
0b0a38a7a2 [X86] combineX86ShufflesRecursively - don't widen shuffle subvector inputs
combineX86ShuffleChain and combineX86ShuffleChainWithExtract no longer require the shuffle inputs to be the same width as the root vector, so we can stop generating widening nodes on the fly (combineX86ShuffleChain should handles all of this).

This requires a couple of additional folds to avoid a couple of notable regressions:

getFauxShuffleMask - recognise INSERT_SUBVECTOR(X,Y,C) as a shuffle pattern as long as its not just widening the subvector.

combineConcatVectorOps - folds CONCAT_VECTORS(AssertSext(X,Ty),AssertSext(Y,Ty)) -> AssertSext(CONCAT_VECTORS(X,Y),Ty)

One of the final stages towards fixing Issue #45319 and addressing the regressions in the interleaved tests in D127115
2023-02-11 13:23:04 +00:00
Darshan Bhat
19c42f672f [DFAPacketizer] Move DefaultVLIWScheduler class declaration to header file
This change moves "DefaultVLIWScheduler" class declaration from
DFAPacketizer.cpp to DFAPacketizer.h.
This is needed because there is a protected class member of
type "DefaultVLIWScheduler*" in "VLIWPacketizerList" class.
The derived classes cannot use this memeber unless declaration
is available to it. More specifically :

// Without this change

```
class HexagonPacketizerList : public VLIWPacketizerList {
  public :
	HexagonPacketizerList() {
	// Below line will cause incomplete class error since
	// declaration was not available through header.
	VLIWScheduler->schedule();
  }
}
```

Reviewed By: kparzysz

Differential Revision: https://reviews.llvm.org/D139767
2023-02-11 14:31:58 +05:30
Jay Foad
811d11b064 [AMDGPU] Add GFX11 HW_REG_PERF_SNAPSHOT_*
These are similar to hardware registers already added for GFX940,
but with different numbers and slightly different names.

Differential Revision: https://reviews.llvm.org/D143740
2023-02-10 20:28:14 +00:00
Alex Brachet
3e57aa304f [llvm-driver] Reinvoke clang as described by llvm driver extra args
Differential Revision: https://reviews.llvm.org/D137800
2023-02-10 19:42:32 +00:00
Changpeng Fang
7ca3444fba AMDGPU: Use module flag to get code object version at IR level folow-up
Summary:
  This is part of the leftover work for https://reviews.llvm.org/D143138.
In this work, we pass code object version as an argument to initialize target ID
and use it for targetID dump.

Reviewers: arsenm

Differential Revision
  https://reviews.llvm.org/D143293
2023-02-10 11:16:38 -08:00
Arthur Eubanks
c8b8d6badd [Passes] Remove some legacy passes
Namely CrossDSOCFI and GlobalSplit.

These are part of the optimization pipeline, of which the legacy pass manager version is deprecated.
2023-02-10 10:46:45 -08:00
OCHyams
295f5fafcb [Assignment Tracking] Fix migrateDebuginfo in SROA
Without this patch, migrateDebugInfo doesn't understand how to handle existing
fragments that are smaller than the to-be-split store. This can occur
if. e.g. a vector store (1 dbg.assign) is split (many dbg.assigns - 1 fragment
for each scalar) and later those stores are re-vectorized (many dbg.assigns),
and then SROA runs on that.

The approach taken in this patch is to drop intrinsics with fragments outside
of the slice.

For example, starting with:

  store <2 x float> %v, ptr %dest !DIAssignID !1
  call void @llvm.dbg.assign(..., DIExpression(DW_OP_LLVM_fragment, 0, 32), !1, ...)
  call void @llvm.dbg.assign(..., DIExpression(DW_OP_LLVM_fragment, 32, 32), !1, ...)

When visiting the slice of bits 0 to 31 we get:

  store float %v.extract.0, ptr %dest !DIAssignID !2
  call void @llvm.dbg.assign(..., DIExpression(DW_OP_LLVM_fragment, 0, 32), !2, ...)

The other dbg.assign associated with the currently-split store is dropped for
this split part. And visiting bits 32 to 63 we get the following:

  store float %v.extract.1, ptr %adjusted.dest !DIAssignID !3
  call void @llvm.dbg.assign(..., DIExpression(DW_OP_LLVM_fragment, 32, 32), !3, ...)

I've added two tests that cover this case.

Implementing this meant re-writing the fragment-calculation part of
migrateDebugInfo to work with the absolute offset of the new slice in terms of
the base alloca (instead of the offset of the slice into the new alloca), the
fragment (if any) of the variable associated with the base alloca, and the
fragment associated with the split store. Because we need the offset into the
base alloca for the variables being split, some careful wiring is required for
memory intrinsics due to the fact that memory intrinsics can be split when
either the source or dest allocas are split. In the case where the source
alloca drives the splitting, we need to be careful to pass migrateDebugInfo the
information in relation to the dest alloca.

Reviewed By: StephenTozer

Differential Revision: https://reviews.llvm.org/D143146
2023-02-10 18:10:11 +00:00
David Green
c52255d26a [AArch64] Reassociate sub(x, add(m1, m2)) to sub(sub(x, m1), m2)
The mid end will reassociate sub(sub(x, m1), m2) to sub(x, add(m1, m2)). This
reassociates it back to allow the creation of more mls instructions.

Differential Revision: https://reviews.llvm.org/D143143
2023-02-10 18:09:11 +00:00
Craig Topper
d37a31cf23 [X86] Attempt to fix ubsan failure.
operator~ promote the single bit input to int. The ~ will cause the upper
31 bits to become 1s making it a negative value. This is undefined for
shift.

Mask it back down to a single bit.

The extra 1s were being shifted to bit 8 and above and the they aren't
used by the emitByte call so this shouldn't be a functional change.
2023-02-10 10:02:51 -08:00
Johannes Doerfert
1763c63254 [Attributor][NFCI] Use a set to track dependences 2023-02-10 11:56:09 -06:00
Johannes Doerfert
86cce90e21 [Attributor][NFCI] Avoid AAIntraFnReachability updates if possible
Even if liveness changed, we only care about certain dead edges in
AAIntraFnReachability. If those are still dead, we can avoid an update.
2023-02-10 11:56:09 -06:00
Johannes Doerfert
a9557aacd1 [Attributor][NFCI] Use queries without exclusion set whenever possible
If a query uses an exclusion set but we haven't used it to determine the
result, we can cache the query without exclusion set too. When we lookup
a cached result we can check for the non-exclusion set version first.
2023-02-10 11:56:09 -06:00
Johannes Doerfert
76a1919026 [Attributor][NFC] Avoid unnecessary string operations
This caused multiple string operations which we don't need if we do not
create a profile.
2023-02-10 11:56:09 -06:00
Johannes Doerfert
bf9964fb13 [Attributor][NFCI] Create a AAIsDead for the function eagerly 2023-02-10 11:56:09 -06:00
Johannes Doerfert
8bc0bee2f8 [Attributor][NFCI] Avoid a temporary vector and exit early
This change simply avoids the temporary vector and processes the elments
right away.
2023-02-10 11:56:09 -06:00
Michael Buch
b8ef007fca Reland "[llvm][dsymutil] Add DW_TAG_imported_declaration to accelerator table"
This relands the commit previously reverted in
`8570bee53a8ce0c5d04bc11f288e19a457474c4c` due to failures on linux.

The problem was that the test executable was built with absolute
OSO prefix paths. This re-commit adds a modified version of the
executable that strips the absolute OSO prefix paths and makes
sure the test appends the OSO prefix appropriately (via the appropriate
dsymutil flags).

Differential Revision: https://reviews.llvm.org/D143458
2023-02-10 17:19:07 +00:00
Sanjay Patel
af39acda88 [VectorCombine] fix insertion point of shuffles
As shown in issue #60649, the new shuffles were
being inserted before a phi, and that is invalid.

It seems like most test coverage for this fold
(foldSelectShuffle) lives in the AArch64 dir,
but this doesn't repro there for a base target.
2023-02-10 10:57:11 -05:00
Sanjay Patel
78056e2f2d [InstCombine] propagate FMF in exp2->ldexp fold 2023-02-10 10:02:25 -05:00
Sanjay Patel
3abea2b544 [InstCombine] copy tail markings in exp2->ldexp fold 2023-02-10 10:02:25 -05:00
David Green
86bfeb906e Revert "Inlining: Run the legacy AlwaysInliner before the regular inliner."
This seems to cause large regressions in existing code, as much as 75% slower
(4x the time taken). Small always inline functions seem to be used a lot in the
cmsis-dsp library.

I would add a phase ordering test to show the problems, but one already exists!
The llvm/test/Transforms/PhaseOrdering/ARM/arm_mult_q15.ll was just changed by
removing alwaysinline to hide the problems that existed.

This reverts commit cae033dcf227aeecf58fca5af6fc7fde1fd2fb4f.
This reverts commit 8e33c41e72ad42e4c27f8cbc3ad2e02b169637a1.
2023-02-10 15:01:49 +00:00
Juan Manuel MARTINEZ CAAMAÑO
c4a250ecea [AMDGPU][MC] Generate relative relocations for allocatable (more particularly, eh_frame) sections
Reviewed By: scott.linder

Differential Revision: https://reviews.llvm.org/D142453
2023-02-10 15:54:43 +01:00
Benjamin Maxwell
f1837c7074 [DebugInfo] Handle missed DW_FORM_addrx3 and DW_FORM_strx3 cases
This fixes a few places where the addrx3 and strx3 forms were missed.
Previously this meant if one of these forms appeared somewhere various
errors could occur. This now also adds an extra test case for the addrx3
form (which previously failed).

Differential Revision: https://reviews.llvm.org/D143488
2023-02-10 14:44:18 +00:00
Simon Pilgrim
a3060f0f37 [X86] combineConcatVectorOps - concatenate AVX512 vselect nodes. NFC.
This also requires us to constant fold vXi1 concat_vector nodes
2023-02-10 14:05:35 +00:00
OCHyams
25d0f3c4d0 [Assignment Tracking] Fix fragment index error in getDerefOffsetInBytes
Without this patch `getDerefOffsetInBytes` incorrectly always returns
`std::nullopt` for expressions with fragments due to an off-by-one error with
fragment element indices.

Reviewed By: StephenTozer

Differential Revision: https://reviews.llvm.org/D143567
2023-02-10 13:49:05 +00:00
Sanjay Patel
9dcd7195a2 [InstCombine] avoid crashing in pow->ldexp
Similar to 62a0a1b9eea7788c1f9dbae -

We have pow math intrinsics in IR, but no ldexp intrinsics
to handle vector types.

A patch for that was proposed in D14327, but it was not completed.

Issue #60605
2023-02-10 08:03:13 -05:00
Sanjay Patel
62a0a1b9ee [InstCombine] avoid crashing in exp2->ldexp
We have exp2 math intrinsics in IR, but no ldexp intrinsics
to handle vector types.

A patch for that was proposed in D14327, but it was not completed.

Issue #60605
2023-02-10 07:35:39 -05:00
Tim Northover
c4ce967e34 ARM: skip debug instructions when matching jump-table patterns.
When working out whether we can see a compressible jump-table pattern during
ConstantIslands, we were stopping when we saw a debug instruction. Instead it's
better to keep iterating backwards to the first real instruction.

https://reviews.llvm.org/D142019
2023-02-10 12:27:59 +00:00
Ivan Kosarev
f0f8ae7596 [AMDGPU][AsmParser] Fix matching immediate literals.
Prevents potential matching of literal offsets to non-literal operands.

Reviewed By: dp

Differential Revision: https://reviews.llvm.org/D142194
2023-02-10 11:36:07 +00:00
Dominik Adamski
baca3c1507 Move SIMD alignment calculation to LLVM Frontend
Currently default simd alignment is defined by Clang specific TargetInfo class.
This class cannot be reused for LLVM Flang. That's why default simd alignment
calculation has been moved to OMPIRBuilder which is common for Flang and Clang.

Previous attempt: https://reviews.llvm.org/D138496 was wrong because
the default alignment depended on the number of built LLVM targets.

If we wanted to calculate the default alignment for PPC and we hadn't specified
PPC LLVM target to build, then we would get 0 as the alignment because
OMPIRBuilder couldn't create PPCTargetMachine object and it returned 0 as
the default value.

If PPC LLVM target had been built earlier, then OMPIRBuilder could have created
PPCTargetMachine object and it would have returned 128.

Differential Revision: https://reviews.llvm.org/D141910

Reviewed By: jdoerfert
2023-02-10 04:11:54 -06:00
Dmitry Makogon
c77c186a64 [LVI] Don't traverse uses when calculating range at use
This effectively reverts 5c38c6a and 4f772b0.

A recently introduced LazyValueInfo::getConstantRangeAtUse returns incorrect
ranges for values in certain cases. One such example is described in PR60629.
The issue has something to do with traversing PHI uses of a value transitively.
As nikic pointed out, we're effectively reasoning about values from different
loop iterations.

In the faulting test case, CVP made a miscompilation because the calculated
range for a shift argument was incorrect. It returned empty-set, however it is
clearly not a dead code. CVP then erased the shift instruction because
of empty range.
2023-02-10 17:06:36 +07:00