177445 Commits

Author SHA1 Message Date
Kazu Hirata
fd358997b3 [IR] Use range-based for loops (NFC) 2024-01-14 00:53:28 -08:00
Kazu Hirata
c0cb80338f [IR] Use StringRef::consume_front (NFC) 2024-01-14 00:53:26 -08:00
Nicholas Mosier
49138d97c0
[X86] Fix SLH crash on llvm.eh.sjlh.longjmp (#77959)
Fix #60081.
2024-01-14 12:03:18 +08:00
Kazu Hirata
e4a6be0fc0 [CodeGen] Use getConstantOperandVal (NFC) 2024-01-13 18:18:51 -08:00
Kazu Hirata
b5d6ea4d8b [Support] Use StringRef::consume_front (NFC) 2024-01-13 18:18:49 -08:00
Kazu Hirata
96f14ea618 [llvm] Use range-based for loops with llvm::drop_begin (NFC) 2024-01-13 18:18:47 -08:00
Heejin Ahn
d871f40deb
[WebAssembly] Use DebugValueManager only when subprogram exists (#77978)
We previously scanned the whole BB for `DBG_VALUE` instruction even when
the program doesn't have debug info, i.e., the function doesn't have a
subprogram associated with it, which can make compilation unnecessarily
slow. This disables `DebugValueManager` when a `DISubprogram` doesn't
exist for a function.

This only reduces unnecessary work in non-debug mode and does not change
output, so it's hard to add a test to test this behavior.

Test changes were necessary because their `DISubprogram`s were not
correctly linked with the functions, so with this PR the compiler
incorrectly assumed the functions didn't have a subprogram and the tests
started to fail.

Fixes https://github.com/emscripten-core/emscripten/issues/21048.
2024-01-13 14:55:54 -08:00
Vitaly Buka
253d2f931e
Revert "[InstCombine] Fold icmp pred (inttoptr X), (inttoptr Y) -> icmp pred X, Y" (#78023)
Reverts llvm/llvm-project#77832

To fix https://lab.llvm.org/buildbot/#/builders/236/builds/8673

Also truncation to shorter type looks incorrect.

Issue for tracking #78024 .
2024-01-13 11:15:30 -08:00
Durgadoss R
8d817f6479
[LLVM][NVPTX]: Add aligned versions of cluster barriers (#77940) 2024-01-13 10:41:19 +01:00
paperchalice
8566cd6124
[CodeGen] Let PassBuilder support machine passes (#76320)
`PassBuilder` would be a better place to parse MIR pipeline. We can
reuse the code to support parsing pass with parameters and targets can
reuse `registerPassBuilderCallbacks` to register the target specific
passes. `PassBuilder` also has ability to check whether a Pass is a
machine pass.
2024-01-13 11:28:07 +08:00
Kazu Hirata
1df4fb9881 [Support] Use StringRef::ltrim (NFC) 2024-01-12 18:39:53 -08:00
Kazu Hirata
15179aa433 [llvm] Use llvm::is_contained (NFC) 2024-01-12 18:39:48 -08:00
Koakuma
5fa4b1d83c
[SPARC] Consume tune-cpu directive in the backend (#77195)
This lets the backend read the `tune-cpu` directive that is emitted by the frontend.

No changes are needed for clang as it is already emits it.
2024-01-12 21:21:32 -05:00
Min-Yih Hsu
3edf82d556
[XRay] Reserve memory space ahead-of-time when reading native format log (#76853)
XRay used to struggle reading large log files. It turned out the
bottleneck was primarily caused by the reallocation happens when
appending log entries into a std::vector.
This patch reserves the memory space ahead-of-time since the number of
entries is known for most cases. Making llvm-xray runs 1.8 times faster
and uses 1.4 times less physical memory when reading large (~2.6GB) log
files.
2024-01-12 16:34:39 -08:00
Reid Kleckner
21a77e8a92 [IR] Reorder Value fields to put the SubclassID first (#53520)
Placing the class id at offset 0 should make `isa` and `dyn_cast` faster
by eliminating the field offset (previously 0x10) from the memory
operand, saving encoding space on x86, and, in theory, an add micro-op.
You can see the load encodes one byte smaller here:
https://godbolt.org/z/Whvz4can9

The compile time tracker shows some modestly positive results in the
on the `cycle` metric and in the final clang binary size metric:
https://llvm-compile-time-tracker.com/compare.php?from=33b54f01fe32030ff60d661a7a951e33360f82ee&to=2530347a57401744293c54f92f9781fbdae3d8c2&stat=cycles
Clicking through to the per-library size breakdown shows that
instcombine size reduces by 0.68%, which is meaningful, and I believe
instcombine is known to be a hotspot.

It is, however, potentially noise. I still think we should do this,
because notionally, the class id really acts as the vptr of the Value,
and conventionally the vptr is always at offset 0.
2024-01-12 23:13:01 +00:00
Greg Clayton
4618ef8cf5
Allow the dumping of .dwo files contents to show up when dumping an executable with split DWARF. (#66726)
Allow the dumping of .dwo files contents to show up when dumping an
executable with split DWARF.

Currently if you run llvm-dwarfdump on a binary that has skeleton
compile units, you only see the skeleton compile units. Since the main
binary has the linked addresses it would be nice to be able to dump
DWARF from the .dwo files and how the resolved addresses instead of
showing the address index and "<unresolved>" in the output. This patch
adds an option that can be specified to dump the non skeleton DIEs named
--dwo.

Added the ability to use the following options with split dwarf as well:
  --name <name>
  --lookup <addr>
  --debug-info <die-offset>
2024-01-12 13:31:55 -08:00
Craig Topper
8cd956197f
[RISCV] Update descriptions for Zvk* shorthands. (#77961)
This makes them more consistent with other extensions so they appear
move similar in the -print-supported-extensions output.
2024-01-12 11:56:36 -08:00
Usman Nadeem
792fa23c1b
[AArch64][SVE2] Lower OR to SLI/SRI (#77555)
Code builds on NEON code and the tests are adapted from NEON tests
minus the tests for illegal types.
2024-01-12 11:23:56 -08:00
Igor Kudrin
9f8c818141
[CommandLine][NFCI] Do not add 'All' to 'RegisteredSubCommands' (#77722)
After #75679, it is no longer necessary to add the `All` pseudo
subcommand to the list of registered subcommands. The change causes the
list to contain only real subcommands, i.e. an unnamed top-level
subcommand and named ones. This simplifies the code a bit by removing
some checks for this special case.

This is a fixed version of #77041, where options of the 'All' subcommand
were not added to subcommands defined after them.
2024-01-13 02:19:42 +07:00
Krzysztof Drewniak
88871784fd
[AMDGPU] Allow buffer intrinsics to be marked volatile at the IR level (#77847)
In order to ensure the correctness of ptr addrspace(7) lowering, we need
a backwards-compatible way to flag buffer intrinsics as volatile that
can't be dropped (unlike metadata).

To acheive this in a backwards-compatible way, we use bit 31 of the
auxilliary immediates of buffer intrinsics as the volatile flag. When
this bit is set, the MachineMemOperand for said intrinsic is marked
volatile. Existing code will ensure that this results in the appropriate
use of flags like glc and dlc.

This commit also harmorizes the handling of the auxilliary immediate for
atomic intrinsics, which new go through extract_cpol like loads and
stores, which masks off the volatile bit.
2024-01-12 11:20:01 -06:00
Jay Foad
9d8e53818d
[AMDGPU] Refactor getNonSoftWaitcntOpcode and its callers (#77933)
This avoids listing all soft waitcnt opcodes in two places
(getNonSoftWaitcntOpcode and isSoftWaitcnt) and avoids the need for
helpers isWaitcnt and isWaitcntVsCnt.
2024-01-12 17:12:09 +00:00
Jay Foad
dec74a8347
[AMDGPU] Fix VS_CNT overflow assertion (#77935)
Always set the upper bound for VS_CNT higher than the lower bound.
Before #77439 this code was only executed on function entry where the
lower bound was 0 so it was not a problem.

Fixes #77931
2024-01-12 17:11:19 +00:00
Paschalis Mpeis
a300b24037 Revert "[TLI] Fix replace-with-veclib crash with invalid arguments (#77112)"
This reverts commit 9fdc568824b0992d48704dfa530a12073cc02f5e,
as it linker crashes on some platforms.
2024-01-12 15:49:15 +00:00
Philip Reames
e4d01bb227
[SCEV] Special case sext in isKnownNonZero (#77834)
The existing logic in isKnownNonZero relies on unsigned ranges, which
can be problematic when our range calculation is imprecise. Consider the
following:
  %offset.nonzero = or i32 %offset, 1
  -->  %offset.nonzero U: [1,0) S: [1,0)
  %offset.i64 = sext i32 %offset.nonzero to i64
  -->  (sext i32 %offset.nonzero to i64) U: [-2147483648,2147483648)
                                         S: [-2147483648,2147483648)

Note that the unsigned range for the sext does contain zero in this case
despite the fact that it can never actually be zero.

Instead, we can push the query down one level - relying on the fact that
the sext is an invertible operation and that the result can only be zero
if the input is. We could likely generalize this reasoning for other
invertible operations, but special casing sext seems worthwhile.
2024-01-12 07:45:28 -08:00
Paschalis Mpeis
9fdc568824
[TLI] Fix replace-with-veclib crash with invalid arguments (#77112)
Fix a crash of `replace-with-veclib` pass, when the arguments of the TLI
mapping do not match the original call.
Now, it simply ignores such cases.

Test require assertions as it accesses programmatically the debug log.
2024-01-12 15:19:52 +00:00
Alexey Bataev
6fdc2ce8c5 [SLP]Fix PR77916: transform the whole mask, not only the elements for
the second vector.

Need to transform all elements in the long mask, if we decided to
produce shorter version, some elements may still have incorrect inifices
after transformation for the first vector in the permutation.
2024-01-12 07:07:43 -08:00
Natalie Chouinard
4f47372f8c
[SPIR-V] Add Float16 support when targeting Vulkan (#77115)
Add Float16 to Vulkan's available capabilities, and guard Float16Buffer
(Kernel-only capability) against being added outside OpenCL
environments.

Add tests to verify half and half vector types, and validate with
spirv-val.

Fixes #66398
2024-01-12 10:03:48 -05:00
Yingwei Zheng
2aae304cbc
[InstCombine] Fold icmp pred (inttoptr X), (inttoptr Y) -> icmp pred X, Y (#77832)
NOTE: Alive2 proofs are unavailable because `inttoptr` is unsupported.
2024-01-12 23:03:07 +08:00
Alexander Yermolovich
d199ab4699
[LLVM][DWARF] Fix accelerator table switching between CU and TU (#77511)
Bug 1 is triggered when a TU is already created, and we process the same
DICompositeType at a top level. We would switch to TU accelerator table,
but
would not switch back on early exit. As the result we would add CU
entries to the TU
accelerator table. When we try to write out TUs and normalize entries,
the
offsets for DIEs that are part of a CU would not have been computed, and
it
would assert on getOffset().

Bug 2 is triggered when processing nested TUs. When we exit from
addDwarfTypeUnitType we switched back to CU accelerator table. If we
were processing nested TUs, the rest of the entries from TUs would be
added to CU accelerator table. When we write out TUs, all the DIE
pointers will become invalid. Eventually it will assert during
normalization step after CU is processed.
2024-01-12 07:01:17 -08:00
Qiongsi Wu
39bb790b90
[SimplifyCFG] switch: Do Not Transform the Default Case if the Condition is Too Wide (#77831)
https://github.com/llvm/llvm-project/pull/76669 taught SimplifyCFG to
handle switches when `default` has only one case. When the `switch`'s
condition is wider than 64 bit, the current implementation can calculate
the wrong default value. This PR skips cases where the condition is too
wide.
2024-01-12 08:54:35 -05:00
Nikita Popov
6c2fbc3a68
[IRBuilder] Add CreatePtrAdd() method (NFC) (#77582)
This abstracts over the common pattern of creating a gep with i8 element
type.
2024-01-12 14:21:21 +01:00
Florian Hahn
59d6f033a2
[VPlan] Support narrowing widened loads in truncateToMinimimalBitwidths.
MinBWs may also contain widened load instructions, handle them by only
narrowing their result.

Fixes https://github.com/llvm/llvm-project/issues/77468
2024-01-12 13:14:13 +00:00
Alexey Bataev
39b2104b4a [SLP]Fix a crash for reduced values with minbitwidth, which are reused.
If the reduced values are additionally affected by minbitwidth analysis,
need to cast them to a proper type before doing any math, if they are
reused.
2024-01-12 04:49:48 -08:00
Alexey Lapshin
35708b0754
[DWARFLinker][NFC] Rename libraries to match with directories name. (#77592)
It was noted that new DWARFLinker libraries do not follow naming
agreement -
https://github.com/llvm/llvm-project/pull/75925#issuecomment-1883301659
This patch rename libraries to match with the agreement.

Rename LLVMDWARFLinkerBase library into the LLVMDWARFLinker. Rename
LLVMDWARFLinker library into the LLVMDWARFLinkerClassic. Correct include
path according to the new directory structure.
2024-01-12 15:36:44 +03:00
Mirko Brkušanin
2adbf254a1
[AMDGPU][NFC] Rename DotIUVOP3PMods to VOP3PModsNeg (#77785)
This is used to select the source modifier (neg) from the immediate
operand. After a follow up commit this will no longer be DOTIU specific.

Co-authored-by: Changpeng Fang <changpeng.fang@amd.com>
2024-01-12 10:57:24 +01:00
ostannard
9c9bffe213
[AArch64] Disable FP loads/stores when fp-armv8 not enabled (#77817)
Most of the floating-point instructions are already gated on the
fp-armv8 subtarget feature (or some other feature), but most of the load
and store instructions, and one move instruction, were not.

I found this list of instructions with a script which consumes the
output of llvm-tblgen --dump-json, looking for instructions which have
an FPR operand but no predicate. That script now finds zero
instructions.

This only affects assembly, not codegen, because the floating-point
types and registers are already not marked as legal when the FPU is
disabled, so it is impossible for any of these to be selected.
2024-01-12 09:15:11 +00:00
Craig Topper
2e78c220fc
[RISCV] Simplify the description for ssaia and smaia. (#77870)
It feels more important to expand out Advanced Interrupt Architecture
for users than to have a description that explains how one extension is
different from the other.
2024-01-11 23:53:00 -08:00
Kazu Hirata
a5dc3f68a8 [llvm] Use SmallString::operator std::string() (NFC) 2024-01-11 23:32:44 -08:00
Mariusz Sikora
2b83ceee3d
[AMDGPU][GFX12] Default component broadcast store (#76212)
For image and buffer stores the default behaviour on GFX12 is to set all
unset components to the value of the first component. So if we pass only
X component, it will be the same as XXXX, or XY same as XYXX.

This patch simplifies the passed vector of components in InstCombine by
removing components from the end that are equal to the first component.

For image stores it also trims DMask if necessary.

---------

Co-authored-by: Mateja Marjanovic <mmarjano@amd.com>
2024-01-12 08:26:08 +01:00
Kazu Hirata
7b9bc4729b [IPO] Use a range-based for loop (NFC) 2024-01-11 22:48:22 -08:00
Kazu Hirata
5e9da33b87 [llvm] Use StringRef::consume_front_insensitive (NFC) 2024-01-11 22:48:20 -08:00
Amara Emerson
a946934a12 [GlobalISel][NFC] Use GPhi wrapper in more places instead of iterating over operands. 2024-01-11 22:25:53 -08:00
Stanislav Mekhanoshin
369981181f
[AMDGPU] Handle bf16 operands the same way as f16. NFC. (#77826)
This is infrastructure change which shall allow use of bf16 operands
with instruction definitions.
2024-01-11 21:08:19 -08:00
Shengchen Kan
4f71068b72 [X86] Correct the asm comment for compression NF_ND -> NF 2024-01-12 12:55:11 +08:00
Carl Ritson
6752f1517d
[TwoAddressInstruction] Recompute live intervals for partial defs (#74431)
Force live interval recomputation for a register if its definition is
narrowed to become partial. The live interval repair process cannot
otherwise detect these changes.
2024-01-12 13:26:01 +09:00
Emil J
3baedb4111
[GISel] Fix #77762: extend correct source registers in combiner helper rule extend_through_phis (#77765)
Since we already know which register we want to extend, we don't have to
ask its defining MI about it

---------

Co-authored-by: Emil Tywoniak <Emil.Tywoniak@hightec-rt.com>
2024-01-12 12:09:58 +08:00
darkbuck
54c19546ba
[GlobalISel] Revise 'assignCustomValue' interface (#77824)
- Previously, 'assignCustomValue' requests the number of assigned VAs
minus 1 is returned and treats 0 as the assignment failure. However,
under that arrangment, we cannot tell a successful *single* VA custom
assignment from the failure case.
- This change requests that 'assignCustomValue' just return the number
of all VAs assigned, including the first WA so that it won't be ambigous
to tell the failure case from the single VA custom assignment.
2024-01-12 10:41:55 +07:00
Wang Pengcheng
a2af374284
[SelectionDAG] Add space-optimized forms of OPC_CheckPredicate (#77763)
We record the usage of each `Predicate` and sort them by usage.

For the top 8 `Predicate`s, we will emit a `PC_CheckPredicateN` to
save one byte.

Overall this reduces the llc binary size with all in-tree targets by
about 61K.

This is a recommit of 1a57927, which was reverted in bc98c31.

The CI failures occurred when doing expensive checks (with option
`LLVM_ENABLE_EXPENSIVE_CHECKS` being ON).

The key point here is that we need stable sorting result in the
test, but doing expensive checks uncovered the non-determinism of
`llvm::sort`. So `llvm::sort` is changed to `llvm::stable_sort`
in this revised patch.

And we use `llvm::MapVector` to keep insertion order.
2024-01-12 11:38:05 +08:00
Craig Topper
9e40ba0c2d [RISCV] Remove period from Zvbb extension description.
No other instruction extension has a period.

There are also periods in 'ssaia' and 'smaia', but those descriptions
need a different update.
2024-01-11 19:28:05 -08:00
Matt Arsenault
5e5e98e36e
AMDGPU: Cleanup MAIFrag predicate code (#77734)
Move the complex predicates into separate variables.
2024-01-12 10:09:56 +07:00