423 Commits

Author SHA1 Message Date
Nikita Popov
74a76a2885
[BranchFolding] Remove dubious assert from operator< (#71639)
`MergePotentialElts::operator<` asserts that the two elements being
compared are not equal. However, sorting functions are allowed to invoke
the comparison function with equal arguments (though they usually don't
for efficiency reasons).

There is an existing special-case that disables the assert if
_GLIBCXX_DEBUG is used, which may invoke the comparator with equal args
to verify strict weak ordering. I believe libc++ also has strict weak
ordering checks under some options nowadays.

Recently, #71312 was reported, where a change to glibc's qsort_r
implementation can also result in comparison between equal elements.
From what I understood, this is an inefficiency that will be fixed on
the glibc side as well, but I think at this point we should just remove
this assertion.

Fixes https://github.com/llvm/llvm-project/issues/71312.
2023-11-09 10:23:10 +01:00
Jay Foad
5022fc2ad3 [CodeGen] Make use of MachineInstr::all_defs and all_uses. NFCI.
Differential Revision: https://reviews.llvm.org/D151424
2023-06-01 19:17:34 +01:00
Jay Foad
47d3cbcf84 [BranchFolder] Skip redundant IMPLICIT_DEFs of subregs
Differential Revision: https://reviews.llvm.org/D148509
2023-04-27 09:40:06 +01:00
Jay Foad
14bc374810 [MC] Use subregs/superregs instead of MCSubRegIterator/MCSuperRegIterator. NFC.
Differential Revision: https://reviews.llvm.org/D148613
2023-04-18 13:29:41 +01:00
Phoebe Wang
0efe111365 Reland "[Windows SEH]: HARDWARE EXCEPTION HANDLING (MSVC -EHa) - Part 2"
This reverts commit db6a979ae82410e42430e47afa488936ba8e3025.

Reland D102817 without any change. The previous revert was a mistake.

Differential Revision: https://reviews.llvm.org/D102817
2023-03-29 08:59:56 +08:00
Noah Goldstein
ee5585ed09 Recommit "Improve and enable folding of conditional branches with tail calls." (2nd Try)
Improve and enable folding of conditional branches with tail calls.

1. Make it so that conditional tail calls can be emitted even when
   there are multiple predecessors.

2. Don't guard the transformation behind -Os. The rationale for
   guarding it was static-prediction can be affected by whether the
   branch is forward of backward. This is no longer true for almost any
   X86 cpus (anything newer than `SnB`) so is no longer a meaningful
   concern.

Reviewed By: pengfei

Differential Revision: https://reviews.llvm.org/D140931
2023-02-06 14:09:17 -06:00
Mikhail Goncharov
3dcbbddf16 Revert "Improve and enable folding of conditional branches with tail calls."
This reverts commit c05ddc9cbc12b1f2038380f57a16c4ca98c614b7.

Fails under asan:

https://lab.llvm.org/buildbot/#/builders/168/builds/11637

Failed Tests (3):
  LLVM :: CodeGen/X86/jump_sign.ll
  LLVM :: CodeGen/X86/or-branch.ll
  LLVM :: CodeGen/X86/tailcall-extract.ll
2023-02-01 16:09:35 +01:00
Noah Goldstein
c05ddc9cbc Improve and enable folding of conditional branches with tail calls.
Improve and enable folding of conditional branches with tail calls.

1. Make it so that conditional tail calls can be emitted even when
   there are multiple predecessors.

2. Don't guard the transformation behind -Os. The rationale for
   guarding it was static-prediction can be affected by whether the
   branch is forward of backward. This is no longer true for almost any
   X86 cpus (anything newer than `SnB`) so is no longer a meaningful
   concern.

Reviewed By: pengfei

Differential Revision: https://reviews.llvm.org/D140931
2023-02-01 01:26:06 -06:00
Craig Topper
e72ca520bb [CodeGen] Remove uses of Register::isPhysicalRegister/isVirtualRegister. NFC
Use isPhysical/isVirtual methods.

Reviewed By: foad

Differential Revision: https://reviews.llvm.org/D141715
2023-01-13 14:38:08 -08:00
tentzen
db6a979ae8 Revert "[Windows SEH]: HARDWARE EXCEPTION HANDLING (MSVC -EHa) - Part 2"
This reverts commit 1a949c871ab4a6b6d792849d3e8c0fa6958d27f5.
2022-12-02 02:44:18 -08:00
tentzen
1a949c871a [Windows SEH]: HARDWARE EXCEPTION HANDLING (MSVC -EHa) - Part 2
This patch is the Part-2 (BE LLVM) implementation of HW Exception handling.
Part-1 (FE Clang) was committed in 797ad701522988e212495285dade8efac41a24d4.

This new feature adds the support of Hardware Exception for Microsoft Windows
SEH (Structured Exception Handling).

Compiler options:
  For clang-cl.exe, the option is -EHa, the same as MSVC.
  For clang.exe, the extra option is -fasync-exceptions,
  plus -triple x86_64-windows -fexceptions and -fcxx-exceptions as usual.

NOTE:: Without the -EHa or -fasync-exceptions, this patch is a NO-DIFF change.

The rules for C code:
For C-code, one way (MSVC approach) to achieve SEH -EHa semantic is to follow three rules:
  First, no exception can move in or out of _try region., i.e., no "potential faulty
    instruction can be moved across _try boundary.
  Second, the order of exceptions for instructions 'directly' under a _try must be preserved
    (not applied to those in callees).
  Finally, global states (local/global/heap variables) that can be read outside of _try region
    must be updated in memory (not just in register) before the subsequent exception occurs.

The impact to C++ code:
  Although SEH is a feature for C code, -EHa does have a profound effect on C++
  side. When a C++ function (in the same compilation unit with option -EHa ) is
  called by a SEH C function, a hardware exception occurs in C++ code can also
  be handled properly by an upstream SEH _try-handler or a C++ catch(...).
  As such, when that happens in the middle of an object's life scope, the dtor
  must be invoked the same way as C++ Synchronous Exception during unwinding process.

Design:
A natural way to achieve the rules above in LLVM today is to allow an EH edge
added on memory/computation instruction (previous iload/istore idea) so that
exception path is modeled in Flow graph preciously. However, tracking every
single memory instruction and potential faulty instruction can create many
Invokes, complicate flow graph and possibly result in negative performance
impact for downstream optimization and code generation. Making all
optimizations be aware of the new semantic is also substantial.

This design does not intend to model exception path at instruction level.
Instead, the proposed design tracks and reports EH state at BLOCK-level to
reduce the complexity of flow graph and minimize the performance-impact on CPP
code under -EHa option.
One key element of this design is the ability to compute State number at
block-level. Our algorithm is based on the following rationales:

A _try scope is always a SEME (Single Entry Multiple Exits) region as jumping
into a _try is not allowed. The single entry must start with a seh_try_begin()
invoke with a correct State number that is the initial state of the SEME.
Through control-flow, state number is propagated into all blocks. Side exits
marked by seh_try_end() will unwind to parent state based on existing SEHUnwindMap[].
Note side exits can ONLY jump into parent scopes (lower state number).
Thus, when a block succeeds various states from its predecessors, the lowest
State triumphs others.  If some exits flow to unreachable, propagation on those
paths terminate, not affecting remaining blocks.
For CPP code, object lifetime region is usually a SEME as SEH _try.
However there is one rare exception: jumping into a lifetime that has Dtor but
has no Ctor is warned, but allowed:

Warning: jump bypasses variable with a non-trivial destructor

In that case, the region is actually a MEME (multiple entry multiple exits).
Our solution is to inject a eha_scope_begin() invoke in the side entry block to
ensure a correct State.
Implementation:
Part-1: Clang implementation (already in):
Please see commit 797ad701522988e212495285dade8efac41a24d4).

Part-2 : LLVM implementation described below.

For both C++ & C-code, the state of each block is computed at the same place in
BE (WinEHPreparing pass) where all other EH tables/maps are calculated.
In addition to _scope_begin & _scope_end, the computation of block state also
rely on the existing State tracking code (UnwindMap and InvokeStateMap).

For both C++ & C-code, the state of each block with potential trap instruction
is marked and reported in DAG Instruction Selection pass, the same place where
the state for -EHsc (synchronous exceptions) is done.
If the first instruction in a reported block scope can trap, a Nop is injected
before this instruction. This nop is needed to accommodate LLVM Windows EH
implementation, in which the address in IPToState table is offset by +1.
(note the purpose of that is to ensure the return address of a call is in the
same scope as the call address.

The handler for catch(...) for -EHa must handle HW exception. So it is
'adjective' flag is reset (it cannot be IsStdDotDot (0x40) that only catches
C++ exceptions).
Suppress push/popTerminate() scope (from noexcept/noTHrow) so that HW
exceptions can be passed through.

Original llvm-dev [RFC] discussions can be found in these two threads below:
https://lists.llvm.org/pipermail/llvm-dev/2020-March/140541.html
https://lists.llvm.org/pipermail/llvm-dev/2020-April/141338.html

Differential Revision: https://reviews.llvm.org/D102817/new/
2022-12-01 23:44:25 -08:00
Matt Arsenault
4cb722acbc BranchFolder: Require NoPHIs
The pass doesn't handle SSA and breaks any phis.
2022-06-01 21:14:49 -04:00
Matt Arsenault
7762a3ce18 Revert "BranchFolder: Assert on SSA functions"
This reverts commit 6ff91d17d66da46572e97f9a0b042182762cbe9e.
2022-04-27 19:02:15 -04:00
Matt Arsenault
6ff91d17d6 BranchFolder: Assert on SSA functions
We probably should have the opposite of getRequiredProperties for this
2022-04-27 18:51:37 -04:00
serge-sans-paille
989f1c72e0 Cleanup codegen includes
This is a (fixed) recommit of https://reviews.llvm.org/D121169

after:  1061034926
before: 1063332844

Discourse thread: https://discourse.llvm.org/t/include-what-you-use-include-cleanup
Differential Revision: https://reviews.llvm.org/D121681
2022-03-16 08:43:00 +01:00
Nico Weber
a278250b0f Revert "Cleanup codegen includes"
This reverts commit 7f230feeeac8a67b335f52bd2e900a05c6098f20.
Breaks CodeGenCUDA/link-device-bitcode.cu in check-clang,
and many LLVM tests, see comments on https://reviews.llvm.org/D121169
2022-03-10 07:59:22 -05:00
serge-sans-paille
7f230feeea Cleanup codegen includes
after:  1061034926
before: 1063332844

Differential Revision: https://reviews.llvm.org/D121169
2022-03-10 10:00:30 +01:00
Kazu Hirata
3aed282257 [CodeGen] Use range-based for loops (NFC) 2021-12-03 20:45:59 -08:00
Kazu Hirata
fd7d40640d [llvm] Use range-based for loops (NFC) 2021-11-28 18:14:49 -08:00
Kazu Hirata
48719e3b18 [CodeGen] Use make_early_inc_range (NFC) 2021-09-18 09:29:24 -07:00
Kazu Hirata
cfc7402419 [llvm] Use drop_begin (NFC) 2021-09-16 08:46:26 -07:00
Kazu Hirata
c9fca53af1 [CodeGen, Target] Use pred_empty and succ_empty (NFC) 2021-09-10 11:11:31 -07:00
Hongtao Yu
bd52495518 [CSSPGO] Undoing the concept of dangling pseudo probe
As a follow-up to https://reviews.llvm.org/D104129, I'm cleaning up the danling probe related code in both the compiler and llvm-profgen.

I'm seeing a 5% size win for the pseudo_probe section for SPEC2017 and 10% for Ciner. Certain benchmark such as 602.gcc has a 20% size win. No obvious difference seen on build time for SPEC2017 and Cinder.

Reviewed By: wenlei

Differential Revision: https://reviews.llvm.org/D104477
2021-06-18 15:14:11 -07:00
Hongtao Yu
1812319292 [CSSPGO] Flip SkipPseudoOp to true for MIR APIs.
Flipping the default value of SkipPseudoOp to true for those MIR APIs to favor maximum performance. Note that certain spots like branch folding and MIR if-conversion is are disabled for better counts quality. For these two optimizations, this is a no-diff change.

The counts quality with SPEC2017 before/after this change is unchanged.

Reviewed By: wmi

Differential Revision: https://reviews.llvm.org/D100332
2021-04-19 17:55:34 -07:00
Hongtao Yu
8985515822 [CSSPGO] Unblocking optimizations by dangling pseudo probes.
This change fixes a couple places where the pseudo probe intrinsic blocks optimizations because they are not naturally removable. To unblock those optimizations, the blocking pseudo probes are moved out of the original blocks and tagged dangling, instead of allowing pseudo probes to be literally removed. The reason is that when the original block is removed, we won't be able to sample it. Instead of assigning it a zero weight, moving all its pseudo probes into another block and marking them dangling should allow the counts inference a chance to assign them a more reasonable weight. We have not seen counts quality degradation from our experiments.

The optimizations being unblocked are:

	1. Removing conditional probes for if-converted branches. Conditional probes are tagged dangling when their homing branch arms are folded so that they will not be over-counted.
	2. Unblocking jump threading from removing empty blocks. Pseudo probe prevents jump threading from removing logically empty blocks that only has one unconditional jump instructions.
	3. Unblocking SimplifyCFG and MIR tail duplicate to thread empty blocks and blocks with redundant branch checks.

Since dangling probes are logically deleted, they should not consume any samples in LTO postLink. This can be achieved by setting their distribution factors to zero when dangled.

Reviewed By: wmi

Differential Revision: https://reviews.llvm.org/D97481
2021-03-03 22:44:42 -08:00
Kazu Hirata
d61b4cb9d8 [CodeGen] Use range-based for loops (NFC) 2021-02-11 23:31:31 -08:00
Kazu Hirata
2082b10d10 [llvm] Use *::empty (NFC) 2021-01-16 09:40:55 -08:00
Fangrui Song
6913812abc Fix some clang-tidy bugprone-argument-comment issues 2020-09-19 20:41:25 -07:00
Simon Pilgrim
8a41a1f567 BranchFolding.cpp - removes includes already included by BranchFolding.h. NFC. 2020-08-13 12:14:31 +01:00
James Y Knight
4b0aa5724f Change the INLINEASM_BR MachineInstr to be a non-terminating instruction.
Before this instruction supported output values, it fit fairly
naturally as a terminator. However, being a terminator while also
supporting outputs causes some trouble, as the physreg->vreg COPY
operations cannot be in the same block.

Modeling it as a non-terminator allows it to be handled the same way
as invoke is handled already.

Most of the changes here were created by auditing all the existing
users of MachineBasicBlock::isEHPad() and
MachineBasicBlock::hasEHPadSuccessor(), and adding calls to
isInlineAsmBrIndirectTarget or mayHaveInlineAsmBr, as appropriate.

Reviewed By: nickdesaulniers, void

Differential Revision: https://reviews.llvm.org/D79794
2020-07-01 12:51:50 -04:00
Yuanfang Chen
78c69a00a4 [NFC] Clean up uses of MachineModuleInfoWrapperPass 2020-07-01 09:45:05 -07:00
Matt Arsenault
af1eeaf380 BranchFolding: Use Register 2020-06-30 12:13:08 -04:00
Zequan Wu
80e107ccd0 Add NoMerge MIFlag to avoid MIR branch folding
Let the codegen recognized the nomerge attribute and disable branch folding when the attribute is given

Differential Revision: https://reviews.llvm.org/D79537
2020-05-29 12:31:06 -07:00
Arthur Eubanks
fc937806ef Don't jump to landing pads in Control Flow Optimizer
Summary: Likely fixes https://bugs.llvm.org/show_bug.cgi?id=45858.

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D80047
2020-05-21 15:19:10 -07:00
James Y Knight
7af9d386da Correctly modify the CFG in IfConverter, and then remove the
CorrectExtraCFGEdges function.

The latter was a workaround for "Various pieces of code" leaving bogus
extra CFG edges in place. Where by "various" it meant only
IfConverter::MergeBlocks, which failed to clear all of the successors
of dead blocks it emptied out. This wouldn't matter a whole lot,
except that the dead blocks remained listed as predecessors of
still-useful blocks, inhibiting optimizations.

This fix slightly changed two thumb tests, because the correct CFG
successors allowed for the "diamond" if-conversion pattern to be
detected, when it could only use "simple" before.

Additionally, the removal of a now-redundant call to analyzeBranch
(with AllowModify=true) in BranchFolder::OptimizeFunction caused a
later check for an empty block in BranchFolder::OptimizeBlock to
fail. Correct this by moving the call to analyzeBranch in
OptimizeBlock higher.

Differential Revision: https://reviews.llvm.org/D79527
2020-05-07 18:17:07 -04:00
Vedant Kumar
10ce1bc8d0 [MachineBasicBlock] Add helpers for skipping debug instructions [1/14]
Summary:
These helpers are exercised by follow-up commits in this patch series,
which is all about removing CodeGen differences with vs. without debug
info in the AArch64 backend.

Reviewers: fhahn, aprantl, jpaquette, paquette

Subscribers: kristof.beyls, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D78260
2020-04-22 17:03:39 -07:00
Simon Pilgrim
6cb204eb64 BranchFolding.h - cleanup includes and forward declarations. NFC.
Push MBFIWrapper.h include down to BranchFolding.cpp/IfConversion.cpp
2020-04-20 15:59:39 +01:00
Djordje Todorovic
016d91ccbd [CallSiteInfo] Handle bundles when updating call site info
This will address the issue: P8198 and P8199 (from D73534).

The methods was not handle bundles properly.

Differential Revision: https://reviews.llvm.org/D74904
2020-02-27 13:57:06 +01:00
Djordje Todorovic
68908993eb [CSInfo] Use isCandidateForCallSiteEntry() when updating the CSInfo
Use the isCandidateForCallSiteEntry().
This should mostly be an NFC, but there are some parts ensuring
the moveCallSiteInfo() and copyCallSiteInfo() operate with call site
entry candidates (both Src and Dest should be the call site entry
candidates).

Differential Revision: https://reviews.llvm.org/D74122
2020-02-10 10:03:14 +01:00
Hiroshi Yamauchi
ac8da31a0f [PGO][PGSO] Handle MBFIWrapper
Some code gen passes use MBFIWrapper to keep track of the frequency of new
blocks. This was not taken into account and could lead to incorrect frequencies
as MBFI silently returns zero frequency for unknown/new blocks.

Add a variant for MBFIWrapper in the PGSO query interface.

Depends on D73494.
2020-01-31 09:36:55 -08:00
Hiroshi Yamauchi
2c03c899d5 [MBFI] Move BranchFolding::MBFIWrapper to its own files. NFC.
Summary:
To avoid header file circular dependency issues in passing updated MBFI (in
MBFIWrapper) to the interface of profile guided size optimizations.

A prep step for (and split off of) D73381.

Reviewers: davidxl

Subscribers: mgorny, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D73494
2020-01-28 10:58:46 -08:00
Krzysztof Parzyszek
020041d99b Update spelling of {analyze,insert,remove}Branch in strings and comments
These names have been changed from CamelCase to camelCase, but there were
many places (comments mostly) that still used the old names.

This change is NFC.
2020-01-21 10:15:38 -06:00
Jay Foad
c5c935ab66 Make more use of MachineInstr::mayLoadOrStore. 2019-12-19 11:51:52 +00:00
Hiroshi Yamauchi
d9ae493937 [PGO][PGSO] Instrument the code gen / target passes.
Summary:
Split off of D67120.

Add the profile guided size optimization instrumentation / queries in the code
gen or target passes. This doesn't enable the size optimizations in those passes
yet as they are currently disabled in shouldOptimizeForSize (for non-IR pass
queries).

A second try after reverted D71072.

Reviewers: davidxl

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D71149
2019-12-09 12:42:59 -08:00
Hiroshi Yamauchi
2eb30fafa5 Revert "[PGO][PGSO] Instrument the code gen / target passes."
This reverts commit 9a0b5e14075a1f42a72eedb66fd4fde7985d37ac.

This seems to break buildbots.
2019-12-06 12:17:32 -08:00
Hiroshi Yamauchi
9a0b5e1407 [PGO][PGSO] Instrument the code gen / target passes.
Summary:
Split off of D67120.

Add the profile guided size optimization instrumentation / queries in the code
gen or target passes. This doesn't enable the size optimizations in those passes
yet as they are currently disabled in shouldOptimizeForSize (for non-IR pass
queries).

Reviewers: davidxl

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D71072
2019-12-06 10:43:39 -08:00
Bjorn Pettersson
898de30291 [BranchFolding] Fix PR43964 about branch folder not being debug invariant
Summary:
The fix in BranchFolder related to non debug invariant problems
done in commit ec32dff0b075055 actually introduced some new
problems with debug invariance.

Before that patch ComputeCommonTailLength would move iterators
back, past debug instructions, in order to make ProfitableToMerge
make consistent answers "when one block differs from the other
only by whether debugging pseudos are present at the beginning".
But the changes in ec32dff0b075055 undid that by moving the iterators
forward again.

This patch refactors ComputeCommonTailLength. The function was
really complex, considering that the SkipTopCFIAndReturn part
always moved the iterators forward to the first "real" instruction
in the found tail after ec32dff0b075055.

The patch also restores the logic to "back past possible debugging
pseudos at beginning of block" to make sure ProfitableToMerge
gives consistent answers independent of DBG_VALUE instructions
before the tail. That is now done by ProfitableToMerge instead of
being hidden as a side-effect in ComputeCommonTailLength.

Reviewers: probinson, yechunliang, jmorse

Reviewed By: jmorse

Subscribers: Orlando, mehdi_amini, dexonsmith, aprantl, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D70091
2019-11-21 18:13:32 +01:00
Reid Kleckner
05da2fe521 Sink all InitializePasses.h includes
This file lists every pass in LLVM, and is included by Pass.h, which is
very popular. Every time we add, remove, or rename a pass in LLVM, it
caused lots of recompilation.

I found this fact by looking at this table, which is sorted by the
number of times a file was changed over the last 100,000 git commits
multiplied by the number of object files that depend on it in the
current checkout:
  recompiles    touches affected_files  header
  342380        95      3604    llvm/include/llvm/ADT/STLExtras.h
  314730        234     1345    llvm/include/llvm/InitializePasses.h
  307036        118     2602    llvm/include/llvm/ADT/APInt.h
  213049        59      3611    llvm/include/llvm/Support/MathExtras.h
  170422        47      3626    llvm/include/llvm/Support/Compiler.h
  162225        45      3605    llvm/include/llvm/ADT/Optional.h
  158319        63      2513    llvm/include/llvm/ADT/Triple.h
  140322        39      3598    llvm/include/llvm/ADT/StringRef.h
  137647        59      2333    llvm/include/llvm/Support/Error.h
  131619        73      1803    llvm/include/llvm/Support/FileSystem.h

Before this change, touching InitializePasses.h would cause 1345 files
to recompile. After this change, touching it only causes 550 compiles in
an incremental rebuild.

Reviewers: bkramer, asbirlea, bollu, jdoerfert

Differential Revision: https://reviews.llvm.org/D70211
2019-11-13 16:34:37 -08:00
Jeremy Morse
ec32dff0b0 [BranchFolding] skip debug instr to avoid code change
Use the existing helper function in BranchFolding, "countsAsInstruction",
to skip over non-instructions. Otherwise debug instructions can be
identified as the last real instruction in a block, leading to different
codegen decisions when debug is enabled as demonstrated by the test case.

Patch by: yechunliang (Chris Ye)!

Differential Revision: https://reviews.llvm.org/D66467
2019-10-29 11:45:38 +00:00
Nikola Prica
98603a8153 [DebugInfo][If-Converter] Update call site info during the optimization
During the If-Converter optimization pay attention when copying or
deleting call instructions in order to keep call site information in
valid state.

Reviewers: aprantl, vsk, efriedma

Reviewed By: vsk, efriedma

Differential Revision: https://reviews.llvm.org/D66955

llvm-svn: 374068
2019-10-08 15:43:12 +00:00