431 Commits

Author SHA1 Message Date
NAKAMURA Takumi
c1221251fb Restore CodeGen/MachineValueType.h from Support
This is rework of;

  - rG13e77db2df94 (r328395; MVT)

Since `LowLevelType.h` has been restored to `CodeGen`, `MachinveValueType.h`
can be restored as well.

Depends on D148767

Differential Revision: https://reviews.llvm.org/D149024
2023-05-03 00:13:20 +09:00
jacquesguan
50f2ce49e7 [MachineScheduler] Rename postprocessDAG to postProcessDAG. NFC
Rename postprocessDAG to camel case.

Reviewed By: foad

Differential Revision: https://reviews.llvm.org/D146795
2023-03-30 10:47:20 +08:00
Francesco Petrogalli
557ea9867f [MISched] Dump the execution trace of the schedule.
The traces are printed only for bottom-up and top-down scheduling
because the values of TopReadyCycle and BottomReadyCycle are
inconsistent when obtained via bidirectional scheduling (see
`BIDIRECTIONAL` checks in the test).

Differential Revision: https://reviews.llvm.org/D142529
2023-01-26 17:54:55 +01:00
NAKAMURA Takumi
292019e931 MachineScheduler.cpp: Fixup D141707, suppress MISchedDumpReservedCycles conditionally.
It is used in `LLVM_ENABLE_DUMP` regardless of `NDEBUG`.
2023-01-14 10:04:23 +09:00
Craig Topper
e72ca520bb [CodeGen] Remove uses of Register::isPhysicalRegister/isVirtualRegister. NFC
Use isPhysical/isVirtual methods.

Reviewed By: foad

Differential Revision: https://reviews.llvm.org/D141715
2023-01-13 14:38:08 -08:00
Francesco Petrogalli
c3c6d47c45 [CodeGen] Fix build failure due to missing declaration.
The failure was reported in https://github.com/llvm/llvm-project/issues/60011

FAILED: lib/CodeGen/CMakeFiles/LLVMCodeGen.dir/MachineScheduler.cpp.o
"/build/llvm-toolchain-snapshot-16~++20230113111109+aba8983c9d86/build-llvm/./bin/clang++" -D_GNU_SOURCE -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -I"/build/llvm-toolchain-snapshot-16~++20230113111109+aba8983c9d86/build-llvm/tools/clang/stage2-bins/lib/CodeGen" -I"/build/llvm-toolchain-snapshot-16~++20230113111109+aba8983c9d86/llvm/lib/CodeGen" -I"/build/llvm-toolchain-snapshot-16~++20230113111109+aba8983c9d86/build-llvm/tools/clang/stage2-bins/include" -I"/build/llvm-toolchain-snapshot-16~++20230113111109+aba8983c9d86/llvm/include" -fstack-protector-strong -Wformat -Werror=format-security -Wno-unused-command-line-argument -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -fno-semantic-interposition -fvisibility-inlines-hidden -Werror=date-time -Werror=unguarded-availability-new -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wmissing-field-initializers -pedantic -Wno-long-long -Wc++98-compat-extra-semi -Wimplicit-fallthrough -Wcovered-switch-default -Wno-noexcept-type -Wnon-virtual-dtor -Wdelete-non-virtual-dtor -Wsuggest-override -Wstring-conversion -Wmisleading-indentation -Wctad-maybe-unsupported -fdiagnostics-color -ffunction-sections -fdata-sections -ffile-prefix-map=/build/llvm-toolchain-snapshot-16~++20230113111109+aba8983c9d86/build-llvm/tools/clang/stage2-bins=build-llvm/tools/clang/stage2-bins -ffile-prefix-map=/build/llvm-toolchain-snapshot-16~++20230113111109+aba8983c9d86/= -no-canonical-prefixes -O2 -DNDEBUG -g1  -fno-exceptions -std=c++17 -MD -MT lib/CodeGen/CMakeFiles/LLVMCodeGen.dir/MachineScheduler.cpp.o -MF lib/CodeGen/CMakeFiles/LLVMCodeGen.dir/MachineScheduler.cpp.o.d -o lib/CodeGen/CMakeFiles/LLVMCodeGen.dir/MachineScheduler.cpp.o -c '/build/llvm-toolchain-snapshot-16~++20230113111109+aba8983c9d86/llvm/lib/CodeGen/MachineScheduler.cpp'
/build/llvm-toolchain-snapshot-16~++20230113111109+aba8983c9d86/llvm/lib/CodeGen/MachineScheduler.cpp:2639:7: error: use of undeclared identifier 'MISchedDumpReservedCycles'
  if (MISchedDumpReservedCycles)
      ^
1 error generated.

Fixes #60011

Differential Revision: https://reviews.llvm.org/D141707
2023-01-13 19:43:56 +01:00
Francesco Petrogalli
aba8983c9d Recommit [SchedBoundary] Add dump method for resource usage.
Summary:
As supporting information, I have added an example that describes how
the indexes of the vector of resources SchedBoundary::ReservedCycles
are tracked by the field SchedBoundary::ReservedCyclesIndex.

This has a minor rework of
b39a9a94f4
which was reverted in
df6ae1779f
becasue the llc invocation of the test was missing the argument
`-mtriple`.

See for example the failure at
https://lab.llvm.org/buildbot#builders/231/builds/7245 that reported
the following when targeting a non-aarch64 native build:

    'cortex-a55' is not a recognized processor for this target (ignoring processor)

Reviewers: jroelofs

Subscribers:

Differential Revision: https://reviews.llvm.org/D141367
2023-01-13 11:42:05 +01:00
Francesco Petrogalli
df6ae1779f Revert "[SchedBoundary] Add dump method for resource usage."
Reverting because of https://lab.llvm.org/buildbot#builders/16/builds/41860

When building on x86, I need to specify also -mtriple in the
invocation of llc otherwise the folllowing error shows up:

    'cortex-a55' is not a recognized processor for this target (ignoring processor)

This reverts commit b39a9a94f420a25a239ae03097c255900cbd660e.
2023-01-13 11:14:20 +01:00
Francesco Petrogalli
b39a9a94f4 [SchedBoundary] Add dump method for resource usage.
As supporting information, I have added an example that describes how
the indexes of the vector of resources SchedBoundary::ReservedCycles
are tracked by the field SchedBoundary::ReservedCyclesIndex.

Reviewed By: jroelofs

Differential Revision: https://reviews.llvm.org/D141367
2023-01-13 10:38:43 +01:00
Dmitry Vassiliev
adc387460d [CodeGen] Fixed undeclared MISchedCutoff in case of NDEBUG and LLVM_ENABLE_ABI_BREAKING_CHECKS
This patch fixes the error llvm/lib/CodeGen/MachineScheduler.cpp(755): error C2065: 'MISchedCutoff': undeclared identifier in case of NDEBUG and LLVM_ENABLE_ABI_BREAKING_CHECKS.
Note MISchedCutoff is declared under #ifndef NDEBUG.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D130425
2022-07-30 18:24:50 +02:00
Kazu Hirata
9e6d1f4b5d [CodeGen] Qualify auto variables in for loops (NFC) 2022-07-17 01:33:28 -07:00
Jannik Silvanus
e5c4cde451 [AMDGPU] SIMachineScheduler: Add support for several MachineScheduler features
The SI machine scheduler inherits from ScheduleDAGMI.
This patch adds support for a few features that are implemented
in ScheduleDAGMI (or its base classes) that were missing so far
because their support is implemented in overridden functions.

* Support cl::opt -view-misched-dags
  This option allows to open a graphical window of the scheduling DAG.

* Support cl::opt -misched-print-dags
  This option allows to print the scheduling DAG in text form.

* After constructing the scheduling DAG, call postprocessDAG()
  to apply any registered DAG mutations.
  Note that currently there are no mutations defined in AMDGPUTargetMachine.cpp
  in case SIScheduler is used.
  Still add this to avoid surprises in the future in case mutations are added.

Differential Revision: https://reviews.llvm.org/D128808
2022-07-14 09:45:31 +02:00
Daniil Kovalev
c53cbce45e [CodeGen] Define ABI breaking class members correctly
Non-static class members declared under #ifndef NDEBUG should be declared
under #if LLVM_ENABLE_ABI_BREAKING_CHECKS to make headers library-friendly and
allow cross-linking, as discussed in D120714.

Differential Revision: https://reviews.llvm.org/D121549
2022-03-24 12:42:59 +03:00
serge-sans-paille
989f1c72e0 Cleanup codegen includes
This is a (fixed) recommit of https://reviews.llvm.org/D121169

after:  1061034926
before: 1063332844

Discourse thread: https://discourse.llvm.org/t/include-what-you-use-include-cleanup
Differential Revision: https://reviews.llvm.org/D121681
2022-03-16 08:43:00 +01:00
Nico Weber
a278250b0f Revert "Cleanup codegen includes"
This reverts commit 7f230feeeac8a67b335f52bd2e900a05c6098f20.
Breaks CodeGenCUDA/link-device-bitcode.cu in check-clang,
and many LLVM tests, see comments on https://reviews.llvm.org/D121169
2022-03-10 07:59:22 -05:00
serge-sans-paille
7f230feeea Cleanup codegen includes
after:  1061034926
before: 1063332844

Differential Revision: https://reviews.llvm.org/D121169
2022-03-10 10:00:30 +01:00
Vang Thao
570471199b [AMDGPU] Fix debug values in scheduler not placed correctly when reverting
Debug position data is cleared after ScheduleDAGMILive::schedule() due to it also calling placeDebugValues(). Make it so the data is not cleared after initial call to placeDebugValues since we will call it again after reverting a schedule.

Secondly, since we skip debug instructions when reverting the schedule on AMDGPU, all debug instructions are now moved to the end of the scheduling region. RegionEnd points to the beginning of this chunk of debug instructions since it was not incremented when a debug instruction was skipped. RegionBegin may also point to the same debug instruction if Unsched.front() is a debug instruction thus shrinking the region to 1. Fix RegionBegin and RegionEnd so that they point to the current beginning and ending before calling placeDebugValues() since both vars will be used as reference points to move debug instructions back.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D119022
2022-02-07 11:01:13 -08:00
James Nagurne
cc3bb85580 [llvm][Hexagon] Generalize VLIWResourceModel, VLIWMachineScheduler, and ConvergingVLIWScheduler
The Pre-RA VLIWMachineScheduler used by Hexagon is a relatively generic
implementation that would make sense to use on other VLIW targets.

This commit lifts those classes into their own header/source file with the
root VLIWMachineScheduler. I chose this path rather than adding the
strategy et al. into MachineScheduler to avoid bloating the file with other
implementations.

Target-specific behaviors have been captured and replicated through
function overloads.

- Added an overloadable DFAPacketizer creation member function. This is
  mainly done for our downstream, which has the capability to override
  the DFAPacketizer with custom implementations. This is an upstreamable
  TODO on our end. Currently, it always returns the result of
  TargetInstrInfo::CreateTargetScheduleState
- Added an extra helper which returns the number of instructions in the
  current packet. This is used in our downstream, and may be useful
  elsewhere.
- Placed the priority heuristic values into the ConvergingVLIWscheduler
  class instead of defining them as local statics in the implementation
- Added a overridable helper in ConvergingVLIWScheduler so that targets
  can create their own VLIWResourceModel

Differential Revision: https://reviews.llvm.org/D113150
2021-12-06 16:23:48 -06:00
Kazu Hirata
ca2f53897a [CodeGen] Use range-based for loops (NFC) 2021-12-04 08:48:05 -08:00
Jay Foad
985eb25546 [MachineScheduler] Fix tracing
Consistently print a newline before "RegionInstrs:".
2021-08-26 09:27:01 +01:00
Qiu Chaofan
07f0faed11 [NFC][Scheduler] Refactor tryCandidate to return boolean
This patch changes return type of tryCandidate from void to bool:

1. Methods in some targets already follow this convention.
2. This would help if some target wants to re-use generic code.
3. It looks more intuitive if these try-method returns the same type.

We may need to change return type of them from bool to some enum
further, to make it less confusing.

Reviewed By: foad

Differential Revision: https://reviews.llvm.org/D103951
2021-07-01 14:31:47 +08:00
Hongtao Yu
b98807df05 [CSSPGO] Exclude pseudo probes from slot index
Pseudo probe are currently given a slot index like other regular instructions. This affects register pressure and lifetime weight computation because of enlarged lifetime length with pseudo probe instructions. As a consequence, program could get different code generated w/ and w/o pseudo probes. I'm closing the gap by excluding pseudo probes from stack index and downstream register allocation related passes.

Reviewed By: wmi

Differential Revision: https://reviews.llvm.org/D100334
2021-04-19 17:55:35 -07:00
David Penry
ca8eef7e3d [CodeGen] Use ProcResGroup information in SchedBoundary
When the ProcResGroup has BufferSize=0,

1. if there is a subunit in the list of write resources for the
   scheduling class, do not attempt to schedule the ProcResGroup.
2. if there is not a subunit in the list of write resources for the
   scheduling class, choose a subunit to use instead of the ProcResGroup.
3. having both the ProcResGroup and any of its subunits in the resources
   implied by a InstRW is not supported.

Used to model parallel uses from a pool of resources.

Differential Revision: https://reviews.llvm.org/D98976
2021-04-19 21:27:45 +01:00
Kazu Hirata
3279943adf [CodeGen] Use range-based for loops (NFC) 2021-02-16 23:23:08 -08:00
Bardia Mahjour
6eff12788e [DDG] Data Dependence Graph - DOT printer - recommit
This is being recommitted to try and address the MSVC complaint.

This patch implements a DDG printer pass that generates a graph in
the DOT description language, providing a more visually appealing
representation of the DDG. Similar to the CFG DOT printer, this
functionality is provided under an option called -dot-ddg and can
be generated in a less verbose mode under -dot-ddg-only option.

Reviewed By: Meinersbur

Differential Revision: https://reviews.llvm.org/D90159
2020-12-16 12:37:36 -05:00
Bardia Mahjour
a29ecca781 Revert "[DDG] Data Dependence Graph - DOT printer"
This reverts commit fd4a10732c8bd646ccc621c0a9af512be252f33a, to
investigate the failure on windows: http://lab.llvm.org:8011/#/builders/127/builds/3274
2020-12-14 16:54:20 -05:00
Bardia Mahjour
fd4a10732c [DDG] Data Dependence Graph - DOT printer
This patch implements a DDG printer pass that generates a graph in
the DOT description language, providing a more visually appealing
representation of the DDG. Similar to the CFG DOT printer, this
functionality is provided under an option called -dot-ddg and can
be generated in a less verbose mode under -dot-ddg-only option.

Differential Revision: https://reviews.llvm.org/D90159
2020-12-14 16:41:14 -05:00
Jon Roelofs
a461e76b6f [MachineScheduler] Inform pass infra of post-ra scheduler's dependencies
Differential Revision: https://reviews.llvm.org/D91561
2020-11-17 10:56:12 -08:00
QingShan Zhang
1d178d600a [Scheduling] Fall back to the fast cluster algorithm if the DAG is too complex
We have added a new load/store cluster algorithm in D85517. However, AArch64 see
some compiling deg with the new algorithm as the IsReachable() is not cheap if
the DAG is complex. O(M+N) See https://bugs.llvm.org/show_bug.cgi?id=47966
So, this patch added a heuristic to switch to old cluster algorithm if the DAG is too complex.

Reviewed By: Owen Anderson

Differential Revision: https://reviews.llvm.org/D90144
2020-11-02 02:11:52 +00:00
Mircea Trofin
f0a98ad820 [NFC] Use Register in RegisterPressure APIs
Some related changes as well.

Differential Revision: https://reviews.llvm.org/D90268
2020-10-28 12:14:08 -07:00
Jon Roelofs
db80cc397e [CodeGen][MachineSched] Fixup function name typo. NFC 2020-10-05 12:43:50 -07:00
Alexander Belyaev
17dc729bd4 Revert "[NFC][ScheduleDAG] Remove unused EntrySU SUnit"
This reverts commit 0345d88de654259ae90494bf9b015416e2cccacb.

Google internal backend uses EntrySU, we are looking into removing
dependency on it.

Differential Revision: https://reviews.llvm.org/D88018
2020-09-21 13:33:05 +02:00
Francis Visoiu Mistrih
0345d88de6 [NFC][ScheduleDAG] Remove unused EntrySU SUnit
EntrySU doesn't seem to be used at all when building the ScheduleDAG.

Differential Revision: https://reviews.llvm.org/D87867
2020-09-18 09:50:47 -07:00
QingShan Zhang
ebf3b188c6 [Scheduling] Implement a new way to cluster loads/stores
Before calling target hook to determine if two loads/stores are clusterable,
we put them into different groups to avoid fake cluster due to dependency.
For now, we are putting the loads/stores into the same group if they have
the same predecessor. We assume that, if two loads/stores have the same
predecessor, it is likely that, they didn't have dependency for each other.

However, one SUnit might have several predecessors and for now, we just
pick up the first predecessor that has non-data/non-artificial dependency,
which is too arbitrary. And we are struggling to fix it.

So, I am proposing some better implementation.
1. Collect all the loads/stores that has memory info first to reduce the complexity.
2. Sort these loads/stores so that we can stop the seeking as early as possible.
3. For each load/store, seeking for the first non-dependency instruction with the
   sorted order, and check if they can cluster or not.

Reviewed By: Jay Foad

Differential Revision: https://reviews.llvm.org/D85517
2020-08-26 12:33:59 +00:00
QingShan Zhang
2b2bfdb474 [NFC] Add the stats for load/store cluster
We have the stats for MacroFusion but miss it for load/store cluster.
2020-08-07 07:09:48 +00:00
QingShan Zhang
3359ea62ed [Scheduling] Create the missing dependency edges for store cluster
If it is load cluster, we don't need to create the dependency edges(SUb->reg) from SUb to SUa
as they both depend on the base register "reg"

     +-------+
+---->  reg  |
|    +---+---+
|        ^
|        |
|        |
|        |
|    +---+---+
|    |  SUa  |  Load 0(reg)
|    +---+---+
|        ^
|        |
|        |
|    +---+---+
+----+  SUb  |  Load 4(reg)
     +-------+

But if it is store cluster, we need to create it as follow shows to avoid the instruction store
depend on scheduled in-between SUb and SUa.

     +-------+
+---->  reg  |
|    +---+---+
|        ^
|        |         Missing       +-------+
|        | +-------------------->+   y   |
|        | |                     +---+---+
|    +---+-+-+                       ^
|    |  SUa  |  Store x 0(reg)       |
|    +---+---+                       |
|        ^                           |
|        |  +------------------------+
|        |  |
|    +---+--++
+----+  SUb  |  Store y 4(reg)
     +-------+

Reviewed By: evandro, arsenm, rampitec, foad, fhahn

Differential Revision: https://reviews.llvm.org/D72031
2020-08-07 04:58:03 +00:00
Jon Roelofs
7f1556f292 Fix typo: s/epomymous/eponymous/ NFC 2020-08-03 14:09:46 -06:00
QingShan Zhang
a6e9f5264c [Scheduling] Improve group algorithm for store cluster
Store Addr and Store Addr+8 are clusterable pair. They have memory(ctrl) dependency on different loads.
Current implementation will put these two stores into different group and miss to cluster them.

Reviewed By: evandro

Differential Revision: https://reviews.llvm.org/D84139
2020-07-27 02:02:40 +00:00
Jay Foad
62fd7f767c [MachineScheduler] Fix the TopDepth/BotHeightReduce latency heuristics
tryLatency compares two sched candidates. For the top zone it prefers
the one with lesser depth, but only if that depth is greater than the
total latency of the instructions we've already scheduled -- otherwise
its latency would be hidden and there would be no stall.

Unfortunately it only tests the depth of one of the candidates. This can
lead to situations where the TopDepthReduce heuristic does not kick in,
but a lower priority heuristic chooses the other candidate, whose depth
*is* greater than the already scheduled latency, which causes a stall.

The fix is to apply the heuristic if the depth of *either* candidate is
greater than the already scheduled latency.

All this also applies to the BotHeightReduce heuristic in the bottom
zone.

Differential Revision: https://reviews.llvm.org/D72392
2020-07-17 11:02:13 +01:00
hsmahesha
5832950adb [AMDGPU/MemOpsCluster] Compute width for MIMG instruction class.
Summary:
`width` computation is missing for newly added `MIMG`
instruction class. Add it.

Reviewers: foad, rampitec, arsenm

Reviewed By: foad

Subscribers: MatzeB, javed.absar, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, kerbowa, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D81649
2020-06-23 17:32:17 +05:30
David Green
2fea3fe41c [MachineScheduler] Update available queue on the first mop of a new cycle
If a resource can be held for multiple cycles in the schedule model
then an instruction can be placed into the available queue, another
instruction can be scheduled, but the first will not be taken back out if
the two instructions hazard. To fix this make sure that we update the
available queue even on the first MOp of a cycle, pushing available
instructions back into the pending queue if they now conflict.

This happens with some downstream schedules we have around MVE
instruction scheduling where we use ResourceCycles=[2] to show the
instruction executing over two beats. Apparently the test changes here
are OK too.

Differential Revision: https://reviews.llvm.org/D76909
2020-06-09 19:13:53 +01:00
hsmahesha
0ed2c04636 [AMDGPU/MemOpsCluster] Let mem ops clustering logic also consider number of clustered bytes
Summary:
While clustering mem ops, AMDGPU target needs to consider number of clustered bytes
to decide on max number of mem ops that can be clustered. This patch adds support to pass
number of clustered bytes to target mem ops clustering logic.

Reviewers: foad, rampitec, arsenm, vpykhtin, javedabsar

Reviewed By: foad

Subscribers: MatzeB, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, javed.absar, kerbowa, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D80545
2020-06-01 22:52:34 +05:30
hsmahesha
09f7dcb64e [AMDGPU/MemOpsCluster] Code clean-up around mem ops clustering logic
Summary:
Clean-up code around mem ops clustering logic. This patch cleans up code within
the function clusterNeighboringMemOps(). It is WIP, and this patch is a first cut.

Reviewers: foad, rampitec, arsenm, vpykhtin, javedabsar

Reviewed By: foad

Subscribers: MatzeB, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, javed.absar, kerbowa, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D80119
2020-05-26 15:49:21 +05:30
Mark Lacey
328bb446dd Add a policy to enable computing SchedDFSResult.
Summary:
Make GenericScheduler compute SchedDFSResult on initialization if
the policy is set. This makes it possible to create classes
that extend GenericScheduler and rely on the results of SchedDFSResult,
e.g. to perform subtree scheduling.

NFC unless the policy is set.

Subscribers: MatzeB, hiraditya, javed.absar, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D78432
2020-04-22 16:36:11 -07:00
Sander de Smalen
8fbc925807 Add OffsetIsScalable to getMemOperandWithOffset
Summary:
Making `Scale` a `TypeSize` in AArch64InstrInfo::getMemOpInfo,
has the effect that all places where this information is used
(notably, TargetInstrInfo::getMemOperandWithOffset) will need
to consider Scale - and derived, Offset - possibly being scalable.

This patch adds a new operand `bool &OffsetIsScalable` to
TargetInstrInfo::getMemOperandWithOffset and fixes up all
the places where this function is used, to consider the
offset possibly being scalable.

In most cases, this means bailing out because the algorithm does not
(or cannot) support scalable offsets in places where it does some
form of alias checking for example.

Reviewers: rovka, efriedma, kristof.beyls

Reviewed By: efriedma

Subscribers: wuzish, kerbowa, MatzeB, arsenm, nemanjai, jvesely, nhaehnle, hiraditya, kbarton, javed.absar, asb, rbar, johnrusso, simoncook, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, jsji, Jim, lenary, s.egerton, pzheng, sameer.abuasal, apazos, luismarques, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D72758
2020-02-18 15:53:29 +00:00
Jay Foad
0d7bd34312 [MachineScheduler] Ignore artificial edges when forming store chains
Summary:
BaseMemOpClusterMutation::apply forms store chains by looking for
control (i.e. non-data) dependencies from one mem op to another.

In the test case, clusterNeighboringMemOps successfully clusters the
loads, and then adds artificial edges to the loads' successors as
described in the comment:
      // Copy successor edges from SUa to SUb. Interleaving computation
      // dependent on SUa can prevent load combining due to register reuse.
The effect of this is that *data* dependencies from one load to a store
are copied as *artificial* dependencies from a different load to the
same store.

Then when BaseMemOpClusterMutation::apply looks at the stores, it finds
that some of them have a control dependency on a previous load, which
breaks the chains and means that the stores are not all considered part
of the same chain and won't all be clustered.

The fix is to only consider non-artificial control dependencies when
forming chains.

Subscribers: MatzeB, jvesely, nhaehnle, hiraditya, javed.absar, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D71717
2020-01-29 16:23:01 +00:00
Benjamin Kramer
adcd026838 Make llvm::StringRef to std::string conversions explicit.
This is how it should've been and brings it more in line with
std::string_view. There should be no functional change here.

This is mostly mechanical from a custom clang-tidy check, with a lot of
manual fixups. It uncovers a lot of minor inefficiencies.

This doesn't actually modify StringRef yet, I'll do that in a follow-up.
2020-01-28 23:25:25 +01:00
Stanislav Mekhanoshin
be8e38cbd9 Correct NumLoads in clustering
Scheduler sends NumLoads argument into shouldClusterMemOps()
one less the actual cluster length. So for 2 instructions
it will pass just 1. Correct this number.

This is NFC for in tree targets.

Differential Revision: https://reviews.llvm.org/D73292
2020-01-24 12:45:28 -08:00
Jay Foad
e0f0d0e55c [MachineScheduler] Allow clustering mem ops with complex addresses
The generic BaseMemOpClusterMutation calls into TargetInstrInfo to
analyze the address of each load/store instruction, and again to decide
whether two instructions should be clustered. Previously this had to
represent each address as a single base operand plus a constant byte
offset. This patch extends it to support any number of base operands.

The old target hook getMemOperandWithOffset is now a convenience
function for callers that are only prepared to handle a single base
operand. It calls the new more general target hook
getMemOperandsWithOffset.

The only requirements for the base operands returned by
getMemOperandsWithOffset are:
- they can be sorted by MemOpInfo::Compare, such that clusterable ops
  get sorted next to each other, and
- shouldClusterMemOps knows what they mean.

One simple follow-on is to enable clustering of AMDGPU FLAT instructions
with both vaddr and saddr (base register + offset register). I've left
a FIXME in the code for this case.

Differential Revision: https://reviews.llvm.org/D71655
2020-01-22 14:28:24 +00:00
Jinsong Ji
c65ac2ba78 [MachineScheduler][NFC] Don't swap when we can't cluster
https://reviews.llvm.org/D72706 tried to reduce reordering due to mem op
clustering. This patch avoid doing the swap when we can't cluster.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D72800
2020-01-15 21:55:31 +00:00