182 Commits

Author SHA1 Message Date
Andrew Trick
880e573d98 MI-Sched: handle latency of in-order operations with the new machine model.
The per-operand machine model allows the target to define "unbuffered"
processor resources. This change is a quick, cheap way to model stalls
caused by the latency of operations that use such resources. This only
applies when the processor's micro-op buffer size is non-zero
(Out-of-Order). We can't precisely model in-order stalls during
out-of-order execution, but this is an easy and effective
heuristic. It benefits cortex-a9 scheduling when using the new
machine model, which is not yet on by default.

MI-Sched for armv7 was evaluated on Swift (and only not enabled because
of a performance bug related to predication). However, we never
evaluated Cortex-A9 performance on MI-Sched in its current form. This
change adds MI-Sched functionality to reach performance goals on
A9. The only remaining change is to allow MI-Sched to run as a PostRA
pass.

I evaluated performance using a set of options to estimate the performance impact once MI sched is default on armv7:
-mcpu=cortex-a9 -disable-post-ra -misched-bench -scheditins=false

For a simple saxpy loop I see a 1.7x speedup. Here are the llvm-testsuite results:
(min run time over 2 runs, filtering tiny changes)

Speedups:
| Benchmarks/BenchmarkGame/recursive         |  52.39% |
| Benchmarks/VersaBench/beamformer           |  20.80% |
| Benchmarks/Misc/pi                         |  19.97% |
| Benchmarks/Misc/mandel-2                   |  19.95% |
| SPEC/CFP2000/188.ammp                      |  18.72% |
| Benchmarks/McCat/08-main/main              |  18.58% |
| Benchmarks/Misc-C++/Large/sphereflake      |  18.46% |
| Benchmarks/Olden/power                     |  17.11% |
| Benchmarks/Misc-C++/mandel-text            |  16.47% |
| Benchmarks/Misc/oourafft                   |  15.94% |
| Benchmarks/Misc/flops-7                    |  14.99% |
| Benchmarks/FreeBench/distray               |  14.26% |
| SPEC/CFP2006/470.lbm                       |  14.00% |
| mediabench/mpeg2/mpeg2dec/mpeg2decode      |  12.28% |
| Benchmarks/SmallPT/smallpt                 |  10.36% |
| Benchmarks/Misc-C++/Large/ray              |   8.97% |
| Benchmarks/Misc/fp-convert                 |   8.75% |
| Benchmarks/Olden/perimeter                 |   7.10% |
| Benchmarks/Bullet/bullet                   |   7.03% |
| Benchmarks/Misc/mandel                     |   6.75% |
| Benchmarks/Olden/voronoi                   |   6.26% |
| Benchmarks/Misc/flops-8                    |   5.77% |
| Benchmarks/Misc/matmul_f64_4x4             |   5.19% |
| Benchmarks/MiBench/security-rijndael       |   5.15% |
| Benchmarks/Misc/flops-6                    |   5.10% |
| Benchmarks/Olden/tsp                       |   4.46% |
| Benchmarks/MiBench/consumer-lame           |   4.28% |
| Benchmarks/Misc/flops-5                    |   4.27% |
| Benchmarks/mafft/pairlocalalign            |   4.19% |
| Benchmarks/Misc/himenobmtxpa               |   4.07% |
| Benchmarks/Misc/lowercase                  |   4.06% |
| SPEC/CFP2006/433.milc                      |   3.99% |
| Benchmarks/tramp3d-v4                      |   3.79% |
| Benchmarks/FreeBench/pifft                 |   3.66% |
| Benchmarks/Ptrdist/ks                      |   3.21% |
| Benchmarks/Adobe-C++/loop_unroll           |   3.12% |
| SPEC/CINT2000/175.vpr                      |   3.12% |
| Benchmarks/nbench                          |   2.98% |
| SPEC/CFP2000/183.equake                    |   2.91% |
| Benchmarks/Misc/perlin                     |   2.85% |
| Benchmarks/Misc/flops-1                    |   2.82% |
| Benchmarks/Misc-C++-EH/spirit              |   2.80% |
| Benchmarks/Misc/flops-2                    |   2.77% |
| Benchmarks/NPB-serial/is                   |   2.42% |
| Benchmarks/ASC_Sequoia/CrystalMk           |   2.33% |
| Benchmarks/BenchmarkGame/n-body            |   2.28% |
| Benchmarks/SciMark2-C/scimark2             |   2.27% |
| Benchmarks/Olden/bh                        |   2.03% |
| skidmarks10/skidmarks                      |   1.81% |
| Benchmarks/Misc/flops                      |   1.72% |

Slowdowns:
| Benchmarks/llubenchmark/llu                | -14.14% |
| Benchmarks/Polybench/stencils/seidel-2d    |  -5.67% |
| Benchmarks/Adobe-C++/functionobjects       |  -5.25% |
| Benchmarks/Misc-C++/oopack_v1p8            |  -5.00% |
| Benchmarks/Shootout/hash                   |  -2.35% |
| Benchmarks/Prolangs-C++/ocean              |  -2.01% |
| Benchmarks/Polybench/medley/floyd-warshall |  -1.98% |
| Polybench/linear-algebra/kernels/3mm       |  -1.95% |
| Benchmarks/McCat/09-vor/vor                |  -1.68% |

llvm-svn: 196516
2013-12-05 17:55:58 +00:00
Andrew Trick
bb1247b9f0 comment typo and reformat
llvm-svn: 196513
2013-12-05 17:55:47 +00:00
Juergen Ributzka
d12ccbd343 [weak vtables] Remove a bunch of weak vtables
This patch removes most of the trivial cases of weak vtables by pinning them to
a single object file. The memory leaks in this version have been fixed. Thanks
Alexey for pointing them out.

Differential Revision: http://llvm-reviews.chandlerc.com/D2068

Reviewed by Andy

llvm-svn: 195064
2013-11-19 00:57:56 +00:00
Alexey Samsonov
49109a279c Revert r194865 and r194874.
This change is incorrect. If you delete virtual destructor of both a base class
and a subclass, then the following code:
  Base *foo = new Child();
  delete foo;
will not cause the destructor for members of Child class. As a result, I observe
plently of memory leaks. Notable examples I investigated are:
ObjectBuffer and ObjectBufferStream, AttributeImpl and StringSAttributeImpl.

llvm-svn: 194997
2013-11-18 09:31:53 +00:00
Juergen Ributzka
dbedae89b9 [weak vtables] Remove a bunch of weak vtables
This patch removes most of the trivial cases of weak vtables by pinning them to
a single object file.

Differential Revision: http://llvm-reviews.chandlerc.com/D2068

Reviewed by Andy

llvm-svn: 194865
2013-11-15 22:34:48 +00:00
Matthias Braun
88dd0abd2d Pass LiveQueryResult by value
This makes the API a bit more natural to use and makes it easier to make
LiveRanges implementation details private.

llvm-svn: 192394
2013-10-10 21:28:52 +00:00
Andrew Trick
dc4c1adfc7 Comment typo.
llvm-svn: 191312
2013-09-24 17:11:19 +00:00
Andrew Trick
978674b2bc Allow subtarget selection of the default MachineScheduler and document the interface.
The global registry is used to allow command line override of the
scheduler selection, but does not work well as the normal selection
API. For example, the same LLVM process should be able to target
multiple targets or subtargets.

llvm-svn: 191071
2013-09-20 05:14:41 +00:00
Andrew Trick
665d3ec3d3 Rename ConvergingScheduler to GenericScheduler.
This was an experimental scheduler a year ago. It's now used by
several subtargets, both in-order and out-of-order, and it
is about to be enabled by default for x86 and armv7. It will be the
new GenericScheduler for subtargets that don't provide their own
SchedulingStrategy.

llvm-svn: 191051
2013-09-19 23:10:59 +00:00
Andrew Trick
6c88b35090 Enable -misched-cyclicpath by default.
llvm-svn: 190367
2013-09-09 23:31:14 +00:00
Andrew Trick
e1f7bf2c02 mi-sched: smooth out the cyclicpath heuristic.
Arnold's idea.

I generally try to avoid stateful heuristics because it can make
debugging harder. However, we need a way to prevent the latency
priority from dominating, and it somewhat makes sense to schedule
aggressively for latency only within an issue group.

Swift in particular likes this, and it doesn't hurt anyone else:
| Benchmarks/MiBench/consumer-lame              |  10.39% |
| Benchmarks/Misc/himenobmtxpa                  |   9.63% |

llvm-svn: 190360
2013-09-09 22:28:08 +00:00
Andrew Trick
b248b4a1de mi-sched: cleanup register pressure update, remove a FIXME.
llvm-svn: 190181
2013-09-06 17:32:47 +00:00
Andrew Trick
c573cd905a mi-sched: improve regpressure tracing.
llvm-svn: 190180
2013-09-06 17:32:44 +00:00
Andrew Trick
7609b7d1b5 mi-sched: print tree size in -view-misched-dags
llvm-svn: 190179
2013-09-06 17:32:42 +00:00
Andrew Trick
ffdbefb90c mi-sched: register pressure update tracing.
llvm-svn: 190178
2013-09-06 17:32:39 +00:00
Andrew Trick
ddffae9027 mi-sched: Reorder Cyclicpath (latency) and CriticalMax (pressure) heuristics.
The latency based scheduling could induce spills in some cases.

llvm-svn: 190177
2013-09-06 17:32:36 +00:00
Andrew Trick
75e411cc8e Added MachineSchedPolicy.
Allow subtargets to customize the generic scheduling strategy.
This is convenient for targets that don't need to add new heuristics
by specializing the strategy.

llvm-svn: 190176
2013-09-06 17:32:34 +00:00
Andrew Trick
ed20075d19 mi-sched: Force bottom up scheduling for generic targets.
Fast register pressure tracking currently only takes effect during
bottom up scheduling. Forcing this is a bit faster and simpler for
targets that don't have many scheduling constraints and don't need
top-down scheduling.

llvm-svn: 190014
2013-09-04 23:54:00 +00:00
Andrew Trick
b05db8e0b9 comment typo
llvm-svn: 189997
2013-09-04 21:12:05 +00:00
Andrew Trick
2a749ee0b9 Remove dead subtree limit code.
llvm-svn: 189995
2013-09-04 21:00:20 +00:00
Andrew Trick
856ecd9ab3 -view-misched-dags, better pruning.
llvm-svn: 189994
2013-09-04 21:00:18 +00:00
Andrew Trick
ef54c59490 mi-sched: DEBUG cleanup, call tracePick for unidirectional scheduling.
llvm-svn: 189993
2013-09-04 21:00:16 +00:00
Andrew Trick
1ab16d9ecf 80 columns
llvm-svn: 189992
2013-09-04 21:00:13 +00:00
Andrew Trick
66c3dfbf8c mi-sched: Suppress register pressure tracking when the scheduling window is too small.
If the instruction window is < NumRegs/2, pressure tracking is not
likely to be effective. The scheduler has to process a very large
number of tiny blocks. We want this to be fast.

llvm-svn: 189991
2013-09-04 21:00:11 +00:00
Andrew Trick
a6e877707f mi-sched: Load clustering is a bit to expensive to enable unconditionally.
llvm-svn: 189990
2013-09-04 21:00:08 +00:00
Andrew Trick
8c699c93b2 mi-sched: Reuse an invalid HazardRecognizer to save compile time.
llvm-svn: 189989
2013-09-04 21:00:05 +00:00
Andrew Trick
310190e21f mi-sched: bypass heuristic checks when regpressure tracking is disabled.
llvm-svn: 189988
2013-09-04 21:00:02 +00:00
Andrew Trick
b6e74712b6 Added -misched-regpressure option.
Register pressure tracking is half the complexity of the
scheduler. It's useful to be able to turn it off for compile time and
performance comparisons.

llvm-svn: 189987
2013-09-04 20:59:59 +00:00
Andrew Trick
2c4f8b7ee8 Fix my previous checkin to updatePressureDiffs.
There was one case that we could hit a DebugValue where I didn't think
to check. DebugValues are evil. No checkinable test case, sorry. It's
an obvious fix.

llvm-svn: 189717
2013-08-31 05:17:58 +00:00
Andrew Trick
2bc74c2887 mi-sched: update PressureDiffs on-the-fly for liveness.
This removes all expensive pressure tracking logic from the scheduling
critical path of node comparison.

llvm-svn: 189643
2013-08-30 04:36:57 +00:00
Andrew Trick
b1a45b6c61 mi-sched: improve the generic register pressure comparison.
Only compare pressure within the same set. When multiple sets are
affected, we prioritize the most constrained set.

llvm-svn: 189641
2013-08-30 04:27:29 +00:00
Andrew Trick
1a8313458f mi-sched: Precompute a PressureDiff for each instruction, adjust for liveness later.
Created SUPressureDiffs array to hold the per node PDiff computed during DAG building.

Added a getUpwardPressureDelta API that will soon replace the old
one. Compute PressureDelta here from the precomputed PressureDiffs.

Updating for liveness will come next.

llvm-svn: 189640
2013-08-30 03:49:48 +00:00
Andrew Trick
ef80f50058 comment typo
llvm-svn: 189635
2013-08-30 02:02:12 +00:00
Andrew Trick
483f4199f3 Comment and revise the cyclic critical path code.
This should be much more clear now. It's still disabled pending testing.

llvm-svn: 189597
2013-08-29 18:04:49 +00:00
Andrew Trick
c01b00400d Adds cyclic critical path computation and heuristics, temporarily disabled.
Estimate the cyclic critical path within a single block loop. If the
acyclic critical path is longer, then the loop will exhaust OOO
resources after some number of iterations. If lag between the acyclic
critical path and cyclic critical path is longer the the time it takes
to issue those loop iterations, then aggressively schedule for
latency.

llvm-svn: 189120
2013-08-23 17:48:43 +00:00
Andrew Trick
a53e101627 mi-sched: Don't call MBB.size() in initSUnits. The driver already has instr count.
This fixes a pathological compile time problem with very large blocks
and lots of scheduling boundaries.

llvm-svn: 189116
2013-08-23 17:48:33 +00:00
Andrew Trick
2f7667e018 Confusing comment typo.
llvm-svn: 187895
2013-08-07 17:20:32 +00:00
Andrew Trick
9c17eab761 MI Sched: Track live-thru registers.
When registers must be live throughout the scheduling region, increase
the limit for the register class. Once we exceed the original limit,
they will be spilled, and there's no point further reducing pressure.

This isn't a perfect heuristics but avoids a situation where the
scheduler could become trapped by trying to achieve the impossible.

llvm-svn: 187436
2013-07-30 19:59:12 +00:00
Andrew Trick
d9761776bc MI Sched fix: assert "Disconnected LRG within the scheduling region."
llvm-svn: 187435
2013-07-30 19:59:08 +00:00
Andrew Trick
401b6959ae MI Sched: Register pressure heuristics.
Consider which set is being increased or decreased before comparing.

llvm-svn: 187110
2013-07-25 07:26:35 +00:00
Andrew Trick
9706496b0d Dump LIS before regalloc. MI sched changes them.
llvm-svn: 187107
2013-07-25 07:26:26 +00:00
Alexey Samsonov
64c391dbe4 Fix uninitialized memory read found by MemorySanitizer: always set output parameter of ConvergingScheduler::SchedBoundary::getOtherResourceCount
llvm-svn: 186658
2013-07-19 08:55:18 +00:00
Andrew Trick
b13ef17a14 MI Sched: Update the way resources are tracked so the current heuristics make more sense.
llvm-svn: 186632
2013-07-19 00:20:07 +00:00
Andrew Trick
b55db58edf MI-Sched: cleanup DEBUG output.
llvm-svn: 184565
2013-06-21 18:33:01 +00:00
Andrew Trick
736dd9a255 MI-Sched: Adjust regpressure limits for reserved regs.
llvm-svn: 184564
2013-06-21 18:32:58 +00:00
Andrew Trick
71f08a3e74 Give RegMax higher priority.
llvm-svn: 184133
2013-06-17 21:45:13 +00:00
Andrew Trick
3c3a40e4c6 Remove compareRPDelta.
A complex, expensive heuristic with little value in the current design.

llvm-svn: 184132
2013-06-17 21:45:11 +00:00
Andrew Trick
7e63046ce9 MI-Sched: Remove another heuristic that is sensitive to queue order.
llvm-svn: 184130
2013-06-17 21:45:07 +00:00
Andrew Trick
d40d0f2c1b MI-Sched: Track multiple candidates with the same priority level.
This eliminates the MultiPressure scheduling "reason". It was
sensitive to queue order. We don't like being sensitive to queue
order.

llvm-svn: 184129
2013-06-17 21:45:05 +00:00
Andrew Trick
8e8415f5ab Missing NDEBUGs.
llvm-svn: 184039
2013-06-15 05:46:47 +00:00