llvm-project

mirror of https://github.com/llvm/llvm-project.git synced 2025-05-01 04:16:08 +00:00

Author	SHA1	Message	Date
Andrew Trick	880e573d98	MI-Sched: handle latency of in-order operations with the new machine model. The per-operand machine model allows the target to define "unbuffered" processor resources. This change is a quick, cheap way to model stalls caused by the latency of operations that use such resources. This only applies when the processor's micro-op buffer size is non-zero (Out-of-Order). We can't precisely model in-order stalls during out-of-order execution, but this is an easy and effective heuristic. It benefits cortex-a9 scheduling when using the new machine model, which is not yet on by default. MI-Sched for armv7 was evaluated on Swift (and only not enabled because of a performance bug related to predication). However, we never evaluated Cortex-A9 performance on MI-Sched in its current form. This change adds MI-Sched functionality to reach performance goals on A9. The only remaining change is to allow MI-Sched to run as a PostRA pass. I evaluated performance using a set of options to estimate the performance impact once MI sched is default on armv7: -mcpu=cortex-a9 -disable-post-ra -misched-bench -scheditins=false For a simple saxpy loop I see a 1.7x speedup. Here are the llvm-testsuite results: (min run time over 2 runs, filtering tiny changes) Speedups: \| Benchmarks/BenchmarkGame/recursive \| 52.39% \| \| Benchmarks/VersaBench/beamformer \| 20.80% \| \| Benchmarks/Misc/pi \| 19.97% \| \| Benchmarks/Misc/mandel-2 \| 19.95% \| \| SPEC/CFP2000/188.ammp \| 18.72% \| \| Benchmarks/McCat/08-main/main \| 18.58% \| \| Benchmarks/Misc-C++/Large/sphereflake \| 18.46% \| \| Benchmarks/Olden/power \| 17.11% \| \| Benchmarks/Misc-C++/mandel-text \| 16.47% \| \| Benchmarks/Misc/oourafft \| 15.94% \| \| Benchmarks/Misc/flops-7 \| 14.99% \| \| Benchmarks/FreeBench/distray \| 14.26% \| \| SPEC/CFP2006/470.lbm \| 14.00% \| \| mediabench/mpeg2/mpeg2dec/mpeg2decode \| 12.28% \| \| Benchmarks/SmallPT/smallpt \| 10.36% \| \| Benchmarks/Misc-C++/Large/ray \| 8.97% \| \| Benchmarks/Misc/fp-convert \| 8.75% \| \| Benchmarks/Olden/perimeter \| 7.10% \| \| Benchmarks/Bullet/bullet \| 7.03% \| \| Benchmarks/Misc/mandel \| 6.75% \| \| Benchmarks/Olden/voronoi \| 6.26% \| \| Benchmarks/Misc/flops-8 \| 5.77% \| \| Benchmarks/Misc/matmul_f64_4x4 \| 5.19% \| \| Benchmarks/MiBench/security-rijndael \| 5.15% \| \| Benchmarks/Misc/flops-6 \| 5.10% \| \| Benchmarks/Olden/tsp \| 4.46% \| \| Benchmarks/MiBench/consumer-lame \| 4.28% \| \| Benchmarks/Misc/flops-5 \| 4.27% \| \| Benchmarks/mafft/pairlocalalign \| 4.19% \| \| Benchmarks/Misc/himenobmtxpa \| 4.07% \| \| Benchmarks/Misc/lowercase \| 4.06% \| \| SPEC/CFP2006/433.milc \| 3.99% \| \| Benchmarks/tramp3d-v4 \| 3.79% \| \| Benchmarks/FreeBench/pifft \| 3.66% \| \| Benchmarks/Ptrdist/ks \| 3.21% \| \| Benchmarks/Adobe-C++/loop_unroll \| 3.12% \| \| SPEC/CINT2000/175.vpr \| 3.12% \| \| Benchmarks/nbench \| 2.98% \| \| SPEC/CFP2000/183.equake \| 2.91% \| \| Benchmarks/Misc/perlin \| 2.85% \| \| Benchmarks/Misc/flops-1 \| 2.82% \| \| Benchmarks/Misc-C++-EH/spirit \| 2.80% \| \| Benchmarks/Misc/flops-2 \| 2.77% \| \| Benchmarks/NPB-serial/is \| 2.42% \| \| Benchmarks/ASC_Sequoia/CrystalMk \| 2.33% \| \| Benchmarks/BenchmarkGame/n-body \| 2.28% \| \| Benchmarks/SciMark2-C/scimark2 \| 2.27% \| \| Benchmarks/Olden/bh \| 2.03% \| \| skidmarks10/skidmarks \| 1.81% \| \| Benchmarks/Misc/flops \| 1.72% \| Slowdowns: \| Benchmarks/llubenchmark/llu \| -14.14% \| \| Benchmarks/Polybench/stencils/seidel-2d \| -5.67% \| \| Benchmarks/Adobe-C++/functionobjects \| -5.25% \| \| Benchmarks/Misc-C++/oopack_v1p8 \| -5.00% \| \| Benchmarks/Shootout/hash \| -2.35% \| \| Benchmarks/Prolangs-C++/ocean \| -2.01% \| \| Benchmarks/Polybench/medley/floyd-warshall \| -1.98% \| \| Polybench/linear-algebra/kernels/3mm \| -1.95% \| \| Benchmarks/McCat/09-vor/vor \| -1.68% \| llvm-svn: 196516	2013-12-05 17:55:58 +00:00
Andrew Trick	bb1247b9f0	comment typo and reformat llvm-svn: 196513	2013-12-05 17:55:47 +00:00
Juergen Ributzka	d12ccbd343	[weak vtables] Remove a bunch of weak vtables This patch removes most of the trivial cases of weak vtables by pinning them to a single object file. The memory leaks in this version have been fixed. Thanks Alexey for pointing them out. Differential Revision: http://llvm-reviews.chandlerc.com/D2068 Reviewed by Andy llvm-svn: 195064	2013-11-19 00:57:56 +00:00
Alexey Samsonov	49109a279c	Revert r194865 and r194874. This change is incorrect. If you delete virtual destructor of both a base class and a subclass, then the following code: Base *foo = new Child(); delete foo; will not cause the destructor for members of Child class. As a result, I observe plently of memory leaks. Notable examples I investigated are: ObjectBuffer and ObjectBufferStream, AttributeImpl and StringSAttributeImpl. llvm-svn: 194997	2013-11-18 09:31:53 +00:00
Juergen Ributzka	dbedae89b9	[weak vtables] Remove a bunch of weak vtables This patch removes most of the trivial cases of weak vtables by pinning them to a single object file. Differential Revision: http://llvm-reviews.chandlerc.com/D2068 Reviewed by Andy llvm-svn: 194865	2013-11-15 22:34:48 +00:00
Matthias Braun	88dd0abd2d	Pass LiveQueryResult by value This makes the API a bit more natural to use and makes it easier to make LiveRanges implementation details private. llvm-svn: 192394	2013-10-10 21:28:52 +00:00
Andrew Trick	dc4c1adfc7	Comment typo. llvm-svn: 191312	2013-09-24 17:11:19 +00:00
Andrew Trick	978674b2bc	Allow subtarget selection of the default MachineScheduler and document the interface. The global registry is used to allow command line override of the scheduler selection, but does not work well as the normal selection API. For example, the same LLVM process should be able to target multiple targets or subtargets. llvm-svn: 191071	2013-09-20 05:14:41 +00:00
Andrew Trick	665d3ec3d3	Rename ConvergingScheduler to GenericScheduler. This was an experimental scheduler a year ago. It's now used by several subtargets, both in-order and out-of-order, and it is about to be enabled by default for x86 and armv7. It will be the new GenericScheduler for subtargets that don't provide their own SchedulingStrategy. llvm-svn: 191051	2013-09-19 23:10:59 +00:00
Andrew Trick	6c88b35090	Enable -misched-cyclicpath by default. llvm-svn: 190367	2013-09-09 23:31:14 +00:00
Andrew Trick	e1f7bf2c02	mi-sched: smooth out the cyclicpath heuristic. Arnold's idea. I generally try to avoid stateful heuristics because it can make debugging harder. However, we need a way to prevent the latency priority from dominating, and it somewhat makes sense to schedule aggressively for latency only within an issue group. Swift in particular likes this, and it doesn't hurt anyone else: \| Benchmarks/MiBench/consumer-lame \| 10.39% \| \| Benchmarks/Misc/himenobmtxpa \| 9.63% \| llvm-svn: 190360	2013-09-09 22:28:08 +00:00
Andrew Trick	b248b4a1de	mi-sched: cleanup register pressure update, remove a FIXME. llvm-svn: 190181	2013-09-06 17:32:47 +00:00
Andrew Trick	c573cd905a	mi-sched: improve regpressure tracing. llvm-svn: 190180	2013-09-06 17:32:44 +00:00
Andrew Trick	7609b7d1b5	mi-sched: print tree size in -view-misched-dags llvm-svn: 190179	2013-09-06 17:32:42 +00:00
Andrew Trick	ffdbefb90c	mi-sched: register pressure update tracing. llvm-svn: 190178	2013-09-06 17:32:39 +00:00
Andrew Trick	ddffae9027	mi-sched: Reorder Cyclicpath (latency) and CriticalMax (pressure) heuristics. The latency based scheduling could induce spills in some cases. llvm-svn: 190177	2013-09-06 17:32:36 +00:00
Andrew Trick	75e411cc8e	Added MachineSchedPolicy. Allow subtargets to customize the generic scheduling strategy. This is convenient for targets that don't need to add new heuristics by specializing the strategy. llvm-svn: 190176	2013-09-06 17:32:34 +00:00
Andrew Trick	ed20075d19	mi-sched: Force bottom up scheduling for generic targets. Fast register pressure tracking currently only takes effect during bottom up scheduling. Forcing this is a bit faster and simpler for targets that don't have many scheduling constraints and don't need top-down scheduling. llvm-svn: 190014	2013-09-04 23:54:00 +00:00
Andrew Trick	b05db8e0b9	comment typo llvm-svn: 189997	2013-09-04 21:12:05 +00:00
Andrew Trick	2a749ee0b9	Remove dead subtree limit code. llvm-svn: 189995	2013-09-04 21:00:20 +00:00
Andrew Trick	856ecd9ab3	-view-misched-dags, better pruning. llvm-svn: 189994	2013-09-04 21:00:18 +00:00
Andrew Trick	ef54c59490	mi-sched: DEBUG cleanup, call tracePick for unidirectional scheduling. llvm-svn: 189993	2013-09-04 21:00:16 +00:00
Andrew Trick	1ab16d9ecf	80 columns llvm-svn: 189992	2013-09-04 21:00:13 +00:00
Andrew Trick	66c3dfbf8c	mi-sched: Suppress register pressure tracking when the scheduling window is too small. If the instruction window is < NumRegs/2, pressure tracking is not likely to be effective. The scheduler has to process a very large number of tiny blocks. We want this to be fast. llvm-svn: 189991	2013-09-04 21:00:11 +00:00
Andrew Trick	a6e877707f	mi-sched: Load clustering is a bit to expensive to enable unconditionally. llvm-svn: 189990	2013-09-04 21:00:08 +00:00
Andrew Trick	8c699c93b2	mi-sched: Reuse an invalid HazardRecognizer to save compile time. llvm-svn: 189989	2013-09-04 21:00:05 +00:00
Andrew Trick	310190e21f	mi-sched: bypass heuristic checks when regpressure tracking is disabled. llvm-svn: 189988	2013-09-04 21:00:02 +00:00
Andrew Trick	b6e74712b6	Added -misched-regpressure option. Register pressure tracking is half the complexity of the scheduler. It's useful to be able to turn it off for compile time and performance comparisons. llvm-svn: 189987	2013-09-04 20:59:59 +00:00
Andrew Trick	2c4f8b7ee8	Fix my previous checkin to updatePressureDiffs. There was one case that we could hit a DebugValue where I didn't think to check. DebugValues are evil. No checkinable test case, sorry. It's an obvious fix. llvm-svn: 189717	2013-08-31 05:17:58 +00:00
Andrew Trick	2bc74c2887	mi-sched: update PressureDiffs on-the-fly for liveness. This removes all expensive pressure tracking logic from the scheduling critical path of node comparison. llvm-svn: 189643	2013-08-30 04:36:57 +00:00
Andrew Trick	b1a45b6c61	mi-sched: improve the generic register pressure comparison. Only compare pressure within the same set. When multiple sets are affected, we prioritize the most constrained set. llvm-svn: 189641	2013-08-30 04:27:29 +00:00
Andrew Trick	1a8313458f	mi-sched: Precompute a PressureDiff for each instruction, adjust for liveness later. Created SUPressureDiffs array to hold the per node PDiff computed during DAG building. Added a getUpwardPressureDelta API that will soon replace the old one. Compute PressureDelta here from the precomputed PressureDiffs. Updating for liveness will come next. llvm-svn: 189640	2013-08-30 03:49:48 +00:00
Andrew Trick	ef80f50058	comment typo llvm-svn: 189635	2013-08-30 02:02:12 +00:00
Andrew Trick	483f4199f3	Comment and revise the cyclic critical path code. This should be much more clear now. It's still disabled pending testing. llvm-svn: 189597	2013-08-29 18:04:49 +00:00
Andrew Trick	c01b00400d	Adds cyclic critical path computation and heuristics, temporarily disabled. Estimate the cyclic critical path within a single block loop. If the acyclic critical path is longer, then the loop will exhaust OOO resources after some number of iterations. If lag between the acyclic critical path and cyclic critical path is longer the the time it takes to issue those loop iterations, then aggressively schedule for latency. llvm-svn: 189120	2013-08-23 17:48:43 +00:00
Andrew Trick	a53e101627	mi-sched: Don't call MBB.size() in initSUnits. The driver already has instr count. This fixes a pathological compile time problem with very large blocks and lots of scheduling boundaries. llvm-svn: 189116	2013-08-23 17:48:33 +00:00
Andrew Trick	2f7667e018	Confusing comment typo. llvm-svn: 187895	2013-08-07 17:20:32 +00:00
Andrew Trick	9c17eab761	MI Sched: Track live-thru registers. When registers must be live throughout the scheduling region, increase the limit for the register class. Once we exceed the original limit, they will be spilled, and there's no point further reducing pressure. This isn't a perfect heuristics but avoids a situation where the scheduler could become trapped by trying to achieve the impossible. llvm-svn: 187436	2013-07-30 19:59:12 +00:00
Andrew Trick	d9761776bc	MI Sched fix: assert "Disconnected LRG within the scheduling region." llvm-svn: 187435	2013-07-30 19:59:08 +00:00
Andrew Trick	401b6959ae	MI Sched: Register pressure heuristics. Consider which set is being increased or decreased before comparing. llvm-svn: 187110	2013-07-25 07:26:35 +00:00
Andrew Trick	9706496b0d	Dump LIS before regalloc. MI sched changes them. llvm-svn: 187107	2013-07-25 07:26:26 +00:00
Alexey Samsonov	64c391dbe4	Fix uninitialized memory read found by MemorySanitizer: always set output parameter of ConvergingScheduler::SchedBoundary::getOtherResourceCount llvm-svn: 186658	2013-07-19 08:55:18 +00:00
Andrew Trick	b13ef17a14	MI Sched: Update the way resources are tracked so the current heuristics make more sense. llvm-svn: 186632	2013-07-19 00:20:07 +00:00
Andrew Trick	b55db58edf	MI-Sched: cleanup DEBUG output. llvm-svn: 184565	2013-06-21 18:33:01 +00:00
Andrew Trick	736dd9a255	MI-Sched: Adjust regpressure limits for reserved regs. llvm-svn: 184564	2013-06-21 18:32:58 +00:00
Andrew Trick	71f08a3e74	Give RegMax higher priority. llvm-svn: 184133	2013-06-17 21:45:13 +00:00
Andrew Trick	3c3a40e4c6	Remove compareRPDelta. A complex, expensive heuristic with little value in the current design. llvm-svn: 184132	2013-06-17 21:45:11 +00:00
Andrew Trick	7e63046ce9	MI-Sched: Remove another heuristic that is sensitive to queue order. llvm-svn: 184130	2013-06-17 21:45:07 +00:00
Andrew Trick	d40d0f2c1b	MI-Sched: Track multiple candidates with the same priority level. This eliminates the MultiPressure scheduling "reason". It was sensitive to queue order. We don't like being sensitive to queue order. llvm-svn: 184129	2013-06-17 21:45:05 +00:00
Andrew Trick	8e8415f5ab	Missing NDEBUGs. llvm-svn: 184039	2013-06-15 05:46:47 +00:00

1 2 3 4

182 Commits