llvm-project

mirror of https://github.com/llvm/llvm-project.git synced 2025-05-05 16:16:05 +00:00

Author	SHA1	Message	Date
Hans Wennborg	abe01e17f6	Revert "[llvm-exegesis] Improve error reporting" and follow-up. It broke e.g. all tests under tools/llvm-exegesis/X86/ when libpfm is not available, see comment on D74085. This reverts commit b3576f60ebc8f660afad8120a72473be47517573 and 141915963b6ab36ee4e577d1b27673fa4d05b409.	2020-02-06 12:53:16 +01:00
Miloš Stojanović	141915963b	[llvm-exegesis] Improve error reporting in Target.cpp Followup to D74085. Replace the use of `report_fatal_error()` with returning the error to `llvm-exegesis.cpp` and handling it there. Differential Revision: https://reviews.llvm.org/D74113	2020-02-06 12:26:08 +01:00
Miloš Stojanović	b3576f60eb	[llvm-exegesis] Improve error reporting Fix inconsistencies in error reporting created by mixing `report_fatal_error()` and `ExitOnErr()`, and add additional information to the error message to make it more user friendly. Minimize the use `report_fatal_error()` because it's meant for use in very rare cases and it results in low information density of the error messages. Summary of the new design: * For command line argument errors output `llvm-exegesis: <error_message>`, which is consistent with the error output format emitted by the backend which checks correctness of the command line arguments. * For other errors the format `llvm-exegesis error: <error_message>` is used. ** If the error occurred during file access `<error_message>` will have of two parts: `'<file_name>': <rest_of_the_error_message>` Differential Revision: https://reviews.llvm.org/D74085	2020-02-06 12:26:08 +01:00
Miloš Stojanović	c7dc4734d2	[llvm-exegesis] Check counters before running Check if the appropriate counters for the specified mode are defined on the target. This is checked before any other work is done. Differential Revision: https://reviews.llvm.org/D71927	2019-12-31 14:17:24 +01:00
Mark de Wever	536c9a604e	[Tools] Fixes -Wrange-loop-analysis warnings This avoids new warnings due to D68912 adds -Wrange-loop-analysis to -Wall. Differential Revision: https://reviews.llvm.org/D71808	2019-12-22 19:11:17 +01:00
Guillaume Chatelet	32d384c020	[llvm-exegesis][NFC] internal changes Summary: BitVectors are now cached to lower memory utilization. Instructions have reference semantics. Reviewers: courbet Subscribers: sdardis, tschuett, jrtc27, atanasyan, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D71653	2019-12-18 17:24:07 +01:00
Simon Pilgrim	aedb528d43	llvm-exegesis - fix shadow variable warnings. NFCI.	2019-11-09 13:43:09 +00:00
Clement Courbet	50cdd56beb	[llvm-exegesis][NFC] Remove extra `llvm::` qualifications. Summary: Second patch: in the lib. Reviewers: gchatelet Subscribers: nemanjai, tschuett, MaskRay, mgrang, jsji, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D68692 llvm-svn: 374158	2019-10-09 11:58:42 +00:00
Clement Courbet	2cd0f28959	[llvm-exegesis] Add options to SnippetGenerator. Summary: This adds a `-max-configs-per-opcode` option to limit the number of configs per opcode. Reviewers: gchatelet Subscribers: tschuett, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D68642 llvm-svn: 374054	2019-10-08 14:30:24 +00:00
Clement Courbet	03a3d29541	[llvm-exegesis][NFC] Move BenchmarkFailure to own file. Summary: And rename to exegesis::Failure, as it's used everytwhere. Reviewers: gchatelet Subscribers: tschuett, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D68217 llvm-svn: 373209	2019-09-30 13:53:50 +00:00
Clement Courbet	3e13816be2	[llvm-exegesis][NFC] Refactor snippet file reading out of tool main. Summary: Add unit tests. Reviewers: gchatelet Subscribers: mgorny, tschuett, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D68212 llvm-svn: 373202	2019-09-30 12:50:25 +00:00
Clement Courbet	9431b72ce9	[llvm-exegesis] Add loop mode for repeating the snippet. Summary: Before this change the Executable function was made by duplicating the snippet. This change adds a --repetion-mode={loop\|duplicate} flag that allows choosing between this behaviour and wrapping the snippet instructions in a loop. The new mode can help measurements when the snippet fits in the DSB by short-cirtcuiting decoding. The loop adds a dec + jmp to the measurements, but since these are not part of the critical path, they execute in parallel with the measured code and do not impact measurements in practice. Overview of the change: - New SnippetRepetitor abstraction that handles repeating the snippet. The assembler delegates repeating the instructions to this class. - ExegesisTarget learns how to decrement loop counter and jump. - Some refactoring of the assembler into FunctionFiller/BasicBlockFiller. Reviewers: gchatelet Subscribers: mgorny, tschuett, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D68125 llvm-svn: 373083	2019-09-27 12:56:24 +00:00
Sylvestre Ledru	375297f38f	fix a typo unavaliable=>unavailable llvm-svn: 362878	2019-06-08 15:07:55 +00:00
Clement Courbet	b9274f2694	[llvm-exegesis] Move native target initialization code to a separate file. Summary: This helps building internal tools on top of the library. Reviewers: gchatelet Subscribers: tschuett, llvm-commits, bdb, ondrasej Tags: #llvm Differential Revision: https://reviews.llvm.org/D62239 llvm-svn: 361385	2019-05-22 13:50:16 +00:00
Roman Lebedev	eb1a156d7f	[llvm-exegesis] benchmarkMain(): less cryptic error if built w/o libpfm Wanted to check if inablility to measure latency of CMOV32rm is a regression from D60041 / D60138, but unable to do that because the llvm-exegesis-{8,9} from debian sid fails with that cryptic, unhelpful error. I suspect this will be a better error. llvm-svn: 357900	2019-04-08 10:50:31 +00:00
Guillaume Chatelet	848df5b509	Add an option do not dump the generated object on disk Reviewers: courbet Subscribers: llvm-commits, bdb Tags: #llvm Differential Revision: https://reviews.llvm.org/D60317 llvm-svn: 357769	2019-04-05 15:18:59 +00:00
Roman Lebedev	c2423fe689	[llvm-exegesis] Introduce a 'naive' clustering algorithm (PR40880) Summary: This is an alternative to D59539. Let's suppose we have measured 4 different opcodes, and got: `0.5`, `1.0`, `1.5`, `2.0`. Let's suppose we are using `-analysis-clustering-epsilon=0.5`. By default now we will start processing the `0.5` point, find that `1.0` is it's neighbor, add them to a new cluster. Then we will notice that `1.5` is a neighbor of `1.0` and add it to that same cluster. Then we will notice that `2.0` is a neighbor of `1.5` and add it to that same cluster. So all these points ended up in the same cluster. This may or may not be a correct implementation of dbscan clustering algorithm. But this is rather horribly broken for the reasons of comparing the clusters with the LLVM sched data. Let's suppose all those opcodes are currently in the same sched cluster. If i specify `-analysis-inconsistency-epsilon=0.5`, then no matter the LLVM values this cluster will never match the LLVM values, and thus this cluster will always be displayed as inconsistent. The solution is obviously to split off some of these opcodes into different sched cluster. But how do i do that? Out of 4 opcodes displayed in the inconsistency report, which ones are the "bad ones"? Which ones are the most different from the checked-in data? I'd need to go in to the `.yaml` and look it up manually. The trivial solution is to, when creating clusters, don't use the full dbscan algorithm, but instead "pick some unclustered point, pick all unclustered points that are it's neighbor, put them all into a new cluster, repeat". And just so as it happens, we can arrive at that algorithm by not performing the "add neighbors of a neighbor to the cluster" step. But that won't work well once we teach analyze mode to operate in on-1D mode (i.e. on more than a single measurement type at a time), because the clustering would depend on the order of the measurements. Instead, let's just create a single cluster per opcode, and put all the points of that opcode into said cluster. And simultaneously check that every point in that cluster is a neighbor of every other point in the cluster, and if they are not, the cluster (==opcode) is unstable. This is //yet another// step to bring me closer to being able to continue cleanup of bdver2 sched model.. Fixes [[ https://bugs.llvm.org/show_bug.cgi?id=40880 \| PR40880 ]]. Reviewers: courbet, gchatelet Reviewed By: courbet Subscribers: tschuett, jdoerfert, RKSimon, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D59820 llvm-svn: 357152	2019-03-28 08:55:01 +00:00
Roman Lebedev	23629385f1	[llvm-exegesis] Separate tool options into three categories. Results in much nicer -help output: ``` $ ./bin/llvm-exegesis -help USAGE: llvm-exegesis [options] OPTIONS: Color Options: -color - Use colors in output (default=autodetect) General options: -enable-cse-in-irtranslator - Should enable CSE in irtranslator -enable-cse-in-legalizer - Should enable CSE in Legalizer Generic Options: -help - Display available options (-help-hidden for more) -help-list - Display list of available options (-help-list-hidden for more) -version - Display the version of this program llvm-exegesis analysis options: -analysis-clustering-epsilon=<number> - dbscan epsilon for benchmark point clustering -analysis-clusters-output-file=<string> - -analysis-display-unstable-clusters - if there is more than one benchmark for an opcode, said benchmarks may end up not being clustered into the same cluster if the measured performance characteristics are different. by default all such opcodes are filtered out. this flag will instead show only such unstable opcodes -analysis-inconsistencies-output-file=<string> - -analysis-inconsistency-epsilon=<number> - epsilon for detection of when the cluster is different from the LLVM schedule profile values -analysis-numpoints=<uint> - minimum number of points in an analysis cluster llvm-exegesis benchmark options: -ignore-invalid-sched-class - ignore instructions that do not define a sched class -mode=<value> - the mode to run =latency - Instruction Latency =inverse_throughput - Instruction Inverse Throughput =uops - Uop Decomposition =analysis - Analysis -num-repetitions=<uint> - number of time to repeat the asm snippet -opcode-index=<int> - opcode to measure, by index -opcode-name=<string> - comma-separated list of opcodes to measure, by name -snippets-file=<string> - code snippets to measure llvm-exegesis options: -benchmarks-file=<string> - File to read (analysis mode) or write (latency/uops/inverse_throughput modes) benchmark results. “-” uses stdin/stdout. -mcpu=<string> - cpu name to use for pfm counters, leave empty to autodetect ``` llvm-svn: 356364	2019-03-18 11:32:37 +00:00
Roman Lebedev	542e5d7bb5	[llvm-exegesis] Split Epsilon param into two (PR40787) Summary: This eps param is used for two distinct things: * initial point clusterization * checking clusters against the llvm values What if one wants to only look at highly different clusters, without changing the clustering itself? In particular, this helps to weed out noisy measurements (since the clusterization epsilon is still small, so there is a better chance that noisy measurements from the same opcode will go into different clusters) By splitting it into two params it is now possible. This is nearly-free performance-wise: Old: ``` $ perf stat -r 25 ./bin/llvm-exegesis -mode=analysis -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-latency-1.yaml -analysis-inconsistencies-output-file=/tmp/clusters-old.html no exegesis target for x86_64-unknown-linux-gnu, using default Parsed 10099 benchmark points Printing sched class consistency analysis results to file '/tmp/clusters-old.html' ... Performance counter stats for './bin/llvm-exegesis -mode=analysis -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-latency-1.yaml -analysis-inconsistencies-output-file=/tmp/clusters-old.html' (25 runs): 390.01 msec task-clock # 0.998 CPUs utilized ( +- 0.25% ) 12 context-switches # 31.735 M/sec ( +- 27.38% ) 0 cpu-migrations # 0.000 K/sec 4745 page-faults # 12183.732 M/sec ( +- 0.54% ) 1562711900 cycles # 4012303.327 GHz ( +- 0.24% ) (82.90%) 185567822 stalled-cycles-frontend # 11.87% frontend cycles idle ( +- 0.52% ) (83.30%) 392106234 stalled-cycles-backend # 25.09% backend cycles idle ( +- 1.31% ) (33.79%) 1839236666 instructions # 1.18 insn per cycle # 0.21 stalled cycles per insn ( +- 0.15% ) (50.37%) 407035764 branches # 1045074878.710 M/sec ( +- 0.12% ) (66.80%) 10896459 branch-misses # 2.68% of all branches ( +- 0.17% ) (83.20%) 0.390629 +- 0.000972 seconds time elapsed ( +- 0.25% ) ``` ``` $ perf stat -r 9 ./bin/llvm-exegesis -mode=analysis -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-latency.yml -analysis-inconsistencies-output-file=/tmp/clusters-old.html no exegesis target for x86_64-unknown-linux-gnu, using default Parsed 50572 benchmark points Printing sched class consistency analysis results to file '/tmp/clusters-old.html' ... Performance counter stats for './bin/llvm-exegesis -mode=analysis -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-latency.yml -analysis-inconsistencies-output-file=/tmp/clusters-old.html' (9 runs): 6803.36 msec task-clock # 0.999 CPUs utilized ( +- 0.96% ) 262 context-switches # 38.546 M/sec ( +- 23.06% ) 0 cpu-migrations # 0.065 M/sec ( +- 76.03% ) 13287 page-faults # 1953.206 M/sec ( +- 0.32% ) 27252537904 cycles # 4006024.257 GHz ( +- 0.95% ) (83.31%) 1496314935 stalled-cycles-frontend # 5.49% frontend cycles idle ( +- 0.97% ) (83.32%) 16128404524 stalled-cycles-backend # 59.18% backend cycles idle ( +- 0.30% ) (33.37%) 17611143370 instructions # 0.65 insn per cycle # 0.92 stalled cycles per insn ( +- 0.05% ) (50.04%) 3894906599 branches # 572537147.437 M/sec ( +- 0.03% ) (66.69%) 116314514 branch-misses # 2.99% of all branches ( +- 0.20% ) (83.35%) 6.8118 +- 0.0689 seconds time elapsed ( +- 1.01%) ``` New: ``` $ perf stat -r 25 ./bin/llvm-exegesis -mode=analysis -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-latency-1.yaml -analysis-inconsistencies-output-file=/tmp/clusters-new.html no exegesis target for x86_64-unknown-linux-gnu, using default Parsed 10099 benchmark points Printing sched class consistency analysis results to file '/tmp/clusters-new.html' ... Performance counter stats for './bin/llvm-exegesis -mode=analysis -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-latency-1.yaml -analysis-inconsistencies-output-file=/tmp/clusters-new.html' (25 runs): 400.14 msec task-clock # 0.998 CPUs utilized ( +- 0.66% ) 12 context-switches # 29.429 M/sec ( +- 25.95% ) 0 cpu-migrations # 0.100 M/sec ( +-100.00% ) 4714 page-faults # 11796.496 M/sec ( +- 0.55% ) 1603131306 cycles # 4011840.105 GHz ( +- 0.66% ) (82.85%) 199538509 stalled-cycles-frontend # 12.45% frontend cycles idle ( +- 2.40% ) (83.10%) 402249109 stalled-cycles-backend # 25.09% backend cycles idle ( +- 1.19% ) (34.05%) 1847783963 instructions # 1.15 insn per cycle # 0.22 stalled cycles per insn ( +- 0.18% ) (50.64%) 407162722 branches # 1018925730.631 M/sec ( +- 0.12% ) (67.02%) 10932779 branch-misses # 2.69% of all branches ( +- 0.51% ) (83.28%) 0.40077 +- 0.00267 seconds time elapsed ( +- 0.67% ) lebedevri@pini-pini:/build/llvm-build-Clang-release$ perf stat -r 9 ./bin/llvm-exegesis -mode=analysis -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-latency.yml -analysis-inconsistencies-output-file=/tmp/clusters-new.html no exegesis target for x86_64-unknown-linux-gnu, using default Parsed 50572 benchmark points Printing sched class consistency analysis results to file '/tmp/clusters-new.html' ... Performance counter stats for './bin/llvm-exegesis -mode=analysis -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-latency.yml -analysis-inconsistencies-output-file=/tmp/clusters-new.html' (9 runs): 6947.79 msec task-clock # 1.000 CPUs utilized ( +- 0.90% ) 217 context-switches # 31.236 M/sec ( +- 36.16% ) 1 cpu-migrations # 0.096 M/sec ( +- 50.00% ) 13258 page-faults # 1908.389 M/sec ( +- 0.34% ) 27830796523 cycles # 4006032.286 GHz ( +- 0.89% ) (83.30%) 1504554006 stalled-cycles-frontend # 5.41% frontend cycles idle ( +- 2.10% ) (83.32%) 16716574843 stalled-cycles-backend # 60.07% backend cycles idle ( +- 0.65% ) (33.38%) 17755545931 instructions # 0.64 insn per cycle # 0.94 stalled cycles per insn ( +- 0.09% ) (50.04%) 3897255686 branches # 560980426.597 M/sec ( +- 0.06% ) (66.70%) 117045395 branch-misses # 3.00% of all branches ( +- 0.47% ) (83.34%) 6.9507 +- 0.0627 seconds time elapsed ( +- 0.90% ) ``` I.e. it's +2.6% slowdown for one whole sweep, or +2% for 5 whole sweeps. Within noise i'd say. Should help with [[ https://bugs.llvm.org/show_bug.cgi?id=40787 \| PR40787 ]]. Reviewers: courbet, gchatelet Reviewed By: courbet Subscribers: tschuett, RKSimon, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D58476 llvm-svn: 354767	2019-02-25 09:36:12 +00:00
Roman Lebedev	69716394f3	[llvm-exegesis] Opcode stabilization / reclusterization (PR40715) Summary: Given an instruction `Opcode`, we can make benchmarks (measurements) of the instruction characteristics/performance. Then, to facilitate further analysis we group the benchmarks with similar characteristics into clusters. Now, this is all not entirely deterministic. Some instructions have variable characteristics, depending on their arguments. And thus, if we do several benchmarks of the same instruction `Opcode`, we may end up with different performance characteristics measurements. And when we then do clustering, these several benchmarks of the same instruction `Opcode` may end up being clustered into different clusters. This is not great for further analysis. We shall find every `Opcode` with benchmarks not in just one cluster, and move all the benchmarks of said `Opcode` into one new unstable cluster per `Opcode`. I have solved this by making `ClusterId` a bit field, adding a `IsUnstable` bit, and introducing `-analysis-display-unstable-clusters` switch to toggle between displaying stable-only clusters and unstable-only clusters. The reclusterization is deterministically stable, produces identical reports between runs. (Or at least that is what i'm seeing, maybe it isn't) Timings/comparisons: old (current trunk/head) {F8303582} ``` $ perf stat -r 25 ./bin/llvm-exegesis -mode=analysis -analysis-epsilon=0.5 -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-inverse_throughput.yaml -analysis-inconsistencies-output-file=/tmp/clusters-old.html no exegesis target for x86_64-unknown-linux-gnu, using default Parsed 43970 benchmark points Printing sched class consistency analysis results to file '/tmp/clusters-old.html' ... no exegesis target for x86_64-unknown-linux-gnu, using default Parsed 43970 benchmark points Printing sched class consistency analysis results to file '/tmp/clusters-old.html' Performance counter stats for './bin/llvm-exegesis -mode=analysis -analysis-epsilon=0.5 -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-inverse_throughput.yaml -analysis-inconsistencies-output-file=/tmp/clusters-old.html' (25 runs): 6624.73 msec task-clock # 0.999 CPUs utilized ( +- 0.53% ) 172 context-switches # 25.965 M/sec ( +- 29.89% ) 0 cpu-migrations # 0.042 M/sec ( +- 56.54% ) 31073 page-faults # 4690.754 M/sec ( +- 0.08% ) 26538711696 cycles # 4006230.292 GHz ( +- 0.53% ) (83.31%) 2017496807 stalled-cycles-frontend # 7.60% frontend cycles idle ( +- 0.93% ) (83.32%) 13403650062 stalled-cycles-backend # 50.51% backend cycles idle ( +- 0.33% ) (33.37%) 19770706799 instructions # 0.74 insn per cycle # 0.68 stalled cycles per insn ( +- 0.04% ) (50.04%) 4419821812 branches # 667207369.714 M/sec ( +- 0.03% ) (66.69%) 121741669 branch-misses # 2.75% of all branches ( +- 0.28% ) (83.34%) 6.6283 +- 0.0358 seconds time elapsed ( +- 0.54% ) ``` patch, with reclustering but without filtering (i.e. outputting all the stable and unstable clusters) {F8303586} ``` $ perf stat -r 25 ./bin/llvm-exegesis -mode=analysis -analysis-epsilon=0.5 -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-inverse_throughput.yaml -analysis-inconsistencies-output-file=/tmp/clusters-new-all.html no exegesis target for x86_64-unknown-linux-gnu, using default Parsed 43970 benchmark points Printing sched class consistency analysis results to file '/tmp/clusters-new-all.html' ... no exegesis target for x86_64-unknown-linux-gnu, using default Parsed 43970 benchmark points Printing sched class consistency analysis results to file '/tmp/clusters-new-all.html' Performance counter stats for './bin/llvm-exegesis -mode=analysis -analysis-epsilon=0.5 -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-inverse_throughput.yaml -analysis-inconsistencies-output-file=/tmp/clusters-new-all.html' (25 runs): 6475.29 msec task-clock # 0.999 CPUs utilized ( +- 0.31% ) 213 context-switches # 32.952 M/sec ( +- 23.81% ) 1 cpu-migrations # 0.130 M/sec ( +- 43.84% ) 31287 page-faults # 4832.057 M/sec ( +- 0.08% ) 25939086577 cycles # 4006160.279 GHz ( +- 0.31% ) (83.31%) 1958812858 stalled-cycles-frontend # 7.55% frontend cycles idle ( +- 0.68% ) (83.32%) 13218961512 stalled-cycles-backend # 50.96% backend cycles idle ( +- 0.29% ) (33.37%) 19752995402 instructions # 0.76 insn per cycle # 0.67 stalled cycles per insn ( +- 0.04% ) (50.04%) 4417079244 branches # 682195472.305 M/sec ( +- 0.03% ) (66.70%) 121510065 branch-misses # 2.75% of all branches ( +- 0.19% ) (83.34%) 6.4832 +- 0.0229 seconds time elapsed ( +- 0.35% ) ``` Funnily, this measurement shows that said reclustering actually improved performance. patch, with reclustering, only the stable clusters {F8303594} ``` $ perf stat -r 25 ./bin/llvm-exegesis -mode=analysis -analysis-epsilon=0.5 -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-inverse_throughput.yaml -analysis-inconsistencies-output-file=/tmp/clusters-new-stable.html no exegesis target for x86_64-unknown-linux-gnu, using default Parsed 43970 benchmark points Printing sched class consistency analysis results to file '/tmp/clusters-new-stable.html' ... no exegesis target for x86_64-unknown-linux-gnu, using default Parsed 43970 benchmark points Printing sched class consistency analysis results to file '/tmp/clusters-new-stable.html' Performance counter stats for './bin/llvm-exegesis -mode=analysis -analysis-epsilon=0.5 -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-inverse_throughput.yaml -analysis-inconsistencies-output-file=/tmp/clusters-new-stable.html' (25 runs): 6387.71 msec task-clock # 0.999 CPUs utilized ( +- 0.13% ) 133 context-switches # 20.792 M/sec ( +- 23.39% ) 0 cpu-migrations # 0.063 M/sec ( +- 61.24% ) 31318 page-faults # 4903.256 M/sec ( +- 0.08% ) 25591984967 cycles # 4006786.266 GHz ( +- 0.13% ) (83.31%) 1881234904 stalled-cycles-frontend # 7.35% frontend cycles idle ( +- 0.25% ) (83.33%) 13209749965 stalled-cycles-backend # 51.62% backend cycles idle ( +- 0.16% ) (33.36%) 19767554347 instructions # 0.77 insn per cycle # 0.67 stalled cycles per insn ( +- 0.04% ) (50.03%) 4417480305 branches # 691618858.046 M/sec ( +- 0.03% ) (66.68%) 118676358 branch-misses # 2.69% of all branches ( +- 0.07% ) (83.33%) 6.3954 +- 0.0118 seconds time elapsed ( +- 0.18% ) ``` Performance improved even further?! Makes sense i guess, less clusters to print. patch, with reclustering, only the unstable clusters {F8303601} ``` $ perf stat -r 25 ./bin/llvm-exegesis -mode=analysis -analysis-epsilon=0.5 -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-inverse_throughput.yaml -analysis-inconsistencies-output-file=/tmp/clusters-new-unstable.html -analysis-display-unstable-clusters no exegesis target for x86_64-unknown-linux-gnu, using default Parsed 43970 benchmark points Printing sched class consistency analysis results to file '/tmp/clusters-new-unstable.html' ... no exegesis target for x86_64-unknown-linux-gnu, using default Parsed 43970 benchmark points Printing sched class consistency analysis results to file '/tmp/clusters-new-unstable.html' Performance counter stats for './bin/llvm-exegesis -mode=analysis -analysis-epsilon=0.5 -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-inverse_throughput.yaml -analysis-inconsistencies-output-file=/tmp/clusters-new-unstable.html -analysis-display-unstable-clusters' (25 runs): 6124.96 msec task-clock # 1.000 CPUs utilized ( +- 0.20% ) 194 context-switches # 31.709 M/sec ( +- 20.46% ) 0 cpu-migrations # 0.039 M/sec ( +- 49.77% ) 31413 page-faults # 5129.261 M/sec ( +- 0.06% ) 24536794267 cycles # 4006425.858 GHz ( +- 0.19% ) (83.31%) 1676085087 stalled-cycles-frontend # 6.83% frontend cycles idle ( +- 0.46% ) (83.32%) 13035595603 stalled-cycles-backend # 53.13% backend cycles idle ( +- 0.16% ) (33.36%) 18260877653 instructions # 0.74 insn per cycle # 0.71 stalled cycles per insn ( +- 0.05% ) (50.03%) 4112411983 branches # 671484364.603 M/sec ( +- 0.03% ) (66.68%) 114066929 branch-misses # 2.77% of all branches ( +- 0.11% ) (83.32%) 6.1278 +- 0.0121 seconds time elapsed ( +- 0.20% ) ``` This tells us that the actual `-analysis-inconsistencies-output-file=` outputting only takes ~0.4 sec for 43970 benchmark points (3 whole sweeps) (Also, wow this is fast, it used to take several minutes originally) Fixes [[ https://bugs.llvm.org/show_bug.cgi?id=40715 \| PR40715 ]]. Reviewers: courbet, gchatelet Reviewed By: courbet Subscribers: tschuett, jdoerfert, llvm-commits, RKSimon Tags: #llvm Differential Revision: https://reviews.llvm.org/D58355 llvm-svn: 354441	2019-02-20 09:14:04 +00:00
Andrea Di Biagio	edbf06a767	[AsmPrinter] Remove hidden flag -print-schedule. This patch removes hidden codegen flag -print-schedule effectively reverting the logic originally committed as r300311 (https://llvm.org/viewvc/llvm-project?view=revision&revision=300311). Flag -print-schedule was originally introduced by r300311 to address PR32216 (https://bugs.llvm.org/show_bug.cgi?id=32216). That bug was about adding "Better testing of schedule model instruction latencies/throughputs". These days, we can use llvm-mca to test scheduling models. So there is no longer a need for flag -print-schedule in LLVM. The main use case for PR32216 is now addressed by llvm-mca. Flag -print-schedule is mainly used for debugging purposes, and it is only actually used by x86 specific tests. We already have extensive (latency and throughput) tests under "test/tools/llvm-mca" for X86 processor models. That means, most (if not all) existing -print-schedule tests for X86 are redundant. When flag -print-schedule was first added to LLVM, several files had to be modified; a few APIs gained new arguments (see for example method MCAsmStreamer::EmitInstruction), and MCSubtargetInfo/TargetSubtargetInfo gained a couple of getSchedInfoStr() methods. Method getSchedInfoStr() had to originally work for both MCInst and MachineInstr. The original implmentation of getSchedInfoStr() introduced a subtle layering violation (reported as PR37160 and then fixed/worked-around by r330615). In retrospect, that new API could have been designed more optimally. We can always query MCSchedModel to get the latency and throughput. More importantly, the "sched-info" string should not have been generated by the subtarget. Note, r317782 fixed an issue where "print-schedule" didn't work very well in the presence of inline assembly. That commit is also reverted by this change. Differential Revision: https://reviews.llvm.org/D57244 llvm-svn: 353043	2019-02-04 12:51:26 +00:00
Roman Lebedev	21193f4b7e	[llvm-exegesis] Don't default to running&dumping all analyses to '-' Summary: Up until the point i have looked in the source, i didn't even understood that i can disable 'cluster' output. I have always silenced it via ` &> /dev/null`. (And hoped it wasn't contributing much of the run time.) While i expect that it has it's use-cases i never once needed it so far. If i forget to silence it, console is completely flooded with that output. How about not expecting users to opt-out of analyses, but to explicitly specify the analyses that should be performed? Reviewers: courbet, gchatelet Reviewed By: courbet Subscribers: tschuett, RKSimon, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D57648 llvm-svn: 353021	2019-02-04 09:12:08 +00:00
Clement Courbet	362653f7af	[llvm-exegesis] Add throughput mode. Summary: This just uses the latency benchmark runner on the parallel uops snippet generator. Fixes PR37698. Reviewers: gchatelet Subscribers: tschuett, RKSimon, llvm-commits Differential Revision: https://reviews.llvm.org/D57000 llvm-svn: 352632	2019-01-30 16:02:20 +00:00
Chandler Carruth	2946cd7010	Update the file headers across all of the LLVM projects in the monorepo to reflect the new license. We understand that people may be surprised that we're moving the header entirely to discuss the new license. We checked this carefully with the Foundation's lawyer and we believe this is the correct approach. Essentially, all code in the project is now made available by the LLVM project under our new license, so you will see that the license headers include that license only. Some of our contributors have contributed code under our old license, and accordingly, we have retained a copy of our old license notice in the top-level files in each project and repository. llvm-svn: 351636	2019-01-19 08:50:56 +00:00
Clement Courbet	0d79aaf1a7	Revert "[llvm-exegesis] Add a snippet generator to generate snippets to compute ROB sizes." This reverts accidental commit rL346394. llvm-svn: 346398	2018-11-08 12:09:45 +00:00
Clement Courbet	c0950ae990	[llvm-exegesis] Add a snippet generator to generate snippets to compute ROB sizes. llvm-svn: 346394	2018-11-08 11:45:14 +00:00
Clement Courbet	41c8af3924	[MCSched] Bind PFM Counters to the CPUs instead of the SchedModel. Summary: The pfm counters are now in the ExegesisTarget rather than the MCSchedModel (PR39165). This also compresses the pfm counter tables (PR37068). Reviewers: RKSimon, gchatelet Subscribers: mgrang, llvm-commits Differential Revision: https://reviews.llvm.org/D52932 llvm-svn: 345243	2018-10-25 07:44:01 +00:00
Guillaume Chatelet	da11b85606	[llvm-exegesis] Implements a cache of Instruction objects. llvm-svn: 345130	2018-10-24 11:55:06 +00:00
Fangrui Song	32401afd8c	[llvm-exegesis] Move namespace exegesis inside llvm:: Summary: This allows simplifying references of llvm::foo with foo when the needs come in the future. Reviewers: courbet, gchatelet Reviewed By: gchatelet Subscribers: javed.absar, tschuett, llvm-commits Differential Revision: https://reviews.llvm.org/D53455 llvm-svn: 344922	2018-10-22 17:10:47 +00:00
Guillaume Chatelet	6a208e8c5f	[llvm-exegesis] Fix off by one error llvm-svn: 344731	2018-10-18 08:20:50 +00:00
Clement Courbet	f973c2df9d	[llvm-exegesis] Allow measuring several instructions in a single run. Summary: We try to recover gracefully on instructions that would crash the program. This includes some refactoring of runMeasurement() implementations. Reviewers: gchatelet Subscribers: tschuett, llvm-commits Differential Revision: https://reviews.llvm.org/D53371 llvm-svn: 344695	2018-10-17 15:04:15 +00:00
Guillaume Chatelet	9b59238822	[llvm-exegesis][NFC] Pass Instruction instead of bare Opcode llvm-svn: 344145	2018-10-10 14:57:32 +00:00
Fangrui Song	22438a844c	[llvm-exegesis] Remove unused headers and fix naming issues Reviewers: courbet Reviewed By: courbet Subscribers: tschuett, llvm-commits Differential Revision: https://reviews.llvm.org/D52565 llvm-svn: 343177	2018-09-27 06:10:15 +00:00
Guillaume Chatelet	55ad087a4c	[llvm-exegesis][NFC] Rewrite of the YAML serialization. Summary: This is a NFC in preparation of exporting the initial registers as part of the YAML dump Reviewers: courbet Reviewed By: courbet Subscribers: mgorny, tschuett, llvm-commits Differential Revision: https://reviews.llvm.org/D52427 llvm-svn: 342967	2018-09-25 12:18:08 +00:00
Clement Courbet	78b2e73d15	[llvm-exegesis] Allow benchmarking arbitrary code snippets. Summary: This is a step towards fixing PR38048. Note that right now the measurements are given per instruction. We'll need to give measurements a per code snippet and update the analysis (PR38731). Reviewers: gchatelet Subscribers: tschuett, llvm-commits Differential Revision: https://reviews.llvm.org/D52041 llvm-svn: 342947	2018-09-25 07:31:44 +00:00
Clement Courbet	7958735e45	[llvm-exegesis][NFC] Remove dead parameter. llvm-svn: 342118	2018-09-13 08:06:29 +00:00
Clement Courbet	d939f6d013	[llvm-exegesis][NFC] Split BenchmarkRunner class Summary: The snippet-generation part goes to the SnippetGenerator class. This will allow benchmarking arbitrary code (see PR38437). Reviewers: gchatelet Subscribers: mgorny, tschuett, llvm-commits Differential Revision: https://reviews.llvm.org/D51979 llvm-svn: 342117	2018-09-13 07:40:53 +00:00
John Brawn	8fc5ec78d5	[llvm-exegesis] Delegate the decision of cycle counter name to the target Currently the cycle counter is taken from the subtarget schedule model, which isn't any use if the subtarget doesn't have one. Delegate the decision to the target benchmark runner, as it may know better what to do in that case, with the default being the current behaviour. Differential Revision: https://reviews.llvm.org/D48779 llvm-svn: 336099	2018-07-02 13:14:49 +00:00
Clement Courbet	4860b98443	[llvm-exegesis] Get the BenchmarkRunner from the ExegesisTarget. Summary: This allows targets to override code generation for some instructions. As an example of override, this also moves ad-hoc instruction filtering for X86 into the X86 ExegesisTarget. Reviewers: gchatelet Subscribers: mgorny, tschuett, llvm-commits Differential Revision: https://reviews.llvm.org/D48587 llvm-svn: 335582	2018-06-26 08:49:30 +00:00
Clement Courbet	44b4c54e26	Re-land r335038 "[llvm-exegesis] A mechanism to add target-specific functionality."" Fix typo: LLVM_NATIVE_ARCH -> LLVM_EXEGESIS_NATIVE_ARCH. llvm-svn: 335041	2018-06-19 11:28:59 +00:00
Clement Courbet	46751785ee	Revert r335038 "[llvm-exegesis] A mechanism to add target-specific functionality." Breaks buildbots. llvm-svn: 335040	2018-06-19 10:54:12 +00:00
Clement Courbet	6780b5f97d	[llvm-exegesis] A mechanism to add target-specific functionality. Summary: This is a step towards implementing memory operands and X87. Reviewers: gchatelet Subscribers: mgorny, tschuett, llvm-commits Differential Revision: https://reviews.llvm.org/D48210 llvm-svn: 335038	2018-06-19 10:39:50 +00:00
Clement Courbet	e752fd65e8	[llvm-exegesis] Optionally ignore instructions without a sched class. Summary: See PR37602. Reviewers: RKSimon Subscribers: llvm-commits, tschuett Differential Revision: https://reviews.llvm.org/D48267 llvm-svn: 334932	2018-06-18 11:27:47 +00:00
Clement Courbet	4273e1e828	[llvm-exegesis] Print the whole snippet in analysis. Summary: On hover, the whole asm snippet is displayed, including operands. This requires the actual assembly output instead of just the MCInsts: This is because some pseudo-instructions get lowered to actual target instructions during codegen (e.g. ABS_Fp32 -> SSE or X87). Reviewers: gchatelet Subscribers: mgorny, tschuett, llvm-commits Differential Revision: https://reviews.llvm.org/D48164 llvm-svn: 334805	2018-06-15 07:30:45 +00:00
Guillaume Chatelet	015b3e5be4	[llvm-exegesis] Fix unhandled error. Summary: Fixing an unhandled error when calling writeYaml. Reviewers: courbet Subscribers: tschuett, llvm-commits Differential Revision: https://reviews.llvm.org/D48022 llvm-svn: 334405	2018-06-11 14:10:10 +00:00
Guillaume Chatelet	6416592909	[llvm-exegesis] Program should succeed if benchmark returns StringError. Summary: Fix for https://bugs.llvm.org/show_bug.cgi?id=37759. Reviewers: courbet Subscribers: tschuett, llvm-commits Differential Revision: https://reviews.llvm.org/D48004 llvm-svn: 334395	2018-06-11 09:18:01 +00:00
Zachary Turner	1f67a3cba9	[FileSystem] Split up the OpenFlags enumeration. This breaks the OpenFlags enumeration into two separate enumerations: OpenFlags and CreationDisposition. The first controls the behavior of the API depending on whether or not the target file already exists, and is not a flags-based enum. The second controls more flags-like values. This yields a more easy to understand API, while also allowing flags to be passed to the openForRead api, where most of the values didn't make sense before. This also makes the apis more testable as it becomes easy to enumerate all the configurations which make sense, so I've added many new tests to exercise all the different values. llvm-svn: 334221	2018-06-07 19:58:58 +00:00
Guillaume Chatelet	b4f1582ac5	[llvm-exegesis] Make BenchmarkRunner handle multiple configurations. Summary: BenchmarkRunner subclasses can now create many configurations - although this patch still generates one. Reviewers: courbet Subscribers: tschuett, llvm-commits Differential Revision: https://reviews.llvm.org/D47877 llvm-svn: 334197	2018-06-07 14:00:29 +00:00
Guillaume Chatelet	8c91d4cb04	[llvm-exegesis] Improve error reporting. Summary: BenchmarkResult IO functions now return an Error or Expected so caller can deal take proper action. Reviewers: courbet Subscribers: tschuett, llvm-commits Differential Revision: https://reviews.llvm.org/D47868 llvm-svn: 334167	2018-06-07 07:51:16 +00:00
Clement Courbet	53d35d2dc4	[llvm-exegesis] Add instructions to BenchmarkResult Key. We want llvm-exegesis to explore instructions (effect of initial register values, effect of operand selection). To enable this a BenchmarkResult muststore all the relevant data in its key. This patch starts adding such data. Here we simply allow to store the generated instructions, following patches will add operands and initial values for registers. https://reviews.llvm.org/D47764 Authored by: Guilluame Chatelet llvm-svn: 334008	2018-06-05 10:56:19 +00:00

1 2

66 Commits