53 Commits

Author SHA1 Message Date
Sjoerd Meijer
6720ce75f6
[Docs][llvm-exegesis] Clarify AArch64 support (#114989)
Claiming AArch64 support for llvm-exegesis is a bit of a stretch in my
opinion as only a couple of opcodes with GPR64 operands will work for
snippet benchmarking, so I propose to clarify that AArch64 support is
very experimental. Also added some clarifications about its libpfm4
dependency.
2024-11-07 10:48:52 +00:00
Aiden Grossman
1d1186de34
[llvm-exegesis] Add loop-register snippet annotation (#82873)
This patch adds a LLVM-EXEGESIS-LOOP-REGISTER snippet annotation which
allows a user to specify the register to use for the loop counter in the
loop repetition mode. This allows for executing snippets that don't work
with the default value (currently R8 on X86).
2024-02-27 12:28:25 -08:00
Aiden Grossman
0c1f62073f
[Docs][llvm-exegesis] Add documentation on validation counters option (#82132)
This patch documents the --validation-counter flag.
2024-02-19 01:39:01 -08:00
Aiden Grossman
415bf200a7
[llvm-exegesis] Replace --num-repetitions with --min-instructions (#77153)
This patch replaces --num-repetitions with --min-instructions to make it
more clear that the value refers to the minimum number of instructions
in the final assembled snippet rather than the number of repetitions of
the snippet. This patch also refactors some llvm-exegesis internal
variable names to reflect the name change.

Fixes #76890.
2024-02-01 01:58:27 -08:00
Aiden Grossman
d8b61d7168
[llvm-exegesis] Add middle half repetition mode (#77020)
This patch adds two new repetition modes to llvm-exegesis, particularly
loop and duplicate repetition modes of what I am terming the middle half
repetition mode. The middle half repetition mode essentially runs each
measurement twice, one with twice the number of iterations of the other.
These two measurements are then agregated by taking their difference.
This subtracts away any setup/overhead that is unrelated to the code in
the snippet, providing more accurate results.

Using this mode on a couple toy examples, I am able to get exact
(integer) throughput values on all of them in contrast to the default
duplicate/loop repetition modes which show a little bit of noise on the
snippet value.
2024-01-30 12:42:35 -08:00
Aiden Grossman
c7c2bbba93 [Docs][llvm-exegesis] Minor adjustments for clarity
This patch makes minor adjustments to the llvm-exegesis docs for
clarity. Particularly, an update is made to the list of snippet
annotations to list the correct number of annotations that was not
updated when the docs were originally updated for the snippet address
annotation. In addition, this patch changes a decimal value for the
snippet memory annotation example for an explicit hex value to emphasize
that the LLVM-EXEGESIS-MEM-DEF annotation takes a hex value for the
memory value.
2023-12-30 09:55:46 -08:00
Aiden Grossman
46fe85446d
[Docs][llvm-exegesis] Add documentation on recently added options (#75408)
This patch adds information on the new LLVM-EXEGESIS-SNIPPET-ADDRESS
annotation and on the --benchmark-repeat-count flag.
2023-12-14 12:07:17 -08:00
Aiden Grossman
d8ab8b95ba [Docs][llvm-exegesis] Fix minor issues in llvm-exegesis docs 2023-11-16 23:41:16 -08:00
Aiden Grossman
43f4e2be89 [Docs][llvm-exegesis] Use double dash long options
Currently the llvm-exegesis docs use a mix of double dash and single
dash options with seemingly no pattern. This patch makes everything
double dash options as it has been suggested that we should be
advertising double dash long options exclusively in the documentation.

Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D157641
2023-08-11 23:17:07 +00:00
Aiden Grossman
3336836dc2 [Docs][llvm-exegesis] Add documentation for memory annotations
Reviewed By: gchatelet

Differential Revision: https://reviews.llvm.org/D151039
2023-07-18 10:23:07 -07:00
Aiden Grossman
05cca8a1bc [Docs][llvm-exegesis] Specify platform support for different modes
llvm-exegesis has both a capture mode and an analysis mode that can be
used independently of each other. This patch makes it clear that
analysis mode will work on other platforms that LLVM supports in the
documentation which was unclear before.

Reviewed By: courbet

Differential Revision: https://reviews.llvm.org/D150536
2023-05-16 03:28:06 +00:00
Aiden Grossman
9fe45fcba7 [Docs][llvm-exegesis] Specify supported platforms and architectures
Currently, there is no documentation on what platforms and architectures
llvm-exegesis is supported on. This patch adds in user-facing
documentation in the CommandGuide about what architectures are supported
as well as developer facing documentation detailing the technical
reasons for why certain platforms are supported and some aren't.

This is a follow-up after discussion in
https://discourse.llvm.org/t/clarification-on-platform-support-for-llvm-exegesis/70206.

Reviewed By: kpdev42

Differential Revision: https://reviews.llvm.org/D149378
2023-05-13 09:05:39 +00:00
Aiden Grossman
c19bc611bd [Docs][llvm-exegesis] Add documentation for --use-dummy-perf-counters
This patch adds in documentation for the --use-dummy-perf-counters
option (introduced in D146301).

Reviewed By: kpdev42

Differential Revision: https://reviews.llvm.org/D147842
2023-04-10 19:09:35 +00:00
Aiden Grossman
8d3a09daca [Docs][llvm-exegesis] Refactor snippet annotations in documentation
Currently, the llvm-exegesis documentation page has all
snippet annotation information under an example. This patch refactors
the annotation documentation to a separate section to make things more
clear and to make adding future annotations easier. This patch also
significantly expands the documentation on the memory scratch space to
which a pointer can be passed through a register as the documentation on
this was quite sparse previously.

Reviewed By: courbet

Differential Revision: https://reviews.llvm.org/D146890
2023-03-27 08:22:25 +00:00
Roman Lebedev
e0ad2af691
[exegesis] "Skip codegen" dry-run mode
While "skip measurements mode" is super useful for test coverage,
i've come to discover it's trade-offs. It still calls back-end
to actually codegen the target assembly, and that is what is taking
80%+ of the time regardless of whether or not we skip the measurements.

On the other hand, just being able to see that exegesis can come up
with a snippet to measure something, is already very useful,
and takes maybe a second for a all-opcode sweep.

Reviewed By: gchatelet

Differential Revision: https://reviews.llvm.org/D140702
2023-01-05 17:47:17 +03:00
Roman Lebedev
6a67b633b9
[exegesis] Analysis: filtering for benchmark results
By default, all benchmark results are analysed, but sometimes it may be useful
to only look at those that to not involve memory, or vice versa. This option
allows to either keep all benchmarks, or filter out (ignore) either all the
ones that do involve memory (involve instructions that may read or write to
memory), or the opposite, to only keep such benchmarks.

Personally, so far i have found the benchmarks that do involve memory
to have dubious results. But the ones that do not involve memory,
are generally actionable. So i would like to have a toggle to declutter results.

Reviewed By: courbet

Differential Revision: https://reviews.llvm.org/D140734
2023-01-04 21:16:11 +03:00
Simon Pilgrim
b723d5a625 [llvm-exegesis][x86] Add option to prevent use of xmm8-xmm15 upper SSE registers
Noticed while trying to use llvm-exegesis to get some accurate capture numbers on some old Atom/Silverment hardware as part of the work with D103695.

These targets' frontends are particularly poor and the use of the xmm8-xmm15 SSE registers results in longer instruction encodings which were affecting the latency/throughput estimates.

Thanks to @lebedev.ri for the --skip-measurements command line argument which made testing much easier!

Differential Revision: https://reviews.llvm.org/D138832
2022-12-07 17:54:09 +00:00
Roman Lebedev
7a76140220
[llvm-exegesis] Dry run mode
Sometimes we only want to ensure that we can produce snippets (all the way
through `SnippetRepetitor`!), but don't care for the execution.
E.g. all of our tests are this way.

I've built LLVM without PFM and removed my CPU from `X86PfmCounters.td`,
and this produces the expected results in that configuration.

Reviewed By: courbet

Differential Revision: https://reviews.llvm.org/D139448
2022-12-07 20:15:43 +03:00
Roman Lebedev
78eaff2ef8
[llvm-exegesis] Loop unrolling for loop snippet repetitor mode
I really needed this, like, factually, yesterday,
when verifying dependency breaking idioms for AMD Zen 3 scheduler model.

Consider the following example:
```
$ ./bin/llvm-exegesis --mode=inverse_throughput --snippets-file=/tmp/snippet.s --num-repetitions=1000000 --repetition-mode=duplicate
Check generated assembly with: /usr/bin/objdump -d /tmp/snippet-4a7e50.o
---
mode:            inverse_throughput
key:
  instructions:
    - 'VPXORYrr YMM0 YMM0 YMM0'
  config:          ''
  register_initial_values: []
cpu_name:        znver3
llvm_triple:     x86_64-unknown-linux-gnu
num_repetitions: 1000000
measurements:
  - { key: inverse_throughput, value: 0.31025, per_snippet_value: 0.31025 }
error:           ''
info:            ''
assembled_snippet: C5FDEFC0C5FDEFC0C5FDEFC0C5FDEFC0C5FDEFC0C5FDEFC0C5FDEFC0C5FDEFC0C5FDEFC0C5FDEFC0C5FDEFC0C5FDEFC0C5FDEFC0C5FDEFC0C5FDEFC0C5FDEFC0C3
...

```
What does it tell us?
So wait, it can only execute ~3 x86 AVX YMM PXOR zero-idioms per cycle?
That doesn't seem right. That's even less than there are pipes supporting this type of op.

Now, second example:
```
$ ./bin/llvm-exegesis --mode=inverse_throughput --snippets-file=/tmp/snippet.s --num-repetitions=1000000 --repetition-mode=loop
Check generated assembly with: /usr/bin/objdump -d /tmp/snippet-2418b5.o
---
mode:            inverse_throughput
key:
  instructions:
    - 'VPXORYrr YMM0 YMM0 YMM0'
  config:          ''
  register_initial_values: []
cpu_name:        znver3
llvm_triple:     x86_64-unknown-linux-gnu
num_repetitions: 1000000
measurements:
  - { key: inverse_throughput, value: 1.00011, per_snippet_value: 1.00011 }
error:           ''
info:            ''
assembled_snippet: 49B80800000000000000C5FDEFC0C5FDEFC04983C0FF75F2C3
...
```
Now that's just worse. Due to the looping, the throughput completely plummeted,
and now we can only do a single instruction/cycle!?

That's not great.
And final example:
```
$ ./bin/llvm-exegesis --mode=inverse_throughput --snippets-file=/tmp/snippet.s --num-repetitions=1000000 --repetition-mode=loop --loop-body-size=1000
Check generated assembly with: /usr/bin/objdump -d /tmp/snippet-c402e2.o
---
mode:            inverse_throughput
key:
  instructions:
    - 'VPXORYrr YMM0 YMM0 YMM0'
  config:          ''
  register_initial_values: []
cpu_name:        znver3
llvm_triple:     x86_64-unknown-linux-gnu
num_repetitions: 1000000
measurements:
  - { key: inverse_throughput, value: 0.167087, per_snippet_value: 0.167087 }
error:           ''
info:            ''
assembled_snippet: 49B80800000000000000C5FDEFC0C5FDEFC04983C0FF75F2C3
...
```

So if we merge the previous two approaches, do duplicate this single-instruction snippet 1000x
(loop-body-size/instruction count in snippet), and run a loop with 1000 iterations
over that duplicated/unrolled snippet, the measured throughput goes through the roof,
up to 5.9 instructions/cycle, which finally tells us that this idiom is zero-cycle!

Reviewed By: courbet

Differential Revision: https://reviews.llvm.org/D102522
2021-05-25 12:08:27 +03:00
Clement Courbet
a098f32a1f [llvm-exegesis][doc] Remove old FIXME.
This was fixed in a previous commit, the previous line in the
documentation explains how to proceed.
2020-10-28 10:53:23 +01:00
Clement Courbet
992da89450 [llvm-exegesis] Update doc.
We don't need an external script to scan all opcodes anymore, just use
`-opcode-index=-1`.
2020-10-28 08:42:38 +01:00
Simon Pilgrim
feb9d8bd8e Fix sphinx indentation warning.
Don't double indent and make it clear we're referting to the latency mode.
2020-08-04 15:57:46 +01:00
Vy Nguyen
ee7caa7593 Reland [llvm-exegesis] Add benchmark latency option on X86 that uses LBR for more precise measurements.
Starting with Skylake, the LBR contains the precise number of cycles between the two
        consecutive branches.
        Making use of this will hopefully make the measurements more precise than the
        existing methods of using RDTSC.

                Differential Revision: https://reviews.llvm.org/D77422

New change: check for existence of field `cycles` in perf_branch_entry before enabling this mode.
This should prevent compilation errors when building for older kernel whose headers don't support it.
2020-07-27 12:38:05 -04:00
Clement Courbet
6bddd099ac Revert "[llvm-exegesis] Add benchmark latency option on X86 that uses LBR for more precise measurements."
From @erichkeane:
```
This patch doesn't seem to build for me:
/iusers/ekeane1/workspaces/llvm-project/llvm/tools/llvm-exegesis/lib/X86/X86Counter.cpp: In function ‘llvm::Error llvm::exegesis::parseDataBuffer(const char*, size_t, const void*, const void*, llvm::SmallVector<long int, 4>*)’:
/iusers/ekeane1/workspaces/llvm-project/llvm/tools/llvm-exegesis/lib/X86/X86Counter.cpp:99:37: error: ‘struct perf_branch_entry’ has no member named ‘cycles’

CycleArray->push_back(Entry.cycles);
I'm on RHEL7, so I have kernel 3.10, so it doesn't have 'cycles'.

According ot this: https://elixir.bootlin.com/linux/v4.3/source/include/uapi/linux/perf_event.h#L963 kernel 4.3 is the first time that 'cycles' appeared in this structure.
```
2020-07-17 16:55:17 +02:00
Jinsong Ji
32d36d9edc [docs] fix ident in llvm-exegesis.rst 2020-07-16 17:30:09 +00:00
Vy Nguyen
1360e140cc [llvm-exegesis] Add benchmark latency option on X86 that uses LBR for more precise measurements.
Starting with Skylake, the LBR contains the precise number of cycles between the two
    consecutive branches.
    Making use of this will hopefully make the measurements more precise than the
    existing methods of using RDTSC.

            Differential Revision: https://reviews.llvm.org/D77422
2020-07-16 12:12:46 -04:00
Roman Lebedev
de22d7154b
[llvm-exegesis] 'Min' repetition mode
Summary:
As noted in documentation, different repetition modes have different trade-offs:

> .. option:: -repetition-mode=[duplicate|loop]
>
>  Specify the repetition mode. `duplicate` will create a large, straight line
>  basic block with `num-repetitions` copies of the snippet. `loop` will wrap
>  the snippet in a loop which will be run `num-repetitions` times. The `loop`
>  mode tends to better hide the effects of the CPU frontend on architectures
>  that cache decoded instructions, but consumes a register for counting
>  iterations.

Indeed. Example:

>>! In D74156#1873657, @lebedev.ri wrote:
> At least for `CMOV`, i'm seeing wildly different results
> |           | Latency | RThroughput |
> | duplicate | 1       | 0.8         |
> | loop      | 2       | 0.6         |
> where latency=1 seems correct, and i'd expect the througput to be close to 1/2 (since there are two execution units).

This isn't great for analysis, at least for schedule model development.

As discussed in excruciating detail in

>>! In D74156#1924514, @gchatelet wrote:
>>>! In D74156#1920632, @lebedev.ri wrote:
>> ... did that explanation of the question i'm having made any sense?
>
> Thx for digging in the conversation !
> Ok it makes more sense now.
>
> I discussed it a bit with @courbet:
>  - We want the analysis tool to stay simple so we'd rather not make it knowledgeable of the repetition mode.
>  - We'd like to still be able to select either repetition mode to dig into special cases
>
> So we could add a third `min` repetition mode that would run both and take the minimum. It could be the default option.
> Would you have some time to look what it would take to add this third mode?

there appears to be an agreement that it is indeed sub-par,
and that we should provide an optional, measurement (not analysis!) -time
way to rectify the situation.

However, the solutions isn't entirely straight-forward.

We can just add an actual 'multiplexer' `MinSnippetRepetitor`, because
if we just concatenate snippets produced by `DuplicateSnippetRepetitor`
and `LoopSnippetRepetitor` and run+measure that, the measurement will
naturally be different from what we'd get by running+measuring
them separately and taking the min.
([[ https://www.wolframalpha.com/input/?i=%28x%2By%29%2F2+%21%3D+min%28x%2C+y%29 | `time(D+L)/2 != min(time(D), time(L))` ]])

Also, it seems best to me to have a single snippet instead of generating
a snippet per repetition mode, since the only difference here is that the
loop repetition mode reserves one register for loop counter.

As far as i can tell, we can either teach `BenchmarkRunner::runConfiguration()`
to produce a single report given multiple repetitors (as in the patch),
or do that one layer higher - don't modify `BenchmarkRunner::runConfiguration()`,
produce multiple reports, don't actually print each one, but aggregate them somehow
and only print the final one.

Initially i've gone ahead with the latter approach, but it didn't look like a natural fit;
the former (as in the diff) does seem like a better fit to me.

There's also a question of the test coverage. It sure currently does work here:
```
$ ./bin/llvm-exegesis --opcode-name=CMOV64rr --mode=inverse_throughput --repetition-mode=duplicate
Check generated assembly with: /usr/bin/objdump -d /tmp/snippet-8fb949.o
---
mode:            inverse_throughput
key:
  instructions:
    - 'CMOV64rr RAX RAX R11 i_0x0'
    - 'CMOV64rr RBP RBP R15 i_0x0'
    - 'CMOV64rr RBX RBX RBX i_0x0'
    - 'CMOV64rr RCX RCX RBX i_0x0'
    - 'CMOV64rr RDI RDI R10 i_0x0'
    - 'CMOV64rr RDX RDX RAX i_0x0'
    - 'CMOV64rr RSI RSI RAX i_0x0'
    - 'CMOV64rr R8 R8 R8 i_0x0'
    - 'CMOV64rr R9 R9 RDX i_0x0'
    - 'CMOV64rr R10 R10 RBX i_0x0'
    - 'CMOV64rr R11 R11 R14 i_0x0'
    - 'CMOV64rr R12 R12 R9 i_0x0'
    - 'CMOV64rr R13 R13 R12 i_0x0'
    - 'CMOV64rr R14 R14 R15 i_0x0'
    - 'CMOV64rr R15 R15 R13 i_0x0'
  config:          ''
  register_initial_values:
    - 'RAX=0x0'
    - 'R11=0x0'
    - 'EFLAGS=0x0'
    - 'RBP=0x0'
    - 'R15=0x0'
    - 'RBX=0x0'
    - 'RCX=0x0'
    - 'RDI=0x0'
    - 'R10=0x0'
    - 'RDX=0x0'
    - 'RSI=0x0'
    - 'R8=0x0'
    - 'R9=0x0'
    - 'R14=0x0'
    - 'R12=0x0'
    - 'R13=0x0'
cpu_name:        bdver2
llvm_triple:     x86_64-unknown-linux-gnu
num_repetitions: 10000
measurements:
  - { key: inverse_throughput, value: 0.819, per_snippet_value: 12.285 }
error:           ''
info:            instruction has tied variables, using static renaming.
assembled_snippet: 5541574156415541545348B8000000000000000049BB00000000000000004883EC08C7042400000000C7442404000000009D48BD000000000000000049BF000000000000000048BB000000000000000048B9000000000000000048BF000000000000000049BA000000000000000048BA000000000000000048BE000000000000000049B8000000000000000049B9000000000000000049BE000000000000000049BC000000000000000049BD0000000000000000490F40C3490F40EF480F40DB480F40CB490F40FA480F40D0480F40F04D0F40C04C0F40CA4C0F40D34D0F40DE4D0F40E14D0F40EC4D0F40F74D0F40FD490F40C35B415C415D415E415F5DC3
...
$ ./bin/llvm-exegesis --opcode-name=CMOV64rr --mode=inverse_throughput --repetition-mode=loop
Check generated assembly with: /usr/bin/objdump -d /tmp/snippet-051eb3.o
---
mode:            inverse_throughput
key:
  instructions:
    - 'CMOV64rr RAX RAX R11 i_0x0'
    - 'CMOV64rr RBP RBP RSI i_0x0'
    - 'CMOV64rr RBX RBX R9 i_0x0'
    - 'CMOV64rr RCX RCX RSI i_0x0'
    - 'CMOV64rr RDI RDI RBP i_0x0'
    - 'CMOV64rr RDX RDX R9 i_0x0'
    - 'CMOV64rr RSI RSI RDI i_0x0'
    - 'CMOV64rr R9 R9 R12 i_0x0'
    - 'CMOV64rr R10 R10 R11 i_0x0'
    - 'CMOV64rr R11 R11 R9 i_0x0'
    - 'CMOV64rr R12 R12 RBP i_0x0'
    - 'CMOV64rr R13 R13 RSI i_0x0'
    - 'CMOV64rr R14 R14 R14 i_0x0'
    - 'CMOV64rr R15 R15 R10 i_0x0'
  config:          ''
  register_initial_values:
    - 'RAX=0x0'
    - 'R11=0x0'
    - 'EFLAGS=0x0'
    - 'RBP=0x0'
    - 'RSI=0x0'
    - 'RBX=0x0'
    - 'R9=0x0'
    - 'RCX=0x0'
    - 'RDI=0x0'
    - 'RDX=0x0'
    - 'R12=0x0'
    - 'R10=0x0'
    - 'R13=0x0'
    - 'R14=0x0'
    - 'R15=0x0'
cpu_name:        bdver2
llvm_triple:     x86_64-unknown-linux-gnu
num_repetitions: 10000
measurements:
  - { key: inverse_throughput, value: 0.6083, per_snippet_value: 8.5162 }
error:           ''
info:            instruction has tied variables, using static renaming.
assembled_snippet: 5541574156415541545348B8000000000000000049BB00000000000000004883EC08C7042400000000C7442404000000009D48BD000000000000000048BE000000000000000048BB000000000000000049B9000000000000000048B9000000000000000048BF000000000000000048BA000000000000000049BC000000000000000049BA000000000000000049BD000000000000000049BE000000000000000049BF000000000000000049B80200000000000000490F40C3480F40EE490F40D9480F40CE480F40FD490F40D1480F40F74D0F40CC4D0F40D34D0F40D94C0F40E54C0F40EE4D0F40F64D0F40FA4983C0FF75C25B415C415D415E415F5DC3
...
$ ./bin/llvm-exegesis --opcode-name=CMOV64rr --mode=inverse_throughput --repetition-mode=min
Check generated assembly with: /usr/bin/objdump -d /tmp/snippet-c7a47d.o
Check generated assembly with: /usr/bin/objdump -d /tmp/snippet-2581f1.o
---
mode:            inverse_throughput
key:
  instructions:
    - 'CMOV64rr RAX RAX R11 i_0x0'
    - 'CMOV64rr RBP RBP R10 i_0x0'
    - 'CMOV64rr RBX RBX R10 i_0x0'
    - 'CMOV64rr RCX RCX RDX i_0x0'
    - 'CMOV64rr RDI RDI RAX i_0x0'
    - 'CMOV64rr RDX RDX R9 i_0x0'
    - 'CMOV64rr RSI RSI RAX i_0x0'
    - 'CMOV64rr R9 R9 RBX i_0x0'
    - 'CMOV64rr R10 R10 R12 i_0x0'
    - 'CMOV64rr R11 R11 RDI i_0x0'
    - 'CMOV64rr R12 R12 RDI i_0x0'
    - 'CMOV64rr R13 R13 RDI i_0x0'
    - 'CMOV64rr R14 R14 R9 i_0x0'
    - 'CMOV64rr R15 R15 RBP i_0x0'
  config:          ''
  register_initial_values:
    - 'RAX=0x0'
    - 'R11=0x0'
    - 'EFLAGS=0x0'
    - 'RBP=0x0'
    - 'R10=0x0'
    - 'RBX=0x0'
    - 'RCX=0x0'
    - 'RDX=0x0'
    - 'RDI=0x0'
    - 'R9=0x0'
    - 'RSI=0x0'
    - 'R12=0x0'
    - 'R13=0x0'
    - 'R14=0x0'
    - 'R15=0x0'
cpu_name:        bdver2
llvm_triple:     x86_64-unknown-linux-gnu
num_repetitions: 10000
measurements:
  - { key: inverse_throughput, value: 0.6073, per_snippet_value: 8.5022 }
error:           ''
info:            instruction has tied variables, using static renaming.
assembled_snippet: 5541574156415541545348B8000000000000000049BB00000000000000004883EC08C7042400000000C7442404000000009D48BD000000000000000049BA000000000000000048BB000000000000000048B9000000000000000048BA000000000000000048BF000000000000000049B9000000000000000048BE000000000000000049BC000000000000000049BD000000000000000049BE000000000000000049BF0000000000000000490F40C3490F40EA490F40DA480F40CA480F40F8490F40D1480F40F04C0F40CB4D0F40D44C0F40DF4C0F40E74C0F40EF4D0F40F14C0F40FD490F40C3490F40EA5B415C415D415E415F5DC35541574156415541545348B8000000000000000049BB00000000000000004883EC08C7042400000000C7442404000000009D48BD000000000000000049BA000000000000000048BB000000000000000048B9000000000000000048BA000000000000000048BF000000000000000049B9000000000000000048BE000000000000000049BC000000000000000049BD000000000000000049BE000000000000000049BF000000000000000049B80200000000000000490F40C3490F40EA490F40DA480F40CA480F40F8490F40D1480F40F04C0F40CB4D0F40D44C0F40DF4C0F40E74C0F40EF4D0F40F14C0F40FD4983C0FF75C25B415C415D415E415F5DC3
...
```
but i open to suggestions as to how test that.

I also have gone with the suggestion to default to this new mode.
This was irking me for some time, so i'm happy to finally see progress here.
Looking forward to feedback.

Reviewers: courbet, gchatelet

Reviewed By: courbet, gchatelet

Subscribers: mstojanovic, RKSimon, llvm-commits, courbet, gchatelet

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D76921
2020-04-02 09:28:35 +03:00
Roman Lebedev
cc5549dbc2
[NFC][llvm-exegesis] Docs/help: opcode-index=-1 means measure everything 2020-02-13 12:46:12 +03:00
Clement Courbet
89a66474b6 [llvm-exegesis] Document repetition-mode.
Reviewers: gchatelet

Subscribers: tschuett, mstojanovic, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D74114
2020-02-06 13:42:12 +01:00
Clement Courbet
2cd0f28959 [llvm-exegesis] Add options to SnippetGenerator.
Summary:
This adds a `-max-configs-per-opcode` option to limit the number of
configs per opcode.

Reviewers: gchatelet

Subscribers: tschuett, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D68642

llvm-svn: 374054
2019-10-08 14:30:24 +00:00
Alex Brachet
fa9d232e43 [docs] [NFC] Removed excess spacing
Summary: Removed excess new lines from documentations. As far as I can tell, it seems as though restructured text is agnostic to new lines, the use of new lines was inconsistent and had no effect on how the files were being displayed.

Reviewers: jhenderson, rupprecht, JDevlieghere

Reviewed By: jhenderson

Subscribers: llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D63971

llvm-svn: 365105
2019-07-04 04:41:06 +00:00
James Henderson
a056684c33 [docs][tools] Add missing "program" tags to rst files
Sphinx allows for definitions of command-line options using
`.. option <name>` and references to those options via `:option:<name>`.
However, it looks like there is no scoping of these options by default,
meaning that links can end up pointing to incorrect documents. See for
example the llvm-mca document, which contains references to -o that,
prior to this patch, pointed to a different document. What's worse is
that these links appear to be non-deterministic in which one is picked
(on my machine, some references end up pointing to opt, whereas on the
live docs, they point to llvm-dwarfdump, for example).

The fix is to add the .. program <name> tag. This essentially namespaces
the options (definitions and references) to the named program, ensuring
that the links are kept correct.

Reviwed by: andreadb

Differential Revision: https://reviews.llvm.org/D63873

llvm-svn: 364538
2019-06-27 13:24:46 +00:00
Guillaume Chatelet
848df5b509 Add an option do not dump the generated object on disk
Reviewers: courbet

Subscribers: llvm-commits, bdb

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D60317

llvm-svn: 357769
2019-04-05 15:18:59 +00:00
Roman Lebedev
c2423fe689 [llvm-exegesis] Introduce a 'naive' clustering algorithm (PR40880)
Summary:
This is an alternative to D59539.

Let's suppose we have measured 4 different opcodes, and got: `0.5`, `1.0`, `1.5`, `2.0`.
Let's suppose we are using `-analysis-clustering-epsilon=0.5`.
By default now we will start processing the `0.5` point, find that `1.0` is it's neighbor, add them to a new cluster.
Then we will notice that `1.5` is a neighbor of `1.0` and add it to that same cluster.
Then we will notice that `2.0` is a neighbor of `1.5` and add it to that same cluster.
So all these points ended up in the same cluster.
This may or may not be a correct implementation of dbscan clustering algorithm.

But this is rather horribly broken for the reasons of comparing the clusters with the LLVM sched data.
Let's suppose all those opcodes are currently in the same sched cluster.
If i specify `-analysis-inconsistency-epsilon=0.5`, then no matter
the LLVM values this cluster will **never** match the LLVM values,
and thus this cluster will **always** be displayed as inconsistent.

The solution is obviously to split off some of these opcodes into different sched cluster.
But how do i do that? Out of 4 opcodes displayed in the inconsistency report,
which ones are the "bad ones"? Which ones are the most different from the checked-in data?
I'd need to go in to the `.yaml` and look it up manually.

The trivial solution is to, when creating clusters, don't use the full dbscan algorithm,
but instead "pick some unclustered point, pick all unclustered points that are it's neighbor,
put them all into a new cluster, repeat". And just so as it happens, we can arrive
at that algorithm by not performing the "add neighbors of a neighbor to the cluster" step.

But that won't work well once we teach analyze mode to operate in on-1D mode
(i.e. on more than a single measurement type at a time), because the clustering would
depend on the order of the measurements.

Instead, let's just create a single cluster per opcode, and put all the points of that opcode into said cluster.
And simultaneously check that every point in that cluster is a neighbor of every other point in the cluster,
and if they are not, the cluster (==opcode) is unstable.

This is //yet another// step to bring me closer to being able to continue cleanup of bdver2 sched model..

Fixes [[ https://bugs.llvm.org/show_bug.cgi?id=40880 | PR40880 ]].

Reviewers: courbet, gchatelet

Reviewed By: courbet

Subscribers: tschuett, jdoerfert, RKSimon, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D59820

llvm-svn: 357152
2019-03-28 08:55:01 +00:00
Roman Lebedev
542e5d7bb5 [llvm-exegesis] Split Epsilon param into two (PR40787)
Summary:
This eps param is used for two distinct things:
* initial point clusterization
* checking clusters against the llvm values

What if one wants to only look at highly different clusters, without changing
the clustering itself? In particular, this helps to weed out noisy measurements
(since the clusterization epsilon is still small, so there is a better chance
that noisy measurements from the same opcode will go into different clusters)

By splitting it into two params it is now possible.

This is nearly-free performance-wise:
Old:
```
$ perf stat -r 25 ./bin/llvm-exegesis -mode=analysis -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-latency-1.yaml -analysis-inconsistencies-output-file=/tmp/clusters-old.html
no exegesis target for x86_64-unknown-linux-gnu, using default
Parsed 10099 benchmark points
Printing sched class consistency analysis results to file '/tmp/clusters-old.html'
...
 Performance counter stats for './bin/llvm-exegesis -mode=analysis -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-latency-1.yaml -analysis-inconsistencies-output-file=/tmp/clusters-old.html' (25 runs):

            390.01 msec task-clock                #    0.998 CPUs utilized            ( +-  0.25% )
                12      context-switches          #   31.735 M/sec                    ( +- 27.38% )
                 0      cpu-migrations            #    0.000 K/sec
              4745      page-faults               # 12183.732 M/sec                   ( +-  0.54% )
        1562711900      cycles                    # 4012303.327 GHz                   ( +-  0.24% )  (82.90%)
         185567822      stalled-cycles-frontend   #   11.87% frontend cycles idle     ( +-  0.52% )  (83.30%)
         392106234      stalled-cycles-backend    #   25.09% backend cycles idle      ( +-  1.31% )  (33.79%)
        1839236666      instructions              #    1.18  insn per cycle
                                                  #    0.21  stalled cycles per insn  ( +-  0.15% )  (50.37%)
         407035764      branches                  # 1045074878.710 M/sec              ( +-  0.12% )  (66.80%)
          10896459      branch-misses             #    2.68% of all branches          ( +-  0.17% )  (83.20%)

          0.390629 +- 0.000972 seconds time elapsed  ( +-  0.25% )
```
```
$ perf stat -r 9 ./bin/llvm-exegesis -mode=analysis -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-latency.yml -analysis-inconsistencies-output-file=/tmp/clusters-old.html
no exegesis target for x86_64-unknown-linux-gnu, using default
Parsed 50572 benchmark points
Printing sched class consistency analysis results to file '/tmp/clusters-old.html'
...
 Performance counter stats for './bin/llvm-exegesis -mode=analysis -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-latency.yml -analysis-inconsistencies-output-file=/tmp/clusters-old.html' (9 runs):

           6803.36 msec task-clock                #    0.999 CPUs utilized            ( +-  0.96% )
               262      context-switches          #   38.546 M/sec                    ( +- 23.06% )
                 0      cpu-migrations            #    0.065 M/sec                    ( +- 76.03% )
             13287      page-faults               # 1953.206 M/sec                    ( +-  0.32% )
       27252537904      cycles                    # 4006024.257 GHz                   ( +-  0.95% )  (83.31%)
        1496314935      stalled-cycles-frontend   #    5.49% frontend cycles idle     ( +-  0.97% )  (83.32%)
       16128404524      stalled-cycles-backend    #   59.18% backend cycles idle      ( +-  0.30% )  (33.37%)
       17611143370      instructions              #    0.65  insn per cycle
                                                  #    0.92  stalled cycles per insn  ( +-  0.05% )  (50.04%)
        3894906599      branches                  # 572537147.437 M/sec               ( +-  0.03% )  (66.69%)
         116314514      branch-misses             #    2.99% of all branches          ( +-  0.20% )  (83.35%)

            6.8118 +- 0.0689 seconds time elapsed  ( +-  1.01%)
```
New:
```
$ perf stat -r 25 ./bin/llvm-exegesis -mode=analysis -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-latency-1.yaml -analysis-inconsistencies-output-file=/tmp/clusters-new.html
no exegesis target for x86_64-unknown-linux-gnu, using default
Parsed 10099 benchmark points
Printing sched class consistency analysis results to file '/tmp/clusters-new.html'
...
 Performance counter stats for './bin/llvm-exegesis -mode=analysis -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-latency-1.yaml -analysis-inconsistencies-output-file=/tmp/clusters-new.html' (25 runs):

            400.14 msec task-clock                #    0.998 CPUs utilized            ( +-  0.66% )
                12      context-switches          #   29.429 M/sec                    ( +- 25.95% )
                 0      cpu-migrations            #    0.100 M/sec                    ( +-100.00% )
              4714      page-faults               # 11796.496 M/sec                   ( +-  0.55% )
        1603131306      cycles                    # 4011840.105 GHz                   ( +-  0.66% )  (82.85%)
         199538509      stalled-cycles-frontend   #   12.45% frontend cycles idle     ( +-  2.40% )  (83.10%)
         402249109      stalled-cycles-backend    #   25.09% backend cycles idle      ( +-  1.19% )  (34.05%)
        1847783963      instructions              #    1.15  insn per cycle
                                                  #    0.22  stalled cycles per insn  ( +-  0.18% )  (50.64%)
         407162722      branches                  # 1018925730.631 M/sec              ( +-  0.12% )  (67.02%)
          10932779      branch-misses             #    2.69% of all branches          ( +-  0.51% )  (83.28%)

           0.40077 +- 0.00267 seconds time elapsed  ( +-  0.67% )

lebedevri@pini-pini:/build/llvm-build-Clang-release$ perf stat -r 9 ./bin/llvm-exegesis -mode=analysis -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-latency.yml -analysis-inconsistencies-output-file=/tmp/clusters-new.html
no exegesis target for x86_64-unknown-linux-gnu, using default
Parsed 50572 benchmark points
Printing sched class consistency analysis results to file '/tmp/clusters-new.html'
...
 Performance counter stats for './bin/llvm-exegesis -mode=analysis -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-latency.yml -analysis-inconsistencies-output-file=/tmp/clusters-new.html' (9 runs):

           6947.79 msec task-clock                #    1.000 CPUs utilized            ( +-  0.90% )
               217      context-switches          #   31.236 M/sec                    ( +- 36.16% )
                 1      cpu-migrations            #    0.096 M/sec                    ( +- 50.00% )
             13258      page-faults               # 1908.389 M/sec                    ( +-  0.34% )
       27830796523      cycles                    # 4006032.286 GHz                   ( +-  0.89% )  (83.30%)
        1504554006      stalled-cycles-frontend   #    5.41% frontend cycles idle     ( +-  2.10% )  (83.32%)
       16716574843      stalled-cycles-backend    #   60.07% backend cycles idle      ( +-  0.65% )  (33.38%)
       17755545931      instructions              #    0.64  insn per cycle
                                                  #    0.94  stalled cycles per insn  ( +-  0.09% )  (50.04%)
        3897255686      branches                  # 560980426.597 M/sec               ( +-  0.06% )  (66.70%)
         117045395      branch-misses             #    3.00% of all branches          ( +-  0.47% )  (83.34%)

            6.9507 +- 0.0627 seconds time elapsed  ( +-  0.90% )
```

I.e. it's +2.6% slowdown for one whole sweep, or +2% for 5 whole sweeps.
Within noise i'd say.

Should help with [[ https://bugs.llvm.org/show_bug.cgi?id=40787 | PR40787 ]].

Reviewers: courbet, gchatelet

Reviewed By: courbet

Subscribers: tschuett, RKSimon, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D58476

llvm-svn: 354767
2019-02-25 09:36:12 +00:00
Roman Lebedev
69716394f3 [llvm-exegesis] Opcode stabilization / reclusterization (PR40715)
Summary:
Given an instruction `Opcode`, we can make benchmarks (measurements) of the
instruction characteristics/performance. Then, to facilitate further analysis
we group the benchmarks with *similar* characteristics into clusters.
Now, this is all not entirely deterministic. Some instructions have variable
characteristics, depending on their arguments. And thus, if we do several
benchmarks of the same instruction `Opcode`, we may end up with *different*
performance characteristics measurements. And when we then do clustering,
these several benchmarks of the same instruction `Opcode` may end up being
clustered into *different* clusters. This is not great for further analysis.

We shall find every `Opcode` with benchmarks not in just one cluster, and move
*all* the benchmarks of said `Opcode` into one new unstable cluster per `Opcode`.

I have solved this by making `ClusterId` a bit field, adding a `IsUnstable` bit,
and introducing `-analysis-display-unstable-clusters` switch to toggle between
displaying stable-only clusters and unstable-only clusters.

The reclusterization is deterministically stable, produces identical reports
between runs. (Or at least that is what i'm seeing, maybe it isn't)

Timings/comparisons:
old (current trunk/head) {F8303582}
```
$ perf stat -r 25 ./bin/llvm-exegesis -mode=analysis -analysis-epsilon=0.5 -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-inverse_throughput.yaml -analysis-inconsistencies-output-file=/tmp/clusters-old.html
no exegesis target for x86_64-unknown-linux-gnu, using default
Parsed 43970 benchmark points
Printing sched class consistency analysis results to file '/tmp/clusters-old.html'
...
no exegesis target for x86_64-unknown-linux-gnu, using default
Parsed 43970 benchmark points
Printing sched class consistency analysis results to file '/tmp/clusters-old.html'

 Performance counter stats for './bin/llvm-exegesis -mode=analysis -analysis-epsilon=0.5 -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-inverse_throughput.yaml -analysis-inconsistencies-output-file=/tmp/clusters-old.html' (25 runs):

           6624.73 msec task-clock                #    0.999 CPUs utilized            ( +-  0.53% )
               172      context-switches          #   25.965 M/sec                    ( +- 29.89% )
                 0      cpu-migrations            #    0.042 M/sec                    ( +- 56.54% )
             31073      page-faults               # 4690.754 M/sec                    ( +-  0.08% )
       26538711696      cycles                    # 4006230.292 GHz                   ( +-  0.53% )  (83.31%)
        2017496807      stalled-cycles-frontend   #    7.60% frontend cycles idle     ( +-  0.93% )  (83.32%)
       13403650062      stalled-cycles-backend    #   50.51% backend cycles idle      ( +-  0.33% )  (33.37%)
       19770706799      instructions              #    0.74  insn per cycle
                                                  #    0.68  stalled cycles per insn  ( +-  0.04% )  (50.04%)
        4419821812      branches                  # 667207369.714 M/sec               ( +-  0.03% )  (66.69%)
         121741669      branch-misses             #    2.75% of all branches          ( +-  0.28% )  (83.34%)

            6.6283 +- 0.0358 seconds time elapsed  ( +-  0.54% )
```

patch, with reclustering but without filtering (i.e. outputting all the stable *and* unstable clusters) {F8303586}
```
$ perf stat -r 25 ./bin/llvm-exegesis -mode=analysis -analysis-epsilon=0.5 -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-inverse_throughput.yaml -analysis-inconsistencies-output-file=/tmp/clusters-new-all.html
no exegesis target for x86_64-unknown-linux-gnu, using default
Parsed 43970 benchmark points
Printing sched class consistency analysis results to file '/tmp/clusters-new-all.html'
...
no exegesis target for x86_64-unknown-linux-gnu, using default
Parsed 43970 benchmark points
Printing sched class consistency analysis results to file '/tmp/clusters-new-all.html'

 Performance counter stats for './bin/llvm-exegesis -mode=analysis -analysis-epsilon=0.5 -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-inverse_throughput.yaml -analysis-inconsistencies-output-file=/tmp/clusters-new-all.html' (25 runs):

           6475.29 msec task-clock                #    0.999 CPUs utilized            ( +-  0.31% )
               213      context-switches          #   32.952 M/sec                    ( +- 23.81% )
                 1      cpu-migrations            #    0.130 M/sec                    ( +- 43.84% )
             31287      page-faults               # 4832.057 M/sec                    ( +-  0.08% )
       25939086577      cycles                    # 4006160.279 GHz                   ( +-  0.31% )  (83.31%)
        1958812858      stalled-cycles-frontend   #    7.55% frontend cycles idle     ( +-  0.68% )  (83.32%)
       13218961512      stalled-cycles-backend    #   50.96% backend cycles idle      ( +-  0.29% )  (33.37%)
       19752995402      instructions              #    0.76  insn per cycle
                                                  #    0.67  stalled cycles per insn  ( +-  0.04% )  (50.04%)
        4417079244      branches                  # 682195472.305 M/sec               ( +-  0.03% )  (66.70%)
         121510065      branch-misses             #    2.75% of all branches          ( +-  0.19% )  (83.34%)

            6.4832 +- 0.0229 seconds time elapsed  ( +-  0.35% )
```
Funnily, *this* measurement shows that said reclustering actually improved performance.

patch, with reclustering, only the stable clusters {F8303594}
```
$ perf stat -r 25 ./bin/llvm-exegesis -mode=analysis -analysis-epsilon=0.5 -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-inverse_throughput.yaml -analysis-inconsistencies-output-file=/tmp/clusters-new-stable.html
no exegesis target for x86_64-unknown-linux-gnu, using default
Parsed 43970 benchmark points
Printing sched class consistency analysis results to file '/tmp/clusters-new-stable.html'
...
no exegesis target for x86_64-unknown-linux-gnu, using default
Parsed 43970 benchmark points
Printing sched class consistency analysis results to file '/tmp/clusters-new-stable.html'

 Performance counter stats for './bin/llvm-exegesis -mode=analysis -analysis-epsilon=0.5 -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-inverse_throughput.yaml -analysis-inconsistencies-output-file=/tmp/clusters-new-stable.html' (25 runs):

           6387.71 msec task-clock                #    0.999 CPUs utilized            ( +-  0.13% )
               133      context-switches          #   20.792 M/sec                    ( +- 23.39% )
                 0      cpu-migrations            #    0.063 M/sec                    ( +- 61.24% )
             31318      page-faults               # 4903.256 M/sec                    ( +-  0.08% )
       25591984967      cycles                    # 4006786.266 GHz                   ( +-  0.13% )  (83.31%)
        1881234904      stalled-cycles-frontend   #    7.35% frontend cycles idle     ( +-  0.25% )  (83.33%)
       13209749965      stalled-cycles-backend    #   51.62% backend cycles idle      ( +-  0.16% )  (33.36%)
       19767554347      instructions              #    0.77  insn per cycle
                                                  #    0.67  stalled cycles per insn  ( +-  0.04% )  (50.03%)
        4417480305      branches                  # 691618858.046 M/sec               ( +-  0.03% )  (66.68%)
         118676358      branch-misses             #    2.69% of all branches          ( +-  0.07% )  (83.33%)

            6.3954 +- 0.0118 seconds time elapsed  ( +-  0.18% )
```
Performance improved even further?! Makes sense i guess, less clusters to print.

patch, with reclustering, only the unstable clusters {F8303601}
```
$ perf stat -r 25 ./bin/llvm-exegesis -mode=analysis -analysis-epsilon=0.5 -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-inverse_throughput.yaml -analysis-inconsistencies-output-file=/tmp/clusters-new-unstable.html -analysis-display-unstable-clusters
no exegesis target for x86_64-unknown-linux-gnu, using default
Parsed 43970 benchmark points
Printing sched class consistency analysis results to file '/tmp/clusters-new-unstable.html'
...
no exegesis target for x86_64-unknown-linux-gnu, using default
Parsed 43970 benchmark points
Printing sched class consistency analysis results to file '/tmp/clusters-new-unstable.html'

 Performance counter stats for './bin/llvm-exegesis -mode=analysis -analysis-epsilon=0.5 -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-inverse_throughput.yaml -analysis-inconsistencies-output-file=/tmp/clusters-new-unstable.html -analysis-display-unstable-clusters' (25 runs):

           6124.96 msec task-clock                #    1.000 CPUs utilized            ( +-  0.20% )
               194      context-switches          #   31.709 M/sec                    ( +- 20.46% )
                 0      cpu-migrations            #    0.039 M/sec                    ( +- 49.77% )
             31413      page-faults               # 5129.261 M/sec                    ( +-  0.06% )
       24536794267      cycles                    # 4006425.858 GHz                   ( +-  0.19% )  (83.31%)
        1676085087      stalled-cycles-frontend   #    6.83% frontend cycles idle     ( +-  0.46% )  (83.32%)
       13035595603      stalled-cycles-backend    #   53.13% backend cycles idle      ( +-  0.16% )  (33.36%)
       18260877653      instructions              #    0.74  insn per cycle
                                                  #    0.71  stalled cycles per insn  ( +-  0.05% )  (50.03%)
        4112411983      branches                  # 671484364.603 M/sec               ( +-  0.03% )  (66.68%)
         114066929      branch-misses             #    2.77% of all branches          ( +-  0.11% )  (83.32%)

            6.1278 +- 0.0121 seconds time elapsed  ( +-  0.20% )
```
This tells us that the actual `-analysis-inconsistencies-output-file=` outputting only takes ~0.4 sec for 43970 benchmark points (3 whole sweeps)
(Also, wow this is fast, it used to take several minutes originally)

Fixes [[ https://bugs.llvm.org/show_bug.cgi?id=40715 | PR40715 ]].

Reviewers: courbet, gchatelet

Reviewed By: courbet

Subscribers: tschuett, jdoerfert, llvm-commits, RKSimon

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D58355

llvm-svn: 354441
2019-02-20 09:14:04 +00:00
Guillaume Chatelet
bd604e011f [llvm-exegesis] [NFC] Fixing typo.
Reviewers: courbet, gchatelet

Reviewed By: courbet, gchatelet

Subscribers: tschuett, llvm-commits

Differential Revision: https://reviews.llvm.org/D54895

llvm-svn: 354250
2019-02-18 10:08:20 +00:00
Roman Lebedev
21193f4b7e [llvm-exegesis] Don't default to running&dumping all analyses to '-'
Summary:
Up until the point i have looked in the source, i didn't even understood that
i can disable 'cluster' output. I have always silenced it via ` &> /dev/null`.
(And hoped it wasn't contributing much of the run time.)

While i expect that it has it's use-cases i never once needed it so far.
If i forget to silence it, console is completely flooded with that output.

How about not expecting users to opt-out of analyses,
but to explicitly specify the analyses that should be performed?

Reviewers: courbet, gchatelet

Reviewed By: courbet

Subscribers: tschuett, RKSimon, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D57648

llvm-svn: 353021
2019-02-04 09:12:08 +00:00
Clement Courbet
362653f7af [llvm-exegesis] Add throughput mode.
Summary:
This just uses the latency benchmark runner on the parallel uops snippet
generator.

Fixes PR37698.

Reviewers: gchatelet

Subscribers: tschuett, RKSimon, llvm-commits

Differential Revision: https://reviews.llvm.org/D57000

llvm-svn: 352632
2019-01-30 16:02:20 +00:00
Clement Courbet
41c8af3924 [MCSched] Bind PFM Counters to the CPUs instead of the SchedModel.
Summary:
The pfm counters are now in the ExegesisTarget rather than the
MCSchedModel (PR39165).

This also compresses the pfm counter tables (PR37068).

Reviewers: RKSimon, gchatelet

Subscribers: mgrang, llvm-commits

Differential Revision: https://reviews.llvm.org/D52932

llvm-svn: 345243
2018-10-25 07:44:01 +00:00
Clement Courbet
f973c2df9d [llvm-exegesis] Allow measuring several instructions in a single run.
Summary:
We try to recover gracefully on instructions that would crash the
program.

This includes some refactoring of runMeasurement() implementations.

Reviewers: gchatelet

Subscribers: tschuett, llvm-commits

Differential Revision: https://reviews.llvm.org/D53371

llvm-svn: 344695
2018-10-17 15:04:15 +00:00
Simon Pilgrim
c4976f6b9f The llvm-exegesis output file is a html file not a txt file.
llvm-svn: 343215
2018-09-27 13:49:52 +00:00
Clement Courbet
86ecf46fb4 [llvm-exegesis] Fix doc in r342947.
llvm-exegesis.rst was using invalid indentation for bullet points.

llvm-svn: 342948
2018-09-25 07:48:38 +00:00
Clement Courbet
78b2e73d15 [llvm-exegesis] Allow benchmarking arbitrary code snippets.
Summary:

This is a step towards fixing PR38048.

Note that right now the measurements are given per instruction. We'll
need to give measurements a per code snippet and update the analysis (PR38731).

Reviewers: gchatelet

Subscribers: tschuett, llvm-commits

Differential Revision: https://reviews.llvm.org/D52041

llvm-svn: 342947
2018-09-25 07:31:44 +00:00
Simon Pilgrim
a5638431dc [docs] Fix indentation of llvm-exegesis command line arguments
llvm-svn: 334976
2018-06-18 20:05:02 +00:00
Clement Courbet
e752fd65e8 [llvm-exegesis] Optionally ignore instructions without a sched class.
Summary: See PR37602.

Reviewers: RKSimon

Subscribers: llvm-commits, tschuett

Differential Revision: https://reviews.llvm.org/D48267

llvm-svn: 334932
2018-06-18 11:27:47 +00:00
Clement Courbet
6eb680a40d [llvm-exegesis] Fix off-by-one in llvm-exegesis documentation.
llvm-svn: 333759
2018-06-01 14:49:06 +00:00
Clement Courbet
2637e5f828 [llvm-exegesis] Show sched class details in analysis.
Summary: And update docs.

Reviewers: gchatelet

Subscribers: tschuett, craig.topper, RKSimon, llvm-commits

Differential Revision: https://reviews.llvm.org/D47254

llvm-svn: 333169
2018-05-24 10:47:05 +00:00
Clement Courbet
488ebfb732 [llvm-exegesis] Update doc to mention that the output is in html.
llvm-svn: 332980
2018-05-22 13:36:29 +00:00
Clement Courbet
5ec03cdaa3 [llvm-exegesis] Improve documentation.
Summary:
- Better flag names.
- Fix flag reference in doc.
- Add usage examples in doc.

Fixes PR37497.

Reviewers: gchatelet

Subscribers: llvm-commits, tschuett

Differential Revision: https://reviews.llvm.org/D47015

llvm-svn: 332708
2018-05-18 12:33:57 +00:00