2014-07-30 20:30:11 +00:00
|
|
|
llvm-profdata - Profile data tool
|
|
|
|
=================================
|
2014-02-17 23:22:49 +00:00
|
|
|
|
[docs][tools] Add missing "program" tags to rst files
Sphinx allows for definitions of command-line options using
`.. option <name>` and references to those options via `:option:<name>`.
However, it looks like there is no scoping of these options by default,
meaning that links can end up pointing to incorrect documents. See for
example the llvm-mca document, which contains references to -o that,
prior to this patch, pointed to a different document. What's worse is
that these links appear to be non-deterministic in which one is picked
(on my machine, some references end up pointing to opt, whereas on the
live docs, they point to llvm-dwarfdump, for example).
The fix is to add the .. program <name> tag. This essentially namespaces
the options (definitions and references) to the named program, ensuring
that the links are kept correct.
Reviwed by: andreadb
Differential Revision: https://reviews.llvm.org/D63873
llvm-svn: 364538
2019-06-27 13:24:46 +00:00
|
|
|
.. program:: llvm-profdata
|
|
|
|
|
2014-02-17 23:22:49 +00:00
|
|
|
SYNOPSIS
|
|
|
|
--------
|
|
|
|
|
2014-07-30 20:30:11 +00:00
|
|
|
:program:`llvm-profdata` *command* [*args...*]
|
2014-02-17 23:22:49 +00:00
|
|
|
|
|
|
|
DESCRIPTION
|
|
|
|
-----------
|
|
|
|
|
2014-07-30 20:30:11 +00:00
|
|
|
The :program:`llvm-profdata` tool is a small utility for working with profile
|
|
|
|
data files.
|
2014-02-17 23:22:49 +00:00
|
|
|
|
2014-07-30 20:30:11 +00:00
|
|
|
COMMANDS
|
|
|
|
--------
|
|
|
|
|
2015-03-12 01:38:50 +00:00
|
|
|
* :ref:`merge <profdata-merge>`
|
|
|
|
* :ref:`show <profdata-show>`
|
2019-04-30 21:19:12 +00:00
|
|
|
* :ref:`overlap <profdata-overlap>`
|
2023-06-06 11:43:36 -07:00
|
|
|
* :ref:`order <profdata-order>`
|
2014-07-30 20:30:11 +00:00
|
|
|
|
|
|
|
.. program:: llvm-profdata merge
|
|
|
|
|
2015-03-12 01:38:50 +00:00
|
|
|
.. _profdata-merge:
|
2014-07-30 20:30:11 +00:00
|
|
|
|
|
|
|
MERGE
|
|
|
|
-----
|
|
|
|
|
|
|
|
SYNOPSIS
|
|
|
|
^^^^^^^^
|
|
|
|
|
2015-12-15 17:37:09 +00:00
|
|
|
:program:`llvm-profdata merge` [*options*] [*filename...*]
|
2014-07-30 20:30:11 +00:00
|
|
|
|
|
|
|
DESCRIPTION
|
|
|
|
^^^^^^^^^^^
|
|
|
|
|
|
|
|
:program:`llvm-profdata merge` takes several profile data files
|
|
|
|
generated by PGO instrumentation and merges them together into a single
|
|
|
|
indexed profile data file.
|
2014-02-17 23:22:49 +00:00
|
|
|
|
2015-12-15 17:37:09 +00:00
|
|
|
By default profile data is merged without modification. This means that the
|
|
|
|
relative importance of each input file is proportional to the number of samples
|
|
|
|
or counts it contains. In general, the input from a longer training run will be
|
|
|
|
interpreted as relatively more important than a shorter run. Depending on the
|
|
|
|
nature of the training runs it may be useful to adjust the weight given to each
|
|
|
|
input file by using the ``-weighted-input`` option.
|
|
|
|
|
2016-06-07 22:47:31 +00:00
|
|
|
Profiles passed in via ``-weighted-input``, ``-input-files``, or via positional
|
|
|
|
arguments are processed once for each time they are seen.
|
|
|
|
|
2015-12-15 17:37:09 +00:00
|
|
|
|
2014-02-17 23:22:49 +00:00
|
|
|
OPTIONS
|
2014-07-30 20:30:11 +00:00
|
|
|
^^^^^^^
|
|
|
|
|
2021-12-30 10:37:17 -08:00
|
|
|
.. option:: --help
|
2014-07-30 20:30:11 +00:00
|
|
|
|
|
|
|
Print a summary of command line options.
|
|
|
|
|
2021-12-30 10:37:17 -08:00
|
|
|
.. option:: --output=<output>, -o
|
2014-07-30 20:30:11 +00:00
|
|
|
|
|
|
|
Specify the output file name. *Output* cannot be ``-`` as the resulting
|
|
|
|
indexed profile data can't be written to standard output.
|
|
|
|
|
2021-12-30 10:37:17 -08:00
|
|
|
.. option:: --weighted-input=<weight,filename>
|
2015-12-15 17:37:09 +00:00
|
|
|
|
2016-05-28 01:03:36 +00:00
|
|
|
Specify an input file name along with a weight. The profile counts of the
|
|
|
|
supplied ``filename`` will be scaled (multiplied) by the supplied
|
2020-04-13 08:39:58 +02:00
|
|
|
``weight``, where ``weight`` is a decimal integer >= 1.
|
2016-05-28 01:03:36 +00:00
|
|
|
Input files specified without using this option are assigned a default
|
|
|
|
weight of 1. Examples are shown below.
|
2015-12-15 17:37:09 +00:00
|
|
|
|
2021-12-30 10:37:17 -08:00
|
|
|
.. option:: --input-files=<path>, -f
|
2016-06-07 22:47:31 +00:00
|
|
|
|
|
|
|
Specify a file which contains a list of files to merge. The entries in this
|
|
|
|
file are newline-separated. Lines starting with '#' are skipped. Entries may
|
|
|
|
be of the form <filename> or <weight>,<filename>.
|
|
|
|
|
2021-12-30 10:37:17 -08:00
|
|
|
.. option:: --remapping-file=<path>, -r
|
2018-09-13 20:22:02 +00:00
|
|
|
|
|
|
|
Specify a file which contains a remapping from symbol names in the input
|
|
|
|
profile to the symbol names that should be used in the output profile. The
|
|
|
|
file should consist of lines of the form ``<input-symbol> <output-symbol>``.
|
|
|
|
Blank lines and lines starting with ``#`` are skipped.
|
|
|
|
|
|
|
|
The :doc:`llvm-cxxmap <llvm-cxxmap>` tool can be used to generate the symbol
|
|
|
|
remapping file.
|
|
|
|
|
2021-12-30 10:37:17 -08:00
|
|
|
.. option:: --instr (default)
|
2015-05-28 21:57:17 +00:00
|
|
|
|
2015-11-24 20:48:25 +00:00
|
|
|
Specify that the input profile is an instrumentation-based profile.
|
2015-05-28 21:57:17 +00:00
|
|
|
|
2021-12-30 10:37:17 -08:00
|
|
|
.. option:: --sample
|
2015-05-28 21:57:17 +00:00
|
|
|
|
2015-11-24 20:48:25 +00:00
|
|
|
Specify that the input profile is a sample-based profile.
|
2021-11-15 09:17:08 +08:00
|
|
|
|
2015-11-24 20:48:25 +00:00
|
|
|
The format of the generated file can be generated in one of three ways:
|
2015-05-28 21:57:17 +00:00
|
|
|
|
2021-12-30 10:37:17 -08:00
|
|
|
.. option:: --binary (default)
|
2015-05-28 21:57:17 +00:00
|
|
|
|
2015-11-24 20:48:25 +00:00
|
|
|
Emit the profile using a binary encoding. For instrumentation-based profile
|
2021-11-15 09:17:08 +08:00
|
|
|
the output format is the indexed binary format.
|
2015-05-28 21:57:17 +00:00
|
|
|
|
2021-12-30 10:37:17 -08:00
|
|
|
.. option:: --extbinary
|
2020-05-13 15:11:49 -07:00
|
|
|
|
|
|
|
Emit the profile using an extensible binary encoding. This option can only
|
|
|
|
be used with sample-based profile. The extensible binary encoding can be
|
|
|
|
more compact with compression enabled and can be loaded faster than the
|
|
|
|
default binary encoding.
|
|
|
|
|
2021-12-30 10:37:17 -08:00
|
|
|
.. option:: --text
|
2015-05-28 21:57:17 +00:00
|
|
|
|
2015-11-24 20:48:25 +00:00
|
|
|
Emit the profile in text mode. This option can also be used with both
|
|
|
|
sample-based and instrumentation-based profile. When this option is used
|
|
|
|
the profile will be dumped in the text format that is parsable by the profile
|
|
|
|
reader.
|
2015-05-28 21:57:17 +00:00
|
|
|
|
2021-12-30 10:37:17 -08:00
|
|
|
.. option:: --gcc
|
2015-05-28 21:57:17 +00:00
|
|
|
|
|
|
|
Emit the profile using GCC's gcov format (Not yet supported).
|
|
|
|
|
2021-12-30 10:37:17 -08:00
|
|
|
.. option:: --sparse[=true|false]
|
2016-01-29 22:54:45 +00:00
|
|
|
|
|
|
|
Do not emit function records with 0 execution count. Can only be used in
|
|
|
|
conjunction with -instr. Defaults to false, since it can inhibit compiler
|
|
|
|
optimization during PGO.
|
|
|
|
|
2021-12-30 10:37:17 -08:00
|
|
|
.. option:: --num-threads=<N>, -j
|
2016-07-19 01:17:20 +00:00
|
|
|
|
|
|
|
Use N threads to perform profile merging. When N=0, llvm-profdata auto-detects
|
|
|
|
an appropriate number of threads to use. This is the default.
|
|
|
|
|
2021-12-30 10:37:17 -08:00
|
|
|
.. option:: --failure-mode=[any|all]
|
2019-09-03 22:23:16 +00:00
|
|
|
|
|
|
|
Set the failure mode. There are two options: 'any' causes the merge command to
|
|
|
|
fail if any profiles are invalid, and 'all' causes the merge command to fail
|
|
|
|
only if all profiles are invalid. If 'all' is set, information from any
|
|
|
|
invalid profiles is excluded from the final merged product. The default
|
|
|
|
failure mode is 'any'.
|
|
|
|
|
2021-12-30 10:37:17 -08:00
|
|
|
.. option:: --prof-sym-list=<path>
|
2020-05-13 15:11:49 -07:00
|
|
|
|
|
|
|
Specify a file which contains a list of symbols to generate profile symbol
|
|
|
|
list in the profile. This option can only be used with sample-based profile
|
|
|
|
in extbinary format. The entries in this file are newline-separated.
|
|
|
|
|
2021-12-30 10:37:17 -08:00
|
|
|
.. option:: --compress-all-sections=[true|false]
|
2020-05-13 15:11:49 -07:00
|
|
|
|
|
|
|
Compress all sections when writing the profile. This option can only be used
|
|
|
|
with sample-based profile in extbinary format.
|
|
|
|
|
2021-12-30 10:37:17 -08:00
|
|
|
.. option:: --use-md5=[true|false]
|
2020-05-13 15:11:49 -07:00
|
|
|
|
|
|
|
Use MD5 to represent string in name table when writing the profile.
|
|
|
|
This option can only be used with sample-based profile in extbinary format.
|
|
|
|
|
2021-12-30 10:37:17 -08:00
|
|
|
.. option:: --gen-partial-profile=[true|false]
|
2020-05-13 15:11:49 -07:00
|
|
|
|
|
|
|
Mark the profile to be a partial profile which only provides partial profile
|
|
|
|
coverage for the optimized target. This option can only be used with
|
|
|
|
sample-based profile in extbinary format.
|
|
|
|
|
2024-08-28 20:33:54 -04:00
|
|
|
.. option:: --split-layout=[true|false]
|
|
|
|
|
|
|
|
Split the profile data section to two with one containing sample profiles with
|
|
|
|
inlined functions and the other not. This option can only be used with
|
|
|
|
sample-based profile in extbinary format.
|
|
|
|
|
2023-03-19 22:37:01 -07:00
|
|
|
.. option:: --convert-sample-profile-layout=[nest|flat]
|
|
|
|
|
|
|
|
Convert the merged profile into a profile with a new layout. Supported
|
2023-03-30 11:21:30 -07:00
|
|
|
layout are ``nest`` (Nested profile, the input should be CS flat profile) and
|
|
|
|
``flat`` (Profile with nested inlinees flattened out).
|
2023-03-19 22:37:01 -07:00
|
|
|
|
2021-12-30 10:37:17 -08:00
|
|
|
.. option:: --supplement-instr-with-sample=<file>
|
Supplement instr profile with sample profile.
PGO profile is usually more precise than sample profile. However, PGO profile
needs to be collected from loadtest and loadtest may not be representative
enough to the production workload. Sample profile collected from production
can be used as a supplement -- for functions cold in loadtest but warm/hot
in production, we can scale up the related function in PGO profile if the
function is warm or hot in sample profile.
The implementation contains changes in compiler side and llvm-profdata side.
Given an instr profile and a sample profile, for a function cold in PGO
profile but warm/hot in sample profile, llvm-profdata will either mark
all the counters in the profile to be -1 or scale up the max count in the
function to be above hot threshold, depending on the zero counter ratio in
the profile. The assumption is if there are too many counters being zero
in the function profile, the profile is more likely to cause harm than good,
then llvm-profdata will mark all the counters to be -1 indicating the
function is hot but the profile is unaccountable. In compiler side, if a
function profile with all -1 counters is seen, the function entry count will
be set to be above hot threshold but its internal profile will be dropped.
In the long run, it may be useful to let compiler support using PGO profile
and sample profile at the same time, but that requires more careful design
and more substantial changes to make two profiles work seamlessly. The patch
here serves as a simple intermediate solution.
Differential Revision: https://reviews.llvm.org/D81981
2020-07-08 15:19:44 -07:00
|
|
|
|
|
|
|
Supplement an instrumentation profile with sample profile. The sample profile
|
|
|
|
is the input of the flag. Output will be in instrumentation format (only works
|
|
|
|
with -instr).
|
|
|
|
|
2021-12-30 10:37:17 -08:00
|
|
|
.. option:: --zero-counter-threshold=<float>
|
Supplement instr profile with sample profile.
PGO profile is usually more precise than sample profile. However, PGO profile
needs to be collected from loadtest and loadtest may not be representative
enough to the production workload. Sample profile collected from production
can be used as a supplement -- for functions cold in loadtest but warm/hot
in production, we can scale up the related function in PGO profile if the
function is warm or hot in sample profile.
The implementation contains changes in compiler side and llvm-profdata side.
Given an instr profile and a sample profile, for a function cold in PGO
profile but warm/hot in sample profile, llvm-profdata will either mark
all the counters in the profile to be -1 or scale up the max count in the
function to be above hot threshold, depending on the zero counter ratio in
the profile. The assumption is if there are too many counters being zero
in the function profile, the profile is more likely to cause harm than good,
then llvm-profdata will mark all the counters to be -1 indicating the
function is hot but the profile is unaccountable. In compiler side, if a
function profile with all -1 counters is seen, the function entry count will
be set to be above hot threshold but its internal profile will be dropped.
In the long run, it may be useful to let compiler support using PGO profile
and sample profile at the same time, but that requires more careful design
and more substantial changes to make two profiles work seamlessly. The patch
here serves as a simple intermediate solution.
Differential Revision: https://reviews.llvm.org/D81981
2020-07-08 15:19:44 -07:00
|
|
|
|
|
|
|
For the function which is cold in instr profile but hot in sample profile, if
|
2022-05-05 17:49:23 +02:00
|
|
|
the ratio of the number of zero counters divided by the total number of
|
Supplement instr profile with sample profile.
PGO profile is usually more precise than sample profile. However, PGO profile
needs to be collected from loadtest and loadtest may not be representative
enough to the production workload. Sample profile collected from production
can be used as a supplement -- for functions cold in loadtest but warm/hot
in production, we can scale up the related function in PGO profile if the
function is warm or hot in sample profile.
The implementation contains changes in compiler side and llvm-profdata side.
Given an instr profile and a sample profile, for a function cold in PGO
profile but warm/hot in sample profile, llvm-profdata will either mark
all the counters in the profile to be -1 or scale up the max count in the
function to be above hot threshold, depending on the zero counter ratio in
the profile. The assumption is if there are too many counters being zero
in the function profile, the profile is more likely to cause harm than good,
then llvm-profdata will mark all the counters to be -1 indicating the
function is hot but the profile is unaccountable. In compiler side, if a
function profile with all -1 counters is seen, the function entry count will
be set to be above hot threshold but its internal profile will be dropped.
In the long run, it may be useful to let compiler support using PGO profile
and sample profile at the same time, but that requires more careful design
and more substantial changes to make two profiles work seamlessly. The patch
here serves as a simple intermediate solution.
Differential Revision: https://reviews.llvm.org/D81981
2020-07-08 15:19:44 -07:00
|
|
|
counters is above the threshold, the profile of the function will be regarded
|
|
|
|
as being harmful for performance and will be dropped.
|
|
|
|
|
2021-12-30 10:37:17 -08:00
|
|
|
.. option:: --instr-prof-cold-threshold=<int>
|
Supplement instr profile with sample profile.
PGO profile is usually more precise than sample profile. However, PGO profile
needs to be collected from loadtest and loadtest may not be representative
enough to the production workload. Sample profile collected from production
can be used as a supplement -- for functions cold in loadtest but warm/hot
in production, we can scale up the related function in PGO profile if the
function is warm or hot in sample profile.
The implementation contains changes in compiler side and llvm-profdata side.
Given an instr profile and a sample profile, for a function cold in PGO
profile but warm/hot in sample profile, llvm-profdata will either mark
all the counters in the profile to be -1 or scale up the max count in the
function to be above hot threshold, depending on the zero counter ratio in
the profile. The assumption is if there are too many counters being zero
in the function profile, the profile is more likely to cause harm than good,
then llvm-profdata will mark all the counters to be -1 indicating the
function is hot but the profile is unaccountable. In compiler side, if a
function profile with all -1 counters is seen, the function entry count will
be set to be above hot threshold but its internal profile will be dropped.
In the long run, it may be useful to let compiler support using PGO profile
and sample profile at the same time, but that requires more careful design
and more substantial changes to make two profiles work seamlessly. The patch
here serves as a simple intermediate solution.
Differential Revision: https://reviews.llvm.org/D81981
2020-07-08 15:19:44 -07:00
|
|
|
|
|
|
|
User specified cold threshold for instr profile which will override the cold
|
|
|
|
threshold got from profile summary.
|
|
|
|
|
2021-12-30 10:37:17 -08:00
|
|
|
.. option:: --suppl-min-size-threshold=<int>
|
Supplement instr profile with sample profile.
PGO profile is usually more precise than sample profile. However, PGO profile
needs to be collected from loadtest and loadtest may not be representative
enough to the production workload. Sample profile collected from production
can be used as a supplement -- for functions cold in loadtest but warm/hot
in production, we can scale up the related function in PGO profile if the
function is warm or hot in sample profile.
The implementation contains changes in compiler side and llvm-profdata side.
Given an instr profile and a sample profile, for a function cold in PGO
profile but warm/hot in sample profile, llvm-profdata will either mark
all the counters in the profile to be -1 or scale up the max count in the
function to be above hot threshold, depending on the zero counter ratio in
the profile. The assumption is if there are too many counters being zero
in the function profile, the profile is more likely to cause harm than good,
then llvm-profdata will mark all the counters to be -1 indicating the
function is hot but the profile is unaccountable. In compiler side, if a
function profile with all -1 counters is seen, the function entry count will
be set to be above hot threshold but its internal profile will be dropped.
In the long run, it may be useful to let compiler support using PGO profile
and sample profile at the same time, but that requires more careful design
and more substantial changes to make two profiles work seamlessly. The patch
here serves as a simple intermediate solution.
Differential Revision: https://reviews.llvm.org/D81981
2020-07-08 15:19:44 -07:00
|
|
|
|
|
|
|
If the size of a function is smaller than the threshold, assume it can be
|
|
|
|
inlined by PGO early inliner and it will not be adjusted based on sample
|
|
|
|
profile.
|
|
|
|
|
2021-12-30 10:37:17 -08:00
|
|
|
.. option:: --debug-info=<path>
|
2021-12-28 17:31:36 -08:00
|
|
|
|
2021-12-28 18:03:28 -08:00
|
|
|
Specify the executable or ``.dSYM`` that contains debug info for the raw profile.
|
[Profile] Add binary profile correlation for code coverage. (#69493)
## Motivation
Since we don't need the metadata sections at runtime, we can somehow
offload them from memory at runtime. Initially, I explored [debug info
correlation](https://discourse.llvm.org/t/instrprofiling-lightweight-instrumentation/59113),
which is used for PGO with value profiling disabled. However, it
currently only works with DWARF and it's be hard to add such artificial
debug info for every function in to CodeView which is used on Windows.
So, offloading profile metadata sections at runtime seems to be a
platform independent option.
## Design
The idea is to use new section names for profile name and data sections
and mark them as metadata sections. Under this mode, the new sections
are non-SHF_ALLOC in ELF. So, they are not loaded into memory at runtime
and can be stripped away as a post-linking step. After the process
exits, the generated raw profiles will contains only headers + counters.
llvm-profdata can be used correlate raw profiles with the unstripped
binary to generate indexed profile.
## Data
For chromium base_unittests with code coverage on linux, the binary size
overhead due to instrumentation reduced from 64M to 38.8M (39.4%) and
the raw profile files size reduce from 128M to 68M (46.9%)
```
$ bloaty out/cov/base_unittests.stripped -- out/no-cov/base_unittests.stripped
FILE SIZE VM SIZE
-------------- --------------
+121% +30.4Mi +121% +30.4Mi .text
[NEW] +14.6Mi [NEW] +14.6Mi __llvm_prf_data
[NEW] +10.6Mi [NEW] +10.6Mi __llvm_prf_names
[NEW] +5.86Mi [NEW] +5.86Mi __llvm_prf_cnts
+95% +1.75Mi +95% +1.75Mi .eh_frame
+108% +400Ki +108% +400Ki .eh_frame_hdr
+9.5% +211Ki +9.5% +211Ki .rela.dyn
+9.2% +95.0Ki +9.2% +95.0Ki .data.rel.ro
+5.0% +87.3Ki +5.0% +87.3Ki .rodata
[ = ] 0 +13% +47.0Ki .bss
+40% +1.78Ki +40% +1.78Ki .got
+12% +1.49Ki +12% +1.49Ki .gcc_except_table
[ = ] 0 +65% +1.23Ki .relro_padding
+62% +1.20Ki [ = ] 0 [Unmapped]
+13% +448 +19% +448 .init_array
+8.8% +192 [ = ] 0 [ELF Section Headers]
+0.0% +136 +0.0% +80 [7 Others]
+0.1% +96 +0.1% +96 .dynsym
+1.2% +96 +1.2% +96 .rela.plt
+1.5% +80 +1.2% +64 .plt
[ = ] 0 -99.2% -3.68Ki [LOAD #5 [RW]]
+195% +64.0Mi +194% +64.0Mi TOTAL
$ bloaty out/cov-cor/base_unittests.stripped -- out/no-cov/base_unittests.stripped
FILE SIZE VM SIZE
-------------- --------------
+121% +30.4Mi +121% +30.4Mi .text
[NEW] +5.86Mi [NEW] +5.86Mi __llvm_prf_cnts
+95% +1.75Mi +95% +1.75Mi .eh_frame
+108% +400Ki +108% +400Ki .eh_frame_hdr
+9.5% +211Ki +9.5% +211Ki .rela.dyn
+9.2% +95.0Ki +9.2% +95.0Ki .data.rel.ro
+5.0% +87.3Ki +5.0% +87.3Ki .rodata
[ = ] 0 +13% +47.0Ki .bss
+40% +1.78Ki +40% +1.78Ki .got
+12% +1.49Ki +12% +1.49Ki .gcc_except_table
+13% +448 +19% +448 .init_array
+0.1% +96 +0.1% +96 .dynsym
+1.2% +96 +1.2% +96 .rela.plt
+1.2% +64 +1.2% +64 .plt
+2.9% +64 [ = ] 0 [ELF Section Headers]
+0.0% +40 +0.0% +40 .data
+1.2% +32 +1.2% +32 .got.plt
+0.0% +24 +0.0% +8 [5 Others]
[ = ] 0 -22.9% -872 [LOAD #5 [RW]]
-74.5% -1.44Ki [ = ] 0 [Unmapped]
[ = ] 0 -76.5% -1.45Ki .relro_padding
+118% +38.8Mi +117% +38.8Mi TOTAL
```
A few things to note:
1. llvm-profdata doesn't support filter raw profiles by binary id yet,
so when a raw profile doesn't belongs to the binary being digested by
llvm-profdata, merging will fail. Once this is implemented,
llvm-profdata should be able to only merge raw profiles with the same
binary id as the binary and discard the rest (with mismatched/missing
binary id). The workflow I have in mind is to have scripts invoke
llvm-profdata to get all binary ids for all raw profiles, and
selectively choose the raw pnrofiles with matching binary id and the
binary to llvm-profdata for merging.
2. Note: In COFF, currently they are still loaded into memory but not
used. I didn't do it in this patch because I noticed that `.lcovmap` and
`.lcovfunc` are loaded into memory. A separate patch will address it.
3. This should works with PGO when value profiling is disabled as debug
info correlation currently doing, though I haven't tested this yet.
2023-12-14 14:16:38 -05:00
|
|
|
When ``--debug-info-correlate`` or ``--profile-correlate=debug-info`` was used
|
|
|
|
for instrumentation, use this option to correlate the raw profile.
|
|
|
|
|
|
|
|
.. option:: --binary-file=<path>
|
|
|
|
|
|
|
|
Specify the executable that contains profile data and profile name sections for
|
|
|
|
the raw profile. When ``-profile-correlate=binary`` was used for
|
|
|
|
instrumentation, use this option to correlate the raw profile.
|
2021-12-28 17:31:36 -08:00
|
|
|
|
2024-09-06 13:28:23 -07:00
|
|
|
.. option:: --debuginfod
|
|
|
|
|
|
|
|
Use debuginfod to find the associated executables that contain profile data and
|
|
|
|
name sections for the raw profiles to correlate them.
|
|
|
|
When -profile-correlate=binary was used for instrumentation, this option can be
|
|
|
|
used for correlation.
|
|
|
|
|
|
|
|
.. option:: --debug-file-directory=<dir>
|
|
|
|
|
|
|
|
Use provided local directories to search for executables that contain profile
|
|
|
|
data and name sections for the raw profiles to correlate them.
|
|
|
|
When -profile-correlate=binary was used for instrumentation, this option can be
|
|
|
|
used for correlation.
|
|
|
|
|
|
|
|
.. option:: --correlate=<kind>
|
|
|
|
|
|
|
|
Specify the correlation kind (debug_info or binary) to use when -debuginfod or
|
|
|
|
-debug-file-directory=<dir> option is provided.
|
|
|
|
|
2023-04-12 10:21:38 -07:00
|
|
|
.. option:: --temporal-profile-trace-reservoir-size
|
|
|
|
|
|
|
|
The maximum number of temporal profile traces to be stored in the output
|
|
|
|
profile. If more traces are added, we will use reservoir sampling to select
|
|
|
|
which traces to keep. Note that changing this value between different merge
|
|
|
|
invocations on the same indexed profile could result in sample bias. The
|
|
|
|
default value is 100.
|
|
|
|
|
|
|
|
.. option:: --temporal-profile-max-trace-length
|
|
|
|
|
|
|
|
The maximum number of functions in a single temporal profile trace. Longer
|
|
|
|
traces will be truncated. The default value is 1000.
|
2021-12-28 17:31:36 -08:00
|
|
|
|
2024-01-23 16:19:45 -05:00
|
|
|
.. option:: --function=<string>
|
|
|
|
|
|
|
|
Only keep functions matching the regex in the output, all others are erased
|
|
|
|
from the profile.
|
|
|
|
|
|
|
|
.. option:: --no-function=<string>
|
|
|
|
|
|
|
|
Remove functions matching the regex from the profile. If both --function and
|
|
|
|
--no-function are specified and a function matches both, it is removed.
|
|
|
|
|
2015-12-15 17:37:09 +00:00
|
|
|
EXAMPLES
|
|
|
|
^^^^^^^^
|
|
|
|
Basic Usage
|
|
|
|
+++++++++++
|
|
|
|
Merge three profiles:
|
|
|
|
|
|
|
|
::
|
|
|
|
|
|
|
|
llvm-profdata merge foo.profdata bar.profdata baz.profdata -output merged.profdata
|
|
|
|
|
|
|
|
Weighted Input
|
|
|
|
++++++++++++++
|
2021-12-28 18:03:28 -08:00
|
|
|
The input file ``foo.profdata`` is especially important, multiply its counts by 10:
|
2015-12-15 17:37:09 +00:00
|
|
|
|
|
|
|
::
|
|
|
|
|
2021-12-30 10:37:17 -08:00
|
|
|
llvm-profdata merge --weighted-input=10,foo.profdata bar.profdata baz.profdata --output merged.profdata
|
2015-12-15 17:37:09 +00:00
|
|
|
|
|
|
|
Exactly equivalent to the previous invocation (explicit form; useful for programmatic invocation):
|
|
|
|
|
|
|
|
::
|
|
|
|
|
2021-12-30 10:37:17 -08:00
|
|
|
llvm-profdata merge --weighted-input=10,foo.profdata --weighted-input=1,bar.profdata --weighted-input=1,baz.profdata --output merged.profdata
|
2015-12-15 17:37:09 +00:00
|
|
|
|
2014-07-30 20:30:11 +00:00
|
|
|
.. program:: llvm-profdata show
|
|
|
|
|
2015-03-12 01:38:50 +00:00
|
|
|
.. _profdata-show:
|
2014-07-30 20:30:11 +00:00
|
|
|
|
|
|
|
SHOW
|
|
|
|
----
|
|
|
|
|
|
|
|
SYNOPSIS
|
|
|
|
^^^^^^^^
|
|
|
|
|
|
|
|
:program:`llvm-profdata show` [*options*] [*filename*]
|
|
|
|
|
|
|
|
DESCRIPTION
|
|
|
|
^^^^^^^^^^^
|
|
|
|
|
|
|
|
:program:`llvm-profdata show` takes a profile data file and displays the
|
|
|
|
information about the profile counters for this file and
|
|
|
|
for any of the specified function(s).
|
|
|
|
|
|
|
|
If *filename* is omitted or is ``-``, then **llvm-profdata show** reads its
|
|
|
|
input from standard input.
|
|
|
|
|
|
|
|
OPTIONS
|
|
|
|
^^^^^^^
|
|
|
|
|
2021-12-30 10:37:17 -08:00
|
|
|
.. option:: --all-functions
|
2014-07-30 20:30:11 +00:00
|
|
|
|
|
|
|
Print details for every function.
|
|
|
|
|
2022-10-05 01:54:04 +00:00
|
|
|
.. option:: --binary-ids
|
|
|
|
|
|
|
|
Print embedded binary ids in a profile.
|
|
|
|
|
2021-12-30 10:37:17 -08:00
|
|
|
.. option:: --counts
|
2014-07-30 20:30:11 +00:00
|
|
|
|
|
|
|
Print the counter values for the displayed functions.
|
|
|
|
|
2022-10-07 10:29:59 -07:00
|
|
|
.. option:: --show-format=<text|json|yaml>
|
2022-10-03 17:48:50 -07:00
|
|
|
|
|
|
|
Emit output in the selected format if supported by the provided profile type.
|
|
|
|
|
2021-12-30 10:37:17 -08:00
|
|
|
.. option:: --function=<string>
|
2014-07-30 20:30:11 +00:00
|
|
|
|
|
|
|
Print details for a function if the function's name contains the given string.
|
|
|
|
|
2021-12-30 10:37:17 -08:00
|
|
|
.. option:: --help
|
2014-07-30 20:30:11 +00:00
|
|
|
|
|
|
|
Print a summary of command line options.
|
2014-02-17 23:22:49 +00:00
|
|
|
|
2021-12-30 10:37:17 -08:00
|
|
|
.. option:: --output=<output>, -o
|
2014-02-17 23:22:49 +00:00
|
|
|
|
2014-07-30 20:30:11 +00:00
|
|
|
Specify the output file name. If *output* is ``-`` or it isn't specified,
|
|
|
|
then the output is sent to standard output.
|
2014-02-17 23:22:49 +00:00
|
|
|
|
2021-12-30 10:37:17 -08:00
|
|
|
.. option:: --instr (default)
|
2015-05-28 21:57:17 +00:00
|
|
|
|
|
|
|
Specify that the input profile is an instrumentation-based profile.
|
|
|
|
|
2021-12-30 10:37:17 -08:00
|
|
|
.. option:: --text
|
2015-11-23 20:47:38 +00:00
|
|
|
|
|
|
|
Instruct the profile dumper to show profile counts in the text format of the
|
|
|
|
instrumentation-based profile data representation. By default, the profile
|
|
|
|
information is dumped in a more human readable form (also in text) with
|
|
|
|
annotations.
|
|
|
|
|
2021-12-30 10:37:17 -08:00
|
|
|
.. option:: --topn=<n>
|
2019-01-08 22:41:48 +00:00
|
|
|
|
2017-07-11 20:30:43 +00:00
|
|
|
Instruct the profile dumper to show the top ``n`` functions with the
|
|
|
|
hottest basic blocks in the summary section. By default, the topn functions
|
|
|
|
are not dumped.
|
|
|
|
|
2021-12-30 10:37:17 -08:00
|
|
|
.. option:: --sample
|
2015-05-28 21:57:17 +00:00
|
|
|
|
|
|
|
Specify that the input profile is a sample-based profile.
|
|
|
|
|
2021-12-30 10:37:17 -08:00
|
|
|
.. option:: --memop-sizes
|
2017-03-16 21:15:48 +00:00
|
|
|
|
|
|
|
Show the profiled sizes of the memory intrinsic calls for shown functions.
|
|
|
|
|
2021-12-30 10:37:17 -08:00
|
|
|
.. option:: --value-cutoff=<n>
|
2019-01-08 22:41:48 +00:00
|
|
|
|
|
|
|
Show only those functions whose max count values are greater or equal to ``n``.
|
|
|
|
By default, the value-cutoff is set to 0.
|
|
|
|
|
2021-12-30 10:37:17 -08:00
|
|
|
.. option:: --list-below-cutoff
|
2019-01-08 22:41:48 +00:00
|
|
|
|
|
|
|
Only output names of functions whose max count value are below the cutoff
|
|
|
|
value.
|
|
|
|
|
2022-10-05 01:54:04 +00:00
|
|
|
.. option:: --profile-version
|
|
|
|
|
|
|
|
Print profile version.
|
|
|
|
|
2021-12-30 10:37:17 -08:00
|
|
|
.. option:: --showcs
|
2019-04-18 07:11:05 +00:00
|
|
|
|
2019-02-28 19:55:07 +00:00
|
|
|
Only show context sensitive profile counts. The default is to filter all
|
|
|
|
context sensitive profile counts.
|
|
|
|
|
2021-12-30 10:37:17 -08:00
|
|
|
.. option:: --show-prof-sym-list=[true|false]
|
2020-05-13 15:11:49 -07:00
|
|
|
|
|
|
|
Show profile symbol list if it exists in the profile. This option is only
|
|
|
|
meaningful for sample-based profile in extbinary format.
|
|
|
|
|
2021-12-30 10:37:17 -08:00
|
|
|
.. option:: --show-sec-info-only=[true|false]
|
2020-05-13 15:11:49 -07:00
|
|
|
|
|
|
|
Show basic information about each section in the profile. This option is
|
|
|
|
only meaningful for sample-based profile in extbinary format.
|
|
|
|
|
2022-10-06 15:17:30 -07:00
|
|
|
.. option:: --debug-info=<path>
|
|
|
|
|
|
|
|
Specify the executable or ``.dSYM`` that contains debug info for the raw profile.
|
[Profile] Add binary profile correlation for code coverage. (#69493)
## Motivation
Since we don't need the metadata sections at runtime, we can somehow
offload them from memory at runtime. Initially, I explored [debug info
correlation](https://discourse.llvm.org/t/instrprofiling-lightweight-instrumentation/59113),
which is used for PGO with value profiling disabled. However, it
currently only works with DWARF and it's be hard to add such artificial
debug info for every function in to CodeView which is used on Windows.
So, offloading profile metadata sections at runtime seems to be a
platform independent option.
## Design
The idea is to use new section names for profile name and data sections
and mark them as metadata sections. Under this mode, the new sections
are non-SHF_ALLOC in ELF. So, they are not loaded into memory at runtime
and can be stripped away as a post-linking step. After the process
exits, the generated raw profiles will contains only headers + counters.
llvm-profdata can be used correlate raw profiles with the unstripped
binary to generate indexed profile.
## Data
For chromium base_unittests with code coverage on linux, the binary size
overhead due to instrumentation reduced from 64M to 38.8M (39.4%) and
the raw profile files size reduce from 128M to 68M (46.9%)
```
$ bloaty out/cov/base_unittests.stripped -- out/no-cov/base_unittests.stripped
FILE SIZE VM SIZE
-------------- --------------
+121% +30.4Mi +121% +30.4Mi .text
[NEW] +14.6Mi [NEW] +14.6Mi __llvm_prf_data
[NEW] +10.6Mi [NEW] +10.6Mi __llvm_prf_names
[NEW] +5.86Mi [NEW] +5.86Mi __llvm_prf_cnts
+95% +1.75Mi +95% +1.75Mi .eh_frame
+108% +400Ki +108% +400Ki .eh_frame_hdr
+9.5% +211Ki +9.5% +211Ki .rela.dyn
+9.2% +95.0Ki +9.2% +95.0Ki .data.rel.ro
+5.0% +87.3Ki +5.0% +87.3Ki .rodata
[ = ] 0 +13% +47.0Ki .bss
+40% +1.78Ki +40% +1.78Ki .got
+12% +1.49Ki +12% +1.49Ki .gcc_except_table
[ = ] 0 +65% +1.23Ki .relro_padding
+62% +1.20Ki [ = ] 0 [Unmapped]
+13% +448 +19% +448 .init_array
+8.8% +192 [ = ] 0 [ELF Section Headers]
+0.0% +136 +0.0% +80 [7 Others]
+0.1% +96 +0.1% +96 .dynsym
+1.2% +96 +1.2% +96 .rela.plt
+1.5% +80 +1.2% +64 .plt
[ = ] 0 -99.2% -3.68Ki [LOAD #5 [RW]]
+195% +64.0Mi +194% +64.0Mi TOTAL
$ bloaty out/cov-cor/base_unittests.stripped -- out/no-cov/base_unittests.stripped
FILE SIZE VM SIZE
-------------- --------------
+121% +30.4Mi +121% +30.4Mi .text
[NEW] +5.86Mi [NEW] +5.86Mi __llvm_prf_cnts
+95% +1.75Mi +95% +1.75Mi .eh_frame
+108% +400Ki +108% +400Ki .eh_frame_hdr
+9.5% +211Ki +9.5% +211Ki .rela.dyn
+9.2% +95.0Ki +9.2% +95.0Ki .data.rel.ro
+5.0% +87.3Ki +5.0% +87.3Ki .rodata
[ = ] 0 +13% +47.0Ki .bss
+40% +1.78Ki +40% +1.78Ki .got
+12% +1.49Ki +12% +1.49Ki .gcc_except_table
+13% +448 +19% +448 .init_array
+0.1% +96 +0.1% +96 .dynsym
+1.2% +96 +1.2% +96 .rela.plt
+1.2% +64 +1.2% +64 .plt
+2.9% +64 [ = ] 0 [ELF Section Headers]
+0.0% +40 +0.0% +40 .data
+1.2% +32 +1.2% +32 .got.plt
+0.0% +24 +0.0% +8 [5 Others]
[ = ] 0 -22.9% -872 [LOAD #5 [RW]]
-74.5% -1.44Ki [ = ] 0 [Unmapped]
[ = ] 0 -76.5% -1.45Ki .relro_padding
+118% +38.8Mi +117% +38.8Mi TOTAL
```
A few things to note:
1. llvm-profdata doesn't support filter raw profiles by binary id yet,
so when a raw profile doesn't belongs to the binary being digested by
llvm-profdata, merging will fail. Once this is implemented,
llvm-profdata should be able to only merge raw profiles with the same
binary id as the binary and discard the rest (with mismatched/missing
binary id). The workflow I have in mind is to have scripts invoke
llvm-profdata to get all binary ids for all raw profiles, and
selectively choose the raw pnrofiles with matching binary id and the
binary to llvm-profdata for merging.
2. Note: In COFF, currently they are still loaded into memory but not
used. I didn't do it in this patch because I noticed that `.lcovmap` and
`.lcovfunc` are loaded into memory. A separate patch will address it.
3. This should works with PGO when value profiling is disabled as debug
info correlation currently doing, though I haven't tested this yet.
2023-12-14 14:16:38 -05:00
|
|
|
When ``--debug-info-correlate`` or ``--profile-correlate=debug-info`` was used
|
|
|
|
for instrumentation, use this option to show the correlated functions from the
|
|
|
|
raw profile.
|
2022-10-06 15:17:30 -07:00
|
|
|
|
|
|
|
.. option:: --covered
|
|
|
|
|
|
|
|
Show only the functions that have been executed, i.e., functions with non-zero
|
|
|
|
counts.
|
|
|
|
|
2019-04-30 21:19:12 +00:00
|
|
|
.. program:: llvm-profdata overlap
|
|
|
|
|
|
|
|
.. _profdata-overlap:
|
|
|
|
|
|
|
|
OVERLAP
|
|
|
|
-------
|
|
|
|
|
|
|
|
SYNOPSIS
|
|
|
|
^^^^^^^^
|
|
|
|
|
|
|
|
:program:`llvm-profdata overlap` [*options*] [*base profile file*] [*test profile file*]
|
|
|
|
|
|
|
|
DESCRIPTION
|
|
|
|
^^^^^^^^^^^
|
|
|
|
|
|
|
|
:program:`llvm-profdata overlap` takes two profile data files and displays the
|
|
|
|
*overlap* of counter distribution between the whole files and between any of the
|
|
|
|
specified functions.
|
|
|
|
|
|
|
|
In this command, *overlap* is defined as follows:
|
|
|
|
Suppose *base profile file* has the following counts:
|
|
|
|
{c1_1, c1_2, ..., c1_n, c1_u_1, c2_u_2, ..., c2_u_s},
|
|
|
|
and *test profile file* has
|
|
|
|
{c2_1, c2_2, ..., c2_n, c2_v_1, c2_v_2, ..., c2_v_t}.
|
|
|
|
Here c{1|2}_i (i = 1 .. n) are matched counters and c1_u_i (i = 1 .. s) and
|
|
|
|
c2_v_i (i = 1 .. v) are unmatched counters (or counters only existing in)
|
|
|
|
*base profile file* and *test profile file*, respectively.
|
|
|
|
Let sum_1 = c1_1 + c1_2 + ... + c1_n + c1_u_1 + c2_u_2 + ... + c2_u_s, and
|
|
|
|
sum_2 = c2_1 + c2_2 + ... + c2_n + c2_v_1 + c2_v_2 + ... + c2_v_t.
|
|
|
|
*overlap* = min(c1_1/sum_1, c2_1/sum_2) + min(c1_2/sum_1, c2_2/sum_2) + ...
|
2019-04-30 22:35:35 +00:00
|
|
|
+ min(c1_n/sum_1, c2_n/sum_2).
|
2019-04-30 21:19:12 +00:00
|
|
|
|
|
|
|
The result overlap distribution is a percentage number, ranging from 0.0% to
|
|
|
|
100.0%, where 0.0% means there is no overlap and 100.0% means a perfect
|
|
|
|
overlap.
|
|
|
|
|
|
|
|
Here is an example, if *base profile file* has counts of {400, 600}, and
|
|
|
|
*test profile file* has matched counts of {60000, 40000}. The *overlap* is 80%.
|
|
|
|
|
|
|
|
OPTIONS
|
|
|
|
^^^^^^^
|
|
|
|
|
2021-12-30 10:37:17 -08:00
|
|
|
.. option:: --function=<string>
|
2019-04-30 21:19:12 +00:00
|
|
|
|
|
|
|
Print details for a function if the function's name contains the given string.
|
|
|
|
|
2021-12-30 10:37:17 -08:00
|
|
|
.. option:: --help
|
2019-04-30 21:19:12 +00:00
|
|
|
|
|
|
|
Print a summary of command line options.
|
|
|
|
|
2021-12-30 10:37:17 -08:00
|
|
|
.. option:: --output=<output>, -o
|
2019-04-30 21:19:12 +00:00
|
|
|
|
|
|
|
Specify the output file name. If *output* is ``-`` or it isn't specified,
|
|
|
|
then the output is sent to standard output.
|
|
|
|
|
2021-12-30 10:37:17 -08:00
|
|
|
.. option:: --value-cutoff=<n>
|
2019-04-30 21:19:12 +00:00
|
|
|
|
|
|
|
Show only those functions whose max count values are greater or equal to ``n``.
|
|
|
|
By default, the value-cutoff is set to max of unsigned long long.
|
|
|
|
|
2021-12-30 10:37:17 -08:00
|
|
|
.. option:: --cs
|
2019-04-30 21:19:12 +00:00
|
|
|
|
|
|
|
Only show overlap for the context sensitive profile counts. The default is to show
|
|
|
|
non-context sensitive profile counts.
|
|
|
|
|
2023-06-06 11:43:36 -07:00
|
|
|
.. program:: llvm-profdata order
|
|
|
|
|
|
|
|
.. _profdata-order:
|
|
|
|
|
|
|
|
ORDER
|
|
|
|
-------
|
|
|
|
|
|
|
|
SYNOPSIS
|
|
|
|
^^^^^^^^
|
|
|
|
|
|
|
|
:program:`llvm-profdata order` [*options*] [*filename*]
|
|
|
|
|
|
|
|
DESCRIPTION
|
|
|
|
^^^^^^^^^^^
|
|
|
|
|
|
|
|
:program:`llvm-profdata order` uses temporal profiling traces from a profile and
|
|
|
|
finds a function order that reduces the number of page faults for those traces.
|
|
|
|
This output can be directly passed to ``lld`` via ``--symbol-ordering-file=``
|
|
|
|
for ELF or ``-order-file`` for Mach-O. If the traces found in the profile are
|
|
|
|
representative of the real world, then this order should improve startup
|
|
|
|
performance.
|
|
|
|
|
|
|
|
OPTIONS
|
|
|
|
^^^^^^^
|
|
|
|
|
|
|
|
.. option:: --help
|
|
|
|
|
|
|
|
Print a summary of command line options.
|
|
|
|
|
|
|
|
.. option:: --output=<output>, -o
|
|
|
|
|
|
|
|
Specify the output file name. If *output* is ``-`` or it isn't specified,
|
|
|
|
then the output is sent to standard output.
|
|
|
|
|
2014-02-17 23:22:49 +00:00
|
|
|
EXIT STATUS
|
|
|
|
-----------
|
|
|
|
|
2014-07-30 20:30:11 +00:00
|
|
|
:program:`llvm-profdata` returns 1 if the command is omitted or is invalid,
|
|
|
|
if it cannot read input files, or if there is a mismatch between their data.
|