mirror of
https://github.com/llvm/llvm-project.git
synced 2025-04-26 07:16:07 +00:00

This is a follow-up to the profile format change in https://github.com/llvm/llvm-project/pull/82711
532 lines
22 KiB
ReStructuredText
532 lines
22 KiB
ReStructuredText
===================================
|
|
Instrumentation Profile Format
|
|
===================================
|
|
|
|
.. contents::
|
|
:local:
|
|
|
|
|
|
Overview
|
|
=========
|
|
|
|
Clang supports two types of profiling via instrumentation [1]_: frontend-based
|
|
and IR-based, and both could support a variety of use cases [2]_ .
|
|
This document describes two binary serialization formats (raw and indexed) to
|
|
store instrumented profiles with a specific emphasis on IRPGO use case, in the
|
|
sense that when specific header fields and payload sections have different ways
|
|
of interpretation across use cases, the documentation is based on IRPGO.
|
|
|
|
.. note::
|
|
Frontend-generated profiles are used together with coverage mapping for
|
|
`source-based code coverage`_. The `coverage mapping format`_ is different from
|
|
profile format.
|
|
|
|
.. _`source-based code coverage`: https://clang.llvm.org/docs/SourceBasedCodeCoverage.html
|
|
.. _`coverage mapping format`: https://llvm.org/docs/CoverageMappingFormat.html
|
|
|
|
Raw Profile Format
|
|
===================
|
|
|
|
The raw profile is generated by running the instrumented binary. The raw profile
|
|
data from an executable or a shared library [3]_ consists of a header and
|
|
multiple sections, with each section as a memory dump. The raw profile data needs
|
|
to be reasonably compact and fast to generate.
|
|
|
|
There are no backward or forward version compatiblity guarantees for the raw profile
|
|
format. That is, compilers and tools `require`_ a specific raw profile version
|
|
to parse the profiles.
|
|
|
|
.. _`require`: https://github.com/llvm/llvm-project/blob/bffdde8b8e5d9a76a47949cd0f574f3ce656e181/llvm/lib/ProfileData/InstrProfReader.cpp#L551-L558
|
|
|
|
To feed profiles back into compilers for an optimized build (e.g., via
|
|
``-fprofile-use`` for IR instrumentation), a raw profile must to be converted into
|
|
indexed format.
|
|
|
|
General Storage Layout
|
|
-----------------------
|
|
|
|
The storage layout of raw profile data format is illustrated below. Basically,
|
|
when the raw profile is read into an memory buffer, the actual byte offset of a
|
|
section is inferred from the section's order in the layout and size information
|
|
of all the sections ahead of it.
|
|
|
|
::
|
|
|
|
+----+-----------------------+
|
|
| | Magic |
|
|
| +-----------------------+
|
|
| | Version |
|
|
| +-----------------------+
|
|
H | Size Info for |
|
|
E | Section 1 |
|
|
A +-----------------------+
|
|
D | Size Info for |
|
|
E | Section 2 |
|
|
R +-----------------------+
|
|
| | ... |
|
|
| +-----------------------+
|
|
| | Size Info for |
|
|
| | Section N |
|
|
+----+-----------------------+
|
|
P | Section 1 |
|
|
A +-----------------------+
|
|
Y | Section 2 |
|
|
L +-----------------------+
|
|
O | ... |
|
|
A +-----------------------+
|
|
D | Section N |
|
|
+----+-----------------------+
|
|
|
|
|
|
.. note::
|
|
Sections might be padded to meet specific alignment requirements. For
|
|
simplicity, header fields and data sections solely for padding purpose are
|
|
omitted in the data layout graph above and the rest of this document.
|
|
|
|
Header
|
|
-------
|
|
|
|
``Magic``
|
|
Magic number encodes profile format (raw, indexed or text). For the raw format,
|
|
the magic number also encodes the endianness (big or little) and C pointer
|
|
size (4 or 8 bytes) of the platform on which the profile is generated.
|
|
|
|
A factory method reads the magic number to construct reader properly and returns
|
|
error upon unrecognized format. Specifically, the factory method and raw profile
|
|
reader implementation make sure that a raw profile file could be read back on
|
|
a platform with the opposite endianness and/or the other C pointer size.
|
|
|
|
``Version``
|
|
The lower 32 bits specify the actual version and the most significant 32 bits
|
|
specify the variant types of the profile. IR-based instrumentation PGO and
|
|
context-sensitive IR-based instrumentation PGO are two variant types.
|
|
|
|
``BinaryIdsSize``
|
|
The byte size of `binary id`_ section.
|
|
|
|
``NumData``
|
|
The number of profile metadata. The byte size of `profile metadata`_ section
|
|
could be computed with this field.
|
|
|
|
``NumCounter``
|
|
The number of entries in the profile counter section. The byte size of `counter`_
|
|
section could be computed with this field.
|
|
|
|
``NumBitmapBytes``
|
|
The number of bytes in the profile `bitmap`_ section.
|
|
|
|
``NamesSize``
|
|
The number of bytes in the name section.
|
|
|
|
.. _`CountersDelta`:
|
|
|
|
``CountersDelta``
|
|
This field records the in-memory address difference between the `profile metadata`_
|
|
and counter section in the instrumented binary, i.e., ``start(__llvm_prf_cnts) - start(__llvm_prf_data)``.
|
|
|
|
It's used jointly with the `CounterPtr`_ field to compute the counter offset
|
|
relative to ``start(__llvm_prf_cnts)``. Check out calculation-of-counter-offset_
|
|
for a visualized explanation.
|
|
|
|
.. note::
|
|
The ``__llvm_prf_data`` object file section might not be loaded into memory
|
|
when instrumented binary runs or might not get generated in the instrumented
|
|
binary in the first place. In those cases, ``CountersDelta`` is not used and
|
|
other mechanisms are used to match counters with instrumented code. See
|
|
`lightweight instrumentation`_ and `binary profile correlation`_ for examples.
|
|
|
|
``BitmapDelta``
|
|
This field records the in-memory address difference between the `profile metadata`_
|
|
and bitmap section in the instrumented binary, i.e., ``start(__llvm_prf_bits) - start(__llvm_prf_data)``.
|
|
|
|
It's used jointly with the `BitmapPtr`_ to find the bitmap of a profile data
|
|
record, in a similar way to how counters are referenced as explained by
|
|
calculation-of-counter-offset_ .
|
|
|
|
Similar to `CountersDelta`_ field, this field may not be used in non-PGO variants
|
|
of profiles.
|
|
|
|
``NamesDelta``
|
|
Records the in-memory address of name section. Not used except for raw profile
|
|
reader error checking.
|
|
|
|
``NumVTables``
|
|
Records the number of instrumented vtable entries in the binary. Used for
|
|
`type profiling`_.
|
|
|
|
``VNamesSize``
|
|
Records the byte size in the virtual table names section. Used for `type profiling`_.
|
|
|
|
``ValueKindLast``
|
|
Records the number of value kinds. Macro `VALUE_PROF_KIND`_ defines the value
|
|
kinds with a description of the kind.
|
|
|
|
.. _`VALUE_PROF_KIND`: https://github.com/llvm/llvm-project/blob/7e405eb722e40c79b7726201d0f76b5dab34ba0f/compiler-rt/include/profile/InstrProfData.inc#L184-L186
|
|
|
|
Payload Sections
|
|
------------------
|
|
|
|
Binary Ids
|
|
^^^^^^^^^^^
|
|
Stores the binary ids of the instrumented binaries to associate binaries with
|
|
profiles for source code coverage. See `binary id`_ RFC for the design.
|
|
|
|
.. _`profile metadata`:
|
|
|
|
Profile Metadata
|
|
^^^^^^^^^^^^^^^^^^
|
|
|
|
This section stores the metadata to map counters and value profiles back to
|
|
instrumented code regions (e.g., LLVM IR for IRPGO).
|
|
|
|
The in-memory representation of the metadata is `__llvm_profile_data`_.
|
|
Some fields are used to reference data from other sections in the profile.
|
|
The fields are documented as follows:
|
|
|
|
.. _`__llvm_profile_data`: https://github.com/llvm/llvm-project/blob/7c3b67d2038cfb48a80299089f6a1308eee1df7f/compiler-rt/include/profile/InstrProfData.inc#L65-L95
|
|
|
|
``NameRef``
|
|
The MD5 of the function's PGO name. PGO name has the format
|
|
``[<filepath><delimiter>]<mangled-name>`` where ``<filepath>`` and
|
|
``<delimiter>`` are provided for local-linkage functions to tell possibly
|
|
identical functions.
|
|
|
|
.. _FuncHash:
|
|
|
|
``FuncHash``
|
|
A checksum of the function's IR, taking control flow graph and instrumented
|
|
value sites into accounts. See `computeCFGHash`_ for details.
|
|
|
|
.. _`computeCFGHash`: https://github.com/llvm/llvm-project/blob/7c3b67d2038cfb48a80299089f6a1308eee1df7f/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp#L616-L685
|
|
|
|
.. _`CounterPtr`:
|
|
|
|
``CounterPtr``
|
|
The in-memory address difference between profile data and the start of corresponding
|
|
counters. Counter position is stored this way (as a link-time constant) to reduce
|
|
instrumented binary size compared with snapshotting the address of symbols directly.
|
|
See `commit a1532ed`_ for further information.
|
|
|
|
.. _`commit a1532ed`: https://github.com/llvm/llvm-project/commit/a1532ed27582038e2d9588108ba0fe8237f01844
|
|
|
|
.. note::
|
|
``CounterPtr`` might represent a different value for non-IRPGO use case. For
|
|
example, for `binary profile correlation`_, it represents the absolute address of counter.
|
|
When in doubt, check source code.
|
|
|
|
.. _`BitmapPtr`:
|
|
|
|
``BitmapPtr``
|
|
The in-memory address difference between profile data and the start address of
|
|
corresponding bitmap.
|
|
|
|
.. note::
|
|
Similar to `CounterPtr`_, this field may represent a different value for non-IRPGO use case.
|
|
|
|
``FunctionPointer``
|
|
Records the function address when instrumented binary runs. This is used to
|
|
map the profiled callee address of indirect calls to the ``NameRef`` during
|
|
conversion from raw to indexed profiles.
|
|
|
|
``Values``
|
|
Represents value profiles in a two dimensional array. The number of elements
|
|
in the first dimension is the number of instrumented value sites across all
|
|
kinds. Each element in the first dimension is the head of a linked list, and
|
|
the each element in the second dimension is linked list element, carrying
|
|
``<profiled-value, count>`` as payload. This is used by compiler runtime when
|
|
writing out value profiles.
|
|
|
|
.. note::
|
|
Value profiling is supported by frontend and IR PGO instrumentation,
|
|
but it's not supported in all cases (e.g., `lightweight instrumentation`_).
|
|
|
|
``NumCounters``
|
|
The number of counters for the instrumented function.
|
|
|
|
``NumValueSites``
|
|
This is an array of counters, and each counter represents the number of
|
|
instrumented sites for a kind of value in the function.
|
|
|
|
``NumBitmapBytes``
|
|
The number of bitmap bytes for the function.
|
|
|
|
.. _`counter`:
|
|
|
|
Profile Counters
|
|
^^^^^^^^^^^^^^^^^
|
|
|
|
For PGO [4]_, the counters within an instrumented function of a specific `FuncHash`_
|
|
are stored contiguously and in an order that is consistent with instrumentation points selection.
|
|
|
|
.. _calculation-of-counter-offset:
|
|
|
|
As mentioned above, the recorded counter offset is relative to the profile metadata.
|
|
So how are function counters located in the raw profile data?
|
|
|
|
Basically, the profile reader iterates profile metadata (from the `profile metadata`_
|
|
section) and makes use of the recorded relative distances, as illustrated below.
|
|
|
|
::
|
|
|
|
+ --> start(__llvm_prf_data) --> +---------------------+ ------------+
|
|
| | Data 1 | |
|
|
| +---------------------+ =====|| |
|
|
| | Data 2 | || |
|
|
| +---------------------+ || |
|
|
| | ... | || |
|
|
Counter| +---------------------+ || |
|
|
Delta | | Data N | || |
|
|
| +---------------------+ || | CounterPtr1
|
|
| || |
|
|
| CounterPtr2 || |
|
|
| || |
|
|
| || |
|
|
+ --> start(__llvm_prf_cnts) --> +---------------------+ || |
|
|
| ... | || |
|
|
+---------------------+ -----||----+
|
|
| Counter for | ||
|
|
| Data 1 | ||
|
|
+---------------------+ ||
|
|
| ... | ||
|
|
+---------------------+ =====||
|
|
| Counter for |
|
|
| Data 2 |
|
|
+---------------------+
|
|
| ... |
|
|
+---------------------+
|
|
| Counter for |
|
|
| Data N |
|
|
+---------------------+
|
|
|
|
|
|
In the graph,
|
|
|
|
* The profile header records ``CounterDelta`` with the value as ``start(__llvm_prf_cnts) - start(__llvm_prf_data)``.
|
|
We will call it ``CounterDeltaInitVal`` below for convenience.
|
|
* For each profile data record ``ProfileDataN``, ``CounterPtr`` is recorded as
|
|
``start(CounterN) - start(ProfileDataN)``, where ``ProfileDataN`` is the N-th
|
|
entry in ``__llvm_prf_data``, and ``CounterN`` represents the corresponding
|
|
profile counters.
|
|
|
|
Each time the reader advances to the next data record, it `updates`_ ``CounterDelta``
|
|
to minus the size of one ``ProfileData``.
|
|
|
|
.. _`updates`: https://github.com/llvm/llvm-project/blob/17ff25a58ee4f29816d932fdb75f0d305718069f/llvm/include/llvm/ProfileData/InstrProfReader.h#L439-L444
|
|
|
|
For the counter corresponding to the first data record, the byte offset
|
|
relative to the start of the counter section is calculated as ``CounterPtr1 - CounterDeltaInitVal``.
|
|
When profile reader advances to the second data record, note ``CounterDelta``
|
|
is updated to ``CounterDeltaInitVal - sizeof(ProfileData)``.
|
|
Thus the byte offset relative to the start of the counter section is calculated
|
|
as ``CounterPtr2 - (CounterDeltaInitVal - sizeof(ProfileData))``.
|
|
|
|
.. _`bitmap`:
|
|
|
|
Bitmap
|
|
^^^^^^^
|
|
This section is used for source-based `Modified Condition/Decision Coverage`_ code coverage. Check out `Bitmap RFC`_
|
|
for the design.
|
|
|
|
.. _`Modified Condition/Decision Coverage`: https://en.wikipedia.org/wiki/Modified_condition/decision_coverage
|
|
.. _`Bitmap RFC`: https://discourse.llvm.org/t/rfc-source-based-mc-dc-code-coverage/59244
|
|
|
|
.. _`function names`:
|
|
|
|
Names
|
|
^^^^^^
|
|
|
|
This section contains possibly compressed concatenated string of functions' PGO
|
|
names. If compressed, zlib library is used.
|
|
|
|
Function names serve as keys in the PGO data hash table when raw profiles are
|
|
converted into indexed profiles. They are also crucial for ``llvm-profdata`` to
|
|
show the profiles in a human-readable way.
|
|
|
|
Virtual Table Profile Data
|
|
^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
|
|
|
This section is used for `type profiling`_. Each entry corresponds to one virtual
|
|
table and is defined by the following C++ struct
|
|
|
|
.. code-block:: c++
|
|
|
|
struct VTableProfData {
|
|
// The start address of the vtable, collected at runtime.
|
|
uint64_t StartAddress;
|
|
// The byte size of the vtable. `StartAddress` and `ByteSize` specifies an address range to look up.
|
|
uint32_t ByteSize;
|
|
// The hash of vtable's (PGO) name
|
|
uint64_t MD5HashOfName;
|
|
};
|
|
|
|
At profile use time, the compiler looks up a profiled address in the sorted vtable
|
|
address ranges and maps the address to a specific vtable through hashed name.
|
|
|
|
Virtual Table Names
|
|
^^^^^^^^^^^^^^^^^^^^
|
|
|
|
This section is similar to `function names`_ section above, except it contains the PGO
|
|
names of profiled virtual tables. It's a standalone section such that raw profile
|
|
readers could directly find each name set by accessing the corresponding profile
|
|
data section.
|
|
|
|
This section is stored in raw profiles such that `llvm-profdata` could show the
|
|
profiles in a human-readable way.
|
|
|
|
Value Profile Data
|
|
^^^^^^^^^^^^^^^^^^^^
|
|
|
|
This section contains the profile data for value profiling.
|
|
|
|
The value profiles corresponding to a profile metadata are serialized contiguously
|
|
as one record, and value profile records are stored in the same order as the
|
|
respective profile data, such that a raw profile reader `advances`_ the pointer to
|
|
profile data and the pointer to value profile records simutaneously [5]_ to find
|
|
value profiles for a per function, per `FuncHash`_ profile data.
|
|
|
|
.. _`advances`: https://github.com/llvm/llvm-project/blob/7e15fa9161eda7497a5d6abf0d951a1d12d86550/llvm/include/llvm/ProfileData/InstrProfReader.h#L456-L457
|
|
|
|
Indexed Profile Format
|
|
===========================
|
|
|
|
Indexed profiles are generated from ``llvm-profdata``. In the indexed profiles,
|
|
function data are organized as on-disk hash table such that compilers can
|
|
look up profile data for functions in an IR module.
|
|
|
|
Compilers and tools must retain backward compatibility with indexed profiles.
|
|
That is, a tool or a compiler built at newer versions of code must understand
|
|
profiles generated by older tools or compilers.
|
|
|
|
General Storage Layout
|
|
-----------------------
|
|
|
|
The ASCII art depicts the general storage layout of indexed profiles.
|
|
Specifically, the indexed profile header describes the byte offset of individual
|
|
payload sections.
|
|
|
|
::
|
|
|
|
+-----------------------+---+
|
|
| Magic | |
|
|
+-----------------------+ |
|
|
| Version | |
|
|
+-----------------------+ |
|
|
| HashType | H
|
|
+-----------------------+ E
|
|
| Byte Offset | A
|
|
+------ | of section A | D
|
|
| +-----------------------+ E
|
|
| | Byte Of fset | R
|
|
+-----------| of section B | |
|
|
| | +-----------------------+ |
|
|
| | | ... | |
|
|
| | +-----------------------+ |
|
|
| | | Byte Offset | |
|
|
+---------------| of section Z | |
|
|
| | | +-----------------------+---+
|
|
| | | | Profile Summary | |
|
|
| | | +-----------------------+ P
|
|
| | +------>| Section A | A
|
|
| | +-----------------------+ Y
|
|
| +---------->| Section B | L
|
|
| +-----------------------+ O
|
|
| | ... | A
|
|
| +-----------------------+ D
|
|
+-------------->| Section Z | |
|
|
+-----------------------+---+
|
|
|
|
.. note::
|
|
|
|
Profile summary section is at the beginning of payload. It's right after the
|
|
header so its position is implicitly known after reading the header.
|
|
|
|
Header
|
|
--------
|
|
|
|
The `Header struct`_ is the source of truth and struct fields should explain
|
|
what's in the header. At a high level, `*Offset` fields record section byte
|
|
offsets, which are used by readers to locate interesting sections and skip
|
|
uninteresting ones.
|
|
|
|
.. note::
|
|
|
|
To maintain backward compatibility of the indexed profiles, existing fields
|
|
shouldn't be deleted from struct definition; the field order shouldn't be
|
|
modified. New fields should be appended.
|
|
|
|
.. _`Header struct`: https://github.com/llvm/llvm-project/blob/1a2960bab6381f2b288328e2371829b460ac020c/llvm/include/llvm/ProfileData/InstrProf.h#L1053-L1080
|
|
|
|
|
|
Payload Sections
|
|
------------------
|
|
|
|
(CS) Profile Summary
|
|
^^^^^^^^^^^^^^^^^^^^^
|
|
This section is right after profile header. It stores the serialized profile
|
|
summary. For context-sensitive IR-based instrumentation PGO, this section stores
|
|
an additional profile summary corresponding to the context-sensitive profiles.
|
|
|
|
.. _`function data`:
|
|
|
|
Function data
|
|
^^^^^^^^^^^^^^^^^^
|
|
This section stores functions and their profiling data as an on-disk hash table.
|
|
Profile data for functions with the same name are grouped together and share one
|
|
hash table entry (the functions may come from different shared libraries for
|
|
instance). The profile data for them are organized as a sequence of key-value
|
|
pair where the key is `FuncHash`_, and the value is profiled information (represented
|
|
by `InstrProfRecord`_) for the function.
|
|
|
|
.. _`InstrProfRecord`: https://github.com/llvm/llvm-project/blob/7e405eb722e40c79b7726201d0f76b5dab34ba0f/llvm/include/llvm/ProfileData/InstrProf.h#L693
|
|
|
|
MemProf Profile data
|
|
^^^^^^^^^^^^^^^^^^^^^^
|
|
This section stores function's memory profiling data. See
|
|
`MemProf binary serialization format RFC`_ for the design.
|
|
|
|
.. _`MemProf binary serialization format RFC`: https://lists.llvm.org/pipermail/llvm-dev/2021-September/153007.html
|
|
|
|
Binary Ids
|
|
^^^^^^^^^^^^^^^^^^^^^^
|
|
The section is used to carry on `binary id`_ information from raw profiles.
|
|
|
|
Temporal Profile Traces
|
|
^^^^^^^^^^^^^^^^^^^^^^^^
|
|
The section is used to carry on temporal profile information from raw profiles.
|
|
See `temporal profiling`_ for the design.
|
|
|
|
Virtual Table Names
|
|
^^^^^^^^^^^^^^^^^^^^
|
|
This section is used to store the names of vtables from raw profile in the indexed
|
|
profile.
|
|
|
|
Unlike function names which are stored as keys of `function data`_ hash table,
|
|
vtable names need to be stored in a standalone section in indexed profiles.
|
|
This way, `llvm-profdata` could show the profiled vtable information in a
|
|
human-readable way.
|
|
|
|
Profile Data Usage
|
|
=======================================
|
|
|
|
``llvm-profdata`` is the command line tool to display and process instrumentation-
|
|
based profile data. For supported usages, check out `llvm-profdata documentation <https://llvm.org/docs/CommandGuide/llvm-profdata.html>`_.
|
|
|
|
.. [1] For usage, see https://clang.llvm.org/docs/UsersManual.html#profiling-with-instrumentation
|
|
.. [2] For example, IR-based instrumentation supports `lightweight instrumentation`_
|
|
and `temporal profiling`_. Frontend instrumentation could support `single-byte counters`_.
|
|
.. [3] A raw profile file could contain the concatenation of multiple raw
|
|
profiles, for example, from an executable and its shared libraries. Raw
|
|
profile reader could parse all raw profiles from the file correctly.
|
|
.. [4] The counter section is used by a few variant types (like temporal
|
|
profiling) and might have different semantics there.
|
|
.. [5] The step size of data pointer is the ``sizeof(ProfileData)``, and the step
|
|
size of value profile pointer is calcuated based on the number of collected
|
|
values.
|
|
|
|
.. _`lightweight instrumentation`: https://groups.google.com/g/llvm-dev/c/r03Z6JoN7d4
|
|
.. _`temporal profiling`: https://discourse.llvm.org/t/rfc-temporal-profiling-extension-for-irpgo/68068
|
|
.. _`single-byte counters`: https://discourse.llvm.org/t/rfc-single-byte-counters-for-source-based-code-coverage/75685
|
|
.. _`binary profile correlation`: https://discourse.llvm.org/t/rfc-add-binary-profile-correlation-to-not-load-profile-metadata-sections-into-memory-at-runtime/74565
|
|
.. _`binary id`: https://lists.llvm.org/pipermail/llvm-dev/2021-June/151154.html
|
|
.. _`type profiling`: https://discourse.llvm.org/t/rfc-dynamic-type-profiling-and-optimizations-in-llvm/74600
|