mirror of
https://github.com/llvm/llvm-project.git
synced 2025-05-08 20:26:06 +00:00

This patch teaches llvm-mca how to parse code comments in search for special "markers" used to select regions of code. Example: # LLVM-MCA-BEGIN My Code Region .... # LLVM-MCA-END The MCAsmLexer now delegates to an object of class MCACommentParser (i.e. an AsmCommentConsumer) the parsing of code comments to search for begin/end code region markers. A comment starting with substring "LLVM-MCA-BEGIN" marks the beginning of a new region of code. A comment starting with substring "LLVM-MCA-END" marks the end of the last region. This implementation doesn't allow regions to overlap. Each region can have a optional description; internally, each region is identified by a range of source code locations (SMLoc). MCInst objects are added to a region R only if the source location for the MCInst is in the range of locations specified by R. By default, the tool allocates an implicit "Default" code region which contains every source location. See new tests llvm-mca-marker-*.s for a few examples. A new Backend object is created for every region. So, the analysis is conducted on every parsed code region. The final report is the union of the reports generated for every code region. Note that empty regions are skipped. Special "[#] Code Region - ..." strings are used in the report to mark the portion which is specific to a code region only. For example, see llvm-mca-markers-5.s. Differential Revision: https://reviews.llvm.org/D45433 llvm-svn: 329590
166 lines
5.6 KiB
ReStructuredText
166 lines
5.6 KiB
ReStructuredText
llvm-mca - LLVM Machine Code Analyzer
|
|
=====================================
|
|
|
|
SYNOPSIS
|
|
--------
|
|
|
|
:program:`llvm-mca` [*options*] [input]
|
|
|
|
DESCRIPTION
|
|
-----------
|
|
|
|
:program:`llvm-mca` is a performance analysis tool that uses information
|
|
available in LLVM (e.g. scheduling models) to statically measure the performance
|
|
of machine code in a specific CPU.
|
|
|
|
Performance is measured in terms of throughput as well as processor resource
|
|
consumption. The tool currently works for processors with an out-of-order
|
|
backend, for which there is a scheduling model available in LLVM.
|
|
|
|
The main goal of this tool is not just to predict the performance of the code
|
|
when run on the target, but also help with diagnosing potential performance
|
|
issues.
|
|
|
|
Given an assembly code sequence, llvm-mca estimates the IPC (Instructions Per
|
|
Cycle), as well as hardware resource pressure. The analysis and reporting style
|
|
were inspired by the IACA tool from Intel.
|
|
|
|
:program:`llvm-mca` allows the usage of special code comments to mark regions of
|
|
the assembly code to be analyzed. A comment starting with substring
|
|
``LLVM-MCA-BEGIN`` marks the beginning of a code region. A comment starting with
|
|
substring ``LLVM-MCA-END`` marks the end of a code region. For example:
|
|
|
|
.. code-block:: none
|
|
|
|
# LLVM-MCA-BEGIN My Code Region
|
|
...
|
|
# LLVM-MCA-END
|
|
|
|
Multiple regions can be specified provided that they do not overlap. A code
|
|
region can have an optional description. If no user defined region is specified,
|
|
then :program:`llvm-mca` assumes a default region which contains every
|
|
instruction in the input file. Every region is analyzed in isolation, and the
|
|
final performance report is the union of all the reports generated for every
|
|
code region.
|
|
|
|
OPTIONS
|
|
-------
|
|
|
|
If ``input`` is "``-``" or omitted, :program:`llvm-mca` reads from standard
|
|
input. Otherwise, it will read from the specified filename.
|
|
|
|
If the :option:`-o` option is omitted, then :program:`llvm-mca` will send its output
|
|
to standard output if the input is from standard input. If the :option:`-o`
|
|
option specifies "``-``", then the output will also be sent to standard output.
|
|
|
|
|
|
.. option:: -help
|
|
|
|
Print a summary of command line options.
|
|
|
|
.. option:: -mtriple=<target triple>
|
|
|
|
Specify a target triple string.
|
|
|
|
.. option:: -march=<arch>
|
|
|
|
Specify the architecture for which to analyze the code. It defaults to the
|
|
host default target.
|
|
|
|
.. option:: -mcpu=<cpuname>
|
|
|
|
Specify the processor for whic to run the analysis.
|
|
By default this defaults to a "generic" processor. It is not autodetected to
|
|
the current architecture.
|
|
|
|
.. option:: -output-asm-variant=<variant id>
|
|
|
|
Specify the output assembly variant for the report generated by the tool.
|
|
On x86, possible values are [0, 1]. A value of 0 (vic. 1) for this flag enables
|
|
the AT&T (vic. Intel) assembly format for the code printed out by the tool in
|
|
the analysis report.
|
|
|
|
.. option:: -dispatch=<width>
|
|
|
|
Specify a different dispatch width for the processor. The dispatch width
|
|
defaults to field 'IssueWidth' in the processor scheduling model. If width is
|
|
zero, then the default dispatch width is used.
|
|
|
|
.. option:: -register-file-size=<size>
|
|
|
|
Specify the size of the register file. When specified, this flag limits how
|
|
many temporary registers are available for register renaming purposes. A value
|
|
of zero for this flag means "unlimited number of temporary registers".
|
|
|
|
.. option:: -iterations=<number of iterations>
|
|
|
|
Specify the number of iterations to run. If this flag is set to 0, then the
|
|
tool sets the number of iterations to a default value (i.e. 70).
|
|
|
|
.. option:: -noalias=<bool>
|
|
|
|
If set, the tool assumes that loads and stores don't alias. This is the
|
|
default behavior.
|
|
|
|
.. option:: -lqueue=<load queue size>
|
|
|
|
Specify the size of the load queue in the load/store unit emulated by the tool.
|
|
By default, the tool assumes an unbound number of entries in the load queue.
|
|
A value of zero for this flag is ignored, and the default load queue size is
|
|
used instead.
|
|
|
|
.. option:: -squeue=<store queue size>
|
|
|
|
Specify the size of the store queue in the load/store unit emulated by the
|
|
tool. By default, the tool assumes an unbound number of entries in the store
|
|
queue. A value of zero for this flag is ignored, and the default store queue
|
|
size is used instead.
|
|
|
|
.. option:: -verbose
|
|
|
|
Enable verbose output. In particular, this flag enables a number of extra
|
|
statistics and performance counters for the dispatch logic, the reorder
|
|
buffer, the retire control unit and the register file.
|
|
|
|
.. option:: -timeline
|
|
|
|
Enable the timeline view.
|
|
|
|
.. option:: -timeline-max-iterations=<iterations>
|
|
|
|
Limit the number of iterations to print in the timeline view. By default, the
|
|
timeline view prints information for up to 10 iterations.
|
|
|
|
.. option:: -timeline-max-cycles=<cycles>
|
|
|
|
Limit the number of cycles in the timeline view. By default, the number of
|
|
cycles is set to 80.
|
|
|
|
.. option:: -resource-pressure
|
|
|
|
Enable the resource pressure view. This is enabled by default.
|
|
|
|
.. option:: -register-file-stats
|
|
|
|
Enable register file usage statistics.
|
|
|
|
.. option:: -instruction-info
|
|
|
|
Enable the instruction info view. This is enabled by default.
|
|
|
|
.. option:: -instruction-tables
|
|
|
|
Prints resource pressure information based on the static information
|
|
available from the processor model. This differs from the resource pressure
|
|
view because it doesn't require that the code is simulated. It instead prints
|
|
the theoretical uniform distribution of resource pressure for every
|
|
instruction in sequence.
|
|
|
|
|
|
EXIT STATUS
|
|
-----------
|
|
|
|
:program:`llvm-mca` returns 0 on success. Otherwise, an error message is printed
|
|
to standard error, and the tool returns 1.
|
|
|