mirror of
https://github.com/llvm/llvm-project.git
synced 2025-04-26 12:56:08 +00:00

This patch implements an LLVM IR pass, named kernel-info, that reports various statistics for codes compiled for GPUs. The ultimate goal of these statistics to help identify bad code patterns and ways to mitigate them. The pass operates at the LLVM IR level so that it can, in theory, support any LLVM-based compiler for programming languages supporting GPUs. It has been tested so far with LLVM IR generated by Clang for OpenMP offload codes targeting NVIDIA GPUs and AMD GPUs. By default, the pass runs at the end of LTO, and options like ``-Rpass=kernel-info`` enable its remarks. Example `opt` and `clang` command lines appear in `llvm/docs/KernelInfo.rst`. Remarks include summary statistics (e.g., total size of static allocas) and individual occurrences (e.g., source location of each alloca). Examples of its output appear in tests in `llvm/test/Analysis/KernelInfo`.
64 lines
2.2 KiB
ReStructuredText
64 lines
2.2 KiB
ReStructuredText
==========
|
|
KernelInfo
|
|
==========
|
|
|
|
.. contents::
|
|
:local:
|
|
|
|
Introduction
|
|
============
|
|
|
|
This LLVM IR pass reports various statistics for codes compiled for GPUs. The
|
|
goal of these statistics is to help identify bad code patterns and ways to
|
|
mitigate them. The pass operates at the LLVM IR level so that it can, in
|
|
theory, support any LLVM-based compiler for programming languages supporting
|
|
GPUs.
|
|
|
|
By default, the pass runs at the end of LTO, and options like
|
|
``-Rpass=kernel-info`` enable its remarks. Example ``opt`` and ``clang``
|
|
command lines appear in the next section.
|
|
|
|
Remarks include summary statistics (e.g., total size of static allocas) and
|
|
individual occurrences (e.g., source location of each alloca). Examples of the
|
|
output appear in tests in `llvm/test/Analysis/KernelInfo`.
|
|
|
|
Example Command Lines
|
|
=====================
|
|
|
|
To analyze a C program as it appears to an LLVM GPU backend at the end of LTO:
|
|
|
|
.. code-block:: shell
|
|
|
|
$ clang -O2 -g -fopenmp --offload-arch=native test.c -foffload-lto \
|
|
-Rpass=kernel-info
|
|
|
|
To analyze specified LLVM IR, perhaps previously generated by something like
|
|
``clang -save-temps -g -fopenmp --offload-arch=native test.c``:
|
|
|
|
.. code-block:: shell
|
|
|
|
$ opt -disable-output test-openmp-nvptx64-nvidia-cuda-sm_70.bc \
|
|
-pass-remarks=kernel-info -passes=kernel-info
|
|
|
|
When specifying an LLVM pass pipeline on the command line, ``kernel-info`` still
|
|
runs at the end of LTO by default. ``-no-kernel-info-end-lto`` disables that
|
|
behavior so you can position ``kernel-info`` explicitly:
|
|
|
|
.. code-block:: shell
|
|
|
|
$ clang -O2 -g -fopenmp --offload-arch=native test.c -foffload-lto \
|
|
-Rpass=kernel-info \
|
|
-Xoffload-linker --lto-newpm-passes='lto<O2>'
|
|
|
|
$ clang -O2 -g -fopenmp --offload-arch=native test.c -foffload-lto \
|
|
-Rpass=kernel-info -mllvm -no-kernel-info-end-lto \
|
|
-Xoffload-linker --lto-newpm-passes='module(kernel-info),lto<O2>'
|
|
|
|
$ opt -disable-output test-openmp-nvptx64-nvidia-cuda-sm_70.bc \
|
|
-pass-remarks=kernel-info \
|
|
-passes='lto<O2>'
|
|
|
|
$ opt -disable-output test-openmp-nvptx64-nvidia-cuda-sm_70.bc \
|
|
-pass-remarks=kernel-info -no-kernel-info-end-lto \
|
|
-passes='module(kernel-info),lto<O2>'
|