mirror of
https://github.com/llvm/llvm-project.git
synced 2025-04-15 21:36:31 +00:00

Summary: These flags were used in the very early days while we were trying to port stuff. Now that we just pass bitcode to the device link job it can be easily replaced by `-Xoffload-linker foo.bc`.
136 lines
5.8 KiB
ReStructuredText
136 lines
5.8 KiB
ReStructuredText
====================
|
|
Clang Linker Wrapper
|
|
====================
|
|
|
|
.. contents::
|
|
:local:
|
|
|
|
.. _clang-linker-wrapper:
|
|
|
|
Introduction
|
|
============
|
|
|
|
This tool works as a wrapper of the normal host linking job. This tool is used
|
|
to create linked device images for offloading and the necessary runtime calls to
|
|
register them. It works by first scanning the linker's input for embedded device
|
|
offloading data stored at the ``.llvm.offloading`` section. This section
|
|
contains binary data created by the :doc:`ClangOffloadPackager`. The extracted
|
|
device files will then be linked. The linked modules will then be wrapped into a
|
|
new object file containing the code necessary to register it with the offloading
|
|
runtime.
|
|
|
|
Usage
|
|
=====
|
|
|
|
This tool can be used with the following options. Any arguments not intended
|
|
only for the linker wrapper will be forwarded to the wrapped linker job.
|
|
|
|
.. code-block:: console
|
|
|
|
USAGE: clang-linker-wrapper [options] -- <options to passed to the linker>
|
|
|
|
OPTIONS:
|
|
--cuda-path=<dir> Set the system CUDA path
|
|
--device-debug Use debugging
|
|
--device-linker=<value> or <triple>=<value>
|
|
Arguments to pass to the device linker invocation
|
|
--dry-run Print program arguments without running
|
|
--help-hidden Display all available options
|
|
--help Display available options (--help-hidden for more)
|
|
--host-triple=<triple> Triple to use for the host compilation
|
|
--linker-path=<path> The linker executable to invoke
|
|
-L <dir> Add <dir> to the library search path
|
|
-l <libname> Search for library <libname>
|
|
--opt-level=<O0, O1, O2, or O3>
|
|
Optimization level for LTO
|
|
--override-image=<kind=file>
|
|
Uses the provided file as if it were the output of the device link step
|
|
-o <path> Path to file to write output
|
|
--pass-remarks-analysis=<value>
|
|
Pass remarks for LTO
|
|
--pass-remarks-missed=<value>
|
|
Pass remarks for LTO
|
|
--pass-remarks=<value> Pass remarks for LTO
|
|
--print-wrapped-module Print the wrapped module's IR for testing
|
|
--ptxas-arg=<value> Argument to pass to the 'ptxas' invocation
|
|
--relocatable Link device code to create a relocatable offloading application
|
|
--save-temps Save intermediate results
|
|
--sysroot<value> Set the system root
|
|
--verbose Verbose output from tools
|
|
--v Display the version number and exit
|
|
-- The separator for the wrapped linker arguments
|
|
|
|
Relocatable Linking
|
|
===================
|
|
|
|
The ``clang-linker-wrapper`` handles linking embedded device code and then
|
|
registering it with the appropriate runtime. Normally, this is only done when
|
|
the executable is created so other files containing device code can be linked
|
|
together. This can be somewhat problematic for users who wish to ship static
|
|
libraries that contain offloading code to users without a compatible offloading
|
|
toolchain.
|
|
|
|
When using a relocatable link with ``-r``, the ``clang-linker-wrapper`` will
|
|
perform the device linking and registration eagerly. This will remove the
|
|
embedded device code and register it correctly with the runtime. Semantically,
|
|
this is similar to creating a shared library object. If standard relocatable
|
|
linking is desired, simply do not run the binaries through the
|
|
``clang-linker-wrapper``. This will simply append the embedded device code so
|
|
that it can be linked later.
|
|
|
|
Matching
|
|
========
|
|
|
|
The linker wrapper will link extracted device code that is compatible with each
|
|
other. Generally, this requires that the target triple and architecture match.
|
|
An exception is made when the architecture is listed as ``generic``, which will
|
|
cause it be linked with any other device code with the same target triple.
|
|
|
|
Debugging
|
|
=========
|
|
|
|
The linker wrapper performs a lot of steps internally, such as input matching,
|
|
symbol resolution, and image registration. This makes it difficult to debug in
|
|
some scenarios. The behavior of the linker-wrapper is controlled mostly through
|
|
metadata, described in `clang documentation
|
|
<https://clang.llvm.org/docs/OffloadingDesign.html>`_. Intermediate output can
|
|
be obtained from the linker-wrapper using the ``--save-temps`` flag. These files
|
|
can then be modified.
|
|
|
|
.. code-block:: sh
|
|
|
|
$> clang openmp.c -fopenmp --offload-arch=gfx90a -c
|
|
$> clang openmp.o -fopenmp --offload-arch=gfx90a -Wl,--save-temps
|
|
$> ; Modify temp files.
|
|
$> llvm-objcopy --update-section=.llvm.offloading=out.bc openmp.o
|
|
|
|
Doing this will allow you to override one of the input files by replacing its
|
|
embedded offloading metadata with a user-modified version. However, this will be
|
|
more difficult when there are multiple input files. For a very large hammer, the
|
|
``--override-image=<kind>=<file>`` flag can be used.
|
|
|
|
In the following example, we use the ``--save-temps`` to obtain the LLVM-IR just
|
|
before running the backend. We then modify it to test altered behavior, and then
|
|
compile it to a binary. This can then be passed to the linker-wrapper which will
|
|
then ignore all embedded metadata and use the provided image as if it were the
|
|
result of the device linking phase.
|
|
|
|
.. code-block:: sh
|
|
|
|
$> clang openmp.c -fopenmp --offload-arch=gfx90a -Wl,--save-temps
|
|
$> ; Modify temp files.
|
|
$> clang --target=amdgcn-amd-amdhsa -mcpu=gfx90a -nogpulib out.bc -o a.out
|
|
$> clang openmp.c -fopenmp --offload-arch=gfx90a -Wl,--override-image=openmp=a.out
|
|
|
|
Example
|
|
=======
|
|
|
|
This tool links object files with offloading images embedded within it using the
|
|
``-fembed-offload-object`` flag in Clang. Given an input file containing the
|
|
magic section we can pass it to this tool to extract the data contained at that
|
|
section and run a device linking job on it.
|
|
|
|
.. code-block:: console
|
|
|
|
clang-linker-wrapper --host-triple=x86_64 --linker-path=/usr/bin/ld -- <Args>
|