mirror of
https://github.com/llvm/llvm-project.git
synced 2025-04-15 21:36:31 +00:00

Summary: The `clang-nvlink-wrapper` is a utility that I removed awhile back during the transition to the new driver. This patch adds back in a new, upgraded version that does LTO + archive linking. It's not an easy choice to reintroduce something I happily deleted, but this is the only way to move forward with improving GPU support in LLVM. While NVIDIA provides a linker called 'nvlink', its main interface is very difficult to work with. It does not provide LTO, or static linking, requires all files to be named a non-standard `.cubin`, and rejects link jobs that other linkers would be fine with (i.e empty). I have spent a great deal of time hacking around this in the GPU `libc` implementation, where I deliberately avoid LTO and static linking and have about 100 lines of hacky CMake dedicated to storing these files in a format that the clang-linker-wrapper accepts to avoid this limitation. The main reason I want to re-intorudce this tool is because I am planning on creating a more standard C/C++ toolchain for GPUs to use. This will install files like the following. ``` <install>/lib/nvptx64-nvidia-cuda/libc.a <install>/lib/nvptx64-nvidia-cuda/libc++.a <install>/lib/nvptx64-nvidia-cuda/libomp.a <install>/lib/clang/19/lib/nvptx64-nvidia-cuda/libclang_rt.builtins.a ``` Linking in these libraries will then simply require passing `-lc` like is already done for non-GPU toolchains. However, this doesn't work with the currently deficient `nvlink` linker, so I consider this a blocking issue to massively improving the state of building GPU libraries. In the future we may be able to convince NVIDIA to port their linker to `ld.lld`, but for now this is the only workable solution that allows us to hack around the weird behavior of their closed-source software. This also copies some amount of logic from the clang-linker-wrapper, but not enough for it to be worthwhile to merge them I feel. In the future it may be possible to delete that handling from there entirely.
75 lines
3.0 KiB
ReStructuredText
75 lines
3.0 KiB
ReStructuredText
====================
|
|
Clang nvlink Wrapper
|
|
====================
|
|
|
|
.. contents::
|
|
:local:
|
|
|
|
.. _clang-nvlink-wrapper:
|
|
|
|
Introduction
|
|
============
|
|
|
|
This tools works as a wrapper around the NVIDIA ``nvlink`` linker. The purpose
|
|
of this wrapper is to provide an interface similar to the ``ld.lld`` linker
|
|
while still relying on NVIDIA's proprietary linker to produce the final output.
|
|
|
|
``nvlink`` has a number of known quirks that make it difficult to use in a
|
|
unified offloading setting. For example, it does not accept ``.o`` files as they
|
|
must be named ``.cubin``. Static archives do not work, so passing a ``.a`` will
|
|
provide a linker error. ``nvlink`` also does not support link time optimization
|
|
and ignores many standard linker arguments. This tool works around these issues.
|
|
|
|
Usage
|
|
=====
|
|
|
|
This tool can be used with the following options. Any arguments not intended
|
|
only for the linker wrapper will be forwarded to ``nvlink``.
|
|
|
|
.. code-block:: console
|
|
|
|
OVERVIEW: A utility that wraps around the NVIDIA 'nvlink' linker.
|
|
This enables static linking and LTO handling for NVPTX targets.
|
|
|
|
USAGE: clang-nvlink-wrapper [options] <options to passed to nvlink>
|
|
|
|
OPTIONS:
|
|
--arch <value> Specify the 'sm_' name of the target architecture.
|
|
--cuda-path=<dir> Set the system CUDA path
|
|
--dry-run Print generated commands without running.
|
|
--feature <value> Specify the '+ptx' freature to use for LTO.
|
|
-g Specify that this was a debug compile.
|
|
-help-hidden Display all available options
|
|
-help Display available options (--help-hidden for more)
|
|
-L <dir> Add <dir> to the library search path
|
|
-l <libname> Search for library <libname>
|
|
-mllvm <arg> Arguments passed to LLVM, including Clang invocations,
|
|
for which the '-mllvm' prefix is preserved. Use '-mllvm
|
|
--help' for a list of options.
|
|
-o <path> Path to file to write output
|
|
--plugin-opt=jobs=<value>
|
|
Number of LTO codegen partitions
|
|
--plugin-opt=lto-partitions=<value>
|
|
Number of LTO codegen partitions
|
|
--plugin-opt=O<O0, O1, O2, or O3>
|
|
Optimization level for LTO
|
|
--plugin-opt=thinlto<value>
|
|
Enable the thin-lto backend
|
|
--plugin-opt=<value> Arguments passed to LLVM, including Clang invocations,
|
|
for which the '-mllvm' prefix is preserved. Use '-mllvm
|
|
--help' for a list of options.
|
|
--save-temps Save intermediate results
|
|
--version Display the version number and exit
|
|
-v Print verbose information
|
|
|
|
Example
|
|
=======
|
|
|
|
This tool is intended to be invoked when targeting the NVPTX toolchain directly
|
|
as a cross-compiling target. This can be used to create standalone GPU
|
|
executables with normal linking semantics similar to standard compilation.
|
|
|
|
.. code-block:: console
|
|
|
|
clang --target=nvptx64-nvidia-cuda -march=native -flto=full input.c
|