[docs] Rewrite HowToCrossCompileLLVM (#129451)

The document has had a few minor tweaks over the years, but the last major piece of work on it was 2016, after first being introduced in 2013. My aim is to provide a clear and clean recipe for cross-compiling LLVM that: * Should be achievable for anyone on common variants of Linux (_including_ the step of acquiring a working sysroot). * I think I've kept the coverage of setting up acquiring a Debian sysroot minimal enough that it can reasonably be included. `debootstrap` is packaged for most common Linux distributions including non-Debian derived distributions like Arch Linux and Fedora. * Describes a setup that we can reasonably support within the community. * I realise with the ninja symlink canonicalisation issue I haven't completely avoided hacks, but I look particularly to point 2 under hacks in the current docs which talks about libraries on the host being found by CMake and adding `-L` and `-I` to try to hack around this. We've all been there and made these kind of temporary workarounds to see if we can get further, but it's very hard to support someone who has problems with a setup that's improperly leaking between the host and target like this. The approach I describe with a clean sysroot and setting appropriate `CMAKE_FIND_ROOT_PATH_MODE_*` settings doesn't have this issue. * Cuts down on extraneous / outdated information, especially where it is better covered elsewhere (e.g. detailed descriptions of CMake options not directly relevant to cross compilation). I've run through the instructions for AArch64, RISC-V (64-bit), and armhf.
2025-04-26 08:36:07 +00:00 · 2025-03-11 18:33:53 +00:00 · 2025-03-11 18:33:53 +00:00 · ba11e1e522
commit ba11e1e522
parent 5686786c55
1 changed files with 212 additions and 189 deletions
--- a/llvm/docs/HowToCrossCompileLLVM.rst
+++ b/llvm/docs/HowToCrossCompileLLVM.rst
@ -1,215 +1,238 @@
 ===================================================================
-How To Cross-Compile Clang/LLVM using Clang/LLVM
+How to cross-compile Clang/LLVM using Clang/LLVM
 ===================================================================

 Introduction
-============
+------------

 This document contains information about building LLVM and
-Clang on host machine, targeting another platform.
+Clang on a host machine, targeting another platform.

 For more information on how to use Clang as a cross-compiler,
 please check https://clang.llvm.org/docs/CrossCompilation.html.

-TODO: Add MIPS and other platforms to this document.
+This document describes cross-building a compiler in a single stage, using an
+existing ``clang`` install as the host compiler.

-Cross-Compiling from x86_64 to ARM
-==================================
+.. note::
+  These instructions have been tested for targeting 32-bit ARM, AArch64, or
+  64-bit RISC-V from an x86_64 Linux host. But should be equally applicable to
+  any other target.

-In this use case, we'll be using CMake and Ninja, on a Debian-based Linux
-system, cross-compiling from an x86_64 host (most Intel and AMD chips
-nowadays) to a hard-float ARM target (most ARM targets nowadays).
-
-The packages you'll need are:
-
- * ``cmake``
- * ``ninja-build`` (from backports in Ubuntu)
- * ``gcc-4.7-arm-linux-gnueabihf``
- * ``gcc-4.7-multilib-arm-linux-gnueabihf``
- * ``binutils-arm-linux-gnueabihf``
- * ``libgcc1-armhf-cross``
- * ``libsfgcc1-armhf-cross``
- * ``libstdc++6-armhf-cross``
- * ``libstdc++6-4.7-dev-armhf-cross``
-
-Configuring CMake
-----------------
-
-For more information on how to configure CMake for LLVM/Clang,
-see :doc:`CMake`.
-
-The CMake options you need to add are:
-
- * ``-DCMAKE_SYSTEM_NAME=<target-system>``
- * ``-DCMAKE_INSTALL_PREFIX=<install-dir>``
- * ``-DLLVM_HOST_TRIPLE=arm-linux-gnueabihf``
- * ``-DLLVM_TARGETS_TO_BUILD=ARM``
-
-Note: ``CMAKE_CROSSCOMPILING`` is always set automatically when ``CMAKE_SYSTEM_NAME`` is set. Don't put ``-DCMAKE_CROSSCOMPILING=TRUE`` in your options.
-
-Also note that ``LLVM_HOST_TRIPLE`` specifies the triple of the system
-that the cross built LLVM is going to run on - the flag is named based
-on the autoconf build/host/target nomenclature. (This flag implicitly sets
-other defaults, such as ``LLVM_DEFAULT_TARGET_TRIPLE``.)
-
-If you're compiling with GCC, you can use architecture options for your target,
-and the compiler driver will detect everything that it needs:
-
- * ``-DCMAKE_CXX_FLAGS='-march=armv7-a -mcpu=cortex-a9 -mfloat-abi=hard'``
-
-However, if you're using Clang, the driver might not be up-to-date with your
-specific Linux distribution, version or GCC layout, so you'll need to fudge.
-
-In addition to the ones above, you'll also need:
-
- * ``--target=arm-linux-gnueabihf`` or whatever is the triple of your cross GCC.
- * ``'--sysroot=/usr/arm-linux-gnueabihf'``, ``'--sysroot=/opt/gcc/arm-linux-gnueabihf'``
-   or whatever is the location of your GCC's sysroot (where /lib, /bin etc are).
- * Appropriate use of ``-I`` and ``-L``, depending on how the cross GCC is installed,
-   and where are the libraries and headers.
-
-You may also want to set the ``LLVM_NATIVE_TOOL_DIR`` option - pointing
-at a directory with prebuilt LLVM tools (``llvm-tblgen``, ``clang-tblgen``
-etc) for the build host, allowing you to them reuse them if available.
-E.g. ``-DLLVM_NATIVE_TOOL_DIR=<path-to-native-llvm-build>/bin``.
-If the option isn't set (or the directory doesn't contain all needed tools),
-the LLVM cross build will automatically launch a nested build to build the
-tools that are required.
-
-The CXX flags define the target, cpu (which in this case
-defaults to ``fpu=VFP3`` with NEON), and forcing the hard-float ABI. If you're
-using Clang as a cross-compiler, you will *also* have to set ``--sysroot``
-to make sure it picks the correct linker.
-
-When using Clang, it's important that you choose the triple to be *identical*
-to the GCC triple and the sysroot. This will make it easier for Clang to
-find the correct tools and include headers. But that won't mean all headers and
-libraries will be found. You'll still need to use ``-I`` and ``-L`` to locate
-those extra ones, depending on your distribution.
-
-Most of the time, what you want is to have a native compiler to the
-platform itself, but not others. So there's rarely a point in compiling
-all back-ends. For that reason, you should also set the
-``TARGETS_TO_BUILD`` to only build the back-end you're targeting to.
-
-You must set the ``CMAKE_INSTALL_PREFIX``, otherwise a ``ninja install``
-will copy ARM binaries to your root filesystem, which is not what you
-want.
-
-Hacks
-----
-
-There are some bugs in current LLVM, which require some fiddling before
-running CMake:
-
-#. If you're using Clang as the cross-compiler, there is a problem in
-   the LLVM ARM back-end that is producing absolute relocations on
-   position-independent code (``R_ARM_THM_MOVW_ABS_NC``), so for now, you
-   should disable PIC:
-
-   .. code-block:: bash
-
-      -DLLVM_ENABLE_PIC=False
-
-   This is not a problem, since Clang/LLVM libraries are statically
-   linked anyway, it shouldn't affect much.
-
-#. The ARM libraries won't be installed in your system.
-   But the CMake prepare step, which checks for
-   dependencies, will check the *host* libraries, not the *target*
-   ones. Below there's a list of some dependencies, but your project could
-   have more, or this document could be outdated. You'll see the errors
-   while linking as an indication of that.
-
-   Debian based distros have a way to add ``multiarch``, which adds
-   a new architecture and allows you to install packages for those
-   systems. See https://wiki.debian.org/Multiarch/HOWTO for more info.
-
-   But not all distros will have that, and possibly not an easy way to
-   install them in any anyway, so you'll have to build/download
-   them separately.
-
-   A quick way of getting the libraries is to download them from
-   a distribution repository, like Debian (http://packages.debian.org/jessie/),
-   and download the missing libraries. Note that the ``libXXX``
-   will have the shared objects (``.so``) and the ``libXXX-dev`` will
-   give you the headers and the static (``.a``) library. Just in
-   case, download both.
-
-   The ones you need for ARM are: ``libtinfo``, ``zlib1g``,
-   ``libxml2`` and ``liblzma``. In the Debian repository you'll
-   find downloads for all architectures.
-
-   After you download and unpack all ``.deb`` packages, copy all
-   ``.so`` and ``.a`` to a directory, make the appropriate
-   symbolic links (if necessary), and add the relevant ``-L``
-   and ``-I`` paths to ``-DCMAKE_CXX_FLAGS`` above.
-
-
-Running CMake and Building
--------------------------
-
-Finally, if you're using your platform compiler, run:
-
-   .. code-block:: bash
-
-     $ cmake -G Ninja <source-dir> -DCMAKE_BUILD_TYPE=<type> <options above>
-
-If you're using Clang as the cross-compiler, run:
-
-   .. code-block:: bash
-
-     $ CC='clang' CXX='clang++' cmake -G Ninja <source-dir> -DCMAKE_BUILD_TYPE=<type> <options above>
-
-If you have ``clang``/``clang++`` on the path, it should just work, and special
-Ninja files will be created in the build directory. I strongly suggest
-you to run ``cmake`` on a separate build directory, *not* inside the
-source tree.
-
-To build, simply type:
-
-   .. code-block:: bash
-
-     $ ninja
-
-It should automatically find out how many cores you have, what are
-the rules that needs building and will build the whole thing.
-
-You can't run ``ninja check-all`` on this tree because the created
-binaries are targeted to ARM, not x86_64.
-
-Installing and Using
+Setting up a sysroot
 --------------------

-After the LLVM/Clang has built successfully, you should install it
-via:
+You will need a sysroot that contains essential build dependencies compiled
+for the target architecture. In this case, we will be using CMake and Ninja on
+a Linux host and compiling against a Debian sysroot. Detailed instructions on
+producing sysroots are outside of the scope of this documentation, but the
+following instructions should work on any Linux distribution with these
+pre-requisites:
+
+ * ``binfmt_misc`` configured to execute ``qemu-user`` for binaries of the
+   target architecture. This is done by installing the ``qemu-user-static``
+   and ``binfmt-support`` packages on Debian-derived distributions.
+ * Root access (setups involving ``proot`` or other tools to avoid this
+   requirement may be possible, but aren't described here).
+ * The ``debootstrap`` tool. This is available in most distributions.
+
+The following snippet will initialise sysroots for 32-bit Arm, AArch64, and
+64-bit RISC-V (just pick the target(s) you are interested in):

   .. code-block:: bash

-     $ ninja install
+    sudo debootstrap --arch=armhf --variant=minbase --include=build-essential,symlinks stable sysroot-deb-armhf-stable
+    sudo debootstrap --arch=arm64 --variant=minbase --include=build-essential,symlinks stable sysroot-deb-arm64-stable
+    sudo debootstrap --arch=riscv64 --variant=minbase --include=build-essential,symlinks unstable sysroot-deb-riscv64-unstable

-which will create a sysroot on the install-dir. You can then tar
-that directory into a binary with the full triple name (for easy
-identification), like:
+The created sysroot may contain absolute symlinks, which will resolve to a
+location within the host when accessed during compilation, so we must convert
+any absolute symlinks to relative ones:

   .. code-block:: bash

-     $ ln -sf <install-dir> arm-linux-gnueabihf-clang
-     $ tar zchf arm-linux-gnueabihf-clang.tar.gz arm-linux-gnueabihf-clang
+    sudo chroot sysroot-of-your-choice symlinks -cr .

-If you copy that tarball to your target board, you'll be able to use
-it for running the test-suite, for example. Follow the guidelines at
-https://llvm.org/docs/lnt/quickstart.html, unpack the tarball in the
-test directory, and use options:
+
+Configuring CMake and building
+------------------------------
+
+For more information on how to configure CMake for LLVM/Clang,
+see :doc:`CMake`. Following CMake's recommended practice, we will create a
+`toolchain file
+<https://cmake.org/cmake/help/book/mastering-cmake/chapter/Cross%20Compiling%20With%20CMake.html#toolchain-files>`_. 
+
+The following assumes you have a system install of ``clang`` and ``lld`` that
+will be used for cross compiling and that the listed commands are executed
+from within the root of a checkout of the ``llvm-project`` git repository.
+
+First, set variables in your shell session that will be used throughout the
+build instructions:

   .. code-block:: bash

-     $ ./sandbox/bin/python sandbox/bin/lnt runtest nt \
-         --sandbox sandbox \
-         --test-suite `pwd`/test-suite \
-         --cc `pwd`/arm-linux-gnueabihf-clang/bin/clang \
-         --cxx `pwd`/arm-linux-gnueabihf-clang/bin/clang++
+    SYSROOT=$HOME/sysroot-deb-arm64-stable
+    TARGET=aarch64-linux-gnu
+    CFLAGS=""

-Remember to add the ``-jN`` options to ``lnt`` to the number of CPUs
-on your board. Also, the path to your clang has to be absolute, so
-you'll need the `pwd` trick above.
+To customise details of the compilation target or choose a different
+architecture altogether, change the ``SYSROOT``,
+``TARGET``, and ``CFLAGS`` variables to something matching your target. For
+example, for 64-bit RISC-V you might set
+``SYSROOT=$HOME/sysroot-deb-riscv64-unstable``, ``TARGET=riscv64-linux-gnu``
+and ``CFLAGS="-march=rva20u64"``. Refer to documentation such as your target's
+compiler documentation or processor manual for guidance on which ``CFLAGS``
+settings may be appropriate. The specified ``TARGET`` should match the triple
+used within the sysroot (i.e. ``$SYSROOT/usr/lib/$TARGET`` should exist).
+
+Then execute the following snippet to create a toolchain file:
+
+   .. code-block:: bash
+
+    cat - <<EOF > $TARGET-clang.cmake
+    set(CMAKE_SYSTEM_NAME Linux)
+    set(CMAKE_SYSROOT "$SYSROOT")
+    set(CMAKE_C_COMPILER_TARGET $TARGET)
+    set(CMAKE_CXX_COMPILER_TARGET $TARGET)
+    set(CMAKE_C_FLAGS_INIT "$CFLAGS")
+    set(CMAKE_CXX_FLAGS_INIT "$CFLAGS")
+    set(CMAKE_LINKER_TYPE LLD)
+    set(CMAKE_C_COMPILER clang)
+    set(CMAKE_CXX_COMPILER clang++)
+    set(CMAKE_FIND_ROOT_PATH_MODE_PROGRAM NEVER)
+    set(CMAKE_FIND_ROOT_PATH_MODE_LIBRARY ONLY)
+    set(CMAKE_FIND_ROOT_PATH_MODE_INCLUDE ONLY)
+    set(CMAKE_FIND_ROOT_PATH_MODE_PACKAGE ONLY)
+    EOF
+
+
+Then configure and build by invoking ``cmake``:
+
+   .. code-block:: bash
+
+    cmake -G Ninja \
+      -DCMAKE_BUILD_TYPE=Release \
+      -DLLVM_ENABLE_PROJECTS="lld;clang" \
+      -DCMAKE_TOOLCHAIN_FILE=$(pwd)/$TARGET-clang.cmake \
+      -DLLVM_HOST_TRIPLE=$TARGET \
+      -DCMAKE_INSTALL_PREFIX=$HOME/clang-$TARGET \
+      -S llvm \
+      -B build/$TARGET
+    cmake --build build/$TARGET
+
+These options from the toolchain file and ``cmake`` invocation above are
+important:
+
+ * ``CMAKE_SYSTEM_NAME``: Perhaps surprisingly, explicitly setting this
+   variable `causes CMake to set
+   CMAKE_CROSSCOMPIILING <https://cmake.org/cmake/help/latest/variable/CMAKE_CROSSCOMPILING.html#variable:CMAKE_CROSSCOMPILING>`_.
+ * ``CMAKE_{C,CXX}_COMPILER_TARGET``: This will be used to set the
+   ``--target`` argument to ``clang``. The triple should match the triple used
+   within the sysroot (i.e. ``$SYSROOT/usr/lib/$TARGET`` should exist).
+ * ``CMAKE_FIND_ROOT_PATH_MODE_*``: These `control the search behaviour for
+   finding libraries, includes or binaries
+   <https://cmake.org/cmake/help/book/mastering-cmake/chapter/Cross%20Compiling%20With%20CMake.html#finding-external-libraries-programs-and-other-files>`_.
+   Setting these prevents files for the host being used in the build.
+ * ``LLVM_HOST_TRIPLE``: Specifies the target triple of the system the built
+   LLVM will run on, which also implicitly sets other defaults such as
+   ``LLVM_DEFAULT_TARGET_TRIPLE``. For example, if you are using an x86_64
+   host to compile for RISC-V, this will be a RISC-V triple.
+ * ``CMAKE_SYSROOT``: The path to the sysroot containing libraries and headers
+   for the target.
+ * ``CMAKE_INSTALL_PREFIX``: Setting this avoids installing binaries compiled
+   for the target system into system directories for the host system. It is
+   not required unless you are going to use the ``install`` target.
+
+See `LLVM's build documentation
+<https://llvm.org/docs/CMake.html#frequently-used-cmake-variables>`_ for more
+guidance on CMake variables (e.g. ``LLVM_TARGETS_TO_BUILD`` may be useful if
+your cross-compiled binaries only need to support compiling for one target).
+
+Working around a ninja dependency issue
+---------------------------------------
+
+If you followed the instructions above to create a sysroot, you may run into a
+`longstanding problem related to path canonicalization in ninja
+<https://github.com/ninja-build/ninja/issues/1330>_`. GCC canonicalizes system
+headers in dependency files, so when ninja reads them it does not need to do
+so. Clang does not do this, and unfortunately ninja does not implement the
+canonicalization logic at all, meaning for some system headers with symlinks
+in the paths, it can incorrectly compute a non-existing path and consider it
+as always modified.
+
+If you are suffering from this issue, you will find any attempt at an
+incremental build (including the suggested command to build the ``install``
+target in the next section) results in recompiling everything.  ``ninja -C
+build/$TARGET -t deps`` shows files in ``$SYSROOT/include/*`` that
+do not exist (as the ``$SYSROOT/include`` folder does not exist) and you can
+further confirm these files are causing ``ninja`` to determine a rebuild is
+necessary with ``ninja -C build/$TARGET -d deps``.
+
+A workaround is to create a symlink so that the incorrect
+``$SYSROOT/include/*`` dependencies resolve to files within
+``$SYSROOT/usr/include/*``. This works in practice for the simple
+cross-compilation use case described here, but is not a general solution.
+
+   .. code-block:: bash
+
+    sudo ln -s usr/include $SYSROOT/include
+
+Testing the just-built compiler
+-------------------------------
+
+Confirm the ``clang`` binary was built for the expected target architecture:
+
+   .. code-block:: bash
+
+    $ file -L ./build/aarch64-linux-gnu/bin/clang
+    ./build/aarch64-linux-gnu/bin/clang: ELF 64-bit LSB pie executable, ARM aarch64, version 1 (SYSV), dynamically linked, interpreter /lib/ld-linux-aarch64.so.1, for GNU/Linux 3.7.0, BuildID[sha1]=516b8b366a790fcd3563bee4aec0cdfcb90bb1c7, not stripped
+
+If you have ``qemu-user`` installed you can test the produced target binary
+either by invoking ``qemu-{target}-static`` directly:
+
+   .. code-block:: bash
+
+    $ qemu-aarch64-static -L $SYSROOT ./build/aarch64-linux-gnu/bin/clang --version
+    clang version 21.0.0git (https://github.com/llvm/llvm-project cedfdc6e889c5c614a953ed1f44bcb45a405f8da)
+    Target: aarch64-unknown-linux-gnu
+    Thread model: posix
+    InstalledDir: /home/asb/llvm-project/build/aarch64-linux-gnu/bin
+
+Or, if binfmt_misc is configured (as was necessary for debootstrap):
+
+   .. code-block:: bash
+
+    $ export QEMU_LD_PREFIX=$SYSROOT; ./build/aarch64-linux-gnu/bin/clang --version
+    clang version 21.0.0git (https://github.com/llvm/llvm-project cedfdc6e889c5c614a953ed1f44bcb45a405f8da)
+    Target: aarch64-unknown-linux-gnu
+    Thread model: posix
+    InstalledDir: /home/asb/llvm-project/build/aarch64-linux-gnu/bin
+
+Installing and using
+--------------------
+
+.. note::
+  Use of the ``install`` target requires that you have set
+  ``CMAKE_INSTALL_PREFIX`` otherwise it will attempt to install in
+  directories under `/` on your host.
+
+If you want to transfer a copy of the built compiler to another machine, you
+can first install it to a location on the host via:
+
+   .. code-block:: bash
+
+    cmake --build build/$TARGET --target=install
+
+This will install the LLVM/Clang headers, binaries, libraries, and other files
+to paths within ``CMAKE_INSTALL_PREFIX``. Then tar that directory for transfer
+to a device that runs the target architecture natively:
+
+   .. code-block:: bash
+
+    tar -czvf clang-$TARGET.tar.gz -C $HOME clang-$TARGET
+
+The generated toolchain is portable, but requires compatible versions of any
+shared libraries it links against. This means using a sysroot that is as
+similar to your target operating system as possible is desirable. Other `CMake
+variables <https://llvm.org/docs/CMake.html#frequently-used-cmake-variables>`_
+may be helpful, for instance ``LLVM_STATIC_LINK_CXX_STDLIB``.