[libc][Docs] Begin improving documentation for the GPU libc

This patch updates some of the documentation for the GPU libc project.
There is a lot of work still to be done, but this sets the general
outline.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D149194
This commit is contained in:
Joseph Huber 2023-04-25 16:23:07 -05:00
parent b56b15ed71
commit 807f058487
7 changed files with 243 additions and 170 deletions

18
libc/docs/gpu/index.rst Normal file
View File

@ -0,0 +1,18 @@
.. _libc_gpu:
=============
libc for GPUs
=============
.. note:: This feature is very experimental and may change in the future.
The *GPU* support for LLVM's libc project aims to make a subset of the standard
C library available on GPU based accelerators. Navigate using the links below to
learn more about this project.
.. toctree::
using
support
testing
rpc

17
libc/docs/gpu/rpc.rst Normal file
View File

@ -0,0 +1,17 @@
.. _libc_gpu_rpc:
======================
Remote Procedure Calls
======================
.. contents:: Table of Contents
:depth: 4
:local:
Remote Procedure Call Implementation
====================================
Certain features from the standard C library, such as allocation or printing,
require support from the operating system. We instead implement a remote
procedure call (RPC) interface to allow submitting work from the GPU to a host
server that forwards it to the host system.

88
libc/docs/gpu/support.rst Normal file
View File

@ -0,0 +1,88 @@
.. _libc_gpu_support:
===================
Supported Functions
===================
.. include:: ../check.rst
.. contents:: Table of Contents
:depth: 4
:local:
The following functions and headers are supported at least partially on the
device. Some functions are implemented fully on the GPU, while others require a
`remote procedure call <libc_gpu_rpc>`.
ctype.h
-------
============= ========= ============
Function Name Available RPC Required
============= ========= ============
isalnum |check|
isalpha |check|
isascii |check|
isblank |check|
iscntrl |check|
isdigit |check|
isgraph |check|
islower |check|
isprint |check|
ispunct |check|
isspace |check|
isupper |check|
isxdigit |check|
toascii |check|
tolower |check|
toupper |check|
============= ========= ============
string.h
--------
============= ========= ============
Function Name Available RPC Required
============= ========= ============
bcmp |check|
bzero |check|
memccpy |check|
memchr |check|
memcmp |check|
memcpy |check|
memmove |check|
mempcpy |check|
memrchr |check|
memset |check|
stpcpy |check|
stpncpy |check|
strcat |check|
strchr |check|
strcmp |check|
strcpy |check|
strcspn |check|
strlcat |check|
strlcpy |check|
strlen |check|
strncat |check|
strncmp |check|
strncpy |check|
strnlen |check|
strpbrk |check|
strrchr |check|
strspn |check|
strstr |check|
strtok |check|
strtok_r |check|
strdup
strndup
============= ========= ============
stdlib.h
--------
============= ========= ============
Function Name Available RPC Required
============= ========= ============
atoi |check|
============= ========= ============

32
libc/docs/gpu/testing.rst Normal file
View File

@ -0,0 +1,32 @@
.. _libc_gpu_testing:
============================
Testing the GPU libc library
============================
.. contents:: Table of Contents
:depth: 4
:local:
Testing Infrastructure
======================
The testing support in LLVM's libc implementation for GPUs is designed to mimic
the standard unit tests as much as possible. We use the `remote procedure call
<libc_gpu_rpc>` support to provide the necessary utilities like printing from
the GPU. Execution is performed by emitting a ``_start`` kernel from the GPU
that is then called by an external loader utility. This is an example of how
this can be done manually:
.. code-block:: sh
$> clang++ crt1.o test.cpp --target=amdgcn-amd-amdhsa -mcpu=gfx90a -flto
$> ./amdhsa_loader --threads 1 --blocks 1 a.out
Test Passed!
Unlike the exported ``libcgpu.a``, the testing architecture can only support a
single architecture at a time. This is either detected automatically, or set
manually by the user using ``LIBC_GPU_TEST_ARCHITECTURE``. The latter is useful
in cases where the user does not build LLVM's libc on machine with the GPU to
use for testing.

87
libc/docs/gpu/using.rst Normal file
View File

@ -0,0 +1,87 @@
.. _libc_gpu_usage:
===================
Using libc for GPUs
===================
.. contents:: Table of Contents
:depth: 4
:local:
Building the GPU library
========================
LLVM's libc GPU support *must* be built with an up-to-date ``clang`` compiler
due to heavy reliance on ``clang``'s GPU support. This can be done automatically
using the ``LLVM_ENABLE_RUNTIMES=libc`` option. To enable libc for the GPU,
enable the ``LIBC_GPU_BUILD`` option. By default, ``libcgpu.a`` will be built
using every supported GPU architecture. To restrict the number of architectures
build, either set ``LLVM_LIBC_GPU_ARCHITECTURES`` to the list of desired
architectures manually or use ``native`` to detect the GPUs on your system. A
typical ``cmake`` configuration will look like this:
.. code-block:: sh
$> cd llvm-project # The llvm-project checkout
$> mkdir build
$> cd build
$> cmake ../llvm -G Ninja \
-DLLVM_ENABLE_PROJECTS="clang;lld;compiler-rt" \
-DLLVM_ENABLE_RUNTIMES="libc;openmp" \
-DCMAKE_BUILD_TYPE=<Debug|Release> \ # Select build type
-DLIBC_GPU_BUILD=ON \ # Build in GPU mode
-DLLVM_LIBC_GPU_ARCHITECTURES=all \ # Build all supported architectures
-DCMAKE_INSTALL_PREFIX=<PATH> \ # Where 'libcgpu.a' will live
$> ninja install
Since we want to include ``clang``, ``lld`` and ``compiler-rt`` in our
toolchain, we list them in ``LLVM_ENABLE_PROJECTS``. To ensure ``libc`` is built
using a compatible compiler and to support ``openmp`` offloading, we list them
in ``LLVM_ENABLE_RUNTIMES`` to build them after the enabled projects using the
newly built compiler. ``CMAKE_INSTALL_PREFIX`` specifies the installation
directory in which to install the ``libcgpu.a`` library and headers along with
LLVM. The generated headers will be placed in ``include/gpu-none-llvm``.
Usage
=====
Once the ``libcgpu.a`` static archive has been built it can be linked directly
with offloading applications as a standard library. This process is described in
the `clang documentation <https://clang.llvm.org/docs/OffloadingDesign.html>`_.
This linking mode is used by the OpenMP toolchain, but is currently opt-in for
the CUDA and HIP toolchains through the ``--offload-new-driver``` and
``-fgpu-rdc`` flags. A typical usage will look this this:
.. code-block:: sh
$> clang foo.c -fopenmp --offload-arch=gfx90a -lcgpu
The ``libcgpu.a`` static archive is a fat-binary containing LLVM-IR for each
supported target device. The supported architectures can be seen using LLVM's
``llvm-objdump`` with the ``--offloading`` flag:
.. code-block:: sh
$> llvm-objdump --offloading libcgpu.a
libcgpu.a(strcmp.cpp.o): file format elf64-x86-64
OFFLOADING IMAGE [0]:
kind llvm ir
arch gfx90a
triple amdgcn-amd-amdhsa
producer none
Because the device code is stored inside a fat binary, it can be difficult to
inspect the resulting code. This can be done using the following utilities:
.. code-block:: sh
$> llvm-ar x libcgpu.a strcmp.cpp.o
$> clang-offload-packager strcmp.cpp.o --image=arch=gfx90a,file=gfx90a.bc
$> opt -S out.bc
...
Please note that this fat binary format is provided for compatibility with
existing offloading toolchains. The implementation in ``libc`` does not depend
on any existing offloading languages and is completely freestanding.

View File

@ -1,169 +0,0 @@
.. _GPU_mode:
==============
GPU Mode
==============
.. include:: check.rst
.. contents:: Table of Contents
:depth: 4
:local:
.. note:: This feature is very experimental and may change in the future.
The *GPU* mode of LLVM's libc is an experimental mode used to support calling
libc routines during GPU execution. The goal of this project is to provide
access to the standard C library on systems running accelerators. To begin using
this library, build and install the ``libcgpu.a`` static archive following the
instructions in :ref:`building_gpu_mode` and link with your offloading
application.
.. _building_gpu_mode:
Building the GPU library
========================
LLVM's libc GPU support *must* be built using the same compiler as the final
application to ensure relative LLVM bitcode compatibility. This can be done
automatically using the ``LLVM_ENABLE_RUNTIMES=libc`` option. Furthermore,
building for the GPU is only supported in :ref:`fullbuild_mode`. To enable the
GPU build, set the target OS to ``gpu`` via ``LLVM_LIBC_TARGET_OS=gpu``. By
default, ``libcgpu.a`` will be built using every supported GPU architecture. To
restrict the number of architectures build, set ``LLVM_LIBC_GPU_ARCHITECTURES``
to the list of desired architectures or use ``all``. A typical ``cmake``
configuration will look like this:
.. code-block:: sh
$> cd llvm-project # The llvm-project checkout
$> mkdir build
$> cd build
$> cmake ../llvm -G Ninja \
-DLLVM_ENABLE_PROJECTS="clang;lld;compiler-rt" \
-DLLVM_ENABLE_RUNTIMES="libc;openmp" \
-DCMAKE_BUILD_TYPE=<Debug|Release> \ # Select build type
-DLLVM_LIBC_FULL_BUILD=ON \ # We need the full libc
-DLIBC_GPU_BUILD=ON \ # Build in GPU mode
-DLLVM_LIBC_GPU_ARCHITECTURES=all \ # Build all supported architectures
-DCMAKE_INSTALL_PREFIX=<PATH> \ # Where 'libcgpu.a' will live
$> ninja install
Since we want to include ``clang``, ``lld`` and ``compiler-rt`` in our
toolchain, we list them in ``LLVM_ENABLE_PROJECTS``. To ensure ``libc`` is built
using a compatible compiler and to support ``openmp`` offloading, we list them
in ``LLVM_ENABLE_RUNTIMES`` to build them after the enabled projects using the
newly built compiler. ``CMAKE_INSTALL_PREFIX`` specifies the installation
directory in which to install the ``libcgpu.a`` library along with LLVM.
Usage
=====
Once the ``libcgpu.a`` static archive has been built in
:ref:`building_gpu_mode`, it can be linked directly with offloading applications
as a standard library. This process is described in the `clang documentation
<https://clang.llvm.org/docs/OffloadingDesign.html>_`. This linking mode is used
by the OpenMP toolchain, but is currently opt-in for the CUDA and HIP toolchains
using the ``--offload-new-driver``` and ``-fgpu-rdc`` flags. A typical usage
will look this this:
.. code-block:: sh
$> clang foo.c -fopenmp --offload-arch=gfx90a -lcgpu
The ``libcgpu.a`` static archive is a fat-binary containing LLVM-IR for each
supported target device. The supported architectures can be seen using LLVM's
objdump with the ``--offloading`` flag:
.. code-block:: sh
$> llvm-objdump --offloading libcgpu.a
libcgpu.a(strcmp.cpp.o): file format elf64-x86-64
OFFLOADING IMAGE [0]:
kind llvm ir
arch gfx90a
triple amdgcn-amd-amdhsa
producer <none>
Because the device code is stored inside a fat binary, it can be difficult to
inspect the resulting code. This can be done using the following utilities:
.. code-block:: sh
$> llvm-ar x libcgpu.a strcmp.cpp.o
$> clang-offload-packager strcmp.cpp.o --image=arch=gfx90a,file=gfx90a.bc
$> opt -S out.bc
...
Supported Functions
===================
The following functions and headers are supported at least partially on the
device. Currently, only basic device functions that do not require an operating
system are supported on the device. Supporting functions like `malloc` using an
RPC mechanism is a work-in-progress.
ctype.h
-------
============= =========
Function Name Available
============= =========
isalnum |check|
isalpha |check|
isascii |check|
isblank |check|
iscntrl |check|
isdigit |check|
isgraph |check|
islower |check|
isprint |check|
ispunct |check|
isspace |check|
isupper |check|
isxdigit |check|
toascii |check|
tolower |check|
toupper |check|
============= =========
string.h
--------
============= =========
Function Name Available
============= =========
bcmp |check|
bzero |check|
memccpy |check|
memchr |check|
memcmp |check|
memcpy |check|
memmove |check|
mempcpy |check|
memrchr |check|
memset |check|
stpcpy |check|
stpncpy |check|
strcat |check|
strchr |check|
strcmp |check|
strcpy |check|
strcspn |check|
strlcat |check|
strlcpy |check|
strlen |check|
strncat |check|
strncmp |check|
strncpy |check|
strnlen |check|
strpbrk |check|
strrchr |check|
strspn |check|
strstr |check|
strtok |check|
strtok_r |check|
strdup
strndup
============= =========

View File

@ -52,7 +52,7 @@ stages there is no ABI stability in any form.
usage_modes
overlay_mode
fullbuild_mode
gpu_mode
gpu/index.rst
.. toctree::
:hidden: