Summary:
These flags were used in the very early days while we were trying to
port stuff. Now that we just pass bitcode to the device link job it
can be easily replaced by `-Xoffload-linker foo.bc`.
Summary:
One of the downsides of the linker wrapper is that it made debugging
more difficult. It is very powerful in that it can resolve a lot of
input matching and library handling that could not be done before.
However, the old method allowed users to simply copy-paste the script
files to modify the output and test it.
This patch attempts to make it easier to debug changes by letting the
user override all the linker inputs. That is, we provide a user-created
binary that is treated like the final output of the device link step.
The intended use-case is for using `-save-temps` to get some IR, then
modifying the IR and sticking it back in to see if it exhibits the old
failures.
Summary:
The standard GPU compilation process embeds each intermediate object
file into the host file at the `.llvm.offloading` section so it can be
linked later. We also use a special section called something like
`omp_offloading_entries` to store all the globals that need to be
registered by the runtime. The linker-wrapper's job is to link the
embedded device code stored at this section and then emit code to
register the linked image and the kernels and globals in the offloading
entry section.
One downside to RDC linking is that it can become quite big for very
large projects that wish to make use of static linking. This patch
changes the support for relocatable linking via `-r` to support a kind
of "partial" RDC compilation for offloading languages.
This primarily requires manually editing the embedded data in the
output object file for the relocatable link. We need to rename the
output section to make it distinct from the input sections that will be
merged. We then delete the old embedded object code so it won't be
linked further. We then need to rename the old offloading section so
that it is private to the module. A runtime solution could also be done
to defer entries that don't belong to the given GPU executable, but this
is easier. Note that this does not work with COFF linking, only the ELF
method for handling offloading entries, that could be made to work
similarly.
Given this support, the following compilation path should produce two
distinct images for OpenMP offloading.
```
$ clang foo.c -fopenmp --offload-arch=native -c
$ clang foo.c -lomptarget.devicertl --offload-link -r -o merged.o
$ clang main.c merged.o -fopenmp --offload-arch=native
$ ./a.out
```
Or similarly for HIP to effectively perform non-RDC mode compilation for
a subset of files.
```
$ clang -x hip foo.c --offload-arch=native --offload-new-driver -fgpu-rdc -c
$ clang -x hip foo.c -lomptarget.devicertl --offload-link -r -o merged.o
$ clang -x hip main.c merged.o --offload-arch=native --offload-new-driver -fgpu-rdc
$ ./a.out
```
One question is whether or not this should be the default behavior of
`-r` when run through the linker-wrapper or a special option. Standard
`-r` behavior is still possible if used without invoking the
linker-wrapper and it guaranteed to be correct.
This patch adds more documentation for the OpenMP offloading driver.
This includes a new file that describes the overall pipeline becuase
that was not previously explained in full elsewhere.
Reviewed By: ABataev
Differential Revision: https://reviews.llvm.org/D118815
This patch introduces a linker wrapper tool that allows us to preprocess
files before they are sent to the linker. This adds a dummy action and
job to the driver stage that builds the linker command as usual and then
replaces the command line with the wrapper tool.
Depends on D116543
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D116544