llvm-project

mirror of https://github.com/llvm/llvm-project.git synced 2025-05-01 21:26:05 +00:00

Author	SHA1	Message	Date
Joseph Huber	ed801ad5e5	[Clang] Use metadata to make identifying embedded objects easier Currently we use the `embedBufferInModule` function to store binary strings containing device offloading data inside the host object to create a fatbinary. In the case of LTO, we need to extract this object from the LLVM-IR. This patch adds a metadata node for the embedded objects containing the embedded pointers and the sections they were stored at. This should create a cleaner interface for identifying these values. In the future it may be worthwhile to also encode an `ID` in the metadata corresponding to the object's special section type if relevant. This would allow us to extract the data from an object file and LLVM-IR using the same ID. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D129033	2022-07-07 12:20:25 -04:00
Joseph Huber	0bb1bf1b17	[LinkerWrapper] Add AMDGPU specific options to the LLD invocation We use LLD to perform AMDGPU linking. This linker accepts some arguments through the `-plugin-opt` facilities. These options match what `Clang` will output when given the same input. Reviewed By: yaxunl Differential Revision: https://reviews.llvm.org/D128923	2022-07-05 13:43:51 -04:00
Joseph Huber	1dcbe03c32	[Binary] Further improve malformed input handling for the OffloadBinary Summary: This patch adds some new sanity checks to make sure that the sizes of the offsets are within the bounds of the file or what is expected by the binary. This also improves the error handling of the version structure to be built into the binary itself so we can change it easier.	2022-06-24 09:57:44 -04:00
Joseph Huber	6e6889288c	[Offloading] Embed the target features in the OffloadBinary The target features are necessary for correctly compiling most programs in LTO mode. Currently, these are derived in clang at link time and passed as an arguemnt to the linker wrapper. This is problematic because it requires knowing the required toolchain at link time, which should not be necessry. Instead, these features should be embedded into the offloading binary so we can unify them in the linker wrapper for LTO. This also required changing the offload packager to interpret multiple arguments as concatenation with a comma. This is so we can still use the `,` separator for the argument list. Depends on D127246 Reviewed By: tra Differential Revision: https://reviews.llvm.org/D127686	2022-06-23 13:15:01 -04:00
Joseph Huber	7597988729	[LinkerWrapper][NFC] Change interface to use a StringRef to TempFiles Summary: Currently we use temporary files to write the intermediate results to. However, these are stored as regular strings and we do a few unnecessary copies and conversions of them. This patch simply replaces these strings with a reference to the filename stored in the list of temporary files. The temporary files will stay alive during the whole linking phase and have stable pointers, so we should be able to cheaply pass references to them rather than copying them every time.	2022-06-22 13:16:37 -04:00
Joseph Huber	a9fd8b9113	[LinkerWrapper] Fix calls to deleted Error constructor on older compilers Summary: A recent patch added some new code paths to the linker wrapper. Older compilers seem to have problems with returning errors wrapped in an Excepted type without explicitly moving them. This caused failures in some of the buildbots. This patch fixes that.	2022-06-22 09:39:23 -04:00
Joseph Huber	958a885050	[LinkerWrapper] Rework the linker wrapper and use owning binaries The linker wrapper currently eagerly extracts all identified offloading binaries to a file. This isn't ideal because we will soon open these files again to examine their symbols for LTO and other things. Additionally, we may not use every extracted file in the case of static libraries. This would be very noisy in the case of static libraries that may contain code for several targets not participating in the current link. Recent changes allow us to treat an Offloading binary as a standard binary class. So that allows us to use an OwningBinary to model the file. Now we keep it in memory and only write it once we know which files will be participating in the final link job. This also reworks a lot of the structure around how we handle this by removing the old DeviceFile class. The main benefit from this is that the following doesn't output 32+ files and instead will only output a single temp file for the linked module. ``` $ clang input.c -fopenmp --offload-arch=sm_70 -foffload-lto -save-temps ``` Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D127246	2022-06-22 09:24:10 -04:00
Fangrui Song	95a134254a	Remove unneeded cl::ZeroOrMore for cl::opt/cl::list options	2022-06-05 01:07:51 -07:00
Fangrui Song	d0d1c416cb	Remove unneeded cl::ZeroOrMore for cl::list options	2022-06-04 23:51:13 -07:00
Fangrui Song	734c223445	[clang-link-wrapper] Remove unneeded cl::ZeroOrMore for cl::opt options. NFC Similar to 557efc9a8b68628c2c944678c6471dac30ed9e8e	2022-06-03 22:02:11 -07:00
Joseph Huber	3723868d9e	[OpenMP] Fix file arguments for embedding bitcode in the linker wrapper Summary: The linker wrapper supports embedding bitcode images instead of linked device images to facilitate JIT in the device runtime. However, we were incorrectly passing in the file twice when this option was set. This patch makes sure we only use the intermediate result of the LTO pass and don't add the final output to the full job. In the future we will want to add both of these andle handle that accoridngly to allow the runtime to either use the AoT compiled version or JIT compile the bitcode version if availible.	2022-05-24 13:45:52 -04:00
Joseph Huber	f37101983f	[OpenMP] Add `-Xoffload-linker` to forward input to the device linker We use the clang-linker-wrapper to perform device linking of embedded offloading object files. This is done by generating those jobs inside of the linker-wrapper itself. This patch adds an argument in Clang and the linker-wrapper that allows users to forward input to the device linking phase. This can either be done for every device linker, or for a specific target triple. We use the `-Xoffload-linker <arg>` and the `-Xoffload-linker-<triple> <arg>` syntax to accomplish this. Reviewed By: markdewing, tra Differential Revision: https://reviews.llvm.org/D126226	2022-05-24 09:11:02 -04:00
Joseph Huber	8a0fb965f6	[LinkerWrapper] Group static libraries in their own buffer Summary: Static libraries need to be handled differently from regular inpout files, namely they are loaded lazily. Previously we used a flag to indicate a file camm from a static library. This patch simplifies this by simply keeping a different array that contains the static libraries so we don't need to parse them out again.	2022-05-12 20:45:49 -04:00
Joseph Huber	1bfa88d0c5	[LinkerWrapper] Remove stripping features from the linker wrapper Summary: The linker wrapper previously had functionality to strip the sections manually. We don't use this at all because this is much better done by the linker via the `SHF_EXCLUDE` flag. This patch simply removes the support for thi sfeature to simplify the code.	2022-05-12 20:45:49 -04:00
Joseph Huber	42a1fb5ca5	[LinkerWrapper][Fix} Fix bad alignment from extracted archive members Summary: We use embedded binaries to extract offloading device code from the host fatbinary. This uses a binary format whose necessary alignment is eight bytes. The alignment is included within the ELF section type so the data extracted from the ELF should always be aligned at that amount. However, if this file was extraqcted from a static archive, it was being sent as an offset in the archive file which did not have the same alignment guaruntees as the ELF file. This was causing errors in the UB-sanitizer build as it would occasionally try to access a misaligned address. To fix this, I simply copy the memory directly to a new buffer which is guarnteed to have worst-case alignment of 16 in the case that it's not properly aligned.	2022-05-11 16:56:41 -04:00
Joseph Huber	f49d576a88	[CUDA] Add wrapper code generation for registering CUDA images This patch adds the necessary code generation to create the wrapper code that registers all the globals in CUDA. We create the necessary functions and iterate through the list of `__start_cuda_offloading_entries` to find which globals must be registered. This is very similar to the code generation done currently in Clang for non-rdc builds, but here we are registering a fully linked fatbinary and finding the globals via the above sections. With this we should be able to fully support basic RDC / LTO building of CUDA code. It's also worth noting that this does not include the necessary PTX to JIT the image, so to use this support the offloading architecture must match the system's architecture. Depends on D123810 Reviewed By: tra Differential Revision: https://reviews.llvm.org/D123812	2022-05-11 07:30:25 -04:00
Joseph Huber	e7858a9fab	[Cuda] Add initial support for wrapping CUDA images in the new driver. This patch adds the initial support for wrapping CUDA images. This requires changing some of the logic for how we bundle images. We now need to copy the image for all kinds that are active for the architecture. Then we need to run a separate wrapping job if the Kind is Cuda. For cuda wrapping we need to use the `fatbinary` program from the CUDA SDK to bundle all the binaries together. This is then passed to a new function to perfom the actual module code generation that will be implemented in a later patch. Depends on D120273 D123471 Reviewed By: tra Differential Revision: https://reviews.llvm.org/D123810	2022-05-11 07:30:23 -04:00
Joseph Huber	e12905b4d5	[OpenMP] Add basic support for properly handling static libraries Currently we handle static libraries like any other object in the linker wrapper. However, this does not preserve the sematnics that dictate static libraries should be lazily loaded as the symbols are needed. This allows us to ignore linking in architectures that are not used by the main application being compiled. This patch adds the basic support for detecting if a file came from a static library, and only including it in the link job if it's used by other object files. This patch only adds the basic support, to be more correct we should check the symbols and only inclue the library if the link job contains symbols that are needed. Ideally we could just put this on the linker itself, but nvlink doesn't seem to support `.a` files. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D125092	2022-05-06 11:20:58 -04:00
Joseph Huber	46a5a8029e	[OpenMP] Fix save-temps name in linker wrapper Summary: The wrapped registration code had a typo in the save-temps version of the name.	2022-05-03 20:51:05 -04:00
Joseph Huber	9f7ac522ae	[OpenMP] Fix printing commands twice in verbose mode Summary: A previous patch merged the command execution and printing into a helper function. The old printing code wasn't removed causing each to be printed twice.	2022-04-29 23:06:22 -04:00
Joseph Huber	d9c64d33b9	[OpenMP] Allow CUDA to be linked with OpenMP using the new driver After basic support for embedding and handling CUDA files was added to the new driver, we should be able to call CUDA functions from OpenMP code. This patch makes the necessary changes to successfuly link in CUDA programs that were compiled using the new driver. With this patch it should be possible to compile device-only CUDA code (no kernels) and call it from OpenMP as follows: ``` $ clang++ cuda.cu -fopenmp-new-driver -offload-arch=sm_70 -c $ clang++ openmp.cpp cuda.o -fopenmp-new-driver -fopenmp -fopenmp-targets=nvptx64 -Xopenmp-target=nvptx64 -march=sm_70 ``` Currently this requires using a host variant to suppress the generation of a CPU-side fallback call. Depends on D120272 Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D120273	2022-04-29 11:38:40 -04:00
Joseph Huber	2fb131668f	[OpenMP] Fix incorrect path taken when searching for LLD for offloading Summary: A previous patch updated the path searching in the linker wrapper. I made an error and caused `lld`, which is necessary to link AMDGPU images, to not be found on some systems. This patch fixes this by correctly searching that linker-wrapper's binary path first again.	2022-04-26 10:51:04 -04:00
Joseph Huber	3530c35c66	[OpenMP] Use CUDA's non-RDC mode when LTO has whole program visibility When we do LTO we consider ourselves to have whole program visibility if every single input file we have contains LLVM bitcode. If we have whole program visibliity then we can create a single image and utilize CUDA's non-RDC mode by not passing `-c` to `ptxas` and ignoring the `nvlink` job. This should be faster for some situations and also saves us the time executing `nvlink`. Reviewed By: tra Differential Revision: https://reviews.llvm.org/D124292	2022-04-23 12:42:40 -04:00
Joseph Huber	dbb10f7097	[OpenMP] Fix deleted move constructor failing on some compiles Summary: A previous commit added some new errors that were not correctly casted to an r-value. This doesn't work on some compilers.	2022-04-19 18:40:15 -04:00
Joseph Huber	260c5df2d5	[OpenMP] Add better testing for the linker wrapper The linker wrapper is used to perform linking and wrapping of embedded device object files. Currently its internals are not able to be tested easily. This patch adds the `--dry-run` and `--print-wrapped-module` options to investigate the link jobs that will be run along with the wrapped code that will be created to register the binaries. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D124039	2022-04-19 18:37:09 -04:00
Joseph Huber	33b604d1c3	[OpenMP] Fix linting diagnostics in the linker wrapper Summary: A previous patch had some linter warnings that should've been addressed.	2022-04-15 21:19:29 -04:00
Joseph Huber	984a0dc386	[OpenMP] Use new offloading binary when embedding offloading images The previous patch introduced the offloading binary format so we can store some metada along with the binary image. This patch introduces using this inside the linker wrapper and Clang instead of the previous method that embedded the metadata in the section name. Differential Revision: https://reviews.llvm.org/D122683	2022-04-15 20:35:26 -04:00
Joseph Huber	cac81161ed	[OpenMP] Don't manually strip sections in the linker wrapper Summary: The changes in D122987 ensures that the offloading sections always have the SHF_EXCLUDE flag. This means that we do not need to manually strip these sections for ELF or COFF targets.	2022-04-15 20:35:25 -04:00
Joseph Huber	a1d57fc225	[OpenMP] Do not use the default pipeline without optimizations Summary: A previous patch added the option to use the default pipeline when perfomring LTO rather than the regular LTO pipeline. This greatly improved performance regressions we were observing with the LTO pipeline. However, this should not be used if the user explicitly disables optimizations as the default pipeline expects some optimizatoins to be perfomed.	2022-04-11 17:27:38 -04:00
Joseph Huber	69a77771a9	[OpenMP] Make linker wrapper thin-lto default thread count use all Summary: Currently there is no option to configure the number of thin-backend threads to use when performing thin-lto on the device, but we should default to use all the threads rather than just one. In the future we should use the same arguments that gold / lld use and parse it here.	2022-04-01 09:44:28 -04:00
Joseph Huber	5856f30b5a	[LTO] Add configuartion option to use default optimization pipeline This patch adds a configuration option to simply use the default pass pipeline in favor of the LTO-specific one. We observed some severe performance penalties when uding device-side LTO for OpenMP offloading applications caused by the LTO-pass pipeline. This is primarily because OpenMP uses an LLVM bitcode library to implement a GPU runtime library. In a standard compilation we link this bitcode library into each source file and optimize it with the default pipeline. When performing LTO we link it late with all the files, but the bitcode library never has the regular optimization pipeline applied to it so we miss a few optimizations just using the LTO pipeline to optimize it. I'm not committed to this solution, but it's the easiest method to solve this performance regression when using LTO without changing the optimizatin pipeline for other users. Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D122133	2022-03-22 09:28:45 -04:00
Joseph Huber	9f89769cd7	[Clang] Add offload kind to embedded offload object This patch adds the offload kind to the embedded section name in preparation for offloading to different kinda like CUDA or HIP. Depends on D120288 Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D120271	2022-03-14 20:08:27 -04:00
Joseph Huber	06b336c4cd	[OpenMP] Implement dense map info for device file This patch implements a DenseMap info struct for the device file type. This is used to help grouping device files that have the same triple and architecture. Because of this the filename, which will always be unique for each file, is not used. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D120288	2022-03-14 20:08:26 -04:00
Joseph Huber	3f7c3ff90e	[OpenMP] Handle sysroot option in offloading linker wrapper Summary: This patch correctly handles the `--sysroot=` option when passed to the linker wrapper. This allows users to correctly find libraries that may contain offloading code if using this option.	2022-03-02 13:02:41 -05:00
Joseph Huber	d5b2055769	[OpenMP] Add verbose output for linker wrapper Summary; This path adds printing support for the linker wrapper. When the user passes `-v` it will not print the commands used by the linker wrapper to indicate to the user what is happening during the linking.	2022-02-28 13:28:19 -05:00
Joseph Huber	6a0b78af91	[OpenMP] Remove static allocator in linker wrapper Summary: We don't need this static allocator to survive the entire file, the strings stored have a defined lifetime.	2022-02-22 21:22:19 -05:00
Joseph Huber	55cb84d9fb	[OpenMP] Unrecognized objects should not be considered failure Summary: This patch removes the error we recieve when attempting to extract offloading sections. We shouldn't consider this a failure because extracting bitcode isn't necessarily required.	2022-02-22 21:22:18 -05:00
Joseph Huber	55639c2f7c	[OpenMP] Properly save strings when doing LTO Summary: We were not previously saving strings when saving symbol names during LTO symbol resolution. This caused a crash inside the dense set when some of the strings would rarely be moved internally by the object file class.	2022-02-16 16:40:39 -05:00
Joseph Huber	24ecafb413	[OpenMP] Add support for CPU offloading in new driver This patch adds support for linking CPU offloading applications in the linker wrapper. We generate the necessary linking job using the host linker's path and library arguments. This may not be true for more complex offloading schemes, but this is sufficient for now. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D119613	2022-02-15 15:05:30 -05:00
Joseph Huber	7ee8bd60f2	[OpenMP] Use executable path when searching for lld Summary: This patch changes the ClangLinkerWrapper to use the executable path when searching for the lld binary. Previously we relied on the program name. Also not finding 'llvm-strip' is not considered an error anymore because it is an optional optimization.	2022-02-07 15:09:51 -05:00
Kelvin Li	8ea4aed50a	[OpenMP] Add search path for llvm-strip Add the build directory to the search path for llvm-strip instead of solely relying on the PATH environment variable setting. Reviewed By: jhuber6 Differential Revision: https://reviews.llvm.org/D118965	2022-02-04 22:15:14 -05:00
Joseph Huber	8cc4ca95b0	[OpenMP] Add Cuda path to linker wrapper tool The linker wrapper tool uses the 'nvlink' and 'ptxas' binaries to link and assemble device files. Previously we searched for this using the binaries in the user's path. This didn't work in cases where the user passed in a specific Cuda path to Clang. This patch changes the linker wrapper to accept an argument for the Cuda path we can get from Clang. This should fix #53573. Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D118944	2022-02-03 20:39:18 -05:00
Joseph Huber	19fac745e3	[OpenMP] Remove call to 'clang-offload-wrapper' binary Summary: This patch removes the system call to the `clang-offload-wrapper` tool by replicating its functionality in a new file. This improves performance and makes the future wrapping functionality easier to change. Differential Revision: https://reviews.llvm.org/D118198	2022-01-31 23:11:43 -05:00
Joseph Huber	eb6ddf288c	[OpenMP] Replace sysmtem call to `llc` with target machine Summary: This patch replaces the system call to the `llc` binary with a library call to the target machine interface. This should be faster than relying on an external system call to compile the final wrapper binary. Differential Revision: https://reviews.llvm.org/D118197	2022-01-31 23:11:42 -05:00
Joseph Huber	9375f1563e	[OpenMP] Cleanup the Linker Wrapper Summary: Various changes and cleanup for the Linker Wrapper tool.	2022-01-31 23:11:42 -05:00
Joseph Huber	58dc981e08	[OpenMP] Include the executable name in the temporary files Summary: This parses the executable name out of the linker arguments so we can use it to give more informative temporary file names and so we don't accidentally use it for device linking.	2022-01-31 23:11:42 -05:00
Joseph Huber	bf499c58af	[OpenMP] Implement save temps functionality in linker wrapper Summary: This patch implements the `-save-temps` flag for the linker wrapper. This allows the user to inspect the intermeditary outpout that the linker wrapper creates.	2022-01-31 23:11:42 -05:00
Joseph Huber	a47b1cf306	[OpenMP] Embed bitcode after optimizations instead of linking Summary: Various changes to the linker wrapper, and the bitcode embedding is not done after the optimizations have run rather than after linking is done. This saves time when doing JIT.	2022-01-31 23:11:42 -05:00
Joseph Huber	46d019041c	[OpenMP] Improve symbol resolution for OpenMP Offloading LTO This patch improves the symbol resolution done for LTO with offloading applications. The symbol resolution done here allows the LTO backend to internalize more functions. The symbol resoltion done is a simplified view that does not take into account various options like `--wrap` or `--dyanimic-list` and always assumes we are creating a shared object. The actual target may be an executable, but semantically it is used as a shared object because certain objects need to be visible outside of the executable when they are read by the OpenMP plugin. Depends on D117246 Differential Revision: https://reviews.llvm.org/D118155	2022-01-31 23:11:42 -05:00
Joseph Huber	ce16ca3c74	[OpenMP] Add support for linking AMDGPU images This patch adds support for linking AMDGPU images using the LLD binary. AMDGPU files are always bitcode images and will always use the LTO backend. Additionally we now pass the default architecture found with the `amdgpu-arch` tool to the argument list. Depends on D117156 Differential Revision: https://reviews.llvm.org/D117246	2022-01-31 23:11:42 -05:00

1 2

58 Commits