100 Commits

Author SHA1 Message Date
Joseph Huber
75f810e025
[Offload] Guard HSA implicit arguments if they aren't created (#133073)
Summary:
We conditionally allocate the implicit arguments, so they possibly are
null. The flang compiler seems to hit this case, even though it
shouldn't when it's supposed to conform to the HSA code object. For now
guard this to fix the regression and cover a case in the future where
someone rolls a fully custom implementatation.

Fixes: https://github.com/llvm/llvm-project/issues/132982
2025-03-26 08:54:33 -05:00
Joseph Huber
25bf4e262c
[Offload] Remove handling for COV4 binaries from offload/ (#131033)
Summary:
We moved from cov4 to cov5 a long time ago, and it guards simplifying
some front end code, so we should be able to move up with this.
2025-03-24 18:58:20 -05:00
Ethan Luis McDonough
c50d39f073
[PGO][Offload] Allow PGO flags to be used on GPU targets (#94268)
This pull request is the third part of an ongoing effort to extends PGO
instrumentation to GPU device code and depends on
https://github.com/llvm/llvm-project/pull/93365. This PR makes the
following changes:

- Allows PGO flags to be supplied to GPU targets
- Pulls version global from device
- Modifies `__llvm_write_custom_profile` and `lprofWriteDataImpl` to
allow the PGO version to be overridden
2025-03-19 19:01:38 -05:00
Joseph Huber
8437b7f558
[libc] Make RPC server handling header only (#131205)
Summary:
This patch moves the RPC server handling to be a header only utility
stored in the `shared/` directory. This is intended to be shared within
LLVM for the loaders and `offload/` handling.

Generally, this makes it easier to share code without weird
cross-project binaries being plucked out of the build system. It also
allows us to soon move the loader interface out of the `libc` project so
that we don't need to bootstrap those and can build them in LLVM.
2025-03-13 19:23:21 -05:00
Nikita Popov
f137c3d592
[TargetRegistry] Accept Triple in createTargetMachine() (NFC) (#130940)
This avoids doing a Triple -> std::string -> Triple round trip in lots
of places, now that the Module stores a Triple.
2025-03-12 17:35:09 +01:00
Nikita Popov
4f469ae046 [offload] Fix build after Module::getTargetTriple() change
Adjust for #129868.
2025-03-06 11:04:00 +01:00
Nikita Popov
979c275097
[IR] Store Triple in Module (NFC) (#129868)
The module currently stores the target triple as a string. This means
that any code that wants to actually use the triple first has to
instantiate a Triple, which is somewhat expensive. The change in #121652
caused a moderate compile-time regression due to this. While it would be
easy enough to work around, I think that architecturally, it makes more
sense to store the parsed Triple in the module, so that it can always be
directly queried.

For this change, I've opted not to add any magic conversions between
std::string and Triple for backwards-compatibilty purses, and instead
write out needed Triple()s or str()s explicitly. This is because I think
a decent number of them should be changed to work on Triple as well, to
avoid unnecessary conversions back and forth.

The only interesting part in this patch is that the default triple is
Triple("") instead of Triple() to preserve existing behavior. The former
defaults to using the ELF object format instead of unknown object
format. We should fix that as well.
2025-03-06 10:27:47 +01:00
Alex
b8a66f50b4
[OFFLOAD] Update ffi_cif structure to match libffi (#128756)
The ffi_cif structure defined in the wrapper header is smaller than the
actual structure in libffi which results in other structures being
overwritten when libffi is called, and finally in a segfault.

The patch updates the structure to the correct layout as specified in
ffi.h
2025-03-04 11:40:12 -06:00
Jan Patrick Lehr
17ccaf4fa8
[NFC][Offload] Fix typo to output architecture (#128527) 2025-02-24 16:54:21 +01:00
Fabian Ritter
a2f9ae1421
[AMDGPU] Replace gfx940 and gfx941 with gfx942 in offload and libclc (#125826)
gfx940 and gfx941 are no longer supported. This is one of a series of
PRs to remove them from the code base.

For SWDEV-512631 and SWDEV-512633
2025-02-19 09:56:04 +01:00
Ethan Luis McDonough
9e5c136d5a
[PGO][Offload] Profile profraw generation for GPU instrumentation #76587 (#93365)
This pull request is the second part of an ongoing effort to extends PGO
instrumentation to GPU device code and depends on #76587. This PR makes
the following changes:

- Introduces `__llvm_write_custom_profile` to PGO compiler-rt library.
This is an external function that can be used to write profiles with
custom data to target-specific files.
- Adds `__llvm_write_custom_profile` as weak symbol to libomptarget so
that it can write the collected data to a profraw file.
- Adds `PGODump` debug flag and only displays dump when the
aforementioned flag is set
2025-02-11 23:30:54 -06:00
Joseph Huber
baf7a3c1e5
[Offload] Properly guard modifications to the RPC device array (#126790)
Summary:
If the user deallocates an RPC device this can sometimes fail if the RPC
server is still running. This will happen if the modification happens
while the server is still checking it. This patch adds a mutex to guard
modifications to it.
2025-02-11 14:57:31 -06:00
Joseph Huber
5812d0bf8e
[Offload] Make only a single thread handle the RPC server thread (#126067)
Summary:
This patch just changes the interface to make starting the thread
multiple times permissable since it will only be done the first time.
Note that this does not refcount it or anything, so it's onto the user
to make sure that they don't shut down the thread before everyone is
done using it. That is the case today because the shutDown portion is
run by a single thread in the destructor phase.

Another question is if we should make this thread truly global state,
because currently it will be private to each plugin instance, so if you
have an AMD and NVIDIA image there will be two, similarly if you have
those inside of a shared library.
2025-02-06 11:38:14 -06:00
Joseph Huber
7a8779422d
[Offload] Stop the RPC server faiilng with more than one GPU (#125982)
Summary:
Pretty dumb mistake of me, forgot that this is run per-device and
per-plugin, which fell through the cracks with my testing because I have
two GPUs that use different plugins.
2025-02-05 20:51:28 -06:00
Joseph Huber
a284a6ed17 [OpenMP] Guard OpenMP specific entry handling 2025-02-03 16:16:18 -06:00
Michał Górny
359a913170
[offload] gnu::format with variadic template functions is Clang-only (#124406)
Use `gnu::format` attribute only when compiling with Clang, as using it
against variadic template functions is a Clang extension and is not
supported by GCC.

See https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77958

Fixes #119069
2025-02-02 15:55:22 +00:00
Christian Clauss
1f56bb3137
[Offload][NFC] Fix typos discovered by codespell (#125119)
https://github.com/codespell-project/codespell

% `codespell
--ignore-words-list=archtype,hsa,identty,inout,iself,nd,te,ths,vertexes
--write-changes`
2025-01-31 09:35:29 -06:00
Joseph Huber
38b3f45a81 [Offload] Fix offload-info interface
Summary:
The offload info tool doesn't initialize things properly, just check
this first instead.
2025-01-27 10:36:09 -06:00
Joseph Huber
f07505849c [Offload] Fix server thread from being shut down if unused 2025-01-27 08:29:41 -06:00
Joseph Huber
e7592d83e0 [Offload][NFC] Make sure the thread is not running already 2025-01-27 08:06:29 -06:00
Joseph Huber
bd8a818128 [Offload] Add cuLaunchHostFunc to dynamic cuda
Summary:
This was missing, causing non-directly linked builds to fail.
2025-01-24 11:41:20 -06:00
Joseph Huber
134401deea
[Offload] Move RPC server handling to a dedicated thread (#112988)
Summary:
Handling the RPC server requires running through list of jobs that the
device has requested to be done. Currently this is handled by the thread
that does the waiting for the kernel to finish. However, this is not
sound on NVIDIA architectures and only works for async launches in the
OpenMP model that uses helper threads.

However, we also don't want to have this thread doing work
unnnecessarily. For this reason we track the execution of kernels and
cause the thread to sleep via a condition variable (usually backed by
some kind of futex or other intelligent sleeping mechanism) so that the
thread will be idle while no kernels are running.
2025-01-24 11:36:45 -06:00
hidekisaito
ed512710a5
[Offload] Make MemoryManager threshold ENV var size_t type. (#124063) 2025-01-23 11:46:56 -06:00
Joseph Huber
6518b121f0
[Offload][NFC] Factor out and rename the __tgt_offload_entry struct (#123785)
Summary:
This patch is an NFC renaming to make using the offloading entry type
more portable between other targets. Right now this is just moving its
definition to LLVM so others can use it. Future work will rework the
struct layout.
2025-01-21 12:05:24 -06:00
Jinsong Ji
8d1d67ec4d
[Offload][PGO] Fix dump of array in ProfData (#122039)
Exposed by -Warray-bounds:

In file included from
../../../../../../../llvm/offload/plugins-nextgen/common/src/GlobalHandler.cpp:252:

../../../../../../../llvm/llvm/include/llvm/ProfileData/InstrProfData.inc:109:1:
error: array index 4 is past the end of the array (that has type 'const
std::remove_const<const uint16_t>::type[4]' (aka 'const unsigned
short[4]')) [-Werror,-Warray-bounds]
109 | INSTR_PROF_DATA(const uint16_t, Int16ArrayTy,
NumValueSites[IPVK_Last+1], \
| ^ ~~~~~~~~~~~

../../../../../../../llvm/offload/plugins-nextgen/common/src/GlobalHandler.cpp:250:15:
note: expanded from macro 'INSTR_PROF_DATA'
250 | outs() << ProfData.Name << " "; \
      |               ^        ~~~~

../../../../../../../llvm/llvm/include/llvm/ProfileData/InstrProfData.inc:109:1:
note: array 'NumValueSites' declared here
109 | INSTR_PROF_DATA(const uint16_t, Int16ArrayTy,
NumValueSites[IPVK_Last+1], \
      | ^

../../../../../../../llvm/offload/plugins-nextgen/common/include/GlobalHandler.h:62:3:
note: expanded from macro 'INSTR_PROF_DATA'
   62 |   std::remove_const<Type>::type Name;

Avoid accessing out-of-bound data, but skip printing array data for now.
As there is no simple way to do this without hardcoding the
NumValueSites field.

---------

Co-authored-by: Ethan Luis McDonough <ethanluismcdonough@gmail.com>
2025-01-14 15:46:27 -05:00
wanglei
bdf727065b
[Offload] Add support for loongarch64 to host plugin
This adds support for the loongarch64 architecture to the offload host
plugin.

Similar to #115773

To fix some test issues, I've had to add the LoongArch64 target to:

- CompilerInvocation::ParseLangArgs
- linkDevice in ClangLinuxWrapper.cpp
- OMPContext::OMPContext (to set the device_kind_cpu trait)

Reviewed By: jhuber6

Pull Request: https://github.com/llvm/llvm-project/pull/120173
2024-12-17 19:06:10 +08:00
Jinsong Ji
e85a9f5540
libc: Prefix RPC Status code to avoid conflict in windows build (#119991)
Somehow conflict with define in wingdi.h.

Fix build failures:

[ 52%] Building CXX object
projects/offload/plugins-nextgen/common/CMakeFiles/PluginCommon.dir/src/RPC.cpp.obj
In file included from
...llvm\offload\plugins-nextgen\common\src\RPC.cpp:16:
...\llvm\libc\shared\rpc.h(48,3): error: expected identifier
   48 |   ERROR = 0x1000,
      |   ^
c:\Program files (x86)\Windows
Kits\10\include\10.0.22000.0\um\wingdi.h(118,29): note: expanded from
macro 'ERROR'
  118 | #define ERROR               0
      |                             ^
...\llvm\offload\plugins-nextgen\common\src\RPC.cpp(75,17): error:
expected unqualified-id
   75 |     return rpc::ERROR;
      |                 ^
c:\Program files (x86)\Windows
Kits\10\include\10.0.22000.0\um\wingdi.h(118,29): note: expanded from
macro 'ERROR'
  118 | #define ERROR               0
      |                             ^
2 errors generated.
2024-12-15 09:35:44 -05:00
hidekisaito
f2bceb2311
[Offload][AMDGPU] accept generic target (#118919)
Enables generic ISA, e.g., "--offload-arch=gfx11-generic" device code to
run on gfx11-generic ISA capable device.

Executable may contain one ELF that has specific target ISA and another
ELF that has compatible generic ISA.
Under that circumstance, this code should say both ELFs are compatible,
leaving the rest to PluginManager to handle.
Suggestions on how best to address that is welcome.
2024-12-09 19:11:38 -05:00
Shilei Tian
92376c3ff5
[Offload][OMPX] Add the runtime support for multi-dim grid and block (#118042) 2024-12-06 09:07:50 -05:00
Shilei Tian
eb49788bd9
[Offload][AMDGPU] Allow COV6 images (#118909) 2024-12-05 20:05:39 -05:00
Callum Fare
fd3907ccb5
Reland #118503: [Offload] Introduce offload-tblgen and initial new API implementation (#118614)
Reland #118503. Added a fix for builds with `-DBUILD_SHARED_LIBS=ON`
(see last commit). Otherwise the changes are identical.

---


### New API

Previous discussions at the LLVM/Offload meeting have brought up the
need for a new API for exposing the functionality of the plugins. This
change introduces a very small subset of a new API, which is primarily
for testing the offload tooling and demonstrating how a new API can fit
into the existing code base without being too disruptive. Exact designs
for these entry points and future additions can be worked out over time.

The new API does however introduce the bare minimum functionality to
implement device discovery for Unified Runtime and SYCL. This means that
the `urinfo` and `sycl-ls` tools can be used on top of Offload. A
(rough) implementation of a Unified Runtime adapter (aka plugin) for
Offload is available
[here](https://github.com/callumfare/unified-runtime/tree/offload_adapter).
Our intention is to maintain this and use it to implement and test
Offload API changes with SYCL.

### Demoing the new API

```sh
# From the runtime build directory
$ ninja LibomptUnitTests
$ OFFLOAD_TRACE=1 ./offload/unittests/OffloadAPI/offload.unittests 
```


### Open questions and future work
* Only some of the available device info is exposed, and not all the
possible device queries needed for SYCL are implemented by the plugins.
A sensible next step would be to refactor and extend the existing device
info queries in the plugins. The existing info queries are all strings,
but the new API introduces the ability to return any arbitrary type.
* It may be sensible at some point for the plugins to implement the new
API directly, and the higher level code on top of it could be made
generic, but this is more of a long-term possibility.
2024-12-05 09:34:04 +01:00
hidekisaito
8c36a823c2
Fix to account for multiple ISA enumeration (#118676) 2024-12-04 12:11:16 -06:00
Michał Górny
04b26f0eb7
[offload] Standalone build fixes (#118173)
A fair number of fixes to get standalone builds of offload working —
mostly copying missing bits from openmp. It's almost ready — I still
need to figure out why some of the tsts aren't linking to the right
libraries.
2024-12-04 11:27:37 +00:00
Shilei Tian
68bcba6d7a Revert "[AMDGPU] Use COV6 by default (#118515)"
This reverts commit 410cbe3cf28913cca2fc61b3437306b841d08172 because some
buildbots are not ready yet.
2024-12-03 20:17:06 -05:00
Shilei Tian
410cbe3cf2
[AMDGPU] Use COV6 by default (#118515) 2024-12-03 19:38:35 -05:00
Jan Patrick Lehr
51553227f0
Revert "Reland of #108413: [Offload] Introduce offload-tblgen and initial new API implementation" (#118541)
Reverts llvm/llvm-project#118503

Broke bot
https://lab.llvm.org/staging/#/builders/131/builds/9701/steps/5/logs/stdio
2024-12-03 21:42:38 +01:00
Callum Fare
8da490320f
Reland of #108413: [Offload] Introduce offload-tblgen and initial new API implementation (#118503)
This is another attempt to reland the changes from #108413

The previous two attempts introduced regressions and were reverted. This
PR has been more thoroughly tested with various configurations so
shouldn't cause any problems this time. If anyone is aware of any likely
remaining problems then please let me know.

The changes are identical other than the fixes contained in the last 5
commits.

---


### New API

Previous discussions at the LLVM/Offload meeting have brought up the
need for a new API for exposing the functionality of the plugins. This
change introduces a very small subset of a new API, which is primarily
for testing the offload tooling and demonstrating how a new API can fit
into the existing code base without being too disruptive. Exact designs
for these entry points and future additions can be worked out over time.

The new API does however introduce the bare minimum functionality to
implement device discovery for Unified Runtime and SYCL. This means that
the `urinfo` and `sycl-ls` tools can be used on top of Offload. A
(rough) implementation of a Unified Runtime adapter (aka plugin) for
Offload is available
[here](https://github.com/callumfare/unified-runtime/tree/offload_adapter).
Our intention is to maintain this and use it to implement and test
Offload API changes with SYCL.

### Demoing the new API

```sh
# From the runtime build directory
$ ninja LibomptUnitTests
$ OFFLOAD_TRACE=1 ./offload/unittests/OffloadAPI/offload.unittests 
```


### Open questions and future work
* Only some of the available device info is exposed, and not all the
possible device queries needed for SYCL are implemented by the plugins.
A sensible next step would be to refactor and extend the existing device
info queries in the plugins. The existing info queries are all strings,
but the new API introduces the ability to return any arbitrary type.
* It may be sensible at some point for the plugins to implement the new
API directly, and the higher level code on top of it could be made
generic, but this is more of a long-term possibility.
2024-12-03 16:28:35 +00:00
Joseph Huber
a6ef0debb1 [libc][NFC] Rename RPC opcodes to better reflect their usage
Summary:
RPC_ is a generic prefix here, use LIBC_ to indicate that these are
opcodes used to implement the C library
2024-12-02 15:35:08 -06:00
Joseph Huber
91f5f974cb
[OpenMP] Unconditionally provide an RPC client interface for OpenMP (#117933)
Summary:
This patch adds an RPC interface that lives directly in the OpenMP
device runtime. This allows OpenMP to implement custom opcodes.
Currently this is only providing the host call interface, which is the
raw version of reverse offloading. Previously this lived in `libc/` as
an extension which is not the correct place.

The interface here uses a weak symbol for the RPC client by the same
name that the `libc` interface uses. This means that it will defer to
the libc one if both are present so we don't need to set up multiple
instances.

The presense of this symbol is what controls whether or not we set up
the RPC server. Because this is an external symbol it normally won't be
optimized out, so there's a special pass in OpenMPOpt that deletes this
symbol if it is unused during linking. That means at `O0` the RPC server
will always be present now, but will be removed trivially if it's not
used at O1 and higher.
2024-12-02 14:31:51 -06:00
Jan Patrick Lehr
5208bc3694
Revert "Reland #2 - [Offload] Introduce offload-tblgen and initial new API implementation (#108413. #117704)" (#117995)
Reverts llvm/llvm-project#117894

Buildbot failures in OpenMP/Offload bots.
https://lab.llvm.org/buildbot/#/builders/30/builds/11193
2024-11-28 12:58:17 +01:00
Callum Fare
992b00020f
Reland #2 - [Offload] Introduce offload-tblgen and initial new API implementation (#108413. #117704) (#117894)
Relands #117704, which relanded changes from #108413 - this was reverted
due to build issues. The new offload library did not build with
`LIBOMPTARGET_OMPT_SUPPORT` enabled, which was not picked up by
pre-merge testing.

The last commit contains the fix; everything else is otherwise identical
to the approved PR.
___

### New API

Previous discussions at the LLVM/Offload meeting have brought up the
need for a new API for exposing the functionality of the plugins. This
change introduces a very small subset of a new API, which is primarily
for testing the offload tooling and demonstrating how a new API can fit
into the existing code base without being too disruptive. Exact designs
for these entry points and future additions can be worked out over time.

The new API does however introduce the bare minimum functionality to
implement device discovery for Unified Runtime and SYCL. This means that
the `urinfo` and `sycl-ls` tools can be used on top of Offload. A
(rough) implementation of a Unified Runtime adapter (aka plugin) for
Offload is available
[here](https://github.com/callumfare/unified-runtime/tree/offload_adapter).
Our intention is to maintain this and use it to implement and test
Offload API changes with SYCL.

### Demoing the new API

```sh
# From the runtime build directory
$ ninja LibomptUnitTests
$ OFFLOAD_TRACE=1 ./offload/unittests/OffloadAPI/offload.unittests 
```


### Open questions and future work
* Only some of the available device info is exposed, and not all the
possible device queries needed for SYCL are implemented by the plugins.
A sensible next step would be to refactor and extend the existing device
info queries in the plugins. The existing info queries are all strings,
but the new API introduces the ability to return any arbitrary type.
* It may be sensible at some point for the plugins to implement the new
API directly, and the higher level code on top of it could be made
generic, but this is more of a long-term possibility.
2024-11-28 10:19:37 +00:00
Joseph Huber
a24aa7dfa5
[Offload] Use libc 'hand-in-hand' module to find RPC header (#117928)
Summary:
We should now use the official™ way to include the files from
`libc/shared`. This required some code to make sure that it's not
included twice if multiple people use it as well as a sanity check on
the directory.
2024-11-27 20:14:13 -06:00
Joseph Huber
1d810ece2b
[libc] Move libc server handlers to a shared header (#117908)
Summary:
We can simply include this header from the shared directory now and do
not need to have this level of indirection. Simply stash it with the
other libc opcode handlers.

If we were able to move the printf handlers to the shared directory then
this could just be a header as well, which would HEAVILY simplify the
mess associated with building the RPC server first in the projects
build, then copying it to the runtimes build.
2024-11-27 14:57:52 -06:00
Joseph Huber
89d8e70031
[libc] Export a pointer to the RPC client directly (#117913)
Summary:
We currently have an unnecessary level of indirection when initializing
the RPC client. This is a holdover from when the RPC client was not
trivially copyable and simply makes it more complicated. Here we use the
`asm` syntax to give the C++ variable a valid name so that we can just
copy to it directly.

Another advantage to this, is that if users want to piggy-back on the
same RPC interface they need only declare theirs as extern with the same
symbol name, or make it weak to optionally use it if LIBC isn't
avaialb.e
2024-11-27 14:57:38 -06:00
Fraser Cormack
0cb5846a68 Revert "Reland - [Offload] Introduce offload-tblgen and initial new API implementation (#108413) (#117704)"
This reverts commit c979ec05642f292737d250c6682d85ed49bc7b6e.

This showed failures in the post-merge CI.
2024-11-27 10:49:01 +00:00
Callum Fare
c979ec0564
Reland - [Offload] Introduce offload-tblgen and initial new API implementation (#108413) (#117704)
Relands changes from #108413 - this was reverted due to build issues.
The problem was just that the `offload-tblgen` tool was behind recent
changes to tablegen that ensure `const` records. This has been fixed and
the PR is otherwise identical.

___

### New API

Previous discussions at the LLVM/Offload meeting have brought up the
need for a new API for exposing the functionality of the plugins. This
change introduces a very small subset of a new API, which is primarily
for testing the offload tooling and demonstrating how a new API can fit
into the existing code base without being too disruptive. Exact designs
for these entry points and future additions can be worked out over time.

The new API does however introduce the bare minimum functionality to
implement device discovery for Unified Runtime and SYCL. This means that
the `urinfo` and `sycl-ls` tools can be used on top of Offload. A
(rough) implementation of a Unified Runtime adapter (aka plugin) for
Offload is available
[here](https://github.com/callumfare/unified-runtime/tree/offload_adapter).
Our intention is to maintain this and use it to implement and test
Offload API changes with SYCL.

### Demoing the new API

```sh
# From the runtime build directory
$ ninja LibomptUnitTests
$ OFFLOAD_TRACE=1 ./offload/unittests/OffloadAPI/offload.unittests 
```


### Open questions and future work
* Only some of the available device info is exposed, and not all the
possible device queries needed for SYCL are implemented by the plugins.
A sensible next step would be to refactor and extend the existing device
info queries in the plugins. The existing info queries are all strings,
but the new API introduces the ability to return any arbitrary type.
* It may be sensible at some point for the plugins to implement the new
API directly, and the higher level code on top of it could be made
generic, but this is more of a long-term possibility.
2024-11-27 10:39:07 +00:00
Joseph Huber
d047bee496 Revert "[Offload] Introduce offload-tblgen and initial new API implementation (#108413)"
This reverts commit 8a2311c4bf9993230e37dc20b57973dc917f2338.
2024-11-25 12:16:46 -06:00
Joseph Huber
d7c20a6f0c [libc][NFC] Move RPC opcodes to the 'shared/' directory as well 2024-11-25 12:04:10 -06:00
Callum Fare
8a2311c4bf
[Offload] Introduce offload-tblgen and initial new API implementation (#108413)
Introduce `offload-tblgen` and an initial implementation of a subset of
the new API. The tablegen files are intended to be the single source of
truth for the new API, with the header files, documentation, and others
bits of source all automatically generated.

**TODO** (based on review feedback so far):
- [x] Check in the generated headers
- [x] Add an `offload-generate` target to trigger the generation rather
than building them every time
- [x] Decide how error handling should work
  - [x] Finish up new error handling implementation 
- [x] Decide naming convention
- [x] Add testing for the new API
- [x] Add tablegen specific testing
- [x] clang-tidy and use llvm:: types when possible
- [x] Add optional code location arguments
- [x] Avoid multiple returns from one function

### offload-tblgen

See the included
[README](d80db06491/offload/new-api/API/README.md)
for more information on how the API definition and generation works. I'm
happy to answer any questions about it and plan to walk through it in a
future LLVM Offload call.

It should be noted that struct definitions have not been fully
implemented/tested as they aren't used by the initial API definitions,
but finishing that off in the future shouldn't be too much work.

The tablegen tooling has been designed to be easily extended with new
backends, using the classes in `RecordTypes.hpp` to abstract over the
tablegen records.

### New API

Previous discussions at the LLVM/Offload meeting have brought up the
need for a new API for exposing the functionality of the plugins. This
change introduces a very small subset of a new API, which is primarily
for testing the offload tooling and demonstrating how a new API can fit
into the existing code base without being too disruptive. Exact designs
for these entry points and future additions can be worked out over time.

The new API does however introduce the bare minimum functionality to
implement device discovery for Unified Runtime and SYCL. This means that
the `urinfo` and `sycl-ls` tools can be used on top of Offload. A
(rough) implementation of a Unified Runtime adapter (aka plugin) for
Offload is available
[here](https://github.com/callumfare/unified-runtime/tree/offload_adapter).
Our intention is to maintain this and use it to implement and test
Offload API changes with SYCL.

### Demoing the new API

```sh
$ git clone -b offload_adapter https://github.com/callumfare/unified-runtime.git
$ cd unified-runtime
$ mkdir build
$ cd build
$ cmake .. -GNinja -DUR_BUILD_ADAPTER_OFFLOAD=ON \
    -DUR_OFFLOAD_INSTALL_DIR=<offload build dir containing liboffload_new.so> \
    -DUR_OFFLOAD_INCLUDE_DIR=<offload build dir containing 'offload' headers directory>
$ ninja urinfo
export LD_LIBRARY_PATH=<offload build dir containing offload plugin libraries>
$ UR_ADAPTERS_FORCE_LOAD=$PWD/lib/libur_adapter_offload.so ./bin/urinfo
[cuda:gpu][cuda:0] CUDA, NVIDIA GeForce GT 1030  [12030]
# Demo with tracing
$ OFFLOAD_TRACE=1 UR_ADAPTERS_FORCE_LOAD=$PWD/lib/libur_adapter_offload.so ./bin/urinfo
---> offloadPlatformGet(.NumEntries = 0, .phPlatforms = {}, .pNumPlatforms = 0x7ffd05e4d6e0 (2))-> OFFLOAD_RESULT_SUCCESS
---> offloadPlatformGet(.NumEntries = 2, .phPlatforms = {0x564bf4040220, 0x564bf4040240}, .pNumPlatforms = nullptr)-> OFFLOAD_RESULT_SUCCESS
...
```


### Open questions and future work
* The new API is implemented in a separate library
(`liboffload_new.so`). It could just as easily be part of the existing
`libomptarget` library - I have no strong feelings on which is better.
* Only some of the available device info is exposed, and not all the
possible device queries needed for SYCL are implemented by the plugins.
A sensible next step would be to refactor and extend the existing device
info queries in the plugins. The existing info queries are all strings,
but the new API introduces the ability to return any arbitrary type.
* It may be sensible at some point for the plugins to implement the new
API directly, and the higher level code on top of it could be made
generic, but this is more of a long-term possibility.
2024-11-25 11:34:14 -06:00
Joseph Huber
b4d49fb52e
[libc] Remove RPC server API and use the header directly (#117075)
Summary:
This patch removes much of the `llvmlibc_rpc_server` interface. This
pretty much deletes all of this code and just replaces it with including
`rpc.h` directly. We still maintain the file to let `libc` handle the
opcodes, since those depend on the `printf` impelmentation.

This will need to be cleaned up more, but I don't want to put too much
into a single patch.
2024-11-25 07:13:28 -06:00