llvm-project/clang/lib/CodeGen/CGCUDARuntime.h
Yuxuan Chen e17a39bc31
[Clang] C++20 Coroutines: Introduce Frontend Attribute [[clang::coro_await_elidable]] (#99282)
This patch is the frontend implementation of the coroutine elide
improvement project detailed in this discourse post:
https://discourse.llvm.org/t/language-extension-for-better-more-deterministic-halo-for-c-coroutines/80044

This patch proposes a C++ struct/class attribute
`[[clang::coro_await_elidable]]`. This notion of await elidable task
gives developers and library authors a certainty that coroutine heap
elision happens in a predictable way.

Originally, after we lower a coroutine to LLVM IR, CoroElide is
responsible for analysis of whether an elision can happen. Take this as
an example:
```
Task foo();
Task bar() {
  co_await foo();
}
```
For CoroElide to happen, the ramp function of `foo` must be inlined into
`bar`. This inlining happens after `foo` has been split but `bar` is
usually still a presplit coroutine. If `foo` is indeed a coroutine, the
inlined `coro.id` intrinsics of `foo` is visible within `bar`. CoroElide
then runs an analysis to figure out whether the SSA value of
`coro.begin()` of `foo` gets destroyed before `bar` terminates.

`Task` types are rarely simple enough for the destroy logic of the task
to reference the SSA value from `coro.begin()` directly. Hence, the pass
is very ineffective for even the most trivial C++ Task types. Improving
CoroElide by implementing more powerful analyses is possible, however it
doesn't give us the predictability when we expect elision to happen.

The approach we want to take with this language extension generally
originates from the philosophy that library implementations of `Task`
types has the control over the structured concurrency guarantees we
demand for elision to happen. That is, the lifetime for the callee's
frame is shorter to that of the caller.

The ``[[clang::coro_await_elidable]]`` is a class attribute which can be
applied to a coroutine return type.

When a coroutine function that returns such a type calls another
coroutine function, the compiler performs heap allocation elision when
the following conditions are all met:
- callee coroutine function returns a type that is annotated with
``[[clang::coro_await_elidable]]``.
- In caller coroutine, the return value of the callee is a prvalue that
is immediately `co_await`ed.

From the C++ perspective, it makes sense because we can ensure the
lifetime of elided callee cannot exceed that of the caller if we can
guarantee that the caller coroutine is never destroyed earlier than the
callee coroutine. This is not generally true for any C++ programs.
However, the library that implements `Task` types and executors may
provide this guarantee to the compiler, providing the user with
certainty that HALO will work on their programs.

After this patch, when compiling coroutines that return a type with such
attribute, the frontend checks that the type of the operand of
`co_await` expressions (not `operator co_await`). If it's also
attributed with `[[clang::coro_await_elidable]]`, the FE emits metadata
on the call or invoke instruction as a hint for a later middle end pass
to elide the elision.

The original patch version is
https://github.com/llvm/llvm-project/pull/94693 and as suggested, the
patch is split into frontend and middle end solutions into stacked PRs.

The middle end CoroSplit patch can be found at
https://github.com/llvm/llvm-project/pull/99283
The middle end transformation that performs the elide can be found at
https://github.com/llvm/llvm-project/pull/99285
2024-09-08 23:08:58 -07:00

126 lines
3.8 KiB
C++

//===----- CGCUDARuntime.h - Interface to CUDA Runtimes ---------*- C++ -*-===//
//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//
//===----------------------------------------------------------------------===//
//
// This provides an abstract class for CUDA code generation. Concrete
// subclasses of this implement code generation for specific CUDA
// runtime libraries.
//
//===----------------------------------------------------------------------===//
#ifndef LLVM_CLANG_LIB_CODEGEN_CGCUDARUNTIME_H
#define LLVM_CLANG_LIB_CODEGEN_CGCUDARUNTIME_H
#include "clang/AST/GlobalDecl.h"
#include "llvm/ADT/StringRef.h"
#include "llvm/Frontend/Offloading/Utility.h"
#include "llvm/IR/GlobalValue.h"
namespace llvm {
class CallBase;
class Function;
class GlobalVariable;
}
namespace clang {
class CUDAKernelCallExpr;
class NamedDecl;
class VarDecl;
namespace CodeGen {
class CodeGenFunction;
class CodeGenModule;
class FunctionArgList;
class ReturnValueSlot;
class RValue;
class CGCUDARuntime {
protected:
CodeGenModule &CGM;
public:
// Global variable properties that must be passed to CUDA runtime.
class DeviceVarFlags {
public:
enum DeviceVarKind {
Variable, // Variable
Surface, // Builtin surface
Texture, // Builtin texture
};
private:
LLVM_PREFERRED_TYPE(DeviceVarKind)
unsigned Kind : 2;
LLVM_PREFERRED_TYPE(bool)
unsigned Extern : 1;
LLVM_PREFERRED_TYPE(bool)
unsigned Constant : 1; // Constant variable.
LLVM_PREFERRED_TYPE(bool)
unsigned Managed : 1; // Managed variable.
LLVM_PREFERRED_TYPE(bool)
unsigned Normalized : 1; // Normalized texture.
int SurfTexType; // Type of surface/texutre.
public:
DeviceVarFlags(DeviceVarKind K, bool E, bool C, bool M, bool N, int T)
: Kind(K), Extern(E), Constant(C), Managed(M), Normalized(N),
SurfTexType(T) {}
DeviceVarKind getKind() const { return static_cast<DeviceVarKind>(Kind); }
bool isExtern() const { return Extern; }
bool isConstant() const { return Constant; }
bool isManaged() const { return Managed; }
bool isNormalized() const { return Normalized; }
int getSurfTexType() const { return SurfTexType; }
};
CGCUDARuntime(CodeGenModule &CGM) : CGM(CGM) {}
virtual ~CGCUDARuntime();
virtual RValue
EmitCUDAKernelCallExpr(CodeGenFunction &CGF, const CUDAKernelCallExpr *E,
ReturnValueSlot ReturnValue,
llvm::CallBase **CallOrInvoke = nullptr);
/// Emits a kernel launch stub.
virtual void emitDeviceStub(CodeGenFunction &CGF, FunctionArgList &Args) = 0;
/// Check whether a variable is a device variable and register it if true.
virtual void handleVarRegistration(const VarDecl *VD,
llvm::GlobalVariable &Var) = 0;
/// Finalize generated LLVM module. Returns a module constructor function
/// to be added or a null pointer.
virtual llvm::Function *finalizeModule() = 0;
/// Returns function or variable name on device side even if the current
/// compilation is for host.
virtual std::string getDeviceSideName(const NamedDecl *ND) = 0;
/// Get kernel handle by stub function.
virtual llvm::GlobalValue *getKernelHandle(llvm::Function *Stub,
GlobalDecl GD) = 0;
/// Get kernel stub by kernel handle.
virtual llvm::Function *getKernelStub(llvm::GlobalValue *Handle) = 0;
/// Adjust linkage of shadow variables in host compilation.
virtual void
internalizeDeviceSideVar(const VarDecl *D,
llvm::GlobalValue::LinkageTypes &Linkage) = 0;
};
/// Creates an instance of a CUDA runtime class.
CGCUDARuntime *CreateNVCUDARuntime(CodeGenModule &CGM);
}
}
#endif