ByVal arguments and Swifterror require special handling in the coroutine
passes. The goal of this section is to provide a description of how
these parameter attributes are handled.
Update the llvm/docs/Coroutines.rst docs to include a full description
of Custom ABI objects. This documentation describes the how ABI objects
allow users (plugin libraries) to create custom ABI objects for their
needs.
This patch is episode two of the coroutine HALO improvement project
published on discourse:
https://discourse.llvm.org/t/language-extension-for-better-more-deterministic-halo-for-c-coroutines/80044
Previously CoroElide depends on inlining, and its analysis does not work
very well with code generated by the C++ frontend due the existence of
many customization points. There has been issue reported to upstream how
ineffective the original CoroElide was in real world applications.
For C++ users, this set of patches aim to fix this problem by providing
library authors and users deterministic HALO behaviour for some
well-behaved coroutine `Task` types. The stack begins with a library
side attribute on the `Task` class that guarantees no unstructured
concurrency when coroutines are awaited directly with `co_await`ed as a
prvalue. This attribute on Task types gives us lifetime guarantees and
makes C++ FE capable to telling the ME which coroutine calls are
elidable. We convey such information from FE through the attribute
`coro_elide_safe`.
This patch modifies CoroSplit to create a variant of the coroutine ramp
function that 1) does not use heap allocated frame, instead take an
additional parameter as the pointer to the frame. Such parameter is
attributed with `dereferenceble` and `align` to convey size and align
requirements for the frame. 2) always stores cleanup instead of destroy
address for `coro.destroy()` actions.
In a later patch, we will have a new pass that runs right after
CoroSplit to find usages of the callee coroutine attributed
`coro_elide_safe` in presplit coroutine callers, allocates the frame on
its "stack", transform those usages to call the `noalloc` ramp function
variant.
(note I put quotes on the word "stack" here, because for presplit
coroutine, any alloca will be spilled into the frame when it's being
split)
The C++ Frontend attribute implementation that works with this change
can be found at https://github.com/llvm/llvm-project/pull/99282
The pass that makes use of the new `noalloc` split can be found at
https://github.com/llvm/llvm-project/pull/99285
The C++ standard requires that symmetric transfer from one coroutine to
another is performed via a tail call. Failure to do so is a miscompile
and often breaks programs by quickly overflowing the stack.
Until now, the coro split pass tried to ensure this in the
`addMustTailToCoroResumes()` function by searching for
`llvm.coro.resume` calls to lower as tail calls if the conditions were
right: the right function arguments, attributes, calling convention
etc., and if a `ret void` was sure to be reached after traversal with
some ad-hoc constant folding following the call.
This was brittle, as the kind of implicit variants required for a tail
call to happen could easily be broken by other passes (e.g. if some
instruction got in between the `resume` and `ret`), see for example
9d1cb18d19862fc0627e4a56e1e491a498e84c71 and
284da049f5feb62b40f5abc41dda7895e3d81d72.
Also the logic seemed backwards: instead of searching for possible tail
call candidates and doing them if the circumstances are right, it seems
better to start with the intention of making the tail calls we need, and
forcing the circumstances to be right.
Now that we have the `llvm.coro.await.suspend.handle` intrinsic (since
f78688134026686288a8d310b493d9327753a022) which corresponds exactly to
symmetric transfer, change the lowering of that to also include the
`resume` part, always lowered as a tail call.
Implement `llvm.coro.await.suspend` intrinsics, to deal with performance
regression after prohibiting `.await_suspend` inlining, as suggested in
#64945.
Actually, there are three new intrinsics, which directly correspond to
each of three forms of `await_suspend`:
```
void llvm.coro.await.suspend.void(ptr %awaiter, ptr %frame, ptr @wrapperFunction)
i1 llvm.coro.await.suspend.bool(ptr %awaiter, ptr %frame, ptr @wrapperFunction)
ptr llvm.coro.await.suspend.handle(ptr %awaiter, ptr %frame, ptr @wrapperFunction)
```
There are three different versions instead of one, because in `bool`
case it's result is used for resuming via a branch, and in
`coroutine_handle` case exceptions from `await_suspend` are handled in
the coroutine, and exceptions from the subsequent `.resume()` are
propagated to the caller.
Await-suspend block is simplified down to intrinsic calls only, for
example for symmetric transfer:
```
%id = call token @llvm.coro.save(ptr null)
%handle = call ptr @llvm.coro.await.suspend.handle(ptr %awaiter, ptr %frame, ptr @wrapperFunction)
call void @llvm.coro.resume(%handle)
%result = call i8 @llvm.coro.suspend(token %id, i1 false)
switch i8 %result, ...
```
All await-suspend logic is moved out into a wrapper function, generated
for each suspension point.
The signature of the function is `<type> wrapperFunction(ptr %awaiter,
ptr %frame)` where `<type>` is one of `void` `i1` or `ptr`, depending on
the return type of `await_suspend`.
Intrinsic calls are lowered during `CoroSplit` pass, right after the
split.
Because I'm new to LLVM, I'm not sure if the helper function generation,
calls to them and lowering are implemented in the right way, especially
with regard to various metadata and attributes, i. e. for TBAA. All
things that seemed questionable are marked with `FIXME` comments.
There is another detail: in case of symmetric transfer raw pointer to
the frame of coroutine, that should be resumed, is returned from the
helper function and a direct call to `@llvm.coro.resume` is generated.
C++ standard demands, that `.resume()` method is evaluated. Not sure how
important is this, because code has been generated in the same way
before, sans helper function.
Close https://github.com/llvm/llvm-project/issues/56980.
This patch tries to introduce a light-weight optimization attribute for
coroutines which are guaranteed to only be destroyed after it reached
the final suspend.
The rationale behind the patch is simple. See the example:
```C++
A foo() {
dtor d;
co_await something();
dtor d1;
co_await something();
dtor d2;
co_return 43;
}
```
Generally the generated .destroy function may be:
```C++
void foo.destroy(foo.Frame *frame) {
switch(frame->suspend_index()) {
case 1:
frame->d.~dtor();
break;
case 2:
frame->d.~dtor();
frame->d1.~dtor();
break;
case 3:
frame->d.~dtor();
frame->d1.~dtor();
frame->d2.~dtor();
break;
default: // coroutine completed or haven't started
break;
}
frame->promise.~promise_type();
delete frame;
}
```
Since the compiler need to be ready for all the cases that the coroutine
may be destroyed in a valid state.
However, from the user's perspective, we can understand that certain
coroutine types may only be destroyed after it reached to the final
suspend point. And we need a method to teach the compiler about this.
Then this is the patch. After the compiler recognized that the
coroutines can only be destroyed after complete, it can optimize the
above example to:
```C++
void foo.destroy(foo.Frame *frame) {
frame->promise.~promise_type();
delete frame;
}
```
I spent a lot of time experimenting and experiencing this in the
downstream. The numbers are really good. In a real-world coroutine-heavy
workload, the size of the build dir (including .o files) reduces 14%.
And the size of final libraries (excluding the .o files) reduces 8% in
Debug mode and 1% in Release mode.
When dealing with short-circuiting coroutines (e.g. expected), the
deferred calls that resolve the get_return_object are currently being
emitted after we delete the coroutine frame.
This was caught by ASAN when using optimizations -O1 and above:
optimizations after inlining would place the __coro_gro in the heap, and
subsequent delete of the coroframe followed by the conversion -> BOOM.
This patch forbids the GRO to be placed in the coroutine frame, by
adding a new metadata node that can be attached to `alloca`
instructions.
Fix#49843
One of the main user of these kind of coroutines is swift. There yield-once (`retcon.once`) coroutines are used to temporary "expose" pointers to internal fields of various objects creating borrow scopes.
However, in some cases it might be useful also to allow these coroutines to produce a normal result, but there is no convenient way to represent this (as compared to switched-resume kind of coroutines where C++ `co_return`
is transformed to a member / callback call on promise object).
The extension is simple: we allow continuation function to have a non-void result and accept optional extra arguments via a special `llvm.coro.end.result` intrinsic that would essentially forward them as normal results.
There is a note in coroutines.rst to say the coroutine intrinsics are
WIP. This is not close to the status quo. The status of LLVM coroutines
have been into maintaining stage. So it looks better to remove the WIP
note in the document. The warning to compatibility remains there.
To solve the readnone problems in coroutines. See
https://discourse.llvm.org/t/address-thread-identification-problems-with-coroutine/62015
for details.
According to the discussion, we decide to fix the problem by inserting
isPresplitCoroutine() checks in different passes instead of
wrapping/unwrapping readnone attributes in CoroEarly/CoroCleanup passes.
In this direction, we might not be able to cover every case at first.
Let's take a "find and fix" strategy.
Reviewed By: nikic, nhaehnle, jyknight
Differential Revision: https://reviews.llvm.org/D127383
It is illegal to merge two `llvm.coro.save` calls unless their
`llvm.coro.suspend` users are also merged. Marks it "nomerge" for
the moment.
This reverts D129025.
Alternative to D129025, which affects other token type users like WinEH.
Reviewed By: ChuanqiXu
Differential Revision: https://reviews.llvm.org/D129530
It is a known problem that we can't align the switch-based coroutine
frame if the alignment exceeds std::max_align_t (which is 16 usually).
We could solve the problem on the middle-end by dynamically transforming
or in the frontend by emitting aligned allocation function.
If we need to solve it in the frontend, the middle end need to offer an
intrinsic to tell the alignment at least. This patch tries to offer such
an intrinsic called llvm.coro.align.
Reviewed By: https://reviews.llvm.org/D117542
Differential revision: https://reviews.llvm.org/D117542
This fixes bug49264.
Simply, coroutine shouldn't be inlined before CoroSplit. And the marker
for pre-splited coroutine is created in CoroEarly pass, which ran after
AlwaysInliner Pass in O0 pipeline. So that the AlwaysInliner couldn't
detect it shouldn't inline a coroutine. So here is the error.
This patch set the presplit attribute in clang and mlir. So the inliner
would always detect the attribute before splitting.
Reviewed By: rjmccall, ezhulenev
Differential Revision: https://reviews.llvm.org/D115790
According to [dcl.fct.def.coroutine]/p14:
> If the evaluation of the expression promise.unhandled_exception()
> exits via an exception, the coroutine is considered suspended at the
> final suspend point.
But this is not implemented in clang before. This patch would implement
this feature by marking the coroutine as done at the place of
coro.end(frame, /*InUnwindPath=*/true ).
Reviewed By: rjmccall
Differential Revision: https://reviews.llvm.org/D115219
I found that the coroutine intrinsic llvm.coro.param in documentation
(https://llvm.org/docs/Coroutines.html#id101) didn't get used actually
since there isn't lowering codes in LLVM. I also checked the
implementation of libstdc++ and libc++. Both of them didn't use
llvm.coro.param. So I am pretty sure that the llvm.coro.param intrinsic
is unused. I think it would be better t to remove it to avoid possible
misleading understandings.
Note: according to [class.copy.elision]/p1.3, this optimization is
allowed by the C++ language specification. Let's make it someday.
Reviewed By: rjmccall
Differential Revision: https://reviews.llvm.org/D115222
See pr46990(https://bugs.llvm.org/show_bug.cgi?id=46990). LICM should not sink store instructions to loop exit blocks which cross coro.suspend intrinsics. This breaks semantic of coro.suspend intrinsic which return to caller directly. Also this leads to use-after-free if the coroutine is freed before control returns to the caller in multithread environment.
This patch disable promotion by check whether loop contains coro.suspend intrinsics.
This is a resubmit of D86190.
Disabling LICM for loops with coroutine suspension is a better option not only for correctness purpose but also for performance purpose.
In most cases LICM sinks memory operations. In the case of coroutine, sinking memory operation out of the loop does not improve performance since coroutien needs to get data from the frame anyway. In fact LICM would hurt coroutine performance since it adds more entries to the frame.
Differential Revision: https://reviews.llvm.org/D96928
The llvm.coro.end.async intrinsic allows to specify a function that is
to be called as the last action before returning. This function will be
inlined after coroutine splitting.
This function can contain a 'musttail' call to allow for guaranteed tail
calling as the last action.
Differential Revision: https://reviews.llvm.org/D93568
The `llvm.coro.suspend.async` intrinsic takes a function pointer as its
argument that describes how-to restore the current continuation's
context from the context argument of the continuation function. Before
we assumed that the current context can be restored by loading from the
context arguments first pointer field (`first_arg->caller_context`).
This allows for defining suspension points that reuse the current
context for example.
Also:
llvm.coro.id.async lowering: Add llvm.coro.preprare.async intrinsic
Blocks inlining until after the async coroutine was split.
Also, change the async function pointer's context size position
struct async_function_pointer {
uint32_t relative_function_pointer_to_async_impl;
uint32_t context_size;
}
And make the position of the `async context` argument configurable. The
position is specified by the `llvm.coro.id.async` intrinsic.
rdar://70097093
Differential Revision: https://reviews.llvm.org/D90783
This patch adds the `async` lowering of coroutines.
This will be used by the Swift frontend to lower async functions. In
contrast to the `retcon` lowering the frontend needs to be in control
over control-flow at suspend points as execution might be suspended at
these points.
This is very much work in progress and the implementation will change as
it evolves with the frontend. As such the documentation is lacking
detail as some of it might change.
rdar://70097093
Reapply with fix for memory sanitizer failure and sphinx failure.
Differential Revision: https://reviews.llvm.org/D90612
This patch adds the `async` lowering of coroutines.
This will be used by the Swift frontend to lower async functions. In
contrast to the `retcon` lowering the frontend needs to be in control
over control-flow at suspend points as execution might be suspended at
these points.
This is very much work in progress and the implementation will change as
it evolves with the frontend. As such the documentation is lacking
detail as some of it might change.
rdar://70097093
Differential Revision: https://reviews.llvm.org/D90612
This has already been done by @rjmccall in D76526 (49e5a97ec363), and
9514c048d89e. We should remove this from the docs.
Differential Revision: https://reviews.llvm.org/D90550
A quick contrast of this ABI with the currently-implemented ABI:
- Allocation is implicitly managed by the lowering passes, which is fine
for frontends that are fine with assuming that allocation cannot fail.
This assumption is necessary to implement dynamic allocas anyway.
- The lowering attempts to fit the coroutine frame into an opaque,
statically-sized buffer before falling back on allocation; the same
buffer must be provided to every resume point. A buffer must be at
least pointer-sized.
- The resume and destroy functions have been combined; the continuation
function takes a parameter indicating whether it has succeeded.
- Conversely, every suspend point begins its own continuation function.
- The continuation function pointer is directly returned to the caller
instead of being stored in the frame. The continuation can therefore
directly destroy the frame when exiting the coroutine instead of having
to leave it in a defunct state.
- Other values can be returned directly to the caller instead of going
through a promise allocation. The frontend provides a "prototype"
function declaration from which the type, calling convention, and
attributes of the continuation functions are taken.
- On the caller side, the frontend can generate natural IR that directly
uses the continuation functions as long as it prevents IPO with the
coroutine until lowering has happened. In combination with the point
above, the frontend is almost totally in charge of the ABI of the
coroutine.
- Unique-yield coroutines are given some special treatment.
llvm-svn: 368788
Summary:
A recent addition to Coroutines TS (https://wg21.link/p0913) adds a pre-defined coroutine noop_coroutine that does nothing.
To implement this feature, we implemented an llvm.coro.noop intrinsic that returns a coroutine handle to a coroutine that does nothing when resumed or destroyed.
Reviewers: EricWF, modocache, rnk, lewissbaker
Reviewed By: modocache
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D45114
llvm-svn: 328986
Summary:
The purpose of coro.end intrinsic is to allow frontends to mark the cleanup and
other code that is only relevant during the initial invocation of the coroutine
and should not be present in resume and destroy parts.
In landing pads coro.end is replaced with an appropriate instruction to unwind to
caller. The handling of coro.end differs depending on whether the target is
using landingpad or WinEH exception model.
For landingpad based exception model, it is expected that frontend uses the
`coro.end`_ intrinsic as follows:
```
ehcleanup:
%InResumePart = call i1 @llvm.coro.end(i8* null, i1 true)
br i1 %InResumePart, label %eh.resume, label %cleanup.cont
cleanup.cont:
; rest of the cleanup
eh.resume:
%exn = load i8*, i8** %exn.slot, align 8
%sel = load i32, i32* %ehselector.slot, align 4
%lpad.val = insertvalue { i8*, i32 } undef, i8* %exn, 0
%lpad.val29 = insertvalue { i8*, i32 } %lpad.val, i32 %sel, 1
resume { i8*, i32 } %lpad.val29
```
The `CoroSpit` pass replaces `coro.end` with ``True`` in the resume functions,
thus leading to immediate unwind to the caller, whereas in start function it
is replaced with ``False``, thus allowing to proceed to the rest of the cleanup
code that is only needed during initial invocation of the coroutine.
For Windows Exception handling model, a frontend should attach a funclet bundle
referring to an enclosing cleanuppad as follows:
```
ehcleanup:
%tok = cleanuppad within none []
%unused = call i1 @llvm.coro.end(i8* null, i1 true) [ "funclet"(token %tok) ]
cleanupret from %tok unwind label %RestOfTheCleanup
```
The `CoroSplit` pass, if the funclet bundle is present, will insert
``cleanupret from %tok unwind to caller`` before
the `coro.end`_ intrinsic and will remove the rest of the block.
Reviewers: majnemer
Reviewed By: majnemer
Subscribers: llvm-commits, mehdi_amini
Differential Revision: https://reviews.llvm.org/D25543
llvm-svn: 297223
Summary:
[Coroutines] Part 9: Add cleanup subfunction.
This patch completes coroutine heap allocation elision. Now, the heap elision example from docs\Coroutines.rst compiles and produces expected result (see test/Transform/Coroutines/ex3.ll)
Intrinsic Changes:
* coro.free gets a token parameter tying it to coro.id to allow reliably discovering all coro.frees associated with a particular coroutine.
* coro.id gets an extra parameter that points back to a coroutine function. This allows to check whether a coro.id describes the enclosing function or it belongs to a different function that was later inlined.
CoroSplit now creates three subfunctions:
# f$resume - resume logic
# f$destroy - cleanup logic, followed by a deallocation code
# f$cleanup - just the cleanup code
CoroElide pass during devirtualization replaces coro.destroy with either f$destroy or f$cleanup depending whether heap elision is performed or not.
Other fixes, improvements:
* Fixed buglet in Shape::buildFrame that was not creating coro.save properly if coroutine has more than one suspend point.
* Switched to using variable width suspend index field (no longer limited to 32 bit index field can be as little as i1 or as large as i<whatever-size_t-is>)
Reviewers: majnemer
Subscribers: llvm-commits, mehdi_amini
Differential Revision: https://reviews.llvm.org/D23844
llvm-svn: 279971
Summary:
1. Make coroutine representation more robust against optimization that may duplicate instruction by introducing coro.id intrinsics that returns a token that will get fed into coro.alloc and coro.begin. Due to coro.id returning a token, it won't get duplicated and can be used as reliable indicator of coroutine identify when a particular coroutine call gets inlined.
2. Move last three arguments of coro.begin into coro.id as they will be shared if coro.begin will get duplicated.
3. doc + test + code updated to support the new intrinsic.
Reviewers: mehdi_amini, majnemer
Subscribers: mehdi_amini, llvm-commits
Differential Revision: https://reviews.llvm.org/D23412
llvm-svn: 278481
Summary:
A particular coroutine usage pattern, where a coroutine is created, manipulated and
destroyed by the same calling function, is common for coroutines implementing
RAII idiom and is suitable for allocation elision optimization which avoid
dynamic allocation by storing the coroutine frame as a static `alloca` in its
caller.
coro.free and coro.alloc intrinsics are used to indicate which code needs to be suppressed
when dynamic allocation elision happens:
```
entry:
%elide = call i8* @llvm.coro.alloc()
%need.dyn.alloc = icmp ne i8* %elide, null
br i1 %need.dyn.alloc, label %coro.begin, label %dyn.alloc
dyn.alloc:
%alloc = call i8* @CustomAlloc(i32 4)
br label %coro.begin
coro.begin:
%phi = phi i8* [ %elide, %entry ], [ %alloc, %dyn.alloc ]
%hdl = call i8* @llvm.coro.begin(i8* %phi, i32 0, i8* null,
i8* bitcast ([2 x void (%f.frame*)*]* @f.resumers to i8*))
```
and
```
%mem = call i8* @llvm.coro.free(i8* %hdl)
%need.dyn.free = icmp ne i8* %mem, null
br i1 %need.dyn.free, label %dyn.free, label %if.end
dyn.free:
call void @CustomFree(i8* %mem)
br label %if.end
if.end:
...
```
If heap allocation elision is performed, we replace coro.alloc with a static alloca on the caller frame and coro.free with null constant.
Also, we need to make sure that if there are any tail calls referencing the coroutine frame, we need to remote tail call attribute, since now coroutine frame lives on the stack.
Documentation and overview is here: http://llvm.org/docs/Coroutines.html.
Upstreaming sequence (rough plan)
1.Add documentation. (https://reviews.llvm.org/D22603)
2.Add coroutine intrinsics. (https://reviews.llvm.org/D22659)
3.Add empty coroutine passes. (https://reviews.llvm.org/D22847)
4.Add coroutine devirtualization + tests.
ab) Lower coro.resume and coro.destroy (https://reviews.llvm.org/D22998)
c) Do devirtualization (https://reviews.llvm.org/D23229)
5.Add CGSCC restart trigger + tests. (https://reviews.llvm.org/D23234)
6.Add coroutine heap elision + tests. <= we are here
7.Add the rest of the logic (split into more patches)
Reviewers: mehdi_amini, majnemer
Subscribers: mehdi_amini, llvm-commits
Differential Revision: https://reviews.llvm.org/D23245
llvm-svn: 278242
This adds boilerplate code for all coroutine passes,
the passes are no-ops for now.
Also, a small test has been added to verify that passes execute in
the expected order or not at all if coroutine support is disabled.
Patch by Gor Nishanov!
Differential Revision: https://reviews.llvm.org/D22847
llvm-svn: 277033
This is the second patch in the coroutine series. It adds coroutine
intrinsics and updates intrinsic cost in TargetTransformInfoImpl.h.
Patch by Gor Nishanov!
Differential Revision: https://reviews.llvm.org/D22659
llvm-svn: 276839