OpenACC 3.3-NEXT has changed the way tags for copy, copyin, copyout, and
create clauses are specified, and end up adding a few extras, and
permits them as a list. This patch encodes these as bitmask enum so
they can be stored succinctly, but still diagnose reasonably.
This introduces a new class 'UnsignedOrNone', which models a lite
version of `std::optional<unsigned>`, but has the same size as
'unsigned'.
This replaces most uses of `std::optional<unsigned>`, and similar
schemes utilizing 'int' and '-1' as sentinel.
Besides the smaller size advantage, this is simpler to serialize, as its
internal representation is a single unsigned int as well.
This is the last item of the OpenACC 3.3 spec. It includes the
implicit-name version of 'routine', plus significant refactorings to
make the two work together. The implicit name version is represented as
an attribute on the function call. This patch also implements the
clauses for the implicit-name version, as well as the A.3.4 warning.
The 'bind' clause allows the renaming of a function during code
generation. There are a few rules about when this can/cannot happen,
and it takes either a string or identifier (previously mis-implemetned
as ID-expression) argument.
Note there are additional rules to this in the implicit-function routine
case, but that isn't implemented in this patch, as implicit-function
routine is not yet implemented either.
'nohost' is only valid on routine, and states that the compiler
shouldn't compile this routine for the host. It has no arguments, so no
checking is required besides putting it in the AST.
The 'routine' construct has two forms, one which takes the name of a
function that it applies to, and another where it implicitly figures it
out based on the next declaration. This patch implements the former with
the required restrictions on the name and the function-static-variables
as specified.
What has not been implemented is any clauses for this, any of the A.3.4
warnings, or the other form.
This statement level construct takes no clauses and has no associated
statement, and simply labels a number of array elements as valid for
caching. The implementation here is pretty simple, but it is a touch of
a special case for parsing, so the parsing code reflects that.
The 'declare' construct is the first of two 'declaration' level
constructs, so it is legal in any place a declaration is, including as a
statement, which this accomplishes by wrapping it in a DeclStmt. All
clauses on this have a 'same scope' requirement, which this enforces as
declaration context instead, which makes it possible to implement these
as a template.
The 'link' and 'device_resident' clauses are also added, which have some
similar/small restrictions, but are otherwise pretty rote.
This patch implements all of the above.
This patch allows using fpfeatures pragmas with __builtin_convertvector:
- added TrailingObjects with FPOptionsOverride and methods for handling
it to ConvertVectorExpr
- added support for codegen, node dumping, and serialization of
fpfeatures contained in ConvertVectorExpr
This commit adds "instantiated_from" to the AST dump for EnumDecl,
improving consistency with CXXRecordDecl and FunctionDecl, which also
include this information. To achieve this, TextNodeDumper::VisitEnumDecl
is updated with analogous lines found in
TextNodeDumper::VisitFunctionDecl and
TextNodeDumper::VisitCXXRecordDecl.
Class templates might be only instantiated when they are required to be
complete, but checking the template args against the primary template is
immediate.
This result is cached so that later when the class is instantiated,
checking against the primary template is not repeated.
The 'MatchedPackOnParmToNonPackOnArg' flag is also produced upon
checking against the primary template, so it needs to be cached in the
specialziation as well.
This fixes a bug which has not been in any release, so there are no
release notes.
Fixes#125290
The atomic construct is a particularly complicated one. The directive
itself is pretty simple, it has 5 options for the 'atomic-clause'.
However, the associated statement is fairly complicated.
'read' accepts:
v = x;
'write' accepts:
x = expr;
'update' (or no clause) accepts:
x++;
x--;
++x;
--x;
x binop= expr;
x = x binop expr;
x = expr binop x;
'capture' accepts either a compound statement, or:
v = x++;
v = x--;
v = ++x;
v = --x;
v = x binop= expr;
v = x = x binop expr;
v = x = expr binop x;
IF 'capture' has a compound statement, it accepts:
{v = x; x binop= expr; }
{x binop= expr; v = x; }
{v = x; x = x binop expr; }
{v = x; x = expr binop x; }
{x = x binop expr ;v = x; }
{x = expr binop x; v = x; }
{v = x; x = expr; }
{v = x; x++; }
{v = x; ++x; }
{x++; v = x; }
{++x; v = x; }
{v = x; x--; }
{v = x; --x; }
{x--; v = x; }
{--x; v = x; }
While these are all quite complicated, there is a significant amount
of similarity between the 'capture' and 'update' lists, so this patch
reuses a lot of the same functions.
This patch implements the entirety of 'atomic', creating a new Sema file
for the sema for it, as it is fairly sizable.
Note that PointerUnion::dyn_cast has been soft deprecated in
PointerUnion.h:
// FIXME: Replace the uses of is(), get() and dyn_cast() with
// isa<T>, cast<T> and the llvm::dyn_cast<T>
Literal migration would result in dyn_cast_if_present (see the
definition of PointerUnion::dyn_cast), but this patch uses dyn_cast
because we expect C to be nonnull.
This reverts commit f949f876daeda520a5b7dbeb2cbb35b8c4383acb.
This commit introduces an llvm_unreachable call that is actually
reachable. I posted a reproducer on the pull request discussion.
ast-print: A DeclRef to an anonymous NTTP will print
'value-parameter-DEPTH-INDEX',
similar to how type parameters are printed.
ast-dump: A bareDeclRef to an anonymous entity will print some extra
identifying information,
instead of an empty name, like indexes.
Falls back to source locations if nothing else is available.
These two clauses just take a 'var-list' and specify where the variables
should be copied from/to. This patch implements the AST nodes for them
and ensures they properly take a var-list.
This executable construct has a larger list of clauses than some of the
others, plus has some additional restrictions. This patch implements
the AST node, plus the 'cannot be the body of a if, while, do, switch,
or label' statement restriction. Future patches will handle the
rest of the restrictions, which are based on clauses.
A fairly simple one, only valid on the 'set' construct, this clause
takes an int expression. Most of the work was already done as a part of
parsing, so this patch ends up being a lot of infrastructure.
The 'set' construct is another fairly simple one, it doesn't have an
associated statement and only a handful of allowed clauses. This patch
implements it and all the rules for it, allowing 3 of its for clauses.
The only exception is default_async, which will be implemented in a
future patch, because it isn't just being enabled, it needs a complete
new implementation.
This is a very simple sema implementation, and just required AST node
plus the existing diagnostics. This patch adds tests and adds the AST
node required, plus enables it for 'init' and 'shutdown' (only!)
These two constructs are very simple and similar, and only support 3
different clauses, two of which are already implemented. This patch
adds AST nodes for both constructs, and leaves the device_num clause
unimplemented, but enables the other two.
The arguments to this are the same as for the 'wait' clause, so this
reuses all of that infrastructure. So all this has to do is support a
pair of clauses that are already implemented (if and async), plus create
an AST node. This patch does so, and adds proper testing.
This is a clause that is only valid on 'host_data' constructs, and
identifies variables which it should use the current device address.
From a Sema perspective, the only thing novel here is mild changes to
how ActOnVar works for this clause, else this is very much like the rest
of the 'var-list' clauses.
'delete' is another clause that has very little compile-time
implication, but needs a full AST that takes a var list. This patch
ipmlements it fully, plus adds sufficient test coverage.
This is another new clause specific to 'exit data' that takes a pointer
argument. This patch implements this the same way we do a few other
clauses (like attach) that have the same restrictions.
The 'if_present' clause controls the replacement of addresses in the
var-list in current device memory. This clause can only go on
'host_device'. From a Sema perspective, there isn't anything to do
beyond add this to AST and pass it on.
This is a very simple clause as far as sema is concerned. It is only
valid on 'exit data', and doesn't have any rules involving it, so it is
simply applied and passed onto the MLIR.
These constructs are all very similar and closely related, so this patch
creates the AST nodes for them, serialization, printing/etc.
Additionally the restrictions are all added as tests/todos in the tests,
as those will have to be implemented once we get those clauses implemented.
Combined constructs (OpenACC 3.3 section 2.11) are a short-cut for
writing a `loop` construct immediately inside of a `compute` construct.
However, this interaction requires we do additional work to ensure that
we get the semantics between the two correct, as well as diagnostics.
This patch adds the semantic analysis for the constructs (but no
clauses), as well as the AST nodes.
After implementing 'loop', we determined that the link to its parent
only ever uses the type, not the construct itself. This patch removes
it, as it is both a waste and causes problems with serialization.
The 'vector' clause specifies the iterations to be executed in vector or
SIMD mode. There are some limitations on which associated compute
contexts may be associated with this and have arguments, but otherwise
this is a fairly unrestricted clause.
It DOES have region limits like 'gang' and 'worker'.
The worker clause specifies iterations of the loop/ that are executed in
parallel by distributing the iterations among the multiple works within
a single gang.
The sema rules for this type are simply that it cannot be combined with
a `kernel` construct with a `num_workers` clause, child `loop` clauses
cannot contain a `gang` or `worker` clause, and that the argument is oly
allowed when associated with a `kernel`.
The 'gang' clause is used to specify parallel execution of loops, thus
has some complicated rules depending on the 'loop's associated compute
construct. This patch implements all of those.
The 'tile' clause shares quite a bit of the rules with 'collapse', so a
followup patch will add those tests/behaviors. This patch deals with
adding the AST node.
The 'tile' clause takes a series of integer constant expressions, or *.
The asterisk is now represented by a new OpenACCAsteriskSizeExpr node,
else this clause is very similar to others.
The 'collapse' clause on a 'loop' construct is used to specify how many
nested loops are associated with the 'loop' construct. It takes an
optional 'force' tag, and an integer constant expression as arguments.
There are many other restrictions based on the contents of the loop/etc,
but those are implemented in followup patches, for now, this patch just
adds the AST node and does basic argument checking on the loop-count.
This extends default argument deduction to cover class templates as
well, applying only to partial ordering, adding to the provisional
wording introduced in https://github.com/llvm/llvm-project/pull/89807.
This solves some ambuguity introduced in P0522 regarding how template
template parameters are partially ordered, and should reduce the
negative impact of enabling `-frelaxed-template-template-args` by
default.
Given the following example:
```C++
template <class T1, class T2 = float> struct A;
template <class T3> struct B;
template <template <class T4> class TT1, class T5> struct B<TT1<T5>>; // #1
template <class T6, class T7> struct B<A<T6, T7>>; // #2
template struct B<A<int>>;
```
Prior to P0522, `#2` was picked. Afterwards, this became ambiguous. This
patch restores the pre-P0522 behavior, `#2` is picked again.
HLSL output parameters are denoted with the `inout` and `out` keywords
in the function declaration. When an argument to an output parameter is
constructed a temporary value is constructed for the argument.
For `inout` pamameters the argument is initialized via copy-initialization
from the argument lvalue expression to the parameter type. For `out`
parameters the argument is not initialized before the call.
In both cases on return of the function the temporary value is written
back to the argument lvalue expression through an implicit assignment
binary operator with casting as required.
This change introduces a new HLSLOutArgExpr ast node which represents
the output argument behavior. The OutArgExpr has three defined children:
- An OpaqueValueExpr of the argument lvalue expression.
- An OpaqueValueExpr of the copy-initialized parameter.
- A BinaryOpExpr assigning the first with the value of the second.
Fixes#87526
---------
Co-authored-by: Damyan Pepper <damyanp@microsoft.com>
Co-authored-by: John McCall <rjmccall@gmail.com>
In 4198576157bfd0d08c08b784220d6132b709ae2c, we add support for dumping
builtin name for AtomicExpr in JSON dump. This change syncs
`TextNodeDumper` with `JSONNodeDumper`, makes `TextNodeDumper` learned
to dump builtin name for AtomicExpr.
Currently, `NamespaceDecl` has a member `AnonOrFirstNamespaceAndFlags`
which stores a few pieces of data:
- a bit indicating whether the namespace was declared `inline`, and
- a bit indicating whether the namespace was declared as a
_nested-namespace-definition_, and
- a pointer a `NamespaceDecl` that either stores:
- a pointer to the first declaration of that namespace if the
declaration is no the first declaration, or
- a pointer to the unnamed namespace that inhabits the namespace
otherwise.
`Redeclarable` already stores a pointer to the first declaration of an
entity, so it's unnecessary to store this in `NamespaceDecl`.
`DeclContext` has 8 bytes in which various bitfields can be stored for a
declaration, so it's not necessary to store these in `NamespaceDecl`
either. We only need to store a pointer to the unnamed namespace that
inhabits the first declaration of a namespace. This patch moves the two
bits currently stored in `NamespaceDecl` to `DeclContext`, and only
stores a pointer to the unnamed namespace that inhabits a namespace in
the first declaration of that namespace. Since `getOriginalNamespace`
always returns the same `NamespaceDecl` as `getFirstDecl`, this function
is removed to avoid confusion.
This commit implements the entirety of the now-accepted [N3017
-Preprocessor
Embed](https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3017.htm) and
its sister C++ paper [p1967](https://wg21.link/p1967). It implements
everything in the specification, and includes an implementation that
drastically improves the time it takes to embed data in specific
scenarios (the initialization of character type arrays). The mechanisms
used to do this are used under the "as-if" rule, and in general when the
system cannot detect it is initializing an array object in a variable
declaration, will generate EmbedExpr AST node which will be expanded by
AST consumers (CodeGen or constant expression evaluators) or expand
embed directive as a comma expression.
This reverts commit
682d461d5a.
---------
Co-authored-by: The Phantom Derpstorm <phdofthehouse@gmail.com>
Co-authored-by: Aaron Ballman <aaron@aaronballman.com>
Co-authored-by: cor3ntin <corentinjabot@gmail.com>
Co-authored-by: H. Vetinari <h.vetinari@gmx.com>