This patch adds `VisitBinAssign` and `VisitBinComma` to the ClangIR
`ScalarExprEmitter` to enable assignments and the comma operator.
---------
Co-authored-by: Morris Hafner <mhafner@nvidia.com>
`Sema::getCurFunctionDecl(AllowLambda = false)` returns a nullptr when
the lambda declaration is outside a function (for example, when
assigning a lambda to a static constexpr variable).
This triggered an assertion in
`SemaAMDGPU::CheckAMDGCNBuiltinFunctionCall`.
Using `Sema::getCurFunctionDecl(AllowLambda = true)` returns the
declaration of the enclosing lambda.
Stumbled with this issue when refactoring some code in CK.
With implicitly-built modules, seeing something like:
```
fatal error: module 'X' is defined in both '<cache>/HASH1/X-HASH2.pcm' and '<cache>/HASH1/X-HASH3.pcm'
```
is super confusing and not actionable, because the module cache tends to
be hidden from the developer.
This PR adds a note to that diagnostic that names the module map files
the PCM files were compiled from, hopefully giving a good enough hint
for further investigation:
```
note: compiled from '<build>/X.framework/Modules/module.modulemap' and '<SDK>/X.framework/Modules/module.modulemap'
```
(I had to replace the mechanism used to convert `DiagnosticError` into
something `DiagnosticsEngine` can understand, because it seemingly did
not support notes.)
This PR implements a CC1 flag `-dump-minimization-hints`.
The flag allows to specify a file path to dump ranges of deserialized
declarations in `ASTReader`. Example usage:
```
clang -Xclang=-dump-minimization-hints=/tmp/decls -c file.cc -o file.o
```
Example output:
```
// /tmp/decls
{
"required_ranges": [
{
"file": "foo.h",
"range": [
{
"from": {
"line": 26,
"column": 1
},
"to": {
"line": 27,
"column": 77
}
}
]
},
{
"file": "bar.h",
"range": [
{
"from": {
"line": 30,
"column": 1
},
"to": {
"line": 35,
"column": 1
}
},
{
"from": {
"line": 92,
"column": 1
},
"to": {
"line": 95,
"column": 1
}
}
]
}
]
}
```
Specifying the flag creates an instance of
`DeserializedDeclsSourceRangePrinter`, which dumps ranges of deserialized
declarations to aid debugging and bug minimization (we use is as input to [C-Vise](https://github.com/emaxx-google/cvise/tree/multifile-hints).
Required ranges are computed from source ranges of Decls.
`TranslationUnitDecl`, `LinkageSpecDecl` and `NamespaceDecl` are ignored
for the sake of this PR.
Technical details:
* `DeserializedDeclsSourceRangePrinter` implements `ASTConsumer` and
`ASTDeserializationListener`, so that an object of
`DeserializedDeclsSourceRangePrinter` registers as its own listener.
* `ASTDeserializationListener` interface provides the `DeclRead`
callback that we use to collect the deserialized Decls.
Printing or otherwise processing them as this point is dangerous, since
that could trigger additional deserialization and crash compilation.
* The collected Decls are processed in `HandleTranslationUnit` method of
`ASTConsumer`. This is a safe point, since we know that by this point
all the Decls needed by the compiler frontend have been deserialized.
* In case our processing causes further deserialization, `DeclRead` from
the listener might be called again. However, at that point we don't
accept any more Decls for processing.
Fix comparing type id pointers, add mor info when print()ing them, use
the most derived type in GetTypeidPtr() and the canonically unqualified
type when we know the type statically.
The NVVMReflect pass simply replaces calls to nvvm-reflect functions
with the appropriate constant, either the architecture number, or
nvvm-reflect-ftz, found in the module's metadata.
The implementation is inefficient and does this by traversing through
all instructions to find calls. The common case is that you never call
nvvm-reflect, so this traversal is costly.
This PR:
- Updates the pass so that it finds the reflect functions by name, and
then traverses through their uses to find the calls directly.
- Adds a line (245) to make sure the dead nvvm-reflect definitions are
erased.
- Adds the ability to set reflect values via command line
In the LLVM middle-end we want to fold `gep inbounds null, idx -> null`:
https://alive2.llvm.org/ce/z/5ZkPx-
This pattern is common in real-world programs
(https://github.com/dtcxzyw/llvm-opt-benchmark/pull/55#issuecomment-1870963906).
Generally, it exists in some (actually) unreachable blocks, which is
introduced by JumpThreading.
However, some old-style offsetof macros are still widely used in
real-world C/C++ code (e.g., hwloc/slurm/luajit). To avoid breaking
existing code and inconvenience to downstream users, this patch removes
the inbounds flag from the struct gep if the base pointer is null.
The 'set' lowering is pretty trivial. 'device_type' is a little more
restricted since both the MLIR-Dialect and language limit it to only 1
value (as confirmed by standards-discussion).
This patch implements 'set', with 'device_type', since 'set' requires at
least 1 clause, and this is the least difficult to implement at the
moment.
This is a basic implementation of P2719: "Type-aware allocation and
deallocation functions" described at http://wg21.link/P2719
The proposal includes some more details but the basic change in
functionality is the addition of support for an additional implicit
parameter in operators `new` and `delete` to act as a type tag. Tag is
of type `std::type_identity<T>` where T is the concrete type being
allocated. So for example, a custom type specific allocator for `int`
say can be provided by the declaration of
void *operator new(std::type_identity<int>, size_t, std::align_val_t);
void operator delete(std::type_identity<int>, void*, size_t, std::align_val_t);
However this becomes more powerful by specifying templated declarations,
for example
template <typename T> void *operator new(std::type_identity<T>, size_t, std::align_val_t);
template <typename T> void operator delete(std::type_identity<T>, void*, size_t, std::align_val_t););
Where the operators being resolved will be the concrete type being
operated over (NB. A completely unconstrained global definition as above
is not recommended as it triggers many problems similar to a general
override of the global operators).
These type aware operators can be declared as either free functions or
in class, and can be specified with or without the other implicit
parameters, with overload resolution performed according to the existing
standard parameter prioritisation, only with type parameterised
operators having higher precedence than non-type aware operators. The
only exception is destroying_delete which for reasons discussed in the
paper we do not support type-aware variants by default.
Fixes#112270
Completed ACs:
- `-res-may-alias` clang-dxc command-line option added
- It inserts and sets a module metadata flag `dx.resmayalias` to 1
- Shader flag set appropriately:
- The flag IS NOT set if DXIL Version <= 1.6 OR the command-line option
`-res-may-alias` is specified
- Otherwise the flag IS set when:
- DXIL Version > 1.7 AND function uses UAVs, OR
- DXIL Version <= 1.7 AND UAVs present globally
- Add tests
- Tests for Shader Models 6.6, 6.7, and 6.8 corresponding to DXIL
Versions 1.6, 1.7, and 1.8
- Tests (`res-may-alias-0.ll`/`res-may-alias-1.ll`) for when the module
metadata flag `dx.resmayalias` is set to 0 or 1 respectively
- A frontend test (`res-may-alias.hlsl`) for testing that that the
command-line option `-res-may-alias` inserts `dx.resmayalias` module
metadata correctly
This PR fixes a bug that when a template specialization is declared with
a forward declaration of a template, the checker fails to find its
definition in the same translation unit and erroneously emit an unsafe
forward declaration warning.
This PR adds the support for recognizing calling adoptCF/adoptNS on the
result of a cast operation on the return value of a function which
creates NS or CF types. It also fixes a bug that we weren't reporting
memory leaks when CF types are created without ever calling RetainPtr's
constructor, adoptCF, or adoptNS.
To do this, this PR adds a new mechanism to report a memory leak
whenever create or copy CF functions are invoked unless this CallExpr
has already been visited while validating a call to adoptCF. Also added
an early exit when isOwned returns IsOwnedResult::Skip due to an
unresolved template argument.
The recently announced IBM z17 processor implements the architecture
already supported as "arch15" in LLVM. This patch adds support for "z17"
as an alternate architecture name for arch15.
This patch also add the scheduler description for the z17 processor,
provided by Jonas Paulsson.
Discussions with the OpenACC Standard folks and the dialect folks showed
that the ability to have 'set' have a 'device_type' with more than one
architecture was a mistake, and one that will be fixed in future
revisions of the standard. Since the dialect requires this anyway,
we'll implement this in advance of standardization.
We execute tests in read only environment which leads to test failure
when tests try to write to the current directory. Either they should
write to a temporary directory or not write if output is not needed.
Fallback from #134717
…utdown'
This patch emits the lowering for 'device_type' on an 'init' or
'shutdown'. This one is fairly unique, as these directives have it as an
attribute, rather than as a component of the individual operands, like
the rest of the constructs.
So this patch implements the lowering as an attribute.
In order to do tis, a few refactorings had to happen: First, the
'emitOpenACCOp' functions needed to pick up th edirective kind/location
so that the NYI diagnostic could be reasonable.
Second, and most impactful, the `applyAttributes` function ends up
needing to encode some of the appertainment rules, thanks to the way the
OpenACC-MLIR operands get their attributes attached. Since they each use
a special function (rather than something that can be legalized at
runtime), the forms of 'setDefaultAttr' is only valid for some ops. SO
this patch uses some `if constexpr` and a small type-trait to help
legalize these.
In the 'single-file-parse' mode, seeing `#include UNDEFINED_IDENTIFIER`
should not be treated as an error. The identifier might be defined in a
header that we decided to skip, resulting in a nonsensical diagnostic
from the user point of view.
fixes#135122
SemaExpr.cpp - Make all doubles fail. Add sema support for float scalars
and vectors when language mode is HLSL.
CGExprScalar.cpp - Allow emit frem when language mode is HLSL.
This changes the TemplateArgument representation to hold a flag
indicating whether a template argument of expression type is supposed to
be canonical or not.
This gets one step closer to solving
https://github.com/llvm/llvm-project/issues/92292
This still doesn't try to unique as-written TSTs. While this would
increase the amount of memory savings and make code dealing with the AST
more well-behaved, profiling template argument lists is still too
expensive for this to be worthwhile, at least for now. Without this
uniquing, this patch stands neutral in terms of performance impact.
This also fixes the context creation of TSTs, so that they don't in some
cases get incorrectly flagged as sugar over their own canonical form.
This is captured in the test expectation change of some AST dumps.
This fixes some places which were unnecessarily canonicalizing these
TSTs.
This patch upstreams initial support for making function calls in CIR.
Function arguments and return values are not included to keep the patch
small for review.
Related to #132487
This feature largely models the same behavior as in C++11. It is
technically a breaking change between C99 and C11, so the paper is not
being backported to older language modes.
One difference between C++ and C is that things which are rvalues in C
are often lvalues in C++ (such as the result of a ternary operator or a
comma operator).
Fixes#96486
This is an alternative to
https://github.com/llvm/llvm-project/pull/122103
In SPIR-V, private global variables have the Private storage class. This
PR adds a new address space which allows frontend to emit variable with
this storage class when targeting this backend.
This is covered in this proposal: llvm/wg-hlsl@4c9e11a
This PR will cause addrspacecast to show up in several cases, like class
member functions or assignment. Those will have to be handled in the
backend later on, particularly to fixup pointer storage classes in some
functions.
Before this change, global variable were emitted with the 'Function'
storage class, which was wrong.
The Pointer class already has the capability to be a function pointer,
but we still classifed function pointers as PT_FnPtr/FunctionPointer.
This means when converting from a Pointer to a FunctionPointer, we lost
the information of what the original Pointer pointed to.
When copying unions, we need to only copy the active field of the source
union, which we were already doing. However, we also need to zero out
the (now) inactive fields, so we don't end up with dangling pointers in
those inactive fields.
This patch adds `memory(argmem: read, inaccessiblemem: readwrite)
mustprogress` to **recoverable** ubsan handlers in order to unblock some
memory/loop optimizations. It provides an average of 3% performance
improvement on llvm-test-suite (except for 49 test failures due to ubsan
diagnostics).
Closes https://github.com/llvm/llvm-project/issues/130093.
https://github.com/llvm/llvm-project/pull/120464 (and earlier CLs) added -fsanitize-merge functionality, which is intended to work for all "sanitizers". It is nearly correct for CFI.
This patch precommits some tests for CFI, to track the progress of future -fsanitize-merge fixes for CFI.
Fixes#112267
Implement the shader flag analysis to set the UseNativeLowPrecision DXIL
module flag.
The flag is only able to be set when the command-line flag
`-enable-16bit-types` is passed to clang-dxc, or equivalently
`-fnative-half-type` is passed to clang.
When the command-line flag is passed, a module metadata flag called
"dx.nativelowprec" is set to 1.
The DXILShaderFlags shader flags analysis checks that the module
metadata flag "dx.nativelowprec" is set to 1 and the DXIL Version is 1.2
or greater before setting the UseNativeLowPrecision DXIL module flag.
A NestedNameSpecifier of TypeSpec kind can be non-dependent even if its
prefix is dependent, when for example the prefix is an injected class
type but the type itself is a simple alias to a non-dependent type.
This issue was a bit hard to observe because if it is an alias to a
class type, then we (for some unknown reason) ignored that the NNS was
dependent in the first place, which wouldn't happen with an enum type.
This could have been a workaround for previous dependency bugs, and is
not relevant anymore for any of the test cases in the tree, so this
patch also removes that.
The other kinds of dependencies are still relevant. If the prefix
contains an unexpanded pack, then this NNS is still unexpanded, and
likewise for errors.
This fixes a regression reported here:
https://github.com/llvm/llvm-project/pull/133610#issuecomment-2787909829
which was introduced by https://github.com/llvm/llvm-project/pull/133610
There are no release notes since the regression was never released.
Researching in prep of doing the implementation for lowering, I found
that the source of the valid identifiers list from flang is in the
frontend. This patch adds the same list to the frontend, but does it as
a sema diagnostic, so we still parse it as an identifier/identifier-like
thing, but then diagnose it as invalid later.
The single file parse mode is supposed to enter both branches of an
`#if` directive whenever the condition contains undefined identifiers.
This patch adds support for undefined function-like macros, where we
would previously emit an error that doesn't make sense from end-user
perspective.
(I discovered this while working on a very similar feature that parses
single module only and doesn't enter either `#if` branch when the
condition contains undefined identifiers.)
Closes#99135.
Tasks completed:
- Wrote implementation in `hlsl_intrinsics.h`/`hlsl_intrinsic_helpers.h`
- Added codegen tests to `clang/test/CodeGenHLSL/builtins/lit.hlsl`
This PR adds the support for treating capturing of "self" as safe if the
lambda simultaneously captures "protectedSelf", which is a RetainPtr of
"self".
This PR also fixes a bug that the checker wasn't generating a warning
when "self" is implicitly captured. Note when "self" is implicitly
captured, we use the lambda's getBeginLoc as a fallback source location.