ByOperand must be false, this is implied by the iterator type.
The instr_iterator cases are a separate implementation from the single
operand defusechain_iterator.
Additionally ByInstr and ByBundle are mutually exclusive.
As part of reassociating add instructions, we may factorize some of the
adds and produce a mul instruction; this patch propagates the source
location of the reassociated tree of instructions to the new mul.
Found using https://github.com/llvm/llvm-project/pull/107279.
During inlining, we may opportunistically simplify conditional branches
(incl. switches) to unconditional branches if, after inlining, their
destination is fixed. While we do this, we should propagate any
DILocation attached to the original branch to the simplified branch,
which this patch enables.
Found using https://github.com/llvm/llvm-project/pull/107279.
Improve error messages when parsing an incorrect type.
Before:
```
invalid kind of type specified
```
After:
```
invalid kind of type specified: expected builtin.tensor, but found 'tensor<*xi32>'
```
This error message is produced when a certain operand/result type is
expected according to an op's TableGen definition, but a different type
is parsed. Type constraints (which may have nice error messages) are
checked after parsing a type. If an incorrect type is parsed, we never
get to the point of printing type constraint error messages. This may
discourage users from specifying C++ classes with type constraints.
(Explicitly specifying C++ classes is beneficial because the
auto-generated C++ code will have richer type information; explicit
casts are unnecessary, etc.) See #134981 for an example where specifying
additional type information with type constraints (e.g.,
`LLVM_AnyVector`) lead to worse error messages.
Note: In order to generate a better error message, the parser must
retrieve a type's name from the C++ class. TableGen-generated type
classes always have a `name` field, but hand-written C++ type classes
may not. The `HasStaticName` template was copied from
`DialectImplementation.h` (`HasStaticDialectName`).
As part of inlining an invoke instruction, we may replace an inlined
resume instruction with a simple branch to the landing pad block. When
this happens, we should also propagate the resume's DILocation to this
branch, which this patch enables.
Found using https://github.com/llvm/llvm-project/pull/107279.
SPIR-V has strict address space rules, constant globals cannot be in the
default address space.
The OMPIRBuilder change was required for lit tests to pass, we were
missing an addrspacecast.
---------
Signed-off-by: Sarnie, Nick <nick.sarnie@intel.com>
This is mostly just a simplification. getCalledFunction is a best-effort
thing so the verifier should not be relying on it in most cases, except
for intrinsic calls where we are guaranteed that the called function is
known, but most of those cases can be handled with
CallBase::getIntrinsicID instead.
---------
Co-authored-by: Tim Gymnich <tim@gymni.ch>
This commit moves the shuffle and shuffle2 builtins to the CLC library.
In so doing it makes the headers simpler and re-usable for other builtin
layers to hook into the CLC functions, if they wish.
An additional gentype utility has been made available, which provides a
consistent vector-size-or-1 macro for use.
The existing __CLC_VECSIZE is defined but empty which is useful in
certain applications, such as in concatenation with a type to make a
correctly sized scalar or vector type. However, this isn't usable in the
same preprocessor lines when wanting to check for specific vector sizes,
as e.g., '__CLC_VECSIZE == 2' resolves to '== 2' which is invalid. In
local testing this is also useful for the geometric builtins which are
only available for scalar types and vector types of 2, 3, or 4 elements.
No codegen changes are observed, except the internal shuffle/shuffle2
utility functions are no longer made publicly available.
We can always fold the input of a extract_strided_metadata operator to
the input of a reinterpret_cast operator, because they point to the same
memory. Note that the reinterpret_cast does not use the layout of its
input memref, only its base memory pointer which is the same as the base
pointer returned by the extract_strided_metadata operator and the base
pointer of the extract_strided_metadata memref input.
Operations like expand_shape, collapse_shape, and subview are lowered to
a pair of extract_strided_metadata and reinterpret_cast like this:
%base_buffer, %offset, %sizes:2, %strides:2 =
memref.extract_strided_metadata %input_memref :
memref<ID1x...xIDNxBaseType> -> memref<f32>, index, index, index, index,
index
%reinterpret_cast = memref.reinterpret_cast %base_buffer to offset:
[%o1], sizes: [%d1,...,%dN], strides: [%s1,...,%N] : memref<f32> to
memref<OD1x...xODNxBaseType >
In many cases the input of the extract_strided_metadata input can be
passed directly into the input of the reinterpret_cast operation like
this (see how %base_buffer is replaced by %input_memref in the
reinterpret_cast above and the input type is updated):
%base_buffer, %offset, %sizes:2, %strides:2 =
memref.extract_strided_metadata %input_memref :
memref<ID1x...xIDNxBaseType> -> memref<f32>, index, index, index, index,
index
%reinterpret_cast = memref.reinterpret_cast %input_memref to offset:
[%o1], sizes: [%d1,...,%dN], strides: [%s1,...,%N] :
memref<ID1x...xIDNxBaseType> to memref<OD1x...xODNxBaseType >
When dealing with static dimensions, the extract_strided_metatdata will
become deadcode and we end up only with a reinterpret_cast:
%reinterpret_cast = memref.reinterpret_cast %input_memref to offset:
[%o1], sizes: [%d1,...,%dN], strides: [%s1,...,%N] :
memref<ID1x...xIDNxBaseType> to memref<OD1x...xODNxBaseType >
Note that reinterpret_cast only reads the base memory pointer from the
input memref (%input_memref above), which is equivalent to the
%base_buffer returned by the extract_strided_metadata operation. Hence
it is legal always to use the extract_strided_metadata input memref
directly in the reinterpret_cast. Note that since this is a pointer,
this operation is legal even when the base pointer values are modified
between the operation pair.
@matthias-springer
@joker-eph
@sahas3
@Hanumanth04
@dixinzhou
@rafaelubalmw
---------
Co-authored-by: Ivan Garcia <igarcia@vdi-ah2ddp-178.dhcp.mathworks.com>
The current hashing quality for `ValueInfo` is poor because it uses
pointers as the hash value, which can negatively impact performance in
various places that use a `DenseSet`/`Map` of `ValueInfo`. In one
observed case, `ModuleSummaryIndex::propagateAttributes()` was taking
about 25 minutes to complete on a ThinLTO application. Profiling
revealed that the majority of this time was spent operating on the
`MarkedNonReadWriteOnly` set.
With the improved hashing, the execution time for `propagateAttributes`
is dramatically reduced to less than 10 seconds.
The .h removals was done by the sync script. I manually cleaned up
the remaining removals based on the output of
git show 750da48b4aa52f libcxx/include/CMakeLists.txt | rg '^- ' | rg -v '\.'
Currently HIP still uses offload bundler for non-rdc mode for the new
offload driver.
This patch switches to use offload wrapper for non-device-only non-rdc
mode when new offload driver is enabled.
This makes the rdc and non-rdc compilation more consistent and speeds up
compilation since the offload wrapper supports parallel compilation for
different GPU arch's.
It is implemented by adding a linker wrapper action for each assemble
action of input file. Linker wrapper action differentiates this special
type of work vs normal linker wrapper work by the fle type. This type of
work results in object instead of image. The linker wrapper adds "-r"
for it and only includes the object file as input, not the host
libraries.
For device-only non-RDC mode, the new driver keeps the original
behavior.
This patch adds some lowering code for Compute Constructs, plus the
infrastructure to someday do clauses. Doing this requires adding the
dialect to the CIRGenerator.
This patch does not however implement/correctly initialize lowering from
OpenACC-Dialect to anything lower however.
Part two of merging #132486. Support volatility in fir ops.
* Introduce a new operation fir.volatile_cast, whose only purpose is to
add or take away the volatility of an SSA value's type. The types must
be otherwise identical, and any other type conversions must be handled
by fir.convert. fir.convert will give an error if the volatility of the
inputs does not match, such that all changes to volatility must be
handled explicitly through fir.volatile_cast.
* Add memory effects to ops that read from or write to memory. The
precedent for this comes from the LLVM dialect (feb7beaf70) where
llvm.load/store ops with the volatile attribute report read/write
effects to a generic memory resource. This change is similar in spirit
but different in two ways: the volatility of an operation is determined
by the type of its memref, not an attribute on the op, and the memory
effects of a load- or store-like operation on a volatile reference type
are reported against a particular memory resource,
`VolatileMemoryResource`. This is so MLIR optimizations are able to
reorder operations that are not volatile around operations that are,
which we believe more precisely models LLVM's volatile memory semantics.
@vzakhari suggested this in #132486 citing LangRef. See
https://llvm.org/docs/LangRef.html#volatile-memory-accesses
Changes needed to generate IR with volatile types are not included in
this change, so it should be non-functional, containing only the changes
to Fir ops and op utilities that will be needed once we enable lowering
to generate volatile types.
Reverts llvm/llvm-project#132274
Broke a test on LLDB Widows on Arm:
https://lab.llvm.org/buildbot/#/builders/141/builds/7726
```
FAIL: test_dwarf (lldbsuite.test.lldbtest.TestExternCSymbols.test_dwarf)
<...>
self.assertTrue(self.res.Succeeded(), msg + output)
AssertionError: False is not true : Command 'expression -- foo()' did not return successfully
Error output:
error: Couldn't look up symbols:
int foo(void)
Hint: The expression tried to call a function that is not present in the target, perhaps because it was optimized out by the compiler.
```
Noticed while investigating #133947 regressions - if we peek through
bitcasts we can lose track of oneuse/combined nodes in shuffle combining
Currently the same codegen as combineX86ShufflesRecursively still peeks
through the bitcasts itself, but we will soon handle this consistently
as another part of #133947
In TailRecursionElimination we may insert a select before the return to
choose the return value if necessary; this select is effectively part of
the return statement, and so should use its DILocation.
Found using https://github.com/llvm/llvm-project/pull/107279.
This patch revisits op verifiers for `LoopWrapperInterface` operations
to improve consistency across operations and to properly cover some
previously misreported cases.
Checks that should be done for these kinds of operations are documented
in the interface description.
When compiling with -msve-vector-bits=128 or vscale_range(1, 1) and when
the offsets allow it, we can pair SVE LDR/STR instructions into Neon
LDP/STP.
For example, given:
```cpp
#include <arm_sve.h>
void foo(double const *ldp, double *stp) {
svbool_t pg = svptrue_b64();
svfloat64_t ld1 = svld1_f64(pg, ldp);
svfloat64_t ld2 = svld1_f64(pg, ldp+svcntd());
svst1_f64(pg, stp, ld1);
svst1_f64(pg, stp+svcntd(), ld2);
}
```
When compiled with `-msve-vector-bits=128`, we currently generate:
```gas
foo:
ldr z0, [x0]
ldr z1, [x0, #1, mul vl]
str z0, [x1]
str z1, [x1, #1, mul vl]
ret
```
With this patch, we instead generate:
```gas
foo:
ldp q0, q1, [x0]
stp q0, q1, [x1]
ret
```
This is an alternative, more targetted approach to #127500.
Some files were accidentally given two copyright headers. Another was
missing one. This commit also converts that file's dos line endings to
unix ones and reformats a comment.
It can be highly beneficial to unroll small, two-block search loops
that look for a value in an array. An example of this would be
something that uses std::find to find a value in libc++. Older
versions of std::find in the libstdc++ headers are manually unrolled
in the source code, but this might change in newer releases where
the compiler is expected to either vectorise or unroll itself.
This fixes a regression I traced back to
8b43c1be23
/ https://github.com/llvm/llvm-project/pull/79000
The regression caused an SSE2 instruction, `movsd`, to be emitted as a
replacement for an SSE instruction, `movaps` despite the target
potentially not supporting this instruction, such as when building with
clang using `-march=pentium3`.
Fixes#134607
`GetLSDAAddress` and `GetPersonalityRoutinePtrAddress` are unused and
they create a bit of a problem for discontinuous functions, because the
unwind plan for these consists of multiple eh_frame descriptors and (at
least in theory) each of them could have a different value for these
entities.
We could say we only support functions for which these are always the
same, or create some sort of a Address2LSDA lookup map, but I think it's
better to leave this question to someone who actually needs this.
Updates the description to align with the specification. Also includes
some small cleanup to `sigmoid`, to avoid confusion.
Signed-off-by: Luke Hutton <luke.hutton@arm.com>
Now, because we do not support mips debugging, if we compile LLVM on
mips target, would report error `static assertion failed:Value mismatch
for signal number SIGBUS`, so add this condition to avoid error.
Adds support for the SPV_INTEL_ternary_bitwise_function extension,
adding;
* the OpBitwiseFunctionINTEL SPIR-V instruction, a ternary bitwise
function where the operation performed is determined by a look-up table
index,
* and the corresponding TernaryBitwiseFunctionINTEL capability.
See
https://github.khronos.org/SPIRV-Registry/extensions/INTEL/SPV_INTEL_ternary_bitwise_function.html.
Signed-off-by: Larsen, Steffen <steffen.larsen@intel.com>
Background:
"amdgpu-sw-lower-lds" pass lowers LDS accesses based on
"sanitize_address" attribute being tagged to kernel or non-kernels.
"amdgpu-sw-lower-lds" pass ideally should either lower all LDS accesses
or should not lower any based on if asan is enabled.
Issue:
But there has been cases when instrumented and non instrumented bitcodes
are linked and this is leading to few LDS being lowered correctly while
others are not. This typically leads to below error in the subsequent
pass.
"Module cannot mix absolute and non-absolute LDS GVs"
Fix:
This patch fixes this issue, by checking if any kernels in module are
tagged with "sanitize_address" attribute and then lowers all the LDS
accesses in all other kernels and non-kernels even though they do not
have "sanitize_address" attribute.
This reverts commit
48864a52ef,
reapplying d7cea2b187.
It also fixes the dangling
pointers caused by the previous version by creating copies of the Rows
in x86AssemblyInspectionEngine.