In 50e5411e4, we preserved the pack substitution index within
SubstTemplateTypeParmType nodes and performed in-place expansions of
packs such that type constraints on a lambda that serve as a pattern of
a fold expression could be evaluated if the type constraints contain any
packs that are expanded by the fold expression.
However, we made an incorrect assumption of the condition under which
in-place expansion should occur. For example, a SizeOfPackExpr case
relies on SubstTemplateTypeParmType nodes being transformed to
SubstTemplateTypeParmPackTypes rather than expanding them immediately in
place.
This fixes that by adding a flag to SubstTemplateTypeParmType to
discriminate such in-place expansion situations.
Fixes https://github.com/llvm/llvm-project/issues/113518
This is also split off from the zvfhmin/zvfbfmin
isLegalElementTypeForRVV work.
Enabling this will cause SLP and RISCVGatherScatterLowering to emit
@llvm.experimental.vp.strided.{load,store} intrinsics, and codegen
support for this was added in #109387 and #114750.
This patch adds support for re-scheduling already scheduled
instructions. For now this will clear and rebuild the DAG, and will
reschedule the code using the new DAG.
Fixes a copy-paste typo.
The typo resulted in producing bad v_perm based operands for the v_dot4
combine. When adding a corresponding byte pair to the v_dot byte pair
chains, we must take note of the byte position in the corresponding
source nodes. These byte positions are used to ensure we extract the
correct DWord from the ultimate source, and formulate a correct
perm_mask from the extracted DWord.
With the typo, we the S0 byte would used the DWord offset for the
corresponding S1 byte. If this offset was not the same as the true DWord
offset for the S0 byte, we would extract and use the wrong byte for S0
in the v_dot.
Fixes https://github.com/llvm/llvm-project/issues/112941
This is another piece split off from the work to add zvfhmin/zvfbfmin to
isLegalElementTypeForRVV.
This is needed to get InterleavedAccessPass to lower [de]interleaves to
segment load/stores.
This PR adds a folder for extracting an element from a vector shuffle.
It turns something like:
```
%shuffle = vector.shuffle %a, %b [0, 8, 7, 15]
: vector<8xf32>, vector<8xf32>
%extract = vector.extract %shuffle[3] : f32 from vector<4xf32>
```
into:
```
%extract = vector.extract %b[7] : f32 from vector<8xf32>
```
This patch fixes:
third-party/unittest/googletest/include/gtest/gtest.h:1379:11:
error: comparison of integers of different signs: 'const unsigned
int' and 'const int' [-Werror,-Wsign-compare]
An extra inhabitant is a bit pattern that does not represent a valid
value for instances of a given type. The number of extra inhabitants is
the number of those bit configurations.
This is used by Swift to save space when composing types. For example,
because Bool only needs 2 bit patterns to represent all of its values
(true and false), an Optional<Bool> only occupies 1 byte in memory by
using a bit configuration that is unused by Bool. Which bit patterns are
unused are part of the ABI of the language.
Since Swift generics are not monomorphized, by using dynamic libraries
you can have generic types whose size, alignment, etc, are known only
at runtime (which is why this feature is needed).
This patch adds num_extra_inhabitants to LLVM-IR debug info and in DWARF
as an Apple extension.
The `FileManager` sharing between module-building `CompilerInstance`s
was disabled a while ago due to `FileEntry::getName()` being unreliable.
Now that we use `FileEntryRef::getNameAsRequested()` in places where it
matters, re-enabling `FileManager` is sound and improves performance of
`clang-scan-deps` by ~6.2%.
Since b4130bee6bfd34d8045f02fc9f951bcb5db9d85c, we check for
_LIBCPP_HAS_THREAD_API_PTHREAD to decide between using
SetThreadDescription or pthread_setname_np for setting the thread name.
c6f3b7bcd0596d30f8dabecdfb9e44f9a07b6e4c changed how libcxx defines
their configuration macros - now they are always defined, but defined to
0 or 1, while they previously were either defined or undefined.
As these libcxx defines used to be defined to an empty string (rather
than expanding to 1) if enabled, we can't easily produce an expression
that works both with older and newer libcxx. Additionally, these defines
are libcxx internal config macros that aren't a detail that isn't
supported and isn't meant to be relied upon.
Simply skip trying to set thread name on MinGW as we can't easily know
which kind of thread native handle we have. Setting the thread name is
only a nice to have, quality of life improvement - things should work
the same even without it.
Additionally, libfuzzer isn't generally usable on MinGW targets yet
(Clang doesn't include it in the getSupportedSanitizers() method for the
MinGW target), so this shouldn't make any difference in practice anyway.
This patch registers the "createInstr" callback that notifies the
scheduler about newly created instructions. This guarantees that all
newly created instructions have a corresponding DAG node associated with
them. Without this the pass crashes when the scheduler encounters the
newly created vector instructions.
This patch also changes the lifetime of the sandboxir Ctx variable in
the SandboxVectorizer pass. It needs to be destroyed after the passes
get destroyed. Without this change when components like the Scheduler
get destroyed Ctx will have already been freed, which is not legal.
https://github.com/llvm/llvm-project/pull/114550 caused a buildbot breakage (https://lab.llvm.org/buildbot/#/builders/66/builds/5853) because of an unused variable. This patch attempts to fix forward:
/home/b/sanitizer-x86_64-linux/build/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUMachineFunction.cpp:106:24: error: variable 'TTy' set but not used [-Werror,-Wunused-but-set-variable]
106 | if (TargetExtType *TTy = AMDGPU::isNamedBarrier(GV)) {
| ^
This PR *restricts* `GeneralizeOuterUnitDimsPackOpPattern` to follow its
intended purpose (as per the documentation), which is to:
> require all outer dimensions of tensor.pack to be 1.
There was one in-tree test that violated this assumption (and happened
to work) – see `@simple_KCRS_to_KRSCsr` in
"generalize-tensor-pack.mlir". That test has been updated to satisfy the
new requirements of the pattern.
By enforcing the pattern to follow its intended design (i.e., making it
stricter), the calculation of shapes and sizes for various Ops that the
pattern generates (PadOp, ExtractSliceOp, EmptyOp, TensorOp, and
InsertSliceOp) becomes much simpler and easier to document. This also
helped *generalize* the pattern to support cases like the one below:
```mlir
func.func @simple_pad_and_pack_dynamic_tile_cst(
%src: tensor<5x1xf32>,
%dest: tensor<1x1x?x2xf32>,
%pad: f32) -> tensor<1x1x?x2xf32> {
%tile_dim_0 = arith.constant 8 : index
%0 = tensor.pack %src
padding_value(%pad : f32)
inner_dims_pos = [0, 1]
inner_tiles = [%tile_dim_0, 2]
into %dest : tensor<5x1xf32> -> tensor<1x1x?x2xf32>
return %0 : tensor<1x1x?x2xf32>
}
```
Note that the inner tile slice is dynamic but compile-time constant.
`getPackOpSourceOrPaddedSource`, which is used to generate PadOp,
detects this and generates a PadOp with static shapes. This is a good
optimization, but it means that all shapes/sizes for Ops generated by
`GeneralizeOuterUnitDimsPackOpPattern` also need to be updated to be
constant/static. By restricting the pattern and simplifying the
size/shape calculation, supporting the case above becomes much easier.
Notable implementation changes:
* PadOp processes the original source (no change in dimensions/rank).
ExtractSliceOp extracts the tile to pack and may reduce the rank. All
following ops work on the tile extracted by ExtractSliceOp (possibly
rank-reduced).
* All shape/size calculations assume that trailing dimensions match
inner_tiles from tensor.pack. All leading dimensions (i.e., outer
dimensions) are assumed to be 1.
* Dynamic sizes for ops like ExtractSliceOp are taken from inner_tiles
rather than computed as, for example, tensor.dim %dest, 2. It’s the
responsibility of the "producers" of tensor.pack to ensure that
dimensions in %dest match the specified tile sizes.
Replace clampScalar of the integer type with minScalar. We can't
narrow the integer type, we can only make it larger. If the type
is larger than xLen we need to use a 2*xlen libcall. If it's larger
than 2*xlen we can't handle it at all.