Preparatory cleanup up patch to makes it easier for combineX86ShufflesRecursively/combineX86ShuffleChain to handle length changing shuffles up the shuffle chain than what combineX86ShuffleChainWithExtract can manage.
Instead of passing the original Root node, pass the root opcode and the current effective value type (which may have widened as we recurse through EXTRACT_SUBVECTOR/TRUNCATE nodes etc.).
This caused ZT0 to be preserved around `__arm_tpidr2_save` in functions
with "aarch64_new_zt0". The block in which `__arm_tpidr2_save` is called
is added by the SMEABIPass and may be reachable in cases where ZA has
not been enabled* (so using `str zt0` is invalid).
* (when za_save_buffer is null and num_za_save_slices is zero)
This commit checks if the operands/results of an operator can be found
in the profile compliance mapping, if it isn't the operator is
considered invalid. As a result, operator datatype combinations that are
not listed under the "Supported Data Types" of the TOSA specification
are disallowed and the validation pass results in failure.
Signed-off-by: Luke Hutton <luke.hutton@arm.com>
This commit makes ThreadMemory a real "forwarder" class by implementing
the missing queue methods: they will just call the corresponding backing
thread method.
To make this patch NFC(*) and not change the behavior of the Python OS
plugin, NamedThreadMemoryWithQueue also overrides these methods to
simply call the `Thread` method, just as it was doing before. This also
makes it obvious that there are missing pieces of this class if it were
to provide full queue support.
(*) This patch is NFC in the sense that all llvm.org plugins will not
have any behavior change, but downstream consumers of ThreadMemory will
benefit from the newly implemented forwarding methods.
ThreadMemory attempts to be a class abstracting the notion of a "fake"
Thread that is backed by a "real" thread. According to its
documentation, it is meant to be a class forwarding most methods to the
backing thread, but it does so only for a handful of methods.
Along the way, it also tries to represent a Thread that may or may not
have a different name, and may or may not have a different queue from
the backing thread. This can be problematic for a couple of reasons:
1. It forces all users into this optional behavior.
2. The forwarding behavior is incomplete: not all methods are currently
being forwarded properly. Some of them involve queues and seem to have
been intentionally left unimplemented.
This commit creates the following separation:
ThreadMemory <- ThreadMemoryProvidingName <-
ThreadMemoryProvidingNameAndQueue
ThreadMemory captures the notion of a backed thread that _really_
forwards all methods to the backing thread. (Missing methods should be
implemented in a later commit, and allowing them to be implemented
without changing behavior of other derived classes is the main purpose
of this refactor).
ThreadMemoryProvidingNameAndQueue is a ThreadMemory that allows users to
override the thread name. If a name is present, it is used; otherwise
the forwarding behavior is used.
ThreadMemoryProvidingNameAndQueue is a ThreadMemoryProvidingName that
allows users to override queue information. If queue information is
present, it is used; otherwise, the forwarding behavior is used.
With this separation, we can more explicitly implement missing methods
of the base class and override them, if needed, in
ThreadMemoryProvidingNameAndQueue. But this commit really is NFC, no new
methods are implemented and no method implementation is changed.
For 256-bit selections, we could be using sub-i8/vXi8 selection condition masks - extend these to i8 and then extract the lowest mask subvector
Fixes#132844
This reverts commit cb4ae35de0b4c19149379f16c7b279d80a669f9d.
That commit broke compilation with GCC:
../unittests/Support/YAMLIOTest.cpp:1280:20: error: explicit specialization of
template<class T> struct llvm::yaml::MappingTraits’ outside its namespace must u
se a nested-name-specifier [-fpermissive]
1280 | template <> struct MappingTraits<V> {
| ^~~~~~~~~~~~~~~~
This reverts commit 71a0cfd93263552ddc0bfd2ea7b0abe9a578f87e.
This commit triggers failed asserts when compiling ffmpeg. The
issue is reproducible with a small standalone reproducer like this:
void make_filters_from_proto(int *filter[][2], int bands) {
int c, q, n;
for (;; q++) {
n = 0;
for (; n < 7; n++) {
int theta = (q * (n - 6) + (n >> 1) - 3) % bands;
if (theta)
c = theta;
filter[q][n][0] = c;
}
}
}
$ clang -target x86_64-linux-gnu -c repro.c -O3
clang: ../lib/Transforms/Vectorize/SLPVectorizer.cpp:989: llvm::SmallVector<llvm
::Value*> {anonymous}::BinOpSameOpcodeHelper::InterchangeableInfo::getOperand(ll
vm::Instruction*) const: Assertion `FromCIValue.isZero() && "Cannot convert the
instruction."' failed.
The same issue also reproduces for a large number of other target
triples, aarch64-linux-gnu and others.
This patch fixes the LBR check in the local lit config. The test would
segfault as the loop body would be basically empty, causing a divide by
zero error. More investigation is needed there so we do not actually hit
that assertion and report a cleaner error somewhere. Specifying an
actual opcode to benchmark fixes the problem. The test would also fail
as -mcpu was set to the default x86 CPU rather than the one currently
being run on, so it would always fail to find a perf counter. This patch
fixes that by simply removing the -mcpu flag.
Given these issues, I'm not sure these tests have ever run in the ~5
years they have been in tree. There were some issues reported in
\#132861, so I guess we'll see if there are further issues when the
testing becomes more broad.
We can use *Set::insert_range to collapse:
for (auto Elem : Range)
Set.insert(E);
down to:
Set.insert_range(Range);
In some cases, we can further fold that into the set declaration.
The File ID is incorrectly calculated, resulting in an out-of-bounds
access. The test code is more complex because the File fetching only
happens in specific scenarios.
---------
Co-authored-by: ShaderKeeper <no-reply@shaderkeeper.com>
Co-authored-by: Chuanqi Xu <yedeng.yd@linux.alibaba.com>
When placed on a function, the ``clang::noconvergent`` attribute ensures
that the function is not assumed to be convergent. But the same
attribute has no effect on function calls. A call is convergent if the
callee is convergent. This is based on the fact that in LLVM, a call
always inherits all the attributes of the callee. Only ``convergent`` is
an attribute in LLVM IR, and there is no equivalent of
``clang::noconvergent``.
This can be used to construct an ExecutorAddrRange from a pair of pointers, or
a pointer and a size.
This will be used to reduce boilerplate in an upcoming patch.
So far I've been unsuccessful in finding an example where the used constant
value is directly observed in the output. This avoids an assert in an intermediate
step of value replacement.
Support the trivial "header"/source switch for module interfaces.
I initially thought the naming are bad and we should rename it. But
later I feel it is better to split patches as much as possible.
From the codes it looks like there are problems. e.g., `isHeaderFile`.
But let's try to fix them in different patches.
Many of the test files had an inconsistent formatting. This patch ran
clang-format over them using the project's .clang-format file, with
column limit = 0, to prevent test directives from being split over
multiple lines.
We use the term "interchangeable instructions" to refer to different
operators that have the same meaning (e.g., `add x, 0` is equivalent to
`mul x, 1`).
Non-constant values are not supported, as they may incur high costs with
little benefit.
---------
Co-authored-by: Alexey Bataev <a.bataev@gmx.com>
The last version of this patch had memory leaks due to using the
BumpPtrAllocator for data types that required destructors to run to
release heap memory (e.g. via std::vector and std::string). This version
avoids that by using smart pointers, and dropping support for
BumpPtrAllocator.
We should refactor this code to use the BumpPtrAllocator again, but that
can be addressed in future patches, since those are more invasive
changes that need to refactor many of the core data types to avoid
owning allocations.
Adds Support for the Mustache Templating Language. See specs here:
https://mustache.github.io/mustache.5.html This patch implements
support+tests for majority of the features of the language including:
- Variables
- Comments
- Lambdas
- Sections
This meant as a library to support places where we have to generate
HTML, such as in clang-doc.
Co-authored-by: Peter Chou <peter.chou@mail.utoronto.ca>
guard_acquire is a helper function used to implement TSan's
__cxa_guard_acquire and pthread_once interceptors.
https://reviews.llvm.org/D54664 introduced optional hooks to support
cooperative multi-threading. It worked by marking the entire
guard_acquire call as a potentially blocking region.
In principle, only the contended case needs to be a potentially blocking
region. This didn't matter for __cxa_guard_acquire because the compiler
emits an inline fast path before calling __cxa_guard_acquire. That is,
once we call __cxa_guard_acquire at all, we know we're in the contended
case.
https://reviews.llvm.org/D107359 then unified the __cxa_guard_acquire
and pthread_once interceptors, adding the hooks to pthread_once.
However, unlike __cxa_guard_acquire, pthread_once callers are not
expected to have an inline fast path. The fast path is inside the
function.
As a result, TSan unnecessarily calls into the cooperative
multi-threading engine on every pthread_once call, despite applications
generally expecting pthread_once to be fast after initialization. Fix
this by deferring the hooks to the contended case inside guard_acquire.
Summary:
Currently we use `TestBigEndian` in CMake to determine endianness. This
doesn't work on all platforms and is deprecated since CMake 3.20.
Instead of using CMake, we can just use the GNU/Clang preprocessor
definitions.
The only difficulty is MSVC, mostly because they don't support the same
macros. But, as far as I'm aware, MSVC / Windows targets are always
little endian, and if not we can just override it for that specific
target in the future.
This generalizes https://github.com/llvm/llvm-project/pull/131975 to non-32-bit Linux (i.e., 64-bit Linux).
This works around an edge case in 64-bit Linux, whereby the memory layout is incompatible if the stack size is unlimited AND ASLR entropy is 31+ bits (see https://github.com/google/sanitizers/issues/856#issuecomment-2747076811).
More generally, this "re-exec without ASLR if layout is incompatible" is a hammer that can work around most shadow mapping issues, without incurring the overhead of using a dynamic shadow.
We previously made an implementation error when adding half overloads
for HLSL library functionality. The half type is always defined in HLSL
and half intrinsics should not be conditionally included.
When native 16-bit types are disabled half is a unique 32-bit float type
with lesser promotion rank than float.
Apply pattern #81782 to intrinsics added in #95999.
Closes#132793
Moving builder classes into separate files
`HLSLBuiltinTypeDeclBuilder.cpp`/`.h`, changing a some
`HLSLExternalSemaSource` methods to private and removing unused methods.
This is a prep work before we start adding more builtin types and
methods, like textures, resource constructors or matrices. For example
constructors could make use of the `BuiltinTypeMethodBuilder`, but this
helper class was defined in `HLSLExternalSemaSource.cpp` after the
method that creates a constructor. Rather than reshuffling the code one
big source file I am moving the builders into a separate cpp & header
file and placing the helper classes declarations up top.
Currently the new header only exposes `BuiltinTypeDeclBuilder` to
`HLSLExternalSemaSource`. In the future but we might decide to expose
more helper classes as needed.