Enable ops with only read side effects in scf.for to be hoisted with a
scf.if guard that checks against the trip count
This patch takes a step towards a less conservative LICM in MLIR as
discussed in the following discourse thread:
[Speculative LICM?](https://discourse.llvm.org/t/speculative-licm/80977)
This patch in particular does the following:
1. Relaxes the original constraint for hoisting that only hoists ops
without any side effects. This patch also allows the ops with only read
side effects to be hoisted into an scf.if guard only if every op in the
loop or its nested regions is side-effect free or has only read side
effects. This scf.if guard wraps the original scf.for and checks for
**trip_count > 0**.
2. To support this, two new interface methods are added to
**LoopLikeInterface**: _wrapInTripCountCheck_ and
_unwrapTripCountCheck_. Implementation starts with wrapping the scf.for
loop into scf.if guard using _wrapInTripCountCheck_ and if there is no
op hoisted into the this guard after we are done processing the
worklist, it unwraps the guard by calling _unwrapTripCountCheck_.
- **Precommit tests for synchronous uwtable CFI fixup**
- **[CFIFixup] Fixup CFI for split functions with synchronous uwtables**
Commit
6e54fccede
disables CFI fixup for
functions with synchronous tables, breaking CFI for split functions.
Instead, we can disable *block-level* CFI fixup for functions with
synchronous tables.
Unwind tables can be:
- N/A (not present)
- Asynchronous
- Synchronous
Functions without unwind tables don't need CFI fixup (since they don't
care about CFI).
Functions with asynchronous unwind tables must be accurate for each
basic block, so full CFI fixup is necessary.
Functions with synchronous unwind tables only need to be accurate for
each function (specifically, the portion of a function in a given
section). Disabling CFI fixup entirely for functions with synchronous
uwtables may break CFI for a function split between two sections. The
portion in the first section may have valid CFI, while the portion in
the second section is missing a call frame.
Ex:
```
(.text.hot)
Foo (BB1):
<Call frame information>
...
BB2:
...
(.text.split)
BB3:
...
BB4:
<epilogue>
```
Even if `Foo` has a synchronous unwind table, we still need to insert
call frame information into `BB3` so that unwinding the call stack from
`BB3` or `BB4` works properly.
The find-dynamic-unwind-info callback registration APIs in libunwind
limit the number of callbacks that can be registered. If we use multiple
UnwindInfoManager instances, each with their own own callback function
(as was the case prior to this patch) we can quickly exceed this limit
(see https://github.com/llvm/llvm-project/issues/126611).
This patch updates the UnwindInfoManager class to use a singleton
pattern, with the single instance shared between all LLVM JITs in the
process.
This change does _not_ apply to compact unwind info registered through
the ORC runtime (which currently installs its own callbacks).
As a bonus this change eliminates the need to load an IR "bouncer"
module to supply the unique callback for each instance, so support for
compact-unwind can be extended to the llvm-jitlink tools (which does not
support adding IR).
This is a continuation of the work started in #125735 to lower selected
VLA shuffles in linear m1 components instead of generating O(LMUL^2) or
O(LMUL*Log2(LMUL) high LMUL shuffles.
This pattern focuses on shuffles where all the elements being used
across the entire destination register group come from a single register
in the source register group. Such cases come up fairly frequently via
e.g. spread(N), and repeat(N) idioms.
One subtlety to this patch is the handling of the index vector for
vrgatherei16.vv. Because the index and source registers can have
different EEW, the index vector for the Nth chunk of the destination is
not guaranteed to be register aligned. In fact, it is common for e.g. an
EEW=64 shuffle to have EEW=16 indices which are four chunks per source
register. Given this, we have to pay a cost for extracting these chunks
into the low position before performing each shuffle.
I'd initially expressed this as a naive extract sub-vector for each data
parallel piece. However, at high LMUL, this quickly caused register
pressure problems since we could at worst need 4x the temporary
registers for the index. Instead, this patch uses a repeating slidedown
chained from previous iterations. This increases critical path by at
worst 3 slides (SEW=64 is the worst case), but reduces register pressure
to at worst 2x - and only if the original index vector is reused
elsewhere. I view this as arguably a bit of a workaround (since our
scheduling should have done better with the plain extract variant), but
a probably neccessary one.
In the process of adding strftime (#122556) I wrote this utility class
to simplify reading from a struct tm. It provides helper functions that
return basically everything needed by strftime. It's not tested
directly, but it is thoroughly exercised by the strftime tests.
For targets that have single precision FPU but not double precision FPU
such as Cortex M4, only using float-float in the intermediate
computations might reduce the code size compared to using double. In
this case, when the exact pass is skipped, the float-only option for
atan2f implemented in this PR reduces the code size of this function by
~1 KB compared to the double precision version.
One of these days, we'll be able to specify time to a computer...
Also, POSIX can remove stuff all they want. Folks probably will continue to
depend on broken interfaces forever.
Link: #124654
Link: https://austingroupbugs.net/view.php?id=1330
This warning is causing lots of build spam when I use a recent Clang as
my host compiler. It's a potential false positive, so silence it until
https://github.com/llvm/llvm-project/issues/126600 is resolved.
Fix variable casing while I'm here.
When building the reorder for non-single use reuse mask, need to check
if the size of the mask is multiple of the number of unique scalars.
Otherwise, the compiler may crash when trying to reorder nodes.
Fixes#126304
Adds a small note to VectorOps.td on what "dim-1" broadcast is. Also
updates comments to consistently use quotes, i.e.
* "dim-1" broadcasting instead of dim-1 broadcasting.
This way it is clear that we are referring to "stretching" one of the
trailing dims rather than e.g. broadcasting a dim at idx 1.
Parameter PossiblyLoopIndependent has lost its intended purpose. This
flag is always set to true in all cases when depends() is called, hence
we want to reconsider the utility of this variable and remove it from
the function signature entirely. This is an NFC patch.
Summary:
If the user deallocates an RPC device this can sometimes fail if the RPC
server is still running. This will happen if the modification happens
while the server is still checking it. This patch adds a mutex to guard
modifications to it.
A few changes to doc generation:
- All summaries are in italics.
- In general each optional block starts and ends with a newline.
- All table elements are enclosed in `|`'s
- Overall reduce the number of >2newlines in a row
Rationale for this change is that our markdown to docs generator
requires a newline before all headers, otherwise it gets inlined into
the line before it, see `### sdy-op-priority-propagate` in the image
below.
<img width="883" alt="image"
src="https://github.com/user-attachments/assets/b795c424-cecb-48df-abbe-aee2030f4491"
/>
That said overall I feel this formatting is more consistent now, here's
a before and after:
- Dialect documentation diff: https://www.diffchecker.com/OVMHoXeL/
- Pass documentation diff: https://www.diffchecker.com/XEJRmW3k/
My last change made the test not run when the `spirv-tools` feature is
not available, which is always the case in CI for clang tests, but it
fails if `spirv-tools` is available for the following reasons:
1) We didn't build `spirv-link` as part of the internal `SPIRV-Tools`
build, which is required by the `clang` call in `clang-linker-wrapper`,
I already fixed that
[here](https://github.com/llvm/llvm-project/pull/126319).
2) We didn't depend on the `SPIRV-Tools` CMake targets in clang tests,
so depending on what CMake targets were built before running
`check-clang`, `SPIRV-Tools` might not have been built.
3) We didn't check for `llvm-spirv` being available, which is not part
of `SPIRV-Tools` but is currently required for SPIR-V compilation.
Manually confirmed this works. This test is the bane of my existence.
---------
Signed-off-by: Sarnie, Nick <nick.sarnie@intel.com>
Allow operator T&() in a member function which returns a const member
variable.
In particular, this will allow UniqueRef::operator T&() and
Ref::operator T&() to be treated as a safe pointer origin when they're
called on a const member.
Change the shift operand for the mul operator to be a required operand.
Also defined shift to be Tosa_ScalarInt8Tensor which requires that it is
a rank-1 tensor
whose shape is [1] (ie, tensor containing a single element)
Signed-off-by: Tai Ly <tai.ly@arm.com>
Typically, we do not track memory sources after a load because of the
dynamic nature of the load and the fact that the alias analysis is a
simple static analysis.
However, the code is written in a way that makes it seem like we are
continuing to track memory but in reality we are only doing so when we
know that the tracked memory is a leaf and therefore when there will
only be one more iteration through the switch statement. In other words,
we are iterating one more time, to gather data about a box, anticipating
that this will be the last time. This is a hack that helped avoid
cut-and-paste from other case statements but gives the wrong impression
about the intention of the code and makes it confusing.
To make it clear that there is no more tracking, we gather all the
necessary data from the memref of the load, in the case statement for
the load, and exit the loop. I am also limiting this data gathering for
the case when we load a box reference while we were actually following
data, as tests have shows, is the only case when we need it for. Other
cases will be handled conservatively, but this can change in the future,
on a case-by-case basis.
---------
Co-authored-by: Joel E. Denny <jdenny.ornl@gmail.com>
This is an upstream proposal from
e60884cb98
We observed malfunctioning StripNonLineTableDebugInfo during debugging
and it's caused by out-of-order evaluation, this is a C++ level semantic
ambiguity issue, refer
https://en.cppreference.com/w/cpp/language/eval_order
Solution is simply separating one line into two.
Nobody is overriding GetValueProperties, so in practice we're always
using `m_collection_sp`, which means we don't need to check the pointer.
The temlated helpers were already operating on `m_collection_sp`
directly so this makes the rest of the class consistent.
When cross compiling the libc-stdbit-tests, the existing tests trigger numerous
instances of -Wimplicit-int-conversion. The truncation of these implicit
promotions is intentional.