If an attribute is not defined earlier in the same file, but just
referenced from its dialect directly, then currently not the correct
check is being emited.
What would it emit for #toy.shape<[1, 2, 3]>:
Earlier:
// CHECK: #[['?']]<[1, 2, 3]>
Now:
// CHECK: #toy.shape<[1, 2, 3]>
The component diagnostic headers (i.e. `DiagnosticAST.h` and friends)
all follow the same format, and there’s enough of them (and in them) to
where updating all of them has become rather tedious (at least it was
for me while working on #132348), so this patch instead generates all of
them (or rather their contents) via Tablegen.
Also, it seems that `%enum_select` currently wouldn’t work in
`DiagnosticCommonKinds.td` because the infrastructure for that was
missing from `DiagnosticIDs.h`; this patch should fix that as well.
.swiftinterface files into the dSYM bundle. These typically come only
from the SDK (since textual interfaces require library evolution) and
thus are a waste of space to copy into the bundle.
The information about this is being parsed out of the control block,
which means duplicating 5 constants from the Swift frontend. If a file
cannot be parsed, dsymutil errs on the side of copying the file anyway.
rdar://138186524
Currently if a developer uses the flag `--mlir-enable-debugger-hook` the
debugger hook is not actually enabled. It seems the DebugConfig and the
MainMLIROptConfig are not connected.
To fix this we can move the `enableDebuggerHook` CL Option to the
DebugConfigCLOptions struct so that it can get registered and enabled
along with the other debugger flags. AFAICS there are no other uses of
the flag so this should be safe.
This also adds a small LIT test to check that the hook is enabled by
checking the std::cerr output for the log message.
We cannot determine ScalarTy from VL because some ScalarTy is determined
from VL[0]->getType(), while others are determined from
getValueType(VL[0]).
Fix "Mask and VecTy are incompatible".
Fix https://github.com/llvm/llvm-project/issues/134126.
The matching code was previous written as if we were mutating the
indices to replace undef elements with preferred values, but the actual
lowering code just took a prefix of the index vector. This resulted in
us using undef indices for lanes which should have been defined,
resulting in incorrect codegen.
Longer term, we probably should rewrite the mask, but this seemed like
an easier tactical fix.
llvm-mt requires libxml2 to work, so do not even build it without
libxml2.
CMake 3.31 and later prefer llvm-mt.exe over Microsoft's mt.exe if
available and using clang-cl.exe as CMAKE_CXX_COMPILER. When CMake picks
up llvm-mt.exe without libxml2, any build will fail with the message
```
llvm-mt: error: no libxml2
```
Any test except `--help` already uses `REQUIRES: libxml2`. There is no
point in having a non-functional executable. Not building llvm-mt.exe
will force CMake to use Microsoft's `mt.exe` instead.
Fixes: #134237
This is an optional mechanism that automatically detects roots. It's a best-effort mechanism, and its main goal is to *avoid* pointing at the message pump function as a root. This is the function that polls message queue(s) in an infinite loop, and is thus a bad root (it never exits).
High-level, when collection is requested - which should happen when a server has already been set up and handing requests - we spend a bit of time sampling all the server's threads. Each sample is a stack which we insert in a `PerThreadCallsiteTrie`. After a while, we run for each `PerThreadCallsiteTrie` the root detection logic. We then traverse all the `FunctionData`, find the ones matching the detected roots, and allocate a `ContextRoot` for them. From here, we special case `FunctionData` objects, in `__llvm_ctx_profile_get_context, that have a `CtxRoot` and route them to `__llvm_ctx_profile_start_context`.
For this to work, on the llvm side, we need to have all functions call `__llvm_ctx_profile_release_context` because they _might_ be roots. This comes at a slight (percentages) penalty during collection - which we can afford since the overall technique is ~5x faster than normal instrumentation. We can later explore conditionally enabling autoroot detection and avoiding this penalty, if desired.
Note that functions that `musttail call` can't have their return instrumented this way, and a subsequent patch will harden the mechanism against this case.
The mechanism could be used in combination with explicit root specification, too.
#130963 switches the default to COV6, which requires ROCm 6.3.
Currently, if the
device libraries for COV6 are not found, the error message is not very
helpful.
This PR provides a more informative error message in such cases.
Devices not supporting denormals can compare them true against zero. It
leads to result not matching the CTS expectation when either supporting
or not denormals.
For example for 0x1.008p-140 we get {0x1.008p-140, 0} while the CTS
expects {0x1.008p-1, -139} when supporting denormals, or {0, 0} when not
supporting denormals (flushed to zero).
Ref #129871
Tail calls were disabled from callers with inreg parameters in 5dc8aeb
with a fixme to check if the callee also takes an inreg parameter.
The issue is that inreg parameters (which are passed in x0 or x1 for
free and member functions respectively) are supposed to be returned (in
x0) at the end of the function. In case of a tail call, that means the
callee needs to return the same value as the caller would.
We can check for that case, and it's not as niche as it sounds, as
that's how Clang will lower one function with an sret return value
calling another, such as:
```
struct T { int x; };
struct S {
T foo();
T bar();
};
T S::foo() { return bar(); } // foo's sret argument will get passed directly to bar
```
Fixes#133098
Defining a new `amdgpu.global_load` op, which is a thin wrap around
ROCDL `global_load_lds` intrinsic, along with its lowering logics to
`rocdl.global.load.lds`.
Noticed on Windows when running LLVM as part of a graphics driver, with
total stack usage limited to about 128 KB. In some cases this function
would overflow the stack.
On Linux this reduces stack usage in this function from about 32 KB to
about 0.5 KB.
Summary:
These two tools do the same thing, we should unify them into a single
tool. We create symlinks for backward compatiblity and provide a way to
get the old vendor specific behavior with `--amdgpu-only` and
`--nvptx-only`.
It's not scientific but I think the PDB we produce on the Windows on Arm
bot simply doesn't have the information needed. Could also be that clang
is producing some DWARF, but link.exe is dropping it from the final executable,
the effect is the same.
Add a dedicated function to check if a plan is for a loop with an early
exit. This can easily be determined by checking the exit blocks.
This allows removing a use of Legal->hasUncountableEarlyExit() from
InnerLoopVectorizer.
PR: https://github.com/llvm/llvm-project/pull/134720
This patch fixes UnicodeDecodeError on Windows in test_errors.py. This
issue was observed on the flang-arm64-windows-msvc buildbot.
Semantics/OpenMP/interop-construct.f90 was crashing due to Python
defaulting to the cp1252 codec on Windows.
I have fixed this by explicitly setting encoding="utf-8" when reading
source files and invoking subprocess.run() in test_errors.py
flang-arm64-windows-msvc was running on stagging master which resulted
in this issue not being fixed earlier.
https://lab.llvm.org/staging/#/builders/206
Using precompiled headers with ccache requires special accommodations.
Add the required ccache options, clang and gcc compiler flags to CMake.
Refactor ccache configuration to pass options directly on the command line for versions of ccache that support it.
---------
Signed-off-by: Kajetan Puchalski <kajetan.puchalski@arm.com>
Converting fixed length masks, as used by MLOAD, to scalable vectors is
done by comparing the mask to zero. When the mask is the result of a
compare we can instead promote the operands and regenerate the original
compare. At worst this reduces the dependecy chain and in most cases
removes the need for multiple compares.
Adds support the the spv.gep intrinsic to the spv ptrcast legalization
step. Those intrinsics are generated by the backend thus not directly
visible in the tests.
This is a pre-requisite to implement addrspacecast legalization for
logical SPIR-V.
This is a conditional revert of cca40aa8d8aa732, which made LLVM's
branch-target-enforcement mode generate BTI at the start of _every_
function, even in the case where the function has internal linkage and
its address is never taken for use in an indirect call.
The rationale was that it might turn out at link time that a direct call
to the function spanned a larger distance than the range of a BL
instruction (say, if the translation unit generated multiple code
sections and the linker put them a very long way apart). Then the linker
might insert a long-branch thunk using an indirect call instruction.
SYSVABI64 has now clarified that in this situation the static linker may
not assume that the target function is safe to call directly. If it
needs to use this strategy, it's responsible for also generating a
'landing pad' near the target function, with a BTI followed by a direct
branch, and using that as the target of the long-distance indirect call.
606ce44fe4
LLD complies with this spec as of commit 098b0d18add97de.
So if we're compiling in a mode that respects SYSVABI64, such as
targeting Linux, it's safe to leave out the BTI at the start of a
function with internal linkage if we can prove that its address isn't
either used in an indirect call in _this_ translation unit or passed out
of the object.
Therefore, this patch goes back to the behavior before cca40aa8d8aa732,
leaving out BTIs in functions that can't be called indirectly, but only
if the target triple is Linux. (I wasn't able to find a more precise
query for "is this a SYSVABI64-compliant platform?", but Linux certainly
is, and this check at least fails in the safe direction - if in doubt,
we put in all the BTIs that might be necessary.)
After https://github.com/llvm/llvm-project/issues/126928 it's now
possible to rewrite the existing combines, which mostly only handle
cases where a operand is an identity value, to use existing simplify
code to unlock general constant folding.
Generalizes the existing code to repeatedly peek though mixed bitcast/insert_subvector/extract_subvector chains to find the source of the shuffle operand.
The statement context is used for lowering clauses for openmp operations
using generalised helpers from flang lowering. The statement context
stores closures which generate code for cleaning up temporary values
generated by the lowering helper. These closures are run when the
statement construct is destroyed. Keeping the statement context local to
the clause or operation being lowered without any special handling was
not correct because any cleanup code would be generated at the insertion
point when that statement context went out of scope (which would in
general be inside of the newly created container operation). It would be
better to generate the cleanup code after the newly created operation
(clause processing is synchronous even for deferred tasks).
Currently supported clauses are mostly populated with simple scalar
values that require no cleanup. Even the simple array sections added by
#132994 needed no cleanup because indexing the right values of the array
did not create any temporaries. Supporting array sections with vector
indexing will generate hlfir.destroy operations for cleanup. This patch
fixes where those will be created. Those hlfir.destroy operations don't
generate any FIR (or LLVM) code, but the issue still exists
theoretically.
I wasn't able to find any clauses which have any cleanup to use to test
this PR. It is probably NFC for the current lowering. This will be
tested in [the PR adding vector subscripting of array
sections](https://github.com/llvm/llvm-project/pull/133892).
Before this commit, we only pushed a queue/running count when the value
was not zero. This makes building Grafana alerting a bit harder.
Changing this to always upload a value for watched workflows.
This is a much smaller, technically orthogonal patch similar to #134505. It
states that a extractvalue(Argument) can be treated like an Argument for alias
analysis, where the extractelement acts like a phi / copy. No inttoptr here.