Several implementations have zero-latency instructions to zero
registers. To-date no implementation has a dedicated SVE instruction but
we can use the NEON equivalent because it is defined to zero bits
128..VL regardless of the immediate used.
NOTE: The relevant instruction is not available in streaming mode, where
the original SVE DUP instruction remains in use.
Before this change the code used to add extra qualifiers, e.g.
`std::unique_ptr<int> _Nonnull` became `::std::std::unique_ptr<int>
_Nonnull`
when adding a global namespace qualifier was requested.
Given:
__auto_type x = 12;
decltype(auto) y = 12;
-Wc++98-compat would diagnose both x and y with:
'auto' type specifier is incompatible with C++98
This patch silences the diagnostic in those cases. decltype(auto) is
still diagnosed with:
'decltype(auto)' type specifier is incompatible with C++ standards
before C++14
as expected but no longer produces the extraneous diagnostic about use
of 'auto'.
Fixes#47900
Previously, alignment option was passed to multilib selection logic only
when -mno-unaligned-access was explicitly specified on the command line.
Now this change ensure both -mno-unaligned-access and -munaligned-access
are passed to the multilib selection logic, which now also considers the
target architecture when determining alignment access policy.
This adds a call to SimplifyDemandedBits from bitcasts with scalar input
types in SimplifyDemandedVectorElts, which can help simplify the input
scalar.
When using `SBFrame::EvaluateExpression` on a frame that's not the
currently selected frame, we would sometimes run into errors such as:
```
error: error: The context has changed before we could JIT the expression!
error: errored out in DoExecute, couldn't PrepareToExecuteJITExpression
```
During expression parsing, we call `RunStaticInitializers`. On our
internal fork this happens quite frequently because any usage of, e.g.,
function pointers, will inject ptrauth fixup code into the expression.
The static initializers are run using `RunThreadPlan`. The
`ExecutionContext::m_frame_sp` going into the `RunThreadPlan` is the
`SBFrame` that we called `EvaluateExpression` on. LLDB then tries to
save this frame to restore it after the thread-plan ran (the restore
occurs by unconditionally overwriting whatever is in
`ExecutionContext::m_frame_sp`). However, if the `selected_frame_sp` is
not the same as the `SBFrame`, then `RunThreadPlan` would set the
`ExecutionContext`'s frame to a different frame than what we started
with. When we `PrepareToExecuteJITExpression`, LLDB checks whether the
`ExecutionContext` frame changed from when we initially
`EvaluateExpression`, and if did, bails out with the error above.
One such test-case is attached. This currently passes regardless of the
fix because our ptrauth static initializers code isn't upstream yet. But
the plan is to upstream it soon.
This patch addresses the issue by saving/restoring the frame of the
incoming `ExecutionContext`, if such frame exists. Otherwise, fall back
to using the selected frame.
rdar://147456589
If `CreateConstInBoundsGEP2_32` returns a constant null/gep, the cast to
GetElementPtrInst will fail.
This patch uses two static helpers
`GEPOperator::accumulateConstantOffset/GetElementPtrInst::getIndexedType`
to infer offset and result type instead of depending on the GEP result.
This patch is extracted from
https://github.com/llvm/llvm-project/pull/130734.
This PR fixes incorrect LCSSA PHI node generation when splitting
critical edges with both
`PreserveLCSSA` and `MergeIdenticalEdges` enabled. The bug caused PHI
nodes in the split block
to miss predecessors when multiple identical edges were merged.
SetExitStatus can be called the second time when we reap the debug
server process. This shouldn't be interesting as at that point, we've
already told everyone that the process has exited.
I believe/hope this will also help with sporadic shutdown crashes that
have cropped up recently. They happen because the debug server is
monitored from a detached thread, so this code can be called after main
returns (and starts destroying everything). This isn't a real fix for
that though, as the situation can still happen (it's just that it
usually happens after the exit status has already been set). I think the
real fix for that is to make sure these threads terminate before we
start shutting everything down.
Remove the test in the convolution verifier that checks the input and
output element types of convolution operations conform to the
constraints imposed by the TOSA 1.0 specification.
These checks are too strict for users of the TOSA dialect who wish to
allow more types than those allowed by the spec and provide
compatibility issues with earlier TOSA implementation which allowed more
type combinations.
Users who do wish to constrain the convolution types combination to only
those allowed by the TOSA 1.0 spec should run the TOSA validation pass
which already performs these checks.
Signed-off-by: Jack Frankland <jack.frankland@arm.com>
Pulled out of #133923 - this prevents regressions with SimplifyDemandedVectorEltsForTargetNode exposing VPERMV3(X,M,X) repeated operand patterns which were getting concatenated to wider VPERMV nodes before simpler canonicalizations could clean them up.
clspv uses a better implementation that is not using a bigger side when
not available.
Add a dummy implementation for mul_hi to avoid to override the
implementation of clspv with the one in libclc.
Cleanup work for #133947 - we need to handle VTRUNC nodes with large
source vectors directly to allow us to widen the size of the shuffle
combine
We currently discard these results in combineX86ShufflesRecursively
anyhow as we don't allow inputs from getTargetShuffleInputs to be larger
than the shuffle value type
PR https://github.com/llvm/llvm-project/pull/91400 broke the usage of
HeaderFilterRegex via config file, because it is now created at a
different point in the execution and leads to a different value.
The result of that is that using HeaderFilterRegex only in the config
file does NOT work, in other words clang-tidy stops triggering warnings
on header files, thereby losing a lot of coverage.
This patch reverts the logic so that the header filter is created upon
calling the getHeaderFilter() function.
Additionally, this patch adds 2 unit tests to prevent regressions in the
future:
- One of them, "simple", tests the most basic use case with a single
top-level .clang-tidy file.
- The second one, "inheritance", demonstrates that the subfolder only
gets warnings from headers within it, and not from parent headers.
Fixes#118009Fixes#121969Fixes#133453
Co-authored-by: Carlos Gálvez <carlos.galvez@zenseact.com>
Fix for regression #130917, changes in #111992 were too broad. This change reduces scope of previous fix. Added `ExternalASTSource::wasThisDeclarationADefinition` to detect cases when FunctionDecl lost body due to declaration merges.
CK is using either inline assembly or inline LLVM-IR builtins to
generate buffer_load_dword lds instructions.
This patch exposes this instruction as a Clang builtin available on gfx9 and gfx10.
Related to SWDEV-519702 and SWDEV-518861
In the profitability check for vectorization, the dependency matrix was
not handled correctly. This can result to make a wrong decision: It may
say "this loop can be vectorized" when in fact it cannot. The root cause
of this is that the check process early returns when it finds '=' or 'I'
in the dependency matrix. To make sure that we can actually vectorize
the loop, we need to check all the rows of the matrix. This patch fixes
the process of checking whether we can vectorize the loop or not. Now it
won't make a wrong decision for a loop that cannot be vectorized.
Related: #131130
Some new registers are reused when replacing some old ones in
certain use case of ModuloScheduleExpander. It is necessary to
avoid repeated interval calculations for these registers.
This is an expensive header, only include it where needed. Move some
functions out of line to achieve that.
This reduces time to build clang by ~0.5% in terms of instructions
retired.
This reverts commit 4e40c7c4bd66d98f529a807dbf410dc46444f4ca.
arm64 CI is getting a failure in
lldb-api.tools/lldb-server.TestGdbRemoteRegisterState.py
with this commit, need to investigate and re-land.
Move the parsing of '-' under the check that we parsed a comma.
Unfortunately, this leads to a poor error, but I still have more known
issues in this code and may end up with an overall restructuring and
want to think about wording.
This is more of a semantic check. The diagnostic location to has been
changed to point at the register list start instead of the
closing brace or whatever character might be there instead of a brace
if its malformed.
This adds the following instruction formats from the Xqci Spec:
- QC.EAI
- QC.EI
- QC.EB
- QC.EJ
- QC.ES
The update to the THead test is because the largest number of operands
for a valid instruction has been bumped by this change.
This reverts commit 68fb7a5a1d203dde7badf67031bdd9eb650eef5d. This
relands commit 0cfabd37df9940346f3bf8a4d74c19e1f48a00e9.
The motivation for this patch is that in Statistics.cpp we [check to see
if the module symfile is
loaded](990a086d9d/lldb/source/Target/Statistics.cpp (L353C60-L353C75))
to calculate how much debug info has been loaded. I have an external
utility that only wants to look at the loaded debug info, which isn't
exposed by the SBAPI.
This reapplies 5ffd9bdb50b57 (#133545) with fixes.
The BUILD_SHARED_LIBS=ON build was fixed by adding missing LLVM
dependencies to the InterpTests binary in
unittests/AST/ByteCode/CMakeLists.txt .
This was disabled due to flakiness but I'm currently unable to
reproduce.
I'm nervous the original issue still exists. However, I downgraded the
tripped
assert in 8c18c25b1b22ea710edb40a4f167a6a8bfe6ff9d to a warning since
the same
assert can trigger for illegitimate reasons.
Fixes#64157
debugserver isn't saving and restoring the SVE/SME register state around
inferior function calls.
Making arbitrary function calls while in Streaming SVE mode is generally
a poor idea because a NEON instruction can be hit and crash the
expression execution, which is how I missed this, but they should be
handled correctly if the user knows it is safe to do.
rdar://146886210
KnownBits passed to computeKnownBitsFromRangeMetadata must have the same
bit width as the range metadata bit width. Otherwise the calculated
results will be incorrect.
---------
Signed-off-by: John Lu <John.Lu@amd.com>
In ecc7e6ce4, we tried to inspect the `LambdaScopeInfo` on stack to
recover the instantiating lambda captures. However, there was a mismatch
in how we compared the pattern declarations of lambdas: the constraint
instantiation used a tailored `getPatternFunctionDecl()` which is
localized in SemaLambda that finds the very primal template declaration
of a lambda, while `FunctionDecl::getTemplateInstantiationPattern` finds
the latest template pattern of a lambda. This difference causes issues
when lambdas are nested, as we always want the primary template
declaration.
This corrects that by moving `Sema::addInstantiatedCapturesToScope` from
SemaConcept to SemaLambda, allowing it to use the localized version of
`getPatternFunctionDecl`.
It is also worth exploring to coalesce the implementation of
`getPatternFunctionDecl` with
`FunctionDecl::getTemplateInstantiationPattern`. But I’m leaving that
for the future, as I’d like to backport this fix (ecc7e6ce4 made the
issue more visible in clang 20, sorry!), and changing Sema’s ABI would
not be suitable in that regards. Hence, no release note.
Fixes https://github.com/llvm/llvm-project/issues/133719
When the list is large, using `llvm::is_contained` is of higher
performance than a sequence of comparisons. When the list is small,
the `llvm::is_contained` can be inlined and unrolled, which has the
same effect as using a sequence of comparisons.
And the generated code is more readable.
Add a -v/--version command line argument to print the version of both
the lldb-dap binary and the liblldb it's linked against.
This is motivated by me trying to figure out which lldb-dap I had in my
PATH.
Modules may contain a mix of functions that participate or don't participate in callgraphs covered by a contextual profile. We currently have been importing all the functions under a context root in the module defining that root, but if the other functions there are covered by flat profiles, the result is difficult to reason about.
This patch allows moving everything under a context root (and that root) in its own module. For now, we expect a module with a filename matching the GUID of the function be present in the set of modules known by the linker. This mechanism can be improved in a later patch.
Subsequent patches will handle implementing "move" instead of "import" semantics for the root function (because we want to make sure only one version of the root exists - so the optimizations we perform are actually the ones being observed at runtime).
Summary:
There were some discussions about this being included by default. I need
to fix this up and codify the use of LLVM libc inside of LLVM. For now,
just turn it off unless the user requested the `libc` GPU stuff. This
matches the old behavior.
This is an attempt to fix a test failure from #133794 when running on
windows builds. I suspect we are running into a case where the
[ICF](https://learn.microsoft.com/en-us/cpp/build/reference/opt-optimizations?view=msvc-170)
optimization kicks in and combines the CreateSystemRuntimePlugin*
functions into a single address. This means that we cannot uniquely
unregister the plugin based on its create function address.
The fix is have each create function return a different (bogus) value.