In the profitability check for vectorization, the dependency matrix was
not handled correctly. This can result to make a wrong decision: It may
say "this loop can be vectorized" when in fact it cannot. The root cause
of this is that the check process early returns when it finds '=' or 'I'
in the dependency matrix. To make sure that we can actually vectorize
the loop, we need to check all the rows of the matrix. This patch fixes
the process of checking whether we can vectorize the loop or not. Now it
won't make a wrong decision for a loop that cannot be vectorized.
Related: #131130
Some new registers are reused when replacing some old ones in
certain use case of ModuloScheduleExpander. It is necessary to
avoid repeated interval calculations for these registers.
This is an expensive header, only include it where needed. Move some
functions out of line to achieve that.
This reduces time to build clang by ~0.5% in terms of instructions
retired.
This reverts commit 4e40c7c4bd66d98f529a807dbf410dc46444f4ca.
arm64 CI is getting a failure in
lldb-api.tools/lldb-server.TestGdbRemoteRegisterState.py
with this commit, need to investigate and re-land.
Move the parsing of '-' under the check that we parsed a comma.
Unfortunately, this leads to a poor error, but I still have more known
issues in this code and may end up with an overall restructuring and
want to think about wording.
This is more of a semantic check. The diagnostic location to has been
changed to point at the register list start instead of the
closing brace or whatever character might be there instead of a brace
if its malformed.
This adds the following instruction formats from the Xqci Spec:
- QC.EAI
- QC.EI
- QC.EB
- QC.EJ
- QC.ES
The update to the THead test is because the largest number of operands
for a valid instruction has been bumped by this change.
This reverts commit 68fb7a5a1d203dde7badf67031bdd9eb650eef5d. This
relands commit 0cfabd37df9940346f3bf8a4d74c19e1f48a00e9.
The motivation for this patch is that in Statistics.cpp we [check to see
if the module symfile is
loaded](990a086d9d/lldb/source/Target/Statistics.cpp (L353C60-L353C75))
to calculate how much debug info has been loaded. I have an external
utility that only wants to look at the loaded debug info, which isn't
exposed by the SBAPI.
This reapplies 5ffd9bdb50b57 (#133545) with fixes.
The BUILD_SHARED_LIBS=ON build was fixed by adding missing LLVM
dependencies to the InterpTests binary in
unittests/AST/ByteCode/CMakeLists.txt .
This was disabled due to flakiness but I'm currently unable to
reproduce.
I'm nervous the original issue still exists. However, I downgraded the
tripped
assert in 8c18c25b1b22ea710edb40a4f167a6a8bfe6ff9d to a warning since
the same
assert can trigger for illegitimate reasons.
Fixes#64157
debugserver isn't saving and restoring the SVE/SME register state around
inferior function calls.
Making arbitrary function calls while in Streaming SVE mode is generally
a poor idea because a NEON instruction can be hit and crash the
expression execution, which is how I missed this, but they should be
handled correctly if the user knows it is safe to do.
rdar://146886210
KnownBits passed to computeKnownBitsFromRangeMetadata must have the same
bit width as the range metadata bit width. Otherwise the calculated
results will be incorrect.
---------
Signed-off-by: John Lu <John.Lu@amd.com>
In ecc7e6ce4, we tried to inspect the `LambdaScopeInfo` on stack to
recover the instantiating lambda captures. However, there was a mismatch
in how we compared the pattern declarations of lambdas: the constraint
instantiation used a tailored `getPatternFunctionDecl()` which is
localized in SemaLambda that finds the very primal template declaration
of a lambda, while `FunctionDecl::getTemplateInstantiationPattern` finds
the latest template pattern of a lambda. This difference causes issues
when lambdas are nested, as we always want the primary template
declaration.
This corrects that by moving `Sema::addInstantiatedCapturesToScope` from
SemaConcept to SemaLambda, allowing it to use the localized version of
`getPatternFunctionDecl`.
It is also worth exploring to coalesce the implementation of
`getPatternFunctionDecl` with
`FunctionDecl::getTemplateInstantiationPattern`. But I’m leaving that
for the future, as I’d like to backport this fix (ecc7e6ce4 made the
issue more visible in clang 20, sorry!), and changing Sema’s ABI would
not be suitable in that regards. Hence, no release note.
Fixes https://github.com/llvm/llvm-project/issues/133719
When the list is large, using `llvm::is_contained` is of higher
performance than a sequence of comparisons. When the list is small,
the `llvm::is_contained` can be inlined and unrolled, which has the
same effect as using a sequence of comparisons.
And the generated code is more readable.
Add a -v/--version command line argument to print the version of both
the lldb-dap binary and the liblldb it's linked against.
This is motivated by me trying to figure out which lldb-dap I had in my
PATH.
Modules may contain a mix of functions that participate or don't participate in callgraphs covered by a contextual profile. We currently have been importing all the functions under a context root in the module defining that root, but if the other functions there are covered by flat profiles, the result is difficult to reason about.
This patch allows moving everything under a context root (and that root) in its own module. For now, we expect a module with a filename matching the GUID of the function be present in the set of modules known by the linker. This mechanism can be improved in a later patch.
Subsequent patches will handle implementing "move" instead of "import" semantics for the root function (because we want to make sure only one version of the root exists - so the optimizations we perform are actually the ones being observed at runtime).
Summary:
There were some discussions about this being included by default. I need
to fix this up and codify the use of LLVM libc inside of LLVM. For now,
just turn it off unless the user requested the `libc` GPU stuff. This
matches the old behavior.
This is an attempt to fix a test failure from #133794 when running on
windows builds. I suspect we are running into a case where the
[ICF](https://learn.microsoft.com/en-us/cpp/build/reference/opt-optimizations?view=msvc-170)
optimization kicks in and combines the CreateSystemRuntimePlugin*
functions into a single address. This means that we cannot uniquely
unregister the plugin based on its create function address.
The fix is have each create function return a different (bogus) value.
llc attempts to create an empty file in the current directory, but it
can't do that on a read-only file system. Send that empty-output to
stdout, which prevents this failure.
We should point to the start of the reglist not the closing parenthesis.
I also moved the check after we finishing parsing the closing brace.
The diagnostic mentions '{ra, s0-s10} or {x1, x8-x9, x18-x26}' so we
should be sure that's what we parsed.
`llvm::convertUTF16ToUTF8String` opens with an assertion that the output
container is empty:
3bdf9a0880/llvm/lib/Support/ConvertUTFWrapper.cpp (L83-L84)
It's not clear to me why this function requires the output container to
be empty instead of just overwriting it, but the callsite in
`TargetThreadWindows::GetName` may reuse the container without clearing
it out first, resulting in an assertion failure:
```
# Child-SP RetAddr Call Site
00 000000d2`44b8ea48 00007ff8`beefc12e ntdll!NtTerminateProcess+0x14
01 000000d2`44b8ea50 00007ff8`bcf518ab ntdll!RtlExitUserProcess+0x11e
02 000000d2`44b8ea80 00007ff8`bc0e0143 KERNEL32!ExitProcessImplementation+0xb
03 000000d2`44b8eab0 00007ff8`bc0e4c49 ucrtbase!common_exit+0xc7
04 000000d2`44b8eb10 00007ff8`bc102ae6 ucrtbase!abort+0x69
05 000000d2`44b8eb40 00007ff8`bc102cc1 ucrtbase!common_assert_to_stderr<wchar_t>+0x6e
06 000000d2`44b8eb80 00007fff`b8e27a80 ucrtbase!wassert+0x71
07 000000d2`44b8ebb0 00007fff`b8b821e1 liblldb!llvm::convertUTF16ToUTF8String+0x30 [D:\r\_work\swift-build\swift-build\SourceCache\llvm-project\llvm\lib\Support\ConvertUTFWrapper.cpp @ 88]
08 000000d2`44b8ec30 00007fff`b83e9aa2 liblldb!lldb_private::TargetThreadWindows::GetName+0x1b1 [D:\r\_work\swift-build\swift-build\SourceCache\llvm-project\lldb\source\Plugins\Process\Windows\Common\TargetThreadWindows.cpp @ 198]
09 000000d2`44b8eca0 00007ff7`2a3c3c14 liblldb!lldb::SBThread::GetName+0x102 [D:\r\_work\swift-build\swift-build\SourceCache\llvm-project\lldb\source\API\SBThread.cpp @ 432]
0a 000000d2`44b8ed70 00007ff7`2a3a5ac6 lldb_dap!lldb_dap::CreateThread+0x1f4 [S:\SourceCache\llvm-project\lldb\tools\lldb-dap\JSONUtils.cpp @ 877]
0b 000000d2`44b8ef10 00007ff7`2a3b0ab5 lldb_dap!`anonymous namespace'::request_threads+0xa6 [S:\SourceCache\llvm-project\lldb\tools\lldb-dap\lldb-dap.cpp @ 3906]
0c 000000d2`44b8f010 00007ff7`2a3b0fe8 lldb_dap!lldb_dap::DAP::HandleObject+0x1c5 [S:\SourceCache\llvm-project\lldb\tools\lldb-dap\DAP.cpp @ 796]
0d 000000d2`44b8f130 00007ff7`2a3a8b96 lldb_dap!lldb_dap::DAP::Loop+0x78 [S:\SourceCache\llvm-project\lldb\tools\lldb-dap\DAP.cpp @ 812]
0e 000000d2`44b8f1d0 00007ff7`2a4b5fbc lldb_dap!main+0x1096 [S:\SourceCache\llvm-project\lldb\tools\lldb-dap\lldb-dap.cpp @ 5319]
0f (Inline Function) --------`-------- lldb_dap!invoke_main+0x22 [D:\a\_work\1\s\src\vctools\crt\vcstartup\src\startup\exe_common.inl @ 78]
10 000000d2`44b8fb80 00007ff8`bcf3e8d7 lldb_dap!__scrt_common_main_seh+0x10c [D:\a\_work\1\s\src\vctools\crt\vcstartup\src\startup\exe_common.inl @ 288]
11 000000d2`44b8fbc0 00007ff8`beefbf6c KERNEL32!BaseThreadInitThunk+0x17
12 000000d2`44b8fbf0 00000000`00000000 ntdll!RtlUserThreadStart+0x2c
```
This stack trace was captured from the lldb distributed in the Swift
toolchain. The issue is easy to reproduce by resuming from a breakpoint
twice in VS Code.
I've verified that clearing out the container here fixes the assertion
failure.
A previous checkin used a workaround to generate the nsw flag where
needed for unary ops. This change upstreams a subsequent change that was
made in the incubator to generate the flag correctly.
Before #134048, TestDAP_Progress relied on wait_for_event to block until
the progressEnd came in. However, progress events were not added to the
packet list, so this call would always time out. This PR makes it so
that packets are added to the packet list, and you can block on them.
Cmake automatically generates dependency files with all compilation
options provided by users. When users use `--offload-compress` for HIP
compilation, it causes warnings when cmake generates dependency files.
Claim this option to suppress warnings.
Change `objc tagged-pointer info` to call
`OptionArgParser::ToRawAddress`.
Previously `ToAddress` was used, but it calls `FixCodeAddress`, which
can erroneously mutate the bits of a tagged pointer.
TR11 introduced changes to support target memory management in a unified
way by defining a series of API routines and additional traits. Host
runtime is oblivious to how actual memory resources are mapped when
using the new API routines, so it can only support how the composed
memory space is maintained, and the offload backend must handle which
memory resources are actually used to allocate memory from the memory
space.
Here is summary of the implementation.
* Implemented 12 API routines to get/mainpulate memory space/allocator.
* Memory space composed with a list of devices has a state with resource
description, and runtime is responsible for maintaining the allocated
memory space objects.
* Defined interface with offload runtime to access memory resource list,
and to redirect calls to omp_alloc/omp_free since it requires
backend-specific information.
* Value of omp_default_mem_space changed from 0 to 99, and
omp_null_mem_space took the value 0 as defined in the language.
* New allocator traits were introduced, but how to use them is up to the
offload backend.
* Added basic tests for the new API routines.
With -fpatchable-function-entry (or the patchable_function_entry
function attribute), we emit records of patchable entry locations to the
__patchable_function_entries section. Add an additional parameter to the
command line option that allows one to specify a different default
section name for the records, and an identical parameter to the function
attribute that allows one to override the section used.
The main use case for this change is the Linux kernel using prefix NOPs
for ftrace, and thus depending on__patchable_function_entries to locate
traceable functions. Functions that are not traceable currently disable
entry NOPs using the function attribute, but this creates a
compatibility issue with -fsanitize=kcfi, which expects all indirectly
callable functions to have a type hash prefix at the same offset from
the function entry.
Adding a section parameter would allow the kernel to distinguish between
traceable and non-traceable functions by adding entry records to
separate sections while maintaining a stable function prefix layout for
all functions. LKML discussion:
https://lore.kernel.org/lkml/Y1QEzk%2FA41PKLEPe@hirez.programming.kicks-ass.net/
The `reassoc` fast-math flag allows a much wider array of algebraic
transformations than just strictly reassociations. In some cases it does
commutations, distributions, and folds away redundant inverse
operations...
While it might make sense to fix the flag naming at some point, in the
meantime we should at least have the docs be accurate to avoid
confusion.
We currently do not have masked vectorization support for tenor.pad with
low padding. However, we can allow this in the special case where the
result dimension after padding is a unit dim. The reason is when we
actually have a low pad on a unit dim, the input size of that dimension
will be (or should be for correct IR) dynamically zero and hence we will
create a zero mask which is correct. If the low pad is dynamically zero
then the lowering is correct as well.
---------
Signed-off-by: Nirvedh <nirvedh@gmail.com>
Update both VPInterleaveRecipe and VPReplicateRecipe codegen to use
debug location directly from the recipe, not the underlying instruction.
This removes another dependency on underlying instructions.