This is replacing #125361
- communicator is mandatory
- new mpi.comm_world
- new mp.comm_split
- lowering and test
---------
Co-authored-by: Sergio Sánchez Ramírez <sergio.sanchez.ramirez+git@bsc.es>
Simplify and fix the logic to clear the old statusline when the terminal
window dimensions have changed. I accidentally broke the terminal
resizing behavior when addressing code review feedback.
I'd really like to figure out a way to test this. PExpect isn't a good
fit for this, because I really need to check the result, rather than the
control characters, as the latter doesn't tell me whether any part of
the old statusline is still visible.
We can safely switch to insert_range here because
SyntheticStmtSourceMap starts out empty in the constructor. Also
TheCFG->synthetic_stmts() comes from DenseMap, so we know that the
keys are unique. That is, operator[] and insert are equivalent in
this particular case.
This correctly rejects imm==0 and prints 1048575 instead of -1.
I've modified the test to only have each hex pattern once with different
check lines before it. This ensures we don't have more invalid messages
printed than we're checking for.
https://reviews.llvm.org/D17938 introduced lowerRelativeReference to
give ConstantExpr sub (A-B) special semantics in ELF: when `A` is an
`unnamed_addr` function, create a PLT-generating relocation. This was
intended for C++ relative vtables, but C++ relative vtable ended up
using DSOLocalEquivalent (lowerDSOLocalEquivalent).
This special treatment of `unnamed_addr` seems unusual.
Let's remove it. Only COFF needs an overload to generate a @IMGREL32
relocation specifier (llvm/test/MC/COFF/cross-section-relative.ll).
Pull Request: https://github.com/llvm/llvm-project/pull/132684
These tests are currently filtered on macOS if your on an M1 (or newer)
device. These tests do work on macOS, for me at least on M1 Max with
macOS 15.3.2 and Xcode 16.2.
Enabling them again, but if we have CI problems with them we can keep
them disabled.
The `locked` variable can be accessed from the asynchronous thread until
the call to f.wait() completes. However, the variable is scoped in a
lexical block that ends before that, leading to a use-after-free.
In 96e5ee2, I inadvertently broke the way non-trivial symbol references
got updated from non-optimized code. The breakage was a consequence of
`getTargetSymbol(MCExpr *)` not returning a symbol when the parameter
was a binary expression. Fix `getTargetSymbol()` to cover such cases.
decodeUImmLog2XLenNonZeroOperand already contains the uimm5 check for
RV32 so we can reuse it. This makes C_SLLI_HINT code more similar to the
tblgen code for C_SLLI.
CUDA defines min/max functions for host in global namespace. HIP header
needs to define them too to be compatible. Currently only min/max(int,
int) is defined. This causes wrong result for arguments that are out of
range for int. This patch defines host min/max functions to be
compatible with CUDA.
Since some HIP apps defined min/max functions by themselves, newly added
min/max function are under the control of macro
`__HIP_DEFINE_EXTENDED_HOST_MIN_MAX__`, which is 0 by default. In the
future, this will change to 1 by
default after most existing HIP apps adopt this change.
Also allows users to define
`__HIP_NO_HOST_MIN_MAX_IN_GLOBAL_NAMESPACE__` to disable host max/min in
global namespace.
min/max functions with mixed signed/unsigned integer parameters are not
defined unless
`__HIP_DEFINE_MIXED_HOST_MIN_MAX__` is defined.
Fixes: SWDEV-446564
The malloc_zone.cpp test currently fails on Darwin hosts, in
SanitizerCommon
tests with lsan enabled.
Need to XFAIL this test to buy time to investigate this failure. Also
we're trying to bring the number of test failing on Darwin bots to 0, to
get clearer signal of any new failures.
rdar://145873843
Co-authored-by: Mariusz Borsa <m_borsa@apple.com>
Vectorizing of fminimumnum and fminimumnum have not support yet. Let's
add the testcase for it now, and we will update the testcase when we
support it.
Drop small size to make vector types match the generic helper
`getMixedValues` in `StaticValueUtils.h`.
This saves some needles vector copies. I didn't find any local variables
that need updating.
getSameOpcode in some cases may consider 2 compares as having same
opcode, even though previously they were considered as alternate. It may
happen, because getSameOpcode looses info about previous instructions
and their states. Need to use isAlternateInstruction function instead
for the correct analysis.
Reviewers: RKSimon, hiraditya
Reviewed By: RKSimon
Pull Request: https://github.com/llvm/llvm-project/pull/133769
This land commits 23aca2f88dd5d2447e69496c89c3ed42a56f9c31 and
1b15a89a23c631a8e2d096dad4afe456970572c0.
https://github.com/llvm/llvm-project/pull/128619 makes symbolizer to
always use debug info when available so we can reland this chagnge.
The pattern
```LLVM
%shl = shl i32 %x, 31
%ashr = ashr i32 %shl, 31
```
would be combined to `G_EXT_INREG %x, 1` by GlobalISel. However
InstCombine normalizes this pattern to:
```LLVM
%and = and i32 %x, 1
%neg = sub i32 0, %and
```
This adds a combiner for this variant as well.
Convert Breakpoint & Watchpoints structs to classes to provide proper
access control. This is in preparation for adopting SBMutex to protect
the underlying SBBreakpoint and SBWatchpoint.
Prior to this PR, we were emitting warnings for Objective-C ivars and
properties if the forward declaration of the type appeared first in a
non-system header. This PR fixes the checker so tha we'd ignore ivars
and properties defined for a forward declared type.
This reverts the revert commit 616f447fc84bdc7655117f1b303d895dc3b93e4d.
It includes updates to remaining users in Polly and Clang, to avoid
failures when building those projects.
Although combining -fveclib=ArmPL with -nostdlib is a rare situation, it
should still be supported correctly and should effect in avoidance of
linking against libm.
Reverts llvm/llvm-project#133302
Reverting to inspect build failures that were introduced from use of the
`clang::Preprocessor` in unit testing, as well as, the warning about an
unused declaration. See linked issue for failures.
This patch improves LLDB launch time on Linux machines for **preload
scenarios**, particularly for executables with a lot of shared library
dependencies (or modules). Specifically:
* Launching a binary with `target.preload-symbols = true`
* Attaching to a process with `target.preload-symbols = true`.
It's completely controlled by a new flag added in the first commit
`plugin.dynamic-loader.posix-dyld.parallel-module-load`, which *defaults
to false*. This was inspired by similar work on Darwin #110646.
Some rough numbers to showcase perf improvement, run on a very beefy
machine:
* Executable with ~5600 modules: baseline 45s, improvement 15s
* Executable with ~3800 modules: baseline 25s, improvement 10s
* Executable with ~6650 modules: baseline 67s, improvement 20s
* Executable with ~12500 modules: baseline 185s, improvement 85s
* Executable with ~14700 modules: baseline 235s, improvement 120s
A lot of targets we deal with have a *ton* of modules, and unfortunately
we're unable to convince other folks to reduce the number of modules, so
performance improvements like this can be very impactful for user
experience.
This patch achieves the performance improvement by parallelizing
`DynamicLoaderPOSIXDYLD::RefreshModules` for the launch scenario, and
`DynamicLoaderPOSIXDYLD::LoadAllCurrentModules` for the attach scenario.
The commits have some context on their specific changes as well --
hopefully this helps the review.
# More context on implementation
We discovered the bottlenecks by via `perf record -g -p <lldb's pid>` on
a Linux machine. With an executable known to have 1000s of shared
library dependencies, I ran
```
(lldb) b main
(lldb) r
# taking a while
```
and showed the resulting perf trace (snippet shown)
```
Samples: 85K of event 'cycles:P', Event count (approx.): 54615855812
Children Self Command Shared Object Symbol
- 93.54% 0.00% intern-state libc.so.6 [.] clone3
clone3
start_thread
lldb_private::HostNativeThreadBase::ThreadCreateTrampoline(void*) r
std::_Function_handler<void* (), lldb_private::Process::StartPrivateStateThread(bool)::$_0>::_M_invoke(std::_Any_data const&)
lldb_private::Process::RunPrivateStateThread(bool) n
- lldb_private::Process::HandlePrivateEvent(std::shared_ptr<lldb_private::Event>&)
- 93.54% lldb_private::Process::ShouldBroadcastEvent(lldb_private::Event*)
- 93.54% lldb_private::ThreadList::ShouldStop(lldb_private::Event*)
- lldb_private::Thread::ShouldStop(lldb_private::Event*) *
- 93.53% lldb_private::StopInfoBreakpoint::ShouldStopSynchronous(lldb_private::Event*) t
- 93.52% lldb_private::BreakpointSite::ShouldStop(lldb_private::StoppointCallbackContext*) i
lldb_private::BreakpointLocationCollection::ShouldStop(lldb_private::StoppointCallbackContext*) k
lldb_private::BreakpointLocation::ShouldStop(lldb_private::StoppointCallbackContext*) b
lldb_private::BreakpointOptions::InvokeCallback(lldb_private::StoppointCallbackContext*, unsigned long, unsigned long) i
DynamicLoaderPOSIXDYLD::RendezvousBreakpointHit(void*, lldb_private::StoppointCallbackContext*, unsigned long, unsigned lo
- DynamicLoaderPOSIXDYLD::RefreshModules() O
- 93.42% DynamicLoaderPOSIXDYLD::RefreshModules()::$_0::operator()(DYLDRendezvous::SOEntry const&) const u
- 93.40% DynamicLoaderPOSIXDYLD::LoadModuleAtAddress(lldb_private::FileSpec const&, unsigned long, unsigned long, bools
- lldb_private::DynamicLoader::LoadModuleAtAddress(lldb_private::FileSpec const&, unsigned long, unsigned long, boos
- 83.90% lldb_private::DynamicLoader::FindModuleViaTarget(lldb_private::FileSpec const&) o
- 83.01% lldb_private::Target::GetOrCreateModule(lldb_private::ModuleSpec const&, bool, lldb_private::Status*
- 77.89% lldb_private::Module::PreloadSymbols()
- 44.06% lldb_private::Symtab::PreloadSymbols()
- 43.66% lldb_private::Symtab::InitNameIndexes()
...
```
We saw that majority of time was spent in `RefreshModules`, with the
main culprit within it `LoadModuleAtAddress` which eventually calls
`PreloadSymbols`.
At first, `DynamicLoaderPOSIXDYLD::LoadModuleAtAddress` appears fairly
independent -- most of it deals with different files and then getting or
creating Modules from these files. The portions that aren't independent
seem to deal with ModuleLists, which appear concurrency safe. There were
members of `DynamicLoaderPOSIXDYLD` I had to synchronize though: namely
`m_loaded_modules` which `DynamicLoaderPOSIXDYLD` maintains to map its
loaded modules to their link addresses. Without synchronizing this, I
ran into SEGFAULTS and other issues when running `check-lldb`. I also
locked the assignment and comparison of `m_interpreter_module`, which
may be unnecessary.
# Alternate implementations
When creating this patch, another implementation I considered was
directly background-ing the call to `Module::PreloadSymbol` in
`Target::GetOrCreateModule`. It would have the added benefit of working
across platforms generically, and appeared to be concurrency safe. It
was done via `Debugger::GetThreadPool().async` directly. However, there
were a ton of concurrency issues, so I abandoned that approach for now.
# Testing
With the feature active, I tested via `ninja check-lldb` on both Debug
and Release builds several times (~5 or 6 altogether?), and didn't spot
additional failing or flaky tests.
I also tested manually on several different binaries, some with around
14000 modules, but just basic operations: launching, reaching main,
setting breakpoint, stepping, showing some backtraces.
I've also tested with the flag off just to make sure things behave
properly synchronously.
If a recipe was predicated and tail folded at the same time, it will
have a mask like
EMIT vp<%header-mask> = icmp ule canonical-iv, backedge-tc
EMIT vp<%mask> = logical-and vp<%header-mask>, vp<%pred-mask>
When converting to an EVL recipe, if the mask isn't exactly just the
header-mask we copy the whole logical-and.
We can remove this redundant logical-and (because it's now covered by
EVL) and just use vp<%pred-mask> instead.
This lets us remove the widened canonical IV in more places.