Bolt currently links and initializes all LLVM targets. This
substantially increases the binary size, and link time if LTO is used.
Instead, only link the targets specified by BOLT_TARGETS_TO_BUILD. We
also have to only initialize those targets, so generate a
TargetConfig.def file with the necessary information. The way the
initialization is done mirrors what llvm-exegesis does.
This reduces llvm-bolt size from 137MB to 78MB for me.
Silence compiler warning introduced in #125007 - assign the address delta to int64_t, assert it is negative and negate it only as part of the mergeSPAdd call
Fixes#125825
Some loops cannot be unrolled by affine-loop-unroll pass. After running
lower-affine pass, they can be unrolled in scf.To enable conversion of
vector Ops in scf to llvm dialect, unroll-full option was added.
---------
Co-authored-by: Oleksandr "Alex" Zinenko <ftynse@gmail.com>
#121386 Introduced cttz intrinsics which caused a regression where
vscale/vscale divisions could no longer be constant folded.
This fold was suggested as a fix in #126411.
https://alive2.llvm.org/ce/z/gWbtPw
Shift amounts in SelectionDAG don't have to match the result type
of the shift. SelectionDAGBuilder will aggressively truncate shift
amounts to the target's preferred type. This may result in a zero extend
that existed in IR being removed.
If we look through a truncate here, we can't guarantee the upper bits
of the truncate input are zero. There may have been a zext that was
removed. Unfortunately, this regresses tests where no truncate was
involved. The only way I can think to fix this is to add an assertzext
when SelectionDAGBuilder truncates a shift amount or remove the
early truncation of shift amounts from SelectionDAGBuilder all together.
Fixes#126889.
This is NFC because it currently only matters for cases that are not
isMoveImmediate, and we do not yet implement any of those. This just
moves the implementation of foldImmediate to use the common interface,
similar to how x86 does it.
The llvm.readcyclecounter intrinsic can be implemented via the `rdhwr
$2, $hwr_cc` instruction.
$hwr_cc: High-resolution cycle counter. This register provides read
access to the coprocessor 0 Count Register.
Fix#106318.
Previously this would give up on folding subregister copies through
a reg_sequence if the input operand already had a subregister index.
d246cc618adc52fdbd69d44a2a375c8af97b6106 stopped introducing these
subregister uses, and this is the first step to lifting that restriction.
I was expecting to be able to implement this only purely with compose /
reverse compose, but I wasn't able to make it work so relies on testing
the lanemasks for whether the copy reads a subset of the input.
On Windows we end up in assembly. Not sure if the thread plans behave
differently or this is a debug info issue. I have no environment to
reproduce and investigate this in, so I'm disabling the test for now.
This PR fixes LLDB stepping out, rather than stepping through a C++
thunk. The implementation is based on, and upstreams, the support for
runtime thunks in the Swift fork.
Fixes#43413
This PR improves the set of rules for type inference by ensuring that a
correct pointer type is deduced from the Value argument of OpAtomic*
instructions, also when a pointer argument is coming from an `inttoptr
.. to` instruction that caused problems earlier. Existing test cases are
updated accordingly. This fixes
https://github.com/llvm/llvm-project/issues/127491
This patch adds support for template arguments of
`clang::TemplateArgument::ArgKind::StructuralValue` kind (added in
https://github.com/llvm/llvm-project/pull/78041). These are used for
non-type template parameters such as floating point constants. When LLDB
created `clang::NonTypeTemplateParmDecl`s, it previously assumed
integral values, this patch accounts for structural values too.
Anywhere LLDB assumed a `DW_TAG_template_value_parameter` was
`Integral`, it will now also check for `StructuralValue`, and will
unpack the `TemplateArgument` value and type accordingly.
We can rely on the fact that any `TemplateArgument` of `StructuralValue`
kind that the `DWARFASTParserClang` creates will have a valid value,
because it gets those from `DW_AT_const_value`.
Summary:
These helpers are very useful but currently absent. They allow the user
to get a bitmask representing the matches within the warp. I have made
an executive decision to drop the `predicate` return from `match_all`
because it's easily testable with `match_all() == __activemask()`.
There has been discussion around this a few times already, and there
seemed to be consensus that we would never pursue an implementation of
the Networking TS. This patch solidifies that discussion by documenting
it and closing issues related to the Networking TS.
Closes#103799Closes#100223Closes#100228Closes#100231Closes#100232
The layout and the size of an ObjC interface can change after its
corresponding implementation is parsed when synthesized ivars or ivars
declared in categories are added to the interface's list of ivars. This
can cause clang to mis-compile if the optimization that emits fixed
offsets for ivars (see 923ddf65f4e21ec67018cf56e823895de18d83bc) uses an
ObjC class layout that is outdated and no longer reflects the current
state of the class.
For example, when compiling `constant-non-fragile-ivar-offset.m`, clang
emits 20 instead of 24 as the offset for `IntermediateClass2Property` as
the class layout for `SuperClass2`, which is created when the
implementation of IntermediateClass3 is parsed, is outdated when the
implementation of `IntermediateClass2` is parsed.
This commit invalidates the stale layout information of the class and
its subclasses if new ivars are added to the interface.
With this change, we can also stop using ObjC implementation decls as
the key to retrieve ObjC class layouts information as the layout
retrieved using the ObjC interface as the key will always be up to date.
rdar://139531391
This pull request fixes an issue that was introduced in #93365.
`__llvm_write_custom_profile` visibility was causing issues on Darwin.
This function needs to be publicly accessible in order to be accessed by
libomptarget, so this pull request makes `__llvm_write_custom_profile`
an explicitly exported symbol on Darwin. Tested on M3 and X86 macs.
The vast majority of `SyntheticChildrenFrontEnd` subclasses provide
children, and as such implement `MightHaveChildren` with a constant
value of `true`. This change makes `true` the default value. With this
change, `MightHaveChildren` only needs to be implemented by synthetic
providers that can return `false`, which is only 3 subclasses.
apple-a18 is an alias of apple-m4.
apple-s6/s7/s8 are aliases of apple-a13.
apple-s9/s10 are aliases of apple-a16.
As with some other aliases today, this reflects identical ISA feature
support, but not necessarily identical microarchitectures and
performance characteristics.
This is effectively a workaround for a bug in livedebugvalues, but seems
to potentially be a general improvement, as BB sections seems like it
could ruin the special 256-byte prelude scheme that
amdgpu-preload-kern-arg-prolog requires anyway. Moving it even later
doesn't seem to have any material impact, and just adds livedebugvalues
to the list of things which no longer have to deal with pseudo
multiple-entry functions.
AMDGPU debug-info isn't supported upstream yet, so the bug being avoided
isn't testable here. I am posting the patch upstream to avoid an
unnecessary diff with AMD's fork.
The verifier ensures function !dbg metadata is unique across the module,
so ensure the old nameless function we leave behind doesn't violate
this invariant.
Removing the function via e.g. eraseFromParent seems like a better
option, but doesn't seem to be legal from a FunctionPass.