Summary:
This patch removes much of the `llvmlibc_rpc_server` interface. This
pretty much deletes all of this code and just replaces it with including
`rpc.h` directly. We still maintain the file to let `libc` handle the
opcodes, since those depend on the `printf` impelmentation.
This will need to be cleaned up more, but I don't want to put too much
into a single patch.
When a binary has multiple text segments, the Size is computed as the
difference of the last address of these segments from the BaseAddress.
The base addresses of all text segments must be the same.
Introduces flag 'perf-script-events' for testing. It allows passing perf events
without BOLT having to parse them using 'perf script'. The flag is used to
pass a mock perf profile that has two memory mappings for a mock binary
that has two text segments. The size of the mapping is updated as this
change `parseMMapEvents` processes all text segments.
* Enables conversion of several select-like instructions within one
group
* Any number of auxiliary instructions depending on the same condition
can be in between select-like instructions
* After splitting the basic block, move select-like instructions into
the relevant basic blocks and optimise them
* Make it easier to add support shift-base select-like instructions and
also any mixture of zext/sext/not instructions
This generated comments like:
// 'BoolReg' class
case MCK_BoolReg: {
which seem redundant because the name is always repeated on the next
line as part of the MCK_ enumerator.
The attached test case from
https://github.com/llvm/llvm-project/issues/117294 used to cause an
assertion because we called classifPrim() on an array type.
The new result doesn't crash but isn't exactly perfect either. Since the
problem arises when evaluating an ImplicitValueInitExpr, we have no
proper source location to point to. Point to the caller instead.
This PR adds support seq_cst (sequential consistency) clause for the
flush directive in OpenMP. The seq_cst clause enforces a stricter memory
ordering, ensuring that all threads observe the memory effects of the
flush in the same order, improving consistency in memory operations
across threads.
---------
Co-authored-by: Shashwathi N <nshashwa@pe28vega.hpc.amslabs.hpecorp.net>
Co-authored-by: CHANDRA GHALE <chandra.nitdgp@gmail.com>
This is split off from #115274. There doesn't seem to be an easy way to
share this with getShuffleCost since that requires passing in a real
insert_element operand to get it to recognise it's a scalar splat.
For i1 vectors we can't currently lower them so it returns an invalid
cost.
---------
Co-authored-by: Shih-Po Hung <shihpo.hung@sifive.com>
For IR like this:
%icmp = icmp ult <4 x i32> %a, splat (i32 5)
%res = extractelement <4 x i1> %icmp, i32 1
where there is only one use of %icmp we can take a similar approach
to what we already do for binary ops such add, sub, etc. and convert
this into
%ext = extractelement <4 x i32> %a, i32 1
%res = icmp ult i32 %ext, 5
For AArch64 targets at least the scalar boolean result will almost
certainly need to be in a GPR anyway, since it will probably be
used by branches for control flow. I've tried to reuse existing code
in scalarizeExtractedBinop to also work for setcc.
NOTE: The optimisations don't apply for tests such as
extract_icmp_v4i32_splat_rhs in the file
CodeGen/AArch64/extract-vector-cmp.ll
because scalarizeExtractedBinOp only works if one of the input
operands is a constant.
We currently support simple reassociation for foldAndOrOfICmps().
Support the same for foldLogicOfFCmps() by going through the common
foldBooleanAndOr() helper.
This will also resolve the regression on #112704, which is also due to
missing reassoc support.
I had to adjust one fold to add support for FMF flag preservation,
otherwise there would be test regressions. There is a separate fold
(reassociateFCmps) handling reassociation for *just* that specific case
and it preserves FMF. Unfortunately it's not rendered entirely redundant
by this patch, because it handles one more level of reassociation as
well.
Remain InheritableAttr to avoid the warning `TypePrinter.cpp:1953:10:
warning: enumeration value ‘TargetVersion’ not handled in switch`
origin messenge
[TargetVersion] Only enable on RISC-V and AArch64 (#115991) Address
#115000.
This patch constrains the target_version feature to work only on RISC-V
and AArch64 to prevent crashes in Clang.
Co-authored-by: Aaron Ballman <aaron@aaronballman.com>
This PR extracts NFC changes out of
https://github.com/llvm/llvm-project/pull/116035 to reap as many of the
same benefits without any of the semantic changes.
More concretely, moving `LLVMStructType` to ODS has the benefits of
being able to generate much of the required boilerplate, such as
interface definitions, documentation and more, automatically.
Furthermore, `LLVMStructType` is then treated less special and its
definition can be found at the same place where all other complex type
definitions are found in the LLVM dialect.
Future changes could leverage more automatically generated code from
TableGen such as `assemblyFormat`. As these are not as trivial, they
have been left for future PRs.
---------
Co-authored-by: Tobias Gysi <tobias.gysi@nextsilicon.com>
We currently list Evan Cheng as the fallback maintainer for the LLVM
backend. However, their last contribution dates back to 2014.
I'd like to nominate arsenm instead, who is our most active backend
reviewer.
In #108907, the index classes started filtering the DIEs according to
the full type query (instead of just the base name). This means that the
checks in SymbolFileDWARF are now redundant.
I've also moved the non-redundant checks so that now all checking is
done in the DWARFIndex class and the caller can expect to get the final
filtered list of types.
The R_ARM_SBREL32 relocation is used in debug info for ARM RWPI
(read-write position independent) code. Compiler-generated DWARF info
will use an expression to add the relocated value to the actual value of
the static base (held in r9) at run-time, so it should be relocated as
if the static base is at address 0.
Create signed constant using getSignedConstant(), to avoid future
assertion failures when we disable implicit truncation in getConstant().
This also touches some generic legalization code, which apparently only
AMDGPU tests.
This will avoid assertion failures once we disable implicit truncation
in getConstant().
Inside adjustSubwordCmp() I ended up suppressing the issue with an
explicit cast, because this code deals with a mix of unsigned and signed
immediates.
Substituting into pack indexing types/expressions can still result in
unexpanded types/expressions, such as `PackIndexingType` or
`PackIndexingExpr`. To handle these cases correctly, we should defer the
pack size checks to the next round of transformation, when the patterns
can be fully expanded.
To that end, the `FullySubstituted` flag is now necessary for computing
the dependencies of `PackIndexingExprs`. Conveniently, this flag can
also represent the prior `ExpandsToEmpty` status with an additional
emptiness check. Therefore, I converted all stored flags to use
`FullySubstituted`.
Fixes https://github.com/llvm/llvm-project/issues/116105
This is a follow up of #96878 to support hoisting load/store from BBs
have the same predecessor, if load/store are the only instructions and
the branch is unpredictable, e.g.:
```
void test (int a, int *c, int *d) {
if (a)
*c = a;
else
*d = a;
}
```
CallStackRadixTreeBuilder::build takes the parameter
MemProfFrameIndexes by value, involving copies:
std::optional<const llvm::DenseMap<FrameIdTy, LinearFrameId>>
MemProfFrameIndexes
Then "build" makes another copy of MemProfFrameIndexe and passes it to
encodeCallStack for every call stack, which is painfully slow.
This patch changes the type to a pointer so that we don't have to make
a copy every time we pass the argument.
Without this patch, it takes 553 seconds to run "llvm-profdata merge"
on a large MemProf raw profile. This patch shortenes that down to 67
seconds.
This patch adds a quick validity check to
InstrProfWriter::addMemProfData. Specifically, we check to see if we
have all (or none) of the MemProf profile components (frames, call
stacks, records).
The credit goes to Teresa Johnson for suggesting this assert.
For the RISC-V target, V14_V15 are not subregisters of v14m4, even
though they share some registers. Currently, the MachineVerifier reports
an error when checking register liveness for segment load/store
operations.
This patch adds additional register liveness checking, using RegUnit
instead of subregisters, to prevent this error.
This patch moves the `areInlineCompatible` implementation from multiple
subclasses (`AArch64TTIImpl`, `RISCVTTIImpl`, `WebAssemblyTTIImpl`) to
the base class `BasicTTIImpl`. The new implementation checks whether the
callee's target features are a subset of the caller's, enabling
consistent behavior across targets. Subclasses now simply delegate to
the base implementation, reducing code duplication and improving
maintainability.
This commit adds an extra assertion to `applySignatureConversion` to
prevent incorrect API usage: The same block cannot be converted multiple
times. That would mess with the underlying conversion value mapping.
(Mappings would be overwritten.) This is similar to op replacements: The
same op cannot be replaced multiple times.
To simplify the check, `BlockTypeConversionRewrite::block` now stores
the original block. The new block is stored in an extra field. (It used
to be the other way around.)
This commit is in preparation of adding 1:N support to the conversion
value mapping. Before making any further changes to the mapping
infrastructure, I'd like to make sure that the code base around it (that
uses the mapping) is robust.
Building vp intrinsic functions using a unified interface for
expandPredicationToIntCall/expandPredicationToFPCall/expandPredicationToCastIntrinsic
functions.
We should check the zstd decompress result before doing the msan
unpoison. If the res is abnormal, then it would be a huge number, which
will cause undesired msan unpoison behavior and will run for a long
time.