In current implementation, the instruction to be sunk will be inserted before the target instruction without considering the def-use tree,
which may case Instruction does not dominate all uses error. We need to choose a suitable location to insert according to the use chain
Reviewed By: dmgreen
Differential Revision: https://reviews.llvm.org/D107262
This reapplies 54a61c94f93, its follow up in 547b712500e, which were
reverted 95fe61e63954. Original commit message:
VarLoc based LiveDebugValues will abandon variable location propagation if
there are too many blocks and variable assignments in the function. If it
didn't, and we had (say) 1000 blocks and 1000 variables in scope, we'd end
up with 1 million DBG_VALUEs just at the start of blocks.
Instruction-referencing LiveDebugValues should honour this limitation too
(because the same limitation applies to it). Hoist the relevant command
line options into LiveDebugValues.cpp and pass it down into the
implementation classes as an argument to ExtendRanges. I've duplicated all
the run-lines in live-debug-values-cutoffs.mir to have an
instruction-referencing flavour.
Differential Revision: https://reviews.llvm.org/D107823
Basic block pointer is dereferenced unconditionally for MBBs with
hasAddressTaken property.
MBBs might have hasAddressTaken property without reference to BB.
Backend developers must assign fake BB to MBB to workaround this issue
and it should be fixed.
Reviewed By: rnk
Differential Revision: https://reviews.llvm.org/D108092
Currently isReallyTriviallyReMaterializableGeneric() implementation
prevents rematerialization on any virtual register use on the grounds
that is not a trivial rematerialization and that we do not want to
extend liveranges.
It appears that LRE logic does not attempt to extend a liverange of
a source register for rematerialization so that is not an issue.
That is checked in the LiveRangeEdit::allUsesAvailableAt().
The only non-trivial aspect of it is accounting for tied-defs which
normally represent a read-modify-write operation and not rematerializable.
The test for a tied-def situation already exists in the
/CodeGen/AMDGPU/remat-vop.mir,
test_no_remat_v_cvt_f32_i32_sdwa_dst_unused_preserve.
The change has affected ARM/Thumb, Mips, RISCV, and x86. For the targets
where I more or less understand the asm it seems to reduce spilling
(as expected) or be neutral. However, it needs a review by all targets'
specialists.
Differential Revision: https://reviews.llvm.org/D106408
Check if a remateralizable nstruction does not have any virtual
register uses. Even though rematerializable RA might not actually
rematerialize it in this scenario. In that case we do not want to
hoist such instruction out of the loop in a believe RA will sink
it back if needed.
This already has impact on AMDGPU target which does not check for
this condition in its isTriviallyReMaterializable implementation
and have instructions with virtual register uses enabled. The
other targets are not impacted at this point although will be when
D106408 lands.
Differential Revision: https://reviews.llvm.org/D107677
SwitchInst should have a void result type.
Add a check to the verifier to catch this error.
Reviewed By: samparker
Differential Revision: https://reviews.llvm.org/D108084
Follow-up to D107068, attempt to fold nested concat_vectors/undefs, as long as both the vector and inner subvector types are legal.
This exposed the same issue in ARM's MVE LowerCONCAT_VECTORS_i1 (raised as PR51365) and AArch64's performConcatVectorsCombine which both assumed concat_vectors only took 2 subvector operands.
Differential Revision: https://reviews.llvm.org/D107597
VarLoc based LiveDebugValues will abandon variable location propagation if
there are too many blocks and variable assignments in the function. If it
didn't, and we had (say) 1000 blocks and 1000 variables in scope, we'd end
up with 1 million DBG_VALUEs just at the start of blocks.
Instruction-referencing LiveDebugValues should honour this limitation too
(because the same limitation applies to it). Hoist the relevant command
line options into LiveDebugValues.cpp and pass it down into the
implementation classes as an argument to ExtendRanges. I've duplicated all
the run-lines in live-debug-values-cutoffs.mir to have an
instruction-referencing flavour.
Differential Revision: https://reviews.llvm.org/D107823
visitEXTRACT_SUBVECTOR can sometimes create illegal BITCASTs when
removing "redundant" INSERT_SUBVECTOR operations. This patch adds
an extra check to ensure such combines only occur after operation
legalisation if any resulting BITBAST is itself legal.
Differential Revision: https://reviews.llvm.org/D108086
This is a fairly common pattern:
```
%mask = G_CONSTANT iN <mask val>
%add = G_ADD %lhs, %rhs
%and = G_AND %add, %mask
```
We have combines to eliminate G_AND with a mask that does nothing.
If we combined the above to this:
```
%mask = G_CONSTANT iN <mask val>
%narrow_lhs = G_TRUNC %lhs
%narrow_rhs = G_TRUNC %rhs
%narrow_add = G_ADD %narrow_lhs, %narrow_rhs
%ext = G_ZEXT %narrow_add
%and = G_AND %ext, %mask
```
We'd be able to take advantage of those combines using the trunc + zext.
For this to work (or be beneficial in the best case)
- The operation we want to narrow then widen must only be used by the G_AND
- The G_TRUNC + G_ZEXT must be free
- Performing the operation at a narrower width must not produce a different
value than performing it at the original width *after masking.*
Example comparison between SDAG + GISel: https://godbolt.org/z/63jzb1Yvj
At -Os for AArch64, this is a 0.2% code size improvement on CTMark/pairlocalign.
Differential Revision: https://reviews.llvm.org/D107929
AttributeList::hasAttribute() is confusing, use clearer methods like
hasParamAttr()/hasRetAttr().
Add hasRetAttr() since it was missing from AttributeList.
We may use several COPY instructions to copy the needed sub-registers
during split. But the way we split the lanes during the COPYs may be
different from the subranges of the old register. This would fail when we
extend the subranges of the new register because the LaneMasks do not
match exactly between subranges of new register and old register.
Since we are bundling the COPYs, I think there is no need to further refine the
subranges of the new register based on the set of LaneMasks of the inserted COPYs.
I am not sure if there will be further breaking cases. But as the subranges of
new register are created based on the LaneMasks of the subranges of old register,
it will be highly possible we will always find an exact LaneMask match.
We can think about how to make the extendPHIKillRanges() work for
subrange mask mismatch case if we meet more such cases in the future.
The test case was from D105065 by @arsenm.
Differential Revision: https://reviews.llvm.org/D107829
This patch adds Pass1 of MIRADDFSDiscriminatorsPass before register
allocation, and Pass2 of MIRAddFSDiscriminatorsPass before
Block-Placement. This is still under --enable-fs-discrmininator
option (default false).
This would reduce the turn-around time for FSAFDO transition.
Differential Revision: https://reviews.llvm.org/D104579
The introduction of `SHF_GNU_RETAIN` has caused massive problems on Solaris.
Initially, as reported in Bug 49437, it caused dozens of testsuite failures
on both sparc and x86. The objects were marked as `ELFOSABI_NONE`, but
`SHF_GNU_RETAIN` is a GNU extension. In the native Solaris ABI, that flag
(in the range for OS-specific values) is `SHF_SUNW_ABSENT` with a
completely different semantics, which confuses Solaris `ld` very much.
Later, the objects became (correctly) marked `ELFOSABI_GNU`, which Solaris
`ld` doesn't support, causing it to SEGV and break the build. The linker
is currently being hardened to not accept non-native OS ABIs to avoid this.
The need for linker support is already documented in
`clang/include/clang/Basic/AttrDocs.td`, but not currently checked.
This patch avoids all this by not emitting `SHF_GNU_RETAIN` on Solaris at all.
Tested on `amd64-pc-solaris2.11`, `sparcv9-sun-solaris2.11`, and
`x86_64-pc-linux-gnu`.
Differential Revision: https://reviews.llvm.org/D107747
We were calling find and then using operator[]. Instead keep the
iterator from find and use it to get the value.
Just happened to notice while investigating how we decide what extends
to use between basic blocks.
Some files still contained the old University of Illinois Open Source
Licence header. This patch replaces that with the Apache 2 with LLVM
Exception licence.
Differential Revision: https://reviews.llvm.org/D107528
This patch refactors / simplifies salvageDebugInfoImpl(). The goal
here is to simplify the implementation of coro::salvageDebugInfo() in
a followup patch.
1. Change the return value to I.getOperand(0). Currently users of
salvageDebugInfoImpl() assume that the first operand is
I.getOperand(0). This patch makes this information explicit. A
nice side-effect of this change is that it allows us to salvage
expressions such as add i8 1, %a in the future.
2. Factor out the creation of a DIExpression and return an array of
DIExpression operations instead. This change allows users that
call salvageDebugInfoImpl() in a loop to avoid the costly
creation of temporary DIExpressions and to defer the creation of
a DIExpression until the end.
This patch does not change any functionality.
rdar://80227769
Differential Revision: https://reviews.llvm.org/D107383
We may call lowerRelativeReference in MC to determine whether target
supports this lowering. We should return nullptr instead of crashing
when we haven't implemented the real lowering.
Reviewed By: hubert.reinterpretcast
Differential Revision: https://reviews.llvm.org/D107830
If a G_SHL is fed by a G_CONSTANT, the lower and upper bits of the source can be
shifted individually by the constant shift amount.
However in case the shift amount came from a G_TRUNC(G_CONSTANT), the generic shift legalization
code was used, producing intermediate shifts that are potentially illegal on some targets.
This change teaches narrowScalarShift to look through G_TRUNCs and G_*EXTs.
Reviewed By: paquette
Differential Revision: https://reviews.llvm.org/D89100
This patch is a revert of e08f205f5c2c. In that patch, DW_TAG_subprograms
were permitted to be referenced across CU boundaries, to improve stack
trace construction using call site information. Unfortunately, as
documented in PR48790, the way that subprograms are "owned" by dwarf units
is sufficiently complicated that subprograms end up in unexpected units,
invalidating cross-unit references.
There's no obvious way to easily fix this, and several attempts have
failed. Revert this to ensure correct DWARF is always emitted.
Three tests change in addition to the reversion, but they're all very
light alterations.
Differential Revision: https://reviews.llvm.org/D107076
We should use MachineInstr::eraseFromParentAndMarkDBGValuesForRemoval()
instead of eraseFromParent().
We should probably use that in other places too but fix this issue which
affects clang bootstrap builds for now.
This commit adds the isnan intrinsic and provides a default expansion
for it in the SDAG. However, it makes the assumption that types
it operates on are IEEE-compliant types. This is not always the case.
An example of that is PPC "double double" which has a representation
that
- Does not need to conform to IEEE requirements for isnan as it is
not an IEEE-compliant type
- Does not have a representation that allows for straightforward
reinterpreting as an integer and use of integer operations
The result was that this commit broke __builtin_isnan for ppc_fp128
making many valid numeric values report a NaN.
This patch simply changes the expansion to always expand to unordered
comparison (regardless of whether FP exceptions are tracked). This
is inline with previous semantics.
This isn't optimal, but prevents crashing when the libcall isn't
available. It just calculates the full product and makes sure the high bits
match the sign of the low half. Each of the pieces should go through their own
type legalization.
This can make D107420 unnecessary.
Needs tests, but I wanted to start discussion about D107420.
Reviewed By: FreddyYe
Differential Revision: https://reviews.llvm.org/D107581
This is recommit of the patch 16ff91ebccda1128c43ff3cee104e2c603569fb2,
reverted in 0c28a7c990c5218d6aec47c5052a51cba686ec5e because it had
an error in call of getFastMathFlags (base type should be FPMathOperator
but not Instruction). The original commit message is duplicated below:
Clang has builtin function '__builtin_isnan', which implements C
library function 'isnan'. This function now is implemented entirely in
clang codegen, which expands the function into set of IR operations.
There are three mechanisms by which the expansion can be made.
* The most common mechanism is using an unordered comparison made by
instruction 'fcmp uno'. This simple solution is target-independent
and works well in most cases. It however is not suitable if floating
point exceptions are tracked. Corresponding IEEE 754 operation and C
function must never raise FP exception, even if the argument is a
signaling NaN. Compare instructions usually does not have such
property, they raise 'invalid' exception in such case. So this
mechanism is unsuitable when exception behavior is strict. In
particular it could result in unexpected trapping if argument is SNaN.
* Another solution was implemented in https://reviews.llvm.org/D95948.
It is used in the cases when raising FP exceptions by 'isnan' is not
allowed. This solution implements 'isnan' using integer operations.
It solves the problem of exceptions, but offers one solution for all
targets, however some can do the check in more efficient way.
* Solution implemented by https://reviews.llvm.org/D96568 introduced a
hook 'clang::TargetCodeGenInfo::testFPKind', which injects target
specific code into IR. Now only SystemZ implements this hook and it
generates a call to target specific intrinsic function.
Although these mechanisms allow to implement 'isnan' with enough
efficiency, expanding 'isnan' in clang has drawbacks:
* The operation 'isnan' is hidden behind generic integer operations or
target-specific intrinsics. It complicates analysis and can prevent
some optimizations.
* IR can be created by tools other than clang, in this case treatment
of 'isnan' has to be duplicated in that tool.
Another issue with the current implementation of 'isnan' comes from the
use of options '-ffast-math' or '-fno-honor-nans'. If such option is
specified, 'fcmp uno' may be optimized to 'false'. It is valid
optimization in general, but it results in 'isnan' always returning
'false'. For example, in some libc++ implementations the following code
returns 'false':
std::isnan(std::numeric_limits<float>::quiet_NaN())
The options '-ffast-math' and '-fno-honor-nans' imply that FP operation
operands are never NaNs. This assumption however should not be applied
to the functions that check FP number properties, including 'isnan'. If
such function returns expected result instead of actually making
checks, it becomes useless in many cases. The option '-ffast-math' is
often used for performance critical code, as it can speed up execution
by the expense of manual treatment of corner cases. If 'isnan' returns
assumed result, a user cannot use it in the manual treatment of NaNs
and has to invent replacements, like making the check using integer
operations. There is a discussion in https://reviews.llvm.org/D18513#387418,
which also expresses the opinion, that limitations imposed by
'-ffast-math' should be applied only to 'math' functions but not to
'tests'.
To overcome these drawbacks, this change introduces a new IR intrinsic
function 'llvm.isnan', which realizes the check as specified by IEEE-754
and C standards in target-agnostic way. During IR transformations it
does not undergo undesirable optimizations. It reaches instruction
selection, where is lowered in target-dependent way. The lowering can
vary depending on options like '-ffast-math' or '-ffp-model' so the
resulting code satisfies requested semantics.
Differential Revision: https://reviews.llvm.org/D104854
Fixes issue where late materialized constants can be more strictly
aligned then their containing csect.
Differential Revision: https://reviews.llvm.org/D103103