* Replace getUserCost with getInstructionCost, covering all cost kinds.
* Remove getInstructionLatency, it's not implemented by any backends, and we should fold the functionality into getUserCost (now getInstructionCost) to make it easier for targets to handle the cost kinds with their existing cost callbacks.
Original Patch by @samparker (Sam Parker)
Differential Revision: https://reviews.llvm.org/D79483
A const reference is preferred over a non-null const pointer.
`Type *` is kept as is to match the other overload.
Reviewed By: davidxl
Differential Revision: https://reviews.llvm.org/D131197
1) Overloaded (instruction-based) method is a wrapper around the current (opcode-based) method.
2) This patch also changes a few callsites (VectorCombine.cpp,
SLPVectorizer.cpp, CodeGenPrepare.cpp) to call the overloaded method.
3) This is a split of D128302.
Differential Revision: https://reviews.llvm.org/D131114
In this patch we replace common code patterns with the use of utility
functions for dealing with profiling metadata. There should be no change
in functionality, as the existing checks should be preserved in all
cases.
Reviewed By: bogner, davidxl
Differential Revision: https://reviews.llvm.org/D128860
In this patch we replace common code patterns with the use of utility
functions for dealing with profiling metadata. There should be no change
in functionality, as the existing checks should be preserved in all
cases.
Reviewed By: bogner, davidxl
Differential Revision: https://reviews.llvm.org/D128860
This patch fixes the following error with MSVC 16.9.2 in case of NDEBUG and LLVM_ENABLE_DUMP:
llvm/lib/CodeGen/CodeGenPrepare.cpp(2581): error C2872: 'ExtAddrMode': ambiguous symbol
llvm/include/llvm/CodeGen/TargetInstrInfo.h(86): note: could be 'llvm::ExtAddrMode'
llvm/lib/CodeGen/CodeGenPrepare.cpp(2447): note: or '`anonymous-namespace'::ExtAddrMode'
llvm/lib/CodeGen/CodeGenPrepare.cpp(2581): error C2039: 'print': is not a member of 'llvm::ExtAddrMode'
Reviewed By: aaron.ballman
Differential Revision: https://reviews.llvm.org/D130426
In https://reviews.llvm.org/D30114, support for mismatching address
spaces was introduced to CodeGenPrepare's optimizeMemoryInst, using
addrspacecast as it was argued that only no-op addrspacecasts would be
considered when constructing the address mode. However, by doing
inttoptr/ptrtoint, it's possible to get CGP to emit an addrspace
that's not actually no-op, introducing a miscompilation:
define void @kernel(i8* %julia_ptr) {
%intptr = ptrtoint i8* %julia_ptr to i64
%ptr = inttoptr i64 %intptr to i32 addrspace(3)*
br label %end
end:
store atomic i32 1, i32 addrspace(3)* %ptr unordered, align 4
ret void
}
Gets compiled to:
define void @kernel(i8* %julia_ptr) {
end:
%0 = addrspacecast i8* %julia_ptr to i32 addrspace(3)*
store atomic i32 1, i32 addrspace(3)* %0 unordered, align 4
ret void
}
In the case of NVPTX, this introduces a cvta.to.shared, whereas
leaving out the %end block and branch doesn't trigger this
optimization. This results in illegal memory accesses as seen in
https://github.com/JuliaGPU/CUDA.jl/issues/558
In this change, I introduced a check before doing the pointer cast
that verifies address spaces are the same. If not, it emits a
ptrtoint/inttoptr combination to get a no-op cast between address
spaces. I decided against disallowing ptrtoint/inttoptr with
non-default AS in matchOperationAddr, because now its still possible
to look through multiple sequences of them that ultimately do not
result in a address space mismatch (i.e. the second lit test).
D125887 changed the ctlz/cttz despeculation transform to insert
a freeze for the introduced branch on zero. While this does fix
the "branch on poison" issue, we may still get in trouble if we
pick a different value for the branch and for the ctz argument
(i.e. non-zero for the branch, but zero for the ctz). To avoid
this, we should use the same frozen value in both positions.
This does cause a regression in RISCV codegen by introducing an
additional sext. The DAG looks like this:
t0: ch = EntryToken
t2: i64,ch = CopyFromReg t0, Register:i64 %3
t4: i64 = AssertSext t2, ValueType:ch:i32
t23: i64 = freeze t4
t9: ch = CopyToReg t0, Register:i64 %0, t23
t16: ch = CopyToReg t0, Register:i64 %4, Constant:i64<32>
t18: ch = TokenFactor t9, t16
t25: i64 = sign_extend_inreg t23, ValueType:ch:i32
t24: i64 = setcc t25, Constant:i64<0>, seteq:ch
t28: i64 = and t24, Constant:i64<1>
t19: ch = brcond t18, t28, BasicBlock:ch<cond.end 0x8311f68>
t21: ch = br t19, BasicBlock:ch<cond.false 0x8311e80>
I don't see a really obvious way to improve this, as we can't push
the freeze past the AssertSext (which may produce poison).
Differential Revision: https://reviews.llvm.org/D126638
Clang-format InstructionSimplify and convert all "FunctionName"s to
"functionName". This patch does touch a lot of files but gets done with
the cleanup of InstructionSimplify in one commit.
This is the alternative to the less invasive clang-format only patch: D126783
Reviewed By: spatel, rengolin
Differential Revision: https://reviews.llvm.org/D126889
Some cl::ZeroOrMore were added to avoid the `may only occur zero or one times!`
error. More were added due to cargo cult. Since the error has been removed,
cl::ZeroOrMore is unneeded.
Also remove cl::init(false) while touching the lines.
This relands commit 4d8d2580c53e130c3c3dd3877384301e3c495554.
The major change here is using 'addUsedIfAvailable<BasicBlockSectionsProfileReader>()` to make sure we don't change the pipeline tests.
Differential Revision: https://reviews.llvm.org/D126518
Today, text section prefixes (none, .unlikely, .hot, and .unkown) are determined based on PGO profile. However, Propeller may deem a function hot when PGO doesn't. Besides, when `-Wl,-keep-text-section-prefix=true` Propeller cannot enforce a global section ordering as the linker can only reorder sections within each output section (.text, .text.hot, .text.unlikely).
This patch promotes all functions with Propeller profiles (functions listed in the basic-block-sections profile) to .text.hot. The feature is hidden behind the flag `--bbsections-guided-section-prefix` which defaults to `true`.
The new implementation refactors the parsing of basic block sections profile into a new `BasicBlockSectionsProfileReader` analysis pass. This allows us to use the information earlier in `CodeGenPrepare` in order to set the functions text prefix. `BasicBlockSectionsProfileReader` will be used both by `BasicBlockSections` pass and `CodeGenPrepare`.
Differential Revision: https://reviews.llvm.org/D122930
Freeze the condition of the newly introduced conditional branch,
to avoid immediate undefined behavior if the input to ctlz/cttz
was originally poison.
Differential Revision: https://reviews.llvm.org/D125887
We commonly want to create either an inbounds or non-inbounds GEP
based on a boolean value, e.g. when preserving inbounds from
existing GEPs. Directly accept such a boolean in the API, rather
than requiring a ternary between CreateGEP and CreateInBoundsGEP.
This change is not entirely NFC, because we now preserve an
inbounds flag in a constant expression edge-case in InstCombine.
We often see code like the following after running SCCP:
switch (x) { case 42: phi(42, ...); }
This tends to produce bad code as we currently materialize the constant
phi-argument in the switch-block. This increases register pressure and
if the pattern repeats for `n` case statements, we end up generating `n`
constant values.
This changes CodeGenPrepare to catch this pattern and revert it back to:
switch (x) { case 42: phi(x, ...); }
Differential Revision: https://reviews.llvm.org/D124552
This adds a `TargetLoweringBase::getSwitchConditionType` callback to
give targets a chance to control the type used in
`CodeGenPrepare::optimizeSwitchInst`.
Implement callback for X86 to avoid i8 and i16 types where possible as
they often incur extra zero-extensions.
This is NFC for non-X86 targets.
Differential Revision: https://reviews.llvm.org/D124894
This patch adds support for inline assembly address operands using the "p"
constraint on X86 and SystemZ.
This was in fact broken on X86 (see example at
https://reviews.llvm.org/D110267, Nov 23).
These operands should probably be treated the same as memory operands by
CodeGenPrepare, which have been commented with "TODO" there.
Review: Xiang Zhang and Ulrich Weigand
Differential Revision: https://reviews.llvm.org/D122220
This reverts commit 7f230feeeac8a67b335f52bd2e900a05c6098f20.
Breaks CodeGenCUDA/link-device-bitcode.cu in check-clang,
and many LLVM tests, see comments on https://reviews.llvm.org/D121169
The `SplitIndirectBrCriticalEdges` function was originally designed for
`CodeGenPrepare` and skipped splitting of edges when the destination
block didn't contain any `PHI` instructions. This only makes sense when
reducing COPYs like `CodeGenPrepare`. In the case of
`PGOInstrumentation` or `GCOVProfiling` it would result in missed
counters and wrong result in functions with computed goto.
Differential Revision: https://reviews.llvm.org/D120096
AddressingModeMatcher::matchOperationAddr may attempt to shift a
variable by the same amount of steps as found in the IR in a SHL
instruction. This was done without considering that there could be
undefined behavior in the IR, so the shift performed when compiling
could end up having undefined behavior as well.
This patch avoid UB in the codegenprepare by making sure that we
limit the shift amount used, in a similar way as already being done
in CodeGenPrepare::optimizeLoadExt.
Differential Revision: https://reviews.llvm.org/D118602
When we know the value we're extending is a negative constant then it
makes sense to use SIGN_EXTEND because this may improve code quality in
some cases, particularly when doing a constant splat of an unpacked vector
type. For example, for SVE when splatting the value -1 into all elements
of a vector of type <vscale x 2 x i32> the element type will get promoted
from i32 -> i64. In this case we want the splat value to sign-extend from
(i32 -1) -> (i64 -1), whereas currently it zero-extends from
(i32 -1) -> (i64 0xFFFFFFFF). Sign-extending the constant means we can use
a single mov immediate instruction.
New tests added here:
CodeGen/AArch64/sve-vector-splat.ll
I believe we see some code quality improvements in these existing
tests too:
CodeGen/AArch64/reduce-and.ll
CodeGen/AArch64/unfold-masked-merge-vector-variablemask.ll
The apparent regressions in CodeGen/AArch64/fast-isel-cmp-vec.ll only
occur because the test disables codegen prepare and branch folding.
Differential Revision: https://reviews.llvm.org/D114357
This reverts commit 31009f0b5afb504fc1f30769c038e1b7be6ea45b.
It seems to be causing SVE VLA buildbot failures and has introduced a
genuine regression. Reverting for now.
When we know the value we're extending is a negative constant then it
makes sense to use SIGN_EXTEND because this may improve code quality in
some cases, particularly when doing a constant splat of an unpacked vector
type. For example, for SVE when splatting the value -1 into all elements
of a vector of type <vscale x 2 x i32> the element type will get promoted
from i32 -> i64. In this case we want the splat value to sign-extend from
(i32 -1) -> (i64 -1), whereas currently it zero-extends from
(i32 -1) -> (i64 0xFFFFFFFF). Sign-extending the constant means we can use
a single mov immediate instruction.
New tests added here:
CodeGen/AArch64/sve-vector-splat.ll
I believe we see some code quality improvements in these existing
tests too:
CodeGen/AArch64/dag-numsignbits.ll
CodeGen/AArch64/reduce-and.ll
CodeGen/AArch64/unfold-masked-merge-vector-variablemask.ll
The apparent regressions in CodeGen/AArch64/fast-isel-cmp-vec.ll only
occur because the test disables codegen prepare and branch folding.
Differential Revision: https://reviews.llvm.org/D114357
This patch fixes a crash when despeculating ctlz/cttz intrinsics with
scalable-vector types. It is not safe to speculatively get the size of
the vector type in bits in case the vector type is not a fixed-length type. As
it happens this isn't required as vector types are skipped anyway.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D112141
Stop using APInt constructors and methods that were soft-deprecated in
D109483. This fixes all the uses I found in llvm, except for the APInt
unit tests which should still test the deprecated methods.
Differential Revision: https://reviews.llvm.org/D110807
This requires a minor change to CodeGenPrepare to ensure that
shouldSinkOperands will be called for And.
Reviewed By: frasercrmck
Differential Revision: https://reviews.llvm.org/D110106