Add the missing include to fix an error when `cmake_push_check_state()`
is called and incidentally the CMakePushCheckState module is not loaded
by any other check running prior to `FindLibEdit.cmake`:
CMake Error at /var/no-tmpfs/portage/dev-util/lldb-15.0.4/work/cmake/Modules/FindLibEdit.cmake:24 (cmake_push_check_state):
Unknown CMake command "cmake_push_check_state".
Call Stack (most recent call first):
cmake/modules/LLDBConfig.cmake:52 (find_package)
cmake/modules/LLDBConfig.cmake:59 (add_optional_dependency)
CMakeLists.txt:28 (include)
Gentoo Bug: https://bugs.gentoo.org/880065
Differential Revision: https://reviews.llvm.org/D137555
Provide FirOpBuilder::setFastMathFlags() to configure FastMathFlags
for the builder. Set FastMathAttr for operations based on FirOpBuilder
configuration via mlir::OpBuilder::Listener.
This is a little bit hacky solution, because we lose the ability
to hook other listeners to FirOpBuilder. There are also potential
issues with OpBuilder::clone() - the hook will be invoked for cloned
operations and will effectively overwrite FastMathAttr with the ones
configured in FirOpBuilder, which should not be happening.
We should teach mlir::OpBuilder about FastMathAttr setup in future.
Reviewed By: jeanPerier, kiranchandramohan
Differential Revision: https://reviews.llvm.org/D137390
GlobalVariable and Function can be available_externally. GlobalAlias is used
similarly. Allowing available_externally is a natural extension and helps
ThinLTO discard GlobalAlias in a non-prevailing COMDAT (see D135427).
For now, available_externally GlobalAlias must point to an
available_externally GlobalValue (not ConstantExpr).
Differential Revision: https://reviews.llvm.org/D137441
This reverts commit 70508b614e6478ba2c3fc79e935e2c68e2d79b71.
This change depends on a reverted change that broke the windows mlir buildbot; reverting to bring remaining mlir bots to green
This change adds a new NVGPU operation that targets the PTX `mma.sp.sync`
instruction variants. A lowering to NVVM is provided using inline
assembly.
Reviewed By: ThomasRaoux, manishucsd
Differential Revision: https://reviews.llvm.org/D137202
This switches wasm intrinsics to use default attributes,
i.e. nofree, nosync, nocallback and willreturn. Especially
willreturn will be required to avoid optimization regressions
in the future.
The attributes are omitted from the trapping fptoi intrinsics
(where I assume trapping is considered well-defined, and as such
these aren't willreturn), the throw/rethrow intrinsics (which
will unwind) and the atomic intrinsics (which aren't nosync).
Differential Revision: https://reviews.llvm.org/D137551
Use case:
- When block layout is visualized after MBP pass, the basic blocks are labeled in layout order; meanwhile blocks could be numbered in a different order.
- As a result, it's hard to map between the graph and pass output. With this option on, the basic blocks are renumbered in function layout order.
This option is only useful when a function is to be visualized (i.e., when view options are on) to make it debugging only.
Use https://godbolt.org/z/5WTW36bMr as an example:
- As MBP pass output (shown in godbolt output window), `func2` is in a basic block numbered `2` (`bb.2`), and `func1` is in a basic block numbered `3` (`bb.3`);
`bb.3` is a block with higher block frequency than `bb.2`, and `bb.3` is placed before `bb.2` in the functin layout.
- Use [1] to get the dot graph (graph uploaded in [2]), the blocks are re-numbered.
- `func1` is in 'if.end' block, and labeled `1` in visualized dot; `func2` is in 'if.then' blocks, and labeled `3` --> the labeled number and bb number won't map.
- [[ b5626ae975/llvm/lib/CodeGen/MachineBlockFrequencyInfo.cpp (L127) | DOTGraphTraits<MachineBlockFrequencyInfo *>::getNodeLabel ]] is where labeled numbers are based on function layout number, and [[ a8d93783f3/llvm/include/llvm/Support/GraphWriter.h (L209)
| called by graph writer ]].
So call 'MachineFunction::RenumberBlocks' would make labeled number (in dot graph) and block number (in pass output) consistent with each other.
[1] `./bin/clang++ -O3 -S -mllvm -view-block-layout-with-bfi=count -mllvm -view-bfi-func-name=_Z9func_loopv -mllvm -print-after=block-placement -mllvm -filter-print-funcs=_Z9func_loopv test.c`
[2] {F25201785}
Reviewed By: davidxl
Differential Revision: https://reviews.llvm.org/D137467
This patch adds the assembly/disassembly for the following instructions:
st1w: Contiguous store words from vector (128-bit vector elements)
st1d: Contiguous store doublewords from vector (128-bit vector elements)
ld1w: Contiguous load unsigned words to vector (128-bit vector elements)
ld1d: Contiguous load unsigned doublewords to vector (128-bit vector elements)
The reference can be found here:
https://developer.arm.com/documentation/ddi0602/2022-09
Differential Revision: https://reviews.llvm.org/D137245
Splatting the first vector element of the result of a BinOp, where any of the
BinOp's operands are the result of a first vector element splat can be simplified to
splatting the first vector element of the result of the BinOp
Differential Revision: https://reviews.llvm.org/D135876
Replace the quick sort partition method with one that is more similar to the
method used by C++ std quick sort. This improves the runtime for sorting
sk_2005.mtx by more than 10x.
Reviewed By: aartbik
Differential Revision: https://reviews.llvm.org/D137290
This patch adds the new FEAT_B16B16 feature as well as the
assembly/disassembly for all of the B16B16 instructions:
bfadd: BFloat16 floating-point add vectors
bfsub: BFloat16 floating-point subtract vectors
bfmul: BFloat16 floating-point multiply vectors
bfclamp: BFloat16 floating-point clamp to minimum/maximum number
bfmax: BFloat16 floating-point maximum
bfmaxnm: BFloat16 floating-point maximum number
bfmin: BFloat16 floating-point minimum
bfminnm: BFloat16 floating-point minimum number
bfmla: BFloat16 floating-point fused multiply-add vectors
bfmls: BFloat16 floating-point fused multiply-subtract vectors
The reference can be found here:
https://developer.arm.com/documentation/ddi0602/2022-09
Differential Revision: https://reviews.llvm.org/D137321
There can be a difference for MOVDDUPrr but not the load folded broadcast that is purely on Port23
Fixes an old TODO (inherited from SkylakeServer which was fixed at c7662dc3e52801ec824d8473278fb976107d3e57)
Confirmed on Agner + uops.info
Try to simplify comparisons with the smallest normalized value. If
denormals will be treated as 0, we can simplify by using an equality
comparison with 0.
fcmp olt fabs(x), smallest_normalized_number -> fcmp oeq x, 0.0
fcmp ult fabs(x), smallest_normalized_number -> fcmp ueq x, 0.0
fcmp oge fabs(x), smallest_normalized_number -> fcmp one x, 0.0
fcmp ult fabs(x), smallest_normalized_number -> fcmp ueq x, 0.0
The device libraries have a few range checks that look like
this for denormal handling paths.
Move getDebugValueLoc so that it can be accessed from DebugInfo.h for the
Assignment Tracking patch stack and remove redundant parameter Src.
Reviewed By: jryans
Differential Revision: https://reviews.llvm.org/D132357
Gather nodes are vectorized as simply vector of the scalars instead of
relying on the actual node. It leads to the fact that in some cases
we may miss incorrect transformation (non-matching set of scalars is
just ended as a gather node instead of possible vector/gather node).
Better to rely on the actual nodes, it allows to improve stability and
better detect missed cases.
Differential Revision: https://reviews.llvm.org/D135174
deleteAssignmentMarkers(const Instruction *Inst) does exactly as you'd expect -
it deletes any dbg.assign intrinsics linked to Inst.
Reviewed By: jmorse
Differential Revision: https://reviews.llvm.org/D133576
Currently call slot optimization may be prevented because the
lifetime markers for the destination only start after the call.
In this case, rather than aborting the transform, we should move
the lifetime.start before the call to enable the transform.
Differential Revision: https://reviews.llvm.org/D135886
Those methods were added long time ago. Now we get the same methods generated by tablegen, so there is no need for duplicates.
Differential Revision: https://reviews.llvm.org/D137544
Adds a 'targetFallback' field to relationships in symbol graph that
contains the plain name of the relationship target. This is useful for
clients when the relationship target symbol is not available.
Differential Revision: https://reviews.llvm.org/D136455
Some "inner" defs were being overriding "outer" SchedRW defs, making it very tricky to track what schedule was being used.
Noticed as I'm trying to remove a lot of unnecessary shift/rotate RMW overrides from the scheduler models
The Assignment Tracking debug-info feature is outlined in this RFC:
https://discourse.llvm.org/t/
rfc-assignment-tracking-a-better-way-of-specifying-variable-locations-in-ir
Overview
It's possible to find intrinsics linked to an instruction by looking at the
MetadataAsValue uses of the attached DIAssignID. That covers instruction ->
intrinsic(s) lookup. Add a global DIAssignID -> instruction(s) map which gives
us the ability to perform intrinsic -> instruction(s) lookup. Add plumbing to
keep the map up to date through optimisations and add utility functions
including two that perform those lookups. Finally, add a unittest.
Details
In llvm/lib/IR/LLVMContextImpl.h add AssignmentIDToInstrs which maps DIAssignID
* attachments to Instruction *s. Because the DIAssignID * is the key we can't
use a TrackingMDNodeRef for it, and therefore cannot easily update the mapping
when a temporary DIAssignID is replaced.
Temporary DIAssignID's are only used in IR parsing to deal with metadata
forward references. Update llvm/lib/AsmParser/LLParser.cpp to avoid using
temporary DIAssignID's for attachments.
In llvm/lib/IR/Metadata.cpp add Instruction::updateDIAssignIDMapping which is
called to remove or add an entry (or both) to AssignmentIDToInstrs. Call this
from Instruction::setMetadata and add a call to setMetadata in Intruction's
dtor that explicitly unsets the DIAssignID so that the mappging gets updated.
In llvm/lib/IR/DebugInfo.cpp and DebugInfo.h add utility functions:
getAssignmentInsts(const DbgAssignIntrinsic *DAI)
getAssignmentMarkers(const Instruction *Inst)
RAUW(DIAssignID *Old, DIAssignID *New)
deleteAll(Function *F)
These core utils are tested in llvm/unittests/IR/DebugInfoTest.cpp.
Reviewed By: jmorse
Differential Revision: https://reviews.llvm.org/D132224
Counterpart to "usedAsMutableReference". Just as for references, there
are const and non-const pointer parameters, and it's valuable to be able
to have different highlighting for the two cases at the call site.
We could have re-used the existing modifier, but having a dedicated one
maximizes client flexibility.
Reviewed By: nridge
Differential Revision: https://reviews.llvm.org/D130015
The Assignment Tracking debug-info feature is outlined in this RFC:
https://discourse.llvm.org/t/
rfc-assignment-tracking-a-better-way-of-specifying-variable-locations-in-ir
Add the llvm.dbg.assign intrinsic boilerplate. This updates the textual-bitcode
roundtrip test to also check that round-tripping with the intrinsic works.
The intrinsic marks the position of a source level assignment.
The llvm.dbg.assign interface looks like this (each parameter is wrapped in
MetadataAsValue, and Value * type parameters are first wrapped in
ValueAsMetadata):
void @llvm.dbg.assign(Value *Value,
DIExpression *ValueExpression,
DILocalVariable *Variable,
DIAssignID *ID,
Value *Address,
DIExpression *AddressExpression)
The first three parameters look and behave like an llvm.dbg.value. ID is a
reference to a store. The intrinsic is "linked to" instructions in the same
function that use the same ID as an attachment. That is mostly conceptual at
this point; the two-way link infrastructure will come in another patch. Address
is the destination address of the store and it is modified by
AddressExpression. LLVM currently encodes variable fragment information in
DIExpressions, so as an implementation quirk the FragmentInfo for Variable is
contained within ValueExpression only.
Reviewed By: jmorse
Differential Revision: https://reviews.llvm.org/D132223
The loop-carried dependency detection logic in isLoopCarriedDep relies
on the load and store using the same definition for the base register.
This misses the case of post-increment loads and stores whose base
register are different PHI initialized from the same initial value.
This commit extends the logic to accept the load and store having
different PHI base address provided that they had the same initial value
when entering the loop and are incremented by the same amount in each
loop.
Reviewed By: bcahoon
Differential Revision: https://reviews.llvm.org/D136463
This is to fix the wrong checking introdued in D64195.
`std {{[0-9]+}}, 16(1)` is the store for the lr register. It breaks
previous testing point before D64195.
The original change compares `APInt` to check the constant is the same or not. But shift amount may have different constant types.
So, this patch change to use `getZExtValue` to compare constant value.
Original comment:
The special case for bit extraction pattern is `((x >> C) & mask) << C`.
It can be combined to `x & (mask << C)` by return true in isDesirableToCommuteWithShift.
Fix: #56427
Reviewed By: dmgreen
Differential Revision: https://reviews.llvm.org/D136014
The Assignment Tracking debug-info feature is outlined in this RFC:
https://discourse.llvm.org/t/
rfc-assignment-tracking-a-better-way-of-specifying-variable-locations-in-ir
Add the DIAssignID metadata attachment boilerplate. Includes a textual-bitcode
roundtrip test and tests that the verifier and parser catch badly formed IR.
This piece of metadata links together stores (used as an attachment) and the
yet-to-be-added llvm.dbg.assign debug intrinsic (used as an operand).
Reviewed By: jmorse
Differential Revision: https://reviews.llvm.org/D132222