The patch #102460 already implements separate DT/LI/SE for parallel sub
function. Crashes have been reported while region generator tries using
oringinal function's DT while creating new parallel sub function due to
checks in #101198. This patch aims at fixing those cases by switching
the DT/LI while generating parallel function using Region Generator.
Fixes#117877
After patch 5ce47a5, some assert crashes occur in Polly. This issue
arises because an instruction from one function queries the Dominator
Tree (DT) of another function. To fix this, the `isHoistableLoad`
function now skips instructions that belong to different function while
iterating.
The patch sets the vectorization metadata to false for Polly's fallback
loops. These are the loops executed when RTCs fail. This minimizes the
multiple loop versioning carried out by Polly and subsequently by the
Loop Vectorizer.
---------
Co-authored-by: Michael Kruse <github@meinersbur.de>
The patch adds a nullptr check before accessing the loop blocks in
'hasPossiblyDistributableLoop' function. The existing check for the
loop’s containment in the region does not capture nullptr cases when the
region covers the entire function. Therefore, it’s better to exit if the
basic block isn’t part of any loop
Fixes#113772.
Rename the function to reflect its correct behavior and to be consistent
with `Module::getOrInsertFunction`. This is also in preparation of
adding a new `Intrinsic::getDeclaration` that will have behavior similar
to `Module::getFunction` (i.e, just lookup, no creation).
(this is the part related to bolt, lld and mlir)
Without these explicit includes, removing other headers, who implicitly
include llvm-config.h, may have non-trivial side effects. For example,
`clangd` may report even `llvm-config.h` as "no used" in case it defines
a macro, that is explicitly used with #ifdef. It is actually amplified
with different build configs which use different set of macros.
As specified in the docs,
1) raw_string_ostream is always unbuffered and
2) the underlying buffer may be used directly
( 65b13610a5226b84889b923bae884ba395ad084d for further reference )
* Don't call raw_string_ostream::flush(), which is essentially a no-op.
* Avoid unneeded calls to raw_string_ostream::str(), to avoid excess indirection.
CodeGenIntrinsic changes:
- Use `const` Record pointers, and `StringRef` when possible.
- Default initialize several fields with their definition instead of in
the constructor.
- Simplify various string checks in the constructor using StringRef
starts_with()/ends_with() functions.
- Eliminate first argument to `setDefaultProperties` and use `TheDef`
class member instead.
IntrinsicEmitter changes:
- Emit `namespace llvm::Intrinsic` instead of nested namespaces.
- End generated comments with a .
- Use range based for loops, and early continue within loops.
- Emit `static constexpr` instead of `static const` for arrays.
- Change `compareFnAttributes` to use std::tie() to compare intrinsic
attributes and return a default value when all attributes are equal.
STLExtras:
- Add std::replace wrapper which takes a range.
DominatorTree, LoopInfo, and ScalarEvolution are function-level analyses
that expect to be called only on instructions and basic blocks of the
function they were original created for. When Polly outlined a parallel
loop body into a separate function, it reused the same analyses seemed
to work until new checks to be added in #101198.
This patch creates new analyses for the subfunctions. GenDT, GenLI, and
GenSE now refer to the analyses of the current region of code. Outside
of an outlined function, they refer to the same analysis as used for the
SCoP, but are substituted within an outlined function.
Additionally to the cross-function queries of DT/LI/SE, we must not
create SCEVs that refer to a mix of expressions for old and generated
values. Currently, SCEVs themselves do not "remember" which
ScalarEvolution analysis they were created for, but mixing them is just
as unexpected as using DT/LI across function boundaries. Hence
`SCEVLoopAddRecRewriter` was combined into `ScopExpander`.
`SCEVLoopAddRecRewriter` only replaced induction variables but left
SCEVUnknowns to reference the old function. `SCEVParameterRewriter`
would have done so but its job was effectively superseded by
`ScopExpander`, and now also `SCEVLoopAddRecRewriter`. Some issues
persist put marked with a FIXME in the code. Changing them would
possibly cause this patch to be not NFC anymore.
The base concept is same as existing reduction algorithm where we get
the list of candidate pairs <store,load>. But the existing algorithm
works only if there is single binary operation between the load and
store.
Example sum += a[i];
This algorithm extends to work with more than single binary operation as
well. It is implemented using data flow reduction detection on basic
block level. We propagate the loads, the number of times the load is
used(flows into instruction) and binary operation performed until we
reach a store.
Example sum += a[i] + b[i];
```
sum(Ld) a[i](Ld)
\ + /
tmp b[i](Ld)
\ + /
sum(St)
```
In the above case the candidate pairs are formed by associating sum with
all of its load inputs which are sum, a[i] and b[i]. Then check
functions are used to filter a valid reduction pair ie {sum,sum}.
---------
Co-authored-by: Michael Kruse <github@meinersbur.de>
Uses the new InsertPosition class (added in #94226) to simplify some of
the IRBuilder interface, and removes the need to pass a BasicBlock
alongside a BasicBlock::iterator, using the fact that we can now get the
parent basic block from the iterator even if it points to the sentinel.
This patch removes the BasicBlock argument from each constructor or call
to setInsertPoint.
This has no functional effect, but later on as we look to remove the
`Instruction *InsertBefore` argument from instruction-creation
(discussed
[here](https://discourse.llvm.org/t/psa-instruction-constructors-changing-to-iterator-only-insertion/77845)),
this will simplify the process by allowing us to deprecate the
InsertPosition constructor directly and catch all the cases where we use
instructions rather than iterators.
Move PassInstrumentationAnalysis into PassInstrumentation.h and stop
including it in PassManager.h (effectively inverting the direction of
the dependency).
Most places using PassManager are not interested in PassInstrumentation,
and we no longer have any uses of it in PassManager.h itself (only in
PassManagerImpl.h).
Update the folder titles for targets in the monorepository that have not
seen taken care of for some time. These are the folders that targets are
organized in Visual Studio and XCode
(`set_property(TARGET <target> PROPERTY FOLDER "<title>")`)
when using the respective CMake's IDE generator.
* Ensure that every target is in a folder
* Use a folder hierarchy with each LLVM subproject as a top-level folder
* Use consistent folder names between subprojects
* When using target-creating functions from AddLLVM.cmake, automatically
deduce the folder. This reduces the number of
`set_property`/`set_target_property`, but are still necessary when
`add_custom_target`, `add_executable`, `add_library`, etc. are used. A
LLVM_SUBPROJECT_TITLE definition is used for that in each subproject's
root CMakeLists.txt.
Reverts commit d68826dfbd98, which changes the previous default behavior
of always breaking before a stream insertion operator `<<` if both
operands are string literals.
Also reverts the related commits 27f547968cce and bf05be5b87fc.
See the discussion in #88483.
This patch addresses the (performance )suggestions by checkcpp static
analyzer for couple of files. Here we use const reference for the
suggested function arguments.
Fixes#82263.
This flag enable the user to print debug Info from all the passes and
helpers inside polly at once. This will help a novice user as well to
work in polly without explicitly having to know which parts of polly has
actually kicked in and pass them via -debug-only.
These are the last remaining "trivial" changes to passes that use
Instruction pointers for insertion. All of this should be NFC, it's just
changing the spelling of how we identify a position.
In one or two locations, I'm also switching uses of getNextNode etc to
using std::next with iterators. This too should be NFC.
---------
Merged by: Stephen Tozer <stephen.tozer@sony.com>
It's becoming potentially unsafe to insert a PHI instruction using a plain
Instruction pointer. Switch all the remaining sites that create and insert
PHIs to use iterators instead. For example, the code in
ComplexDeinterleavingPass.cpp is definitely at-risk of mixing PHIs and
debug-info.
To fix long compile time issue of Schedule optimizer, patch #77280 sets
the upper cap on max ISL operations. In case of bailing out when ISL
quota is hit, error handling behavior was restored manually. This commit
replaces the restoration code with IslMaxOperationsGuard helper and also
removes redundant early return.
Existing reduction detection algorithm does two types of memory checks
before marking a load store pair as reduction.
Second check is to verify there is no other memory access in ScopStmt
overlapping with the memory of load and store that forms the reduction.
Existing check misses cases where there could be probable overlap such
as
A[V] += A[P];
In the above case there is chance of overlap between A[V] and A[P] which
is missed.
This commit addresses this by removing the parameter from space before
checking for compatible space.
Part 1 of this patch :
[75297](https://github.com/llvm/llvm-project/pull/75297)
Polly currently uses `getDebugLoc` in a few places to produce diagnostic
output; this is correct when interacting with specific instructions, but
may be incorrect when dealing with instruction ranges if debug
intrinsics are included. As a general rule, the debug locations attached
to debug intrinsics may be misleading compared to the surrounding
instructions, and are not generally used for anything other than
determining variable scope info; the recommended approach is therefore
to use `getStableDebugLoc` instead, which skips over debug intrinsics.
This is necessary to fix test failures that occur when enabling
non-instruction debug info, which removes debug intrinsics from basic
blocks and thus alters the diagnostic output of Polly (despite causing
no functional change).
Existing reduction detection algorithm does two types of memory checks
before marking a load store pair as reduction.
First is to check if load and store are pointing to the same memory. This
check right now detects the following case as reduction. sum[0] = sum[1]
+ A[i]
This is because the check compares only base of the memory addresses
involved and not their indices. This patch addresses this issue and
introduces some debug prints. Added couple of test cases to verify the
functionality of patch as well.
This changes the AliasSetTracker to track memory locations instead of
pointers in its alias sets. The motivation for this is outlined in an RFC
posted on LLVM discourse:
https://discourse.llvm.org/t/rfc-dont-merge-memory-locations-in-aliassettracker/73336
In the data structures of the AST implementation, I made the choice to
replace the linked list of `PointerRec` entries (that had to go anyway)
with a simple flat vector of `MemoryLocation` objects, but for the
`AliasSet` objects referenced from a lookup table, I retained the
mechanism of a linked list, reference counting, forwarding, etc. The
data structures could be revised in a follow-up change.
There is no upper cap set on current Schedule Optimizer to compute
schedule. In some cases a very long compile time taken to compute the
schedule resulting in hang kind of behavior. This patch introduces a
flag 'polly-schedule-computeout' to pass the capwhich is initialized to
300000. This patch handles the compute out cases by bailing out and
exiting gracefully.
Fixed the test that failed in previous commit.
Fixes#69090
This reverts commit d6c4d4c9b910e8ad5ed7cd4825a143742041c1f4.
Broke buildldbots with asserts disabled; -debug-only is only available in
asserts builds.
There is no upper cap set on current Schedule Optimizer to compute
schedule. In some cases a very long compile time taken to compute the
schedule resulting in hang kind of behavior. This patch introduces a
flag 'polly-schedule-computeout' to pass the capwhich is initialized to
300000. This patch handles the compute out cases by bailing out and
exiting gracefully.
Fixes#69090
Otherwise link may fail if user provided additional library to link with via CMAKE_EXE_LINKER_FLAGS. Concrete example is using custom allocator, LLVMSupport provides needed -lpthread in that case.
Closes: https://github.com/llvm/llvm-project/pull/65424
This patch pulls out the memory checks from the base reduction detection
algorithm. This is the first one in the reduction patch series, to
reduce the difference in future patches.
The header file has been deprecated since:
commit f09cf34d00625e57dea5317a3ac0412c07292148
Author: Archibald Elliott <archibald.elliott@arm.com>
Date: Tue Dec 20 10:24:02 2022 +0000
C++20 comes with std::erase to erase a value from std::vector. This
patch renames llvm::erase_value to llvm::erase for consistency with
C++20.
We could make llvm::erase more similar to std::erase by having it
return the number of elements removed, but I'm not doing that for now
because nobody seems to care about that in our code base.
Since there are only 50 occurrences of erase_value in our code base,
this patch replaces all of them with llvm::erase and deprecates
llvm::erase_value.
This removes `CreateMalloc` from `CallInst` and adds it to the `IRBuilderBase`
class.
We no longer needed the `Instruction *InsertBefore` and
`BasicBlock *InsertAtEnd` arguments of the `createMalloc` helper
function because we're using `IRBuilder` now. That's why I we also don't
need 4 `CreateMalloc` functions, but only two.
Differential Revision: https://reviews.llvm.org/D158861