This is a fix for the issue #87758 where fast-math flags are not
propagated all builtins.
It seems like pragmas with fast math flags was only propagated to calls
of unary floating point builtins. This patch propagate them also for
binary and ternary floating point builtins.
I'm planning to remove StringRef::equals in favor of
StringRef::operator==.
- StringRef::operator== outnumbers StringRef::equals by a factor of 25
under llvm/ in terms of their usage.
- The elimination of StringRef::equals brings StringRef closer to
std::string_view, which has operator== but not equals.
- S == "foo" is more readable than S.equals("foo"), especially for
!Long.Expression.equals("str") vs Long.Expression != "str".
Adjust some of the [rand.dist] critical values that are too strict
- Most critical values are determined empirically by running each test
51
times with a different PRNG seed and finding the smallest symmetric
interval
around the median that contains 90% of the sample means, variances, etc.
- For the Kolmogorov-Smirnov tests, the alpha=0.1 critical value for
large N
is 1.224/sqrt(N).
- For normally distributed variates, the sample kurtosis is distributed
as
Normal(0, 24/N). For N=1e5, this gives a 90% confidence interval of
0+/-0.0255. For Binomial(40, 0.25), which is approximately normal, the
kurtosis is -0.0167, so the relative 90% CI is large, on the order of
0.0255/0.0167 = 153%. In most cases the distribution of the sample
kurtosis
isn't known analytically, but similarly large relative tolerances can be
expected if the kurtosis is near zero.
If the shuffle mask contains no undef elements, then we can move the freeze through a shuffle node.
This requires special case handling to create a new ShuffleVectorSDNode.
Includes VECTOR_SHUFFLE support for isGuaranteedNotToBeUndefOrPoison / canCreateUndefOrPoison.
Parsing support for floating point types was missing a few features:
1. Parsing floating point attributes from integer literals was supported
only for types with bitwidth smaller or equal to 64.
2. Downstream users could not use `AsmParser::parseFloat` to parse float
types which are printed as integer literals.
This commit addresses both these points. It extends
`Parser::parseFloatFromIntegerLiteral` to support arbitrary bitwidth,
and exposes a new API to parse arbitrary floating point given an
fltSemantics as input. The usage of this new API is introduced in the
Test Dialect.
When runOnEachFunctionWithUniqueAllocId is invoked with
ForceSequential=true, then the current implementation runs the function
with AllocId==0, which is the Id for the shared, non-unique, default
AnnotationAllocator.
However, the documentation for runOnEachFunctionWithUniqueAllocId
states:
```
/// Perform the work on each BinaryFunction except those that are rejected
/// by SkipPredicate, and create a unique annotation allocator for each
/// task. This should be used whenever the work function creates annotations to
/// allow thread-safe annotation creation.
```
Therefore, even when ForceSequential==true, a unique AllocId should be
used, i.e. different from 0.
In the current upstream BOLT this is presumably not depended on, but it
is needed to reduce memory usage for analyses that use a lot of
memory/annotations. Examples are the pac-ret and stack-clash analyses
that currently have prototype implementations as described in
https://discourse.llvm.org/t/rfc-bolt-based-binary-analysis-tool-to-verify-correctness-of-security-hardening/78148
These analyses use the DataFlowAnalysis framework to sometimes store
quite a lot of information on each MCInst. They run in parallel on each
function. When the dataflow analysis is finished, the annotations on
each MCInst can be removed, hugely saving on memory consumption. The
only annotations that need to remain are those that indicate some
unexpected properties somewhere in the binary.
Fixing this bug enables implementing the deletion of the memory used by
those huge number of DataFlowAnalysis annotations (by invoking
BC.MIB->freeValuesAllocator(AllocatorId)), even when run with
--no-threads. Without this bug fixed, the invocation of
BC.MIB->freeValuesAllocator(AllocatorId) results in also the memory for
all other annotations to be deleted, as AllocatorId is 0.
---------
Co-authored-by: Maksim Panchenko <maks@meta.com>
The InstCombine contributor guide already says:
> Handle non-splat vector constants if doing so is free, but do
> not add handling for them if it adds any additional complexity
> to the code.
This change strengthens this guideline to explicitly discourage
asking (new) contributors to implement non-splat support during code
reviews. Doing so will almost certainly increase the number of
necessary review iterations, or result in outright contradictory review
feedback, as different people are willing to accept a different degree
of complexity for non-splat vector support.
Targets with no `-fstack-split` support now emit `ld.lld: error: target
doesn't support split stacks` instead of `UNREACHABLE executed` with a
backtrace asking the user to report a bug.
Resolves#88061
This change corrects an invalid behavior in pass
`--buffer-loop-hoisting`. The pass is in charge of extracting buffer
allocations (e.g., `memref.alloca`) from loop regions (e.g., `scf.for`)
when possible. This works OK for looks with sequential execution
semantics. However, a buffer allocated in the body of a parallel loop
may be concurrently accessed by multiple thread to store its local data.
Extracting such buffer from the loop causes all threads to wrongly share
the same memory region.
In the following example, dimension 1 of the input tensor is reversed.
Dimension 0 is traversed with a parallel loop.
```
func.func @f(%input: memref<2x3xf32>) -> memref<2x3xf32> {
%c0 = index.constant 0
%c1 = index.constant 1
%c2 = index.constant 2
%c3 = index.constant 3
%output = memref.alloc() : memref<2x3xf32>
scf.parallel (%index) = (%c0) to (%c2) step (%c1) {
// Create subviews for working input and output slices
%input_slice = memref.subview %input[%index, 2][1, 3][1, -1] : memref<2x3xf32> to memref<1x3xf32, strided<[3, -1], offset: ?>>
%output_slice = memref.subview %output[%index, 0][1, 3][1, 1] : memref<2x3xf32> to memref<1x3xf32, strided<[3, 1], offset: ?>>
// Copy the input slice into this temporary buffer. This intermediate
// copy is unnecessary, but is used for illustration purposes.
%temp = memref.alloc() : memref<1x3xf32>
memref.copy %input_slice, %temp : memref<1x3xf32, strided<[3, -1], offset: ?>> to memref<1x3xf32>
// Copy temporary buffer into output slice
memref.copy %temp, %output_slice : memref<1x3xf32> to memref<1x3xf32, strided<[3, 1], offset: ?>>
scf.reduce
}
return %output : memref<2x3xf32>
}
```
The patch submitted here prevents `%temp = memref.alloc() :
memref<1x3xf32>` from being hoisted when the containing op is
`scf.parallel` or `scf.forall`. A new op trait called
`HasParallelRegion` is introduced and assigned to these two ops to
indicate that their regions have parallel execution semantics.
@joker-eph @ftynse @nicolasvasilache @sabauma
We need it to test isel related passes. Currently
`verifyMachineFunction` is incomplete (no LiveIntervals support), but is
enough for testing isel pass, will migrate to complete
`MachineVerifierPass` in future.
While we don't currently rewrite the hints on manually hot/cold hinted
allocations, enable optionally matching profiles onto those allocations
as a first step to being able to do this.
By explicitly checking whether the library function is in the list of
operator new also fixes one limitation of the prior call to isNewLikeFn.
Some operator new calls (those that specify nothrow) are considered
Malloc-like because they may return null. We want to be able to match
and rewrite these. Therefore the new test uses a nothrow variant to test
the fix for this as well.
This is a new take on #89111. Now that #90040 is merged, this has become
trivial to implement. The added test shows the kind of benefit that we
get from this: now dim-of-expand-shape naturally folds without us
needing to implement an ad-hoc folding rewrite.
I'm not entirely sure what the criteria for 'bleeding-edge' used to be,
but at this point it seems to be the set of all added features in LLVM.
This adds remaining features to bleeding-edge config.
This fixes a bug in #90152 where `operator=` was never looked up in the
current instantiation, resulting in `<` never being interpreted as the
start of a template argument list.
Since function templates are not copy/move assignment operators, the fix
is accomplished by allowing lookup in the current instantiation for
`operator=` when looking up a template name.
Pass in CallLoweringInfo (CLI) instead of passing in the various fields
directly. Also pass in CCState (CCInfo), which is computed in both the
caller and the callee for a minor efficiency saving. There may also be a
small correctness improvement for sibcalls with vectorcall, which has an
odd way of recomputing argument locations.
This is a step towards improving the handling of musttail on armv7,
which we have numerous issues filed about in our tracker.
I took inspiration for this from the RISCV tail call eligibility check,
which uses a similar prototype.
This behavior is true for all attributes, but this behavior can be
surprising for attributes which have function-wide effects, such as
`optnone` and `target`. Most other function attributes affect the
prototype or semantics, but do not affect code generation in the
function body. I believe it is worth calling this out in the
documentation of these function-wide attributes. There may be more,
these were the two that came to mind.
While adding a UI feature in VSCode to toggle hex/dec in variables view
window. I noticed that it does not work after second toggle. Then I
noticed that there is a bug that we only explicitly set hex format not
reset back to default during further toggle. The new test demonstrates
the bug.
This PR resets the format back to default if not using hex. One
complexity is that, we explicitly set registers value format to
AddressInfo, which shouldn't be overridden by default or hex settings.
---------
Co-authored-by: jeffreytan81 <jeffreytan@fb.com>
This reverts commit 2e3e0868748635b779ba89a772eae3664bd822e4. It caused
quadratic slowdown at compilation time in some cases. See the comments
in the original PR: https://github.com/llvm/llvm-project/pull/89069
These check lines break as of 91446e2aa687e due to changes in how LLVM
handles debug information. Since debug informaiton isn't important to
what this test is verifying we can remove the check lines.
There is no real motivation for this change other than to highlight a
case where the new `Checked` matcher API can handle non-splat-vecs
without increasing code complexity.
Closes#85676
The new API is:
`m_CheckedInt(Lambda)`/`m_CheckedFp(Lambda)`
- Matches non-undef constants s.t `Lambda(ele)` is true for all
elements.
`m_CheckedIntAllowUndef(Lambda)`/`m_CheckedFpAllowUndef(Lambda)`
- Matches constants/undef s.t `Lambda(ele)` is true for all
elements.
The goal with these is to be able to replace the common usage of:
```
match(X, m_APInt(C)) && CustomCheck(C)
```
with
```
match(X, m_CheckedInt(C, CustomChecks);
```
The rationale if we often ignore non-splat vectors because there are
no good APIs to handle them with and its not worth increasing code
complexity for such cases.
The hope is the API creates a common method handling
scalars/splat-vecs/non-splat-vecs to essentially make this a
non-issue.
To avoid losing information, we can propagate some access attribute
from the to-be-inlined callee to its callsites.
We can propagate argument memory access attributes to callsite
parameters if they are from the same underlying object.
Closes#89024
Summary:
Previous patches added support for the LLVM rounding intrinsic
functions. This patch allows them to me emitted using the clang builtins
when targeting AMDGPU.
This patch adds a basic version of a combine that attempts to remove
shuffles that when combined simplify away to an identity shuffle. For
example:
%ab = shufflevector <8 x half> %a, <8 x half> poison, <4 x i32> <i32 3,
i32 2, i32 1, i32 0>
%at = shufflevector <8 x half> %a, <8 x half> poison, <4 x i32> <i32 7,
i32 6, i32 5, i32 4>
%abt = fneg <4 x half> %at
%abb = fneg <4 x half> %ab
%r = shufflevector <4 x half> %abt, <4 x half> %abb, <8 x i32> <i32 7,
i32 6, i32 5, i32 4, i32 3, i32 2, i32 1, i32 0>
By looking through the shuffles and fneg, it can be simplified to:
%r = fneg <8 x half> %a
The code tracks each lane starting from the original shuffle, keeping a
track of a vector of {src, idx}. As we propagate up through the
instructions we will either look through intermediate instructions
(binops and unops) or see a collections of lanes that all have the same
src and incrementing idx (an identity). We can also see a single value
with identical lanes, which we can treat like a splat.
Only the basic version is added here, handling identities, splats,
binops and unops. In follow-up patches other instructions can be added
such as constants, intrinsics, cmp/sel and zext/sext/trunc.
Add a T-style log handler that multiplexes messages to two log handlers.
The goal is to use this in combination with the SystemLogHandler to log
messages both to the user requested file as well as the system log. The
latter is part of a sysdiagnose on Darwin which is commonly attached to
bug reports.
The high level goal is to have 1 way of converting a DW_TAG value into a
human-readable string.
There are 3 ways this change accomplishes that:
1.) Changing DW_TAG_value_to_name to not create custom error strings.
The way it was doing this is error-prone: Specifically, it was using a
function-local static char buffer and handing out a pointer to it.
Initialization of this is thread-safe, but mutating it is definitely
not. Multiple threads that want to call this function could step on
each others toes. The implementation in this patch sidesteps the issue
by just returning a StringRef with no mention of the tag value in it.
2.) Changing all uses of DW_TAG_value_to_name to log the value of the
tag since the function doesn't create a string with the value in it
anymore.
3.) Removing `DWARFBaseDIE::GetTagAsCString()`. Callers should call
DW_TAG_value_to_name on the tag directly.
Add anew test showing how a loop gets vectorized incorrectly with a
invariant store reduction where the same location is also read, when
vectorizing with runtime checks.
This PR will fix the si-mode-register pass which is inserting an extra
setreg instruction in case of constrained FP operations. This pass will
be ignored for strictfp functions.