OpenACC specification describes the following type categories: scalar,
array, composite, and aggregate (which includes arrays, composites, and
others such as Fortran pointer/allocatable).
Decision for how to do implicit mapping is dependent on a variable's
category. Since acc dialect's only means of distinguishing between types
is through the interfaces attached, add API to be able to get the type
category.
In addition to defining the new API, attempt to provide a base
implementation for memref which matches what OpenACC spec describes.
* Drop ".Z" in milestone name since we've been doing X.Y releases
instead of X.Y.Z releases since LLVM 18
* Add "LLVM" prefix since that's what release milestones are named
* Use a numbered list to make it clearer that there are two steps
needed, and add some more details to the first step
This patch adds the remaining support for fixed-point arithmetic
instructions (we previously had support for averaging adds and
subtracts).
For saturating adds/subs/multiplies/clips, we can't change `vl` if
`vxsat` is used, since changing `vl` may change its value. So this patch
checks to see if it's dead before considering it a candidate.
Fixes#103016
This is the last missing piece for the C++20 CTAD alias feature. No
release note being added in this PR yet, I will send out a follow-up
patch to mark this feature done.
(Since the release 20 branch is cut, I think we should target on
clang21).
Also move the -fno-strict-overflow option definition next to the
-fstrict-overflow one while here.
Also add test coverage for f(no-)wrapv-pointer being a clang-cl option.
After #124066 we started allowing users that are passthrus. However for
widening/narrowing instructions we were returning the wrong operand info
for passthru operands since it originally assumed the operand would
never be a passthru. This fixes it by handling it in IsMODef.
For `new A[N]{1,2,3}`, we need to allocate N elements of type A, and
initialize the first three with the given InitListExpr elements.
However, if N is larger than 3, we need to initialize the remaining
elements with the InitListExpr array filler.
Similarly, for `new A[N];`, we need to initilize all fields with the
constructor of A. The initializer type is a CXXConstructExpr of
IncompleteArrayType in this case, which we can't generally handle.
This change builds on PR #125683, which added a cost model for fmuladd.
To ensure completeness, this patch extends the cost model to also cover fma, using the same costing approach as fmuladd.
I plan to send a follow-up patch that includes the cost model vp_fma and vp_fmuladd, and their tests.
Previously commit 6e17ed9b04e5523cc910bf171c3122dcc64b86db deleted the
obsolete checker `alpha.security.ArrayBound` which was implemented in
`ArrayBoundChecker.cpp` and renamed the checker
`alpha.security.ArrayBoundV2` to `security.ArrayBound`.
This commit concludes that consolidation by renaming the source file
`ArrayBoundCheckerV2.cpp` to `ArrayBoundChecker.cpp` (which was "freed
up" by the previous commit).
Now that linalg.matmul is in tablegen, "hand write" the Python wrapper
that OpDSL used to derive. Similarly, add a Python wrapper for the new
linalg.contract op.
Required following misc. fixes:
1) make linalg.matmul's parsing and printing consistent w.r.t. whether
indexing_maps occurs before or after operands, i.e. per the tests cases
it comes _before_.
2) tablegen for linalg.contract did not state it accepted an optional
cast attr.
3) In ODS's C++-generating code, expand partial support for `$_builder`
access in `Attr::defaultValue` to full support. This enables access to
the current `MlirContext` when constructing the default value (as is
required when the default value consists of affine maps).
We already had getOperandInfo support, so this marks the instructions as
supported in isCandidate. It also adds support for vfwmaccbf16.v{v,f}
from zvfbfwma
The comment about inbounds protecting only against unsigned wrapping is
incorrect: it also protects against signed wrapping, but the issue is
that it could cross the sign boundary.
When we first began (2015) to lower f32/f64 selects to
X86ISD::BLENDV(scalar_to_vector(),scalar_to_vector(),scalar_to_vector()),
we limited it to AVX targets to avoid issues with SSE41's xmm0
constraint for the condition mask.
Since then we've seen general improvements in TwoAddressInstruction and
better handling of condition commutation for X86ISD::BLENDV nodes, which
should address many of the original concerns of using SSE41 BLENDVPD/S.
In most cases we will replace 3 logic instruction with the BLENDV node
and (up to 3) additional moves. Although the BLENDV is often more
expensive on original SSE41 targets, this should still be an improvement
in a majority of cases.
We also have no equivalent restrictions for SSE41 for v2f64/v4f32 vector
selection.
Fixes#105807
Based on some feedback in Discord about a PR where a reviewer asked the
author to move the formatting changes to a new PR, which appears to
contradict the current form of this document.
I've added an explanation here, before the point where the author would
be committing any of the formatting changes.
There are other ways this can go, for example some projects don't want
the churn of formatting, or you can pre-emptively send a formatting PR,
but I don't think enumerating them all here will help the audience for
this text.
So I've recomended one path that will start them off well, and can
branch off if the reviewers make requests.
This removes all remaining SPIR-V workarounds for CLC functions, in an
effort to streamline the CLC implementation and prevent further issues
that #124614 had to fix. This commit fixes the same issue for the SPIR-V
targets.
Target-specific CLC implementations can and will exist, but for now
they're all identical and so the target-specific SOURCES files have been
removed. Target implementations now always include the 'generic' CLC
directory, meaning we can avoid unnecessary duplication of SOURCES
listings.
There are two ways we can fix this problem, depending on how the
semantics of byval and initializes should interact:
* Don't infer initializes on byval arguments. initializes on byval
refers to the original caller memory (or having both attributes is made
a verifier error).
* Infer initializes on byval, but don't use it in DSE. initializes on
byval refers to the callee copy. This matches the semantics of readonly
on byval. This is slightly more powerful, for example, we could do a
backend optimization where byval + initializes will allocate the full
size of byval on the stack but not copy over the parts covered by
initializes.
I went with the second variant here, skipping byval + initializes in DSE
(FunctionAttrs already doesn't propagate initializes past byval). I'm
open to going in the other direction though.
Fixes https://github.com/llvm/llvm-project/issues/126181.
The code already guards against values coming from a previous iteration
using properlyDominates(). However, addrecs are considered to properly
dominate the loop they are defined in.
Handle this special case separately, by checking for expressions that
have computable loop evolution (this should cover cases like a zext of
an addrec as well).
I considered changing the definition of properlyDominates() instead, but
decided against it. The current definition is useful in other context,
e.g. when deciding whether an expression is safe to expand in a given
block.
Fixes https://github.com/llvm/llvm-project/issues/126012.
When we launch hipcc with multiple offload architectures along with -MF
dep_file flag, the clang compilation invocations for host and device
offloads write to the same dep_file, and can lead to collision during
file IO operations. This can typically happen during large workloads.
This commit provides a fix to generate dep_file only in host
compilation.
---------
Co-authored-by: anikelal <anikelal@amd.com>
This patch adds a new loop to LoopIdiomVectorizePass, enabling it to
recognise and vectorise loops such as:
```cpp
template<class InputIt, class ForwardIt>
InputIt find_first_of(InputIt first, InputIt last,
ForwardIt s_first, ForwardIt s_last)
{
for (; first != last; ++first)
for (ForwardIt it = s_first; it != s_last; ++it)
if (*first == *it)
return first;
return last;
}
```
These loops match the C++ standard library function `std::find_first_of`.
4ce1f9079d4d3 [AMDGPU] Allow rematerialization of instructions with
virtual register uses (#124327)
made changes that require an ordered traversal of a DenseMap. Changing
it to MapVector which
respects insertion order.
LLVM has functionality for producing a register-class-specific error
message in the assembly parser, rather than just emitting the generic
"invalid operand for instruction" error.
This starts the gradual adoption of this functionality for RISC-V, with
some lesser-used shadow-stack register classes:
- GPRX1 (only contains `ra`)
- GPRX5 (only contains `t0`)
- GPRX1X5 (only contains `ra` and `t0`)
LLVM is reasonably conservative about when these errors are used, in
particular you have to have all the features for the relevant mnemonic
enabled before it will do, hence the test updates.
This also merges a pair of almost identical rv32/rv64 test files into a
single file with one run line.
Try to use the _SC_NPROCESSORS_ONLN sysconf elsewhere
(cherry picked from commit edb1e76d8cb080a396c7c992e5d4023e1a777bd1)
Replace usage of deprecated sysctl on macOS
(cherry picked from commit faaa266d33ff203e28b31dd31be9f90c29f28d04)
Retrieve the number of online CPUs on OpenBSD and NetBSD
(cherry picked from commit 41e81b1ca4bbb41d234f2d0f2c56591db78ebb83)
Update error message now that /proc/cpuinfo is no longer in use
(cherry picked from commit c35af58b61daa111c93924e0e7b65022871fadac)
Fix runtime crash when parsing /proc/cpuinfo fails
(cherry picked from commit 39be87d3004ff9ff4cdf736651af80c3d15e2497)
another reversal of something that breaks on wasm
(cherry picked from commit 44507bc91ff9a23ad8ad4120cfc6b0d9bd27e2ca)
In my previous PR (#123656) to update the names of AVX10.2 intrinsics
and mnemonics, I have erroneously deleted `_ph` from few intrinsics.
This PR corrects this.