Split from https://github.com/llvm/llvm-project/pull/133161
refactor the code to extract file helpers used in HTML generators for
use in other generators for clang-doc
This patch fixes the error where compiling with
-DLLVM_LINK_LLVM_DYLIB=ON broke the buildbot
Optimization of alloca instructions may lead to invalid alias tags.
Incorrect alias tags can result in incorrect optimization outcomes for
Fortran source code compiled by Flang with flags: `-O3 -mmlir
-local-alloc-tbaa -flto`.
This commit removes alias tags when memcpy optimization replaces two
arrays with one array, thus ensuring correct compilation of Fortran
source code using flags: `-O3 -mmlir -local-alloc-tbaa -flto`.
This commit is also a proposal to fix the reported issue:
https://github.com/llvm/llvm-project/issues/133984
---------
Co-authored-by: Shilei Tian <i@tianshilei.me>
This reverts commit 785e7f06ddb1ba36aa679d23436726dcf61f8afb.
It did not solve the problem with flang-aarch64-libcxx and caused
another failure with openmp-offload-amdgpu-runtime-2.
This is a follow up to f8ee58a3c, and improves code generation for the
XRivosVizip extension.
If we have a slide pair which could be a zipeven or zipodd if the
shuffle was widened, widen the shuffle and then mask the zipeven or
zipodd.
This is basically working around an order of matching issue; we match
the slide pair variants before trying widening. I considered whether we
should just widen slide pairs without any consideration of the zip
idioms, but the resulting codegen changes look mostly like churn, and
have no clear evidence of profitability.
The buidbot
[flang-aarch64-libcxx](https://lab.llvm.org/buildbot/#/builders/89) is
currently failing with an ABI issue. The suspected reason is that
LLVMSupport.a is built using libc++, but the unittests are using the
default C++ standard library, libstdc++ in this case. This predefined
`llvm_gtest` target uses the LLVMSupport from `find_package(LLVM)`,
which finds the libc++-built LLVMSupport.
To fix, store the `LLVM_ENABLE_LIBCXX` setting in the LLVMConfig.cmake
such that everything that links to LLVM libraries use the same standard
library. In this case discussed in
https://github.com/llvm/llvm-zorg/pull/387 it was the flang-rt
unittests, but other runtimes with GTest unittests should have the same
issue (e.g. offload), and any external project that uses
`find_package(LLVM)`.
clspv is already handling generation of fp16. This implementation is
preventing clspv from making the best choice to use an emulation on top
of fp32-fma, or the native fp16-fma, depending on the command-line
arguments.
This is an alternative to
https://github.com/llvm/llvm-project/pull/122103
In SPIR-V, private global variables have the Private storage class. This
PR adds a new address space which allows frontend to emit variable with
this storage class when targeting this backend.
This is covered in this proposal: llvm/wg-hlsl@4c9e11a
This PR will cause addrspacecast to show up in several cases, like class
member functions or assignment. Those will have to be handled in the
backend later on, particularly to fixup pointer storage classes in some
functions.
Before this change, global variable were emitted with the 'Function'
storage class, which was wrong.
FP_ROUND and FP_EXTEND the input value before FABSing it. This avoids
some bit twiddling to copy the sign bit from the input to the result. It
does introduce one extra FABS, but that is folded into another
instruction for free on AMDGPU, which is the only target currently
affected by this change.
I've always found this hard to read. Some upcoming changes make similar
computations, so I thought it's a good time to factor out this logic
into re-usable helpers and clean it up using LLVM's preferred
early-return style.
The LLVM dialect no longer has its own vector types. It uses
`mlir::VectorType` everywhere. Remove
`LLVM::getFixedVectorType/getScalableVectorType` and use
`VectorType::get` instead. This commit addresses a
[comment](https://github.com/llvm/llvm-project/pull/133286#discussion_r2022192500)
on the PR that deleted the LLVM vector types.
Moving code to another function can lead to missed optimization
opportunities, because function passes operate on smaller chunks of
code, and they cannot figure out all details.
One example of missed optimization opportunities after code extraction
is information about pointer alignment. The instruction combine pass
adds information about pointer alignment to LLVM intrinsic memcpy calls
if it can deduce it from the code or if align metadata is added. If this
information is not present, then further optimization passes can
generate inefficient code.
If we add align metadata to extracted pointers, then the instruction
combine pass can add the align attribute to the LLVM intrinsic memcpy
call and unblock further optimization.
Scope of changes:
1. Analyze MLIR map operations. Add information about the alignment of
objects that are passed by reference to OpenMP GPU kernels.
2. Propagate alignment information to the outlined by `CodeExtractor`
helper functions.
ModuloScheduleExpanderMVE and ModuloScheduleExpander are used sequentially in
certain use cases. It is necessary to update live intervals for ModuloScheduleExpanderMVE;
otherwise, crashes may occur.
Folds patterns such as:
unsigned foo(unsigned x, unsigned y) {
return x >= y ? x - y : x;
}
Before, on RISC-V:
sltu a2, a0, a1
addi a2, a2, -1
and a1, a1, a2
subw a0, a0, a1
Or, with Zicond:
sltu a2, a0, a1
czero.nez a1, a1, a2
subw a0, a0, a1
After, with Zbb:
subw a1, a0, a1
minu a0, a0, a1
Only applies to unsigned comparisons.
If `x >= y` then `x - y` is less than or equal `x`.
Otherwise, `x - y` wraps and is greater than `x`.
https://github.com/llvm/llvm-project/pull/134202 removed support for
`sym@page-offset` in instruction operands. This change is generally
reasonable since subtracting an offset from a symbol typically doesn’t
make sense for Mach-O due to its .subsections_via_symbols mechanism, which treats
them as separate atoms.
However, BoringSSL relies on a temporary symbol with a negative offset,
which can be meaningful when the symbol and the referenced location are
within the same atom.
```
../../third_party/boringssl/src/gen/bcm/p256-armv8-asm-apple.S:1160:25: error: unexpected token in argument list
adrp x23,Lone_mont@PAGE-64
```
It's worth noting that expressions involving @ can be complex and
brittle in MCParser, and much of the Mach-O @ offsets remains
under-tested.
* Allow default argument for parsePrimaryExpr. The argument, used by the niche llvm-ml,
should not require other targets to adapt.
Introduce 'erase(const ElemTy &V)' member function to allow the deletion
of a certain value from EquivClasses. This is essential for certain
scenarios that require modifying the contents of EquivClasses.
---------
Co-authored-by: Florian Hahn <flo@fhahn.com>
The Pointer class already has the capability to be a function pointer,
but we still classifed function pointers as PT_FnPtr/FunctionPointer.
This means when converting from a Pointer to a FunctionPointer, we lost
the information of what the original Pointer pointed to.
When copying unions, we need to only copy the active field of the source
union, which we were already doing. However, we also need to zero out
the (now) inactive fields, so we don't end up with dangling pointers in
those inactive fields.
This patch adds `memory(argmem: read, inaccessiblemem: readwrite)
mustprogress` to **recoverable** ubsan handlers in order to unblock some
memory/loop optimizations. It provides an average of 3% performance
improvement on llvm-test-suite (except for 49 test failures due to ubsan
diagnostics).
Closes https://github.com/llvm/llvm-project/issues/130093.
https://github.com/llvm/llvm-project/pull/120464 (and earlier CLs) added -fsanitize-merge functionality, which is intended to work for all "sanitizers". It is nearly correct for CFI.
This patch precommits some tests for CFI, to track the progress of future -fsanitize-merge fixes for CFI.
Fixes#112267
Implement the shader flag analysis to set the UseNativeLowPrecision DXIL
module flag.
The flag is only able to be set when the command-line flag
`-enable-16bit-types` is passed to clang-dxc, or equivalently
`-fnative-half-type` is passed to clang.
When the command-line flag is passed, a module metadata flag called
"dx.nativelowprec" is set to 1.
The DXILShaderFlags shader flags analysis checks that the module
metadata flag "dx.nativelowprec" is set to 1 and the DXIL Version is 1.2
or greater before setting the UseNativeLowPrecision DXIL module flag.
In isMaskedStoreOverwrite we process two stores that fully overwrite one
another, here we add support for predicated vector length stores so that
DSE will eliminate this variant of masked stores.
This is the follow up installment mentioned in:
https://reviews.llvm.org/D132700
This reverts commit 348374028970c956f2e49ab7553b495d7408ccd9 because #128619 doesn't handle the case when we have an empty frame from `getInliningInfoForAddress` because line num is 0 which makes it non-differentiable from missing debug info. So, we end up using the base filename from symtab again. Reverting for now until that issus is solved.
This patch dismantles G_SHUFFLE_VECTOR before lowering. The original
lowering would emit extract vector element ops. We found that by using
unmerged values the build vector op combine could find ways to fold.
Only enabled on AMDGPU.
This resolves#123631
This allows simplification of code that checks if a token is an
Objective-C keyword.
Also, delete the following in
UnwrappedLineParser::parseStructuralElement():
- an else-after-break in the tok::at case
- the copypasted code in the tok::objc_autoreleasepool case
Use `cudaMallocAsync` in the `CUFAllocDevice` allocator when asyncId is
provided.
More work is needed to be able to call `cudaFreeAsync` since the
allocated address and stream needs to be tracked.
This PR is mainly about exposing the python bindings for`
linalg::isaContractionOpInterface` and` linalg::inferContractionDims`.
---------
Signed-off-by: Bangtian Liu <liubangtian@gmail.com>