Addition of `inlinehint` attributes for CallOps in MLIR in order to be
able to say to a function call that the inlining is desirable without
having the attribute on the FuncOp.
.. which prepares us for handling of discontinuous functions. The main
change there is that we can have multiple FDEs contributing towards an
unwind plan of a single function. This patch separates the logic for
parsing of a single FDE from the construction of an UnwindPlan (so that
another patch can change the latter to combine several unwind plans).
Co-authored-by: Jonas Devlieghere <jonas@devlieghere.com>
This crash will caused if run this testcase:
llvm/test/CodeGen/AMDGPU/llvm.amdgcn.ds.gws.barrier-fastregalloc.ll
When build the SDNode, precisely build the SDNode for this ir:
```ir
call void @llvm.amdgcn.ds.gws.barrier(i32 %val, i32 0)
```
If want call the dump function of the new SDNode in the gdb environment
like this:
```gdb
p N->dump()
```
The llvm will crash.
All of these is because calling ```dump()``` will cause the
calling```MachineMemOperand::print()```
with the argument value for the```TII``` is nullptr.
And the llvm/lib/CodeGen/MachineOperand.cpp#L1235 is a derefrence of
TII.
Signed-off-by: fanfuqiang <fuqiang.fan@mthreads.com>
This is a follow-up to https://github.com/llvm/llvm-project/pull/131074.
After moving the default argument heuristic to `simplifyType` in that
patch, the heuristic no longer applied to the
`DependentScopeDeclRefExpr` case, because that wasn't using
`simplifyType`.
This patch fixes that, with an added testcase.
This patch adds support for parsing symbols in the Xqcilb long branch
instructions. The instructions use the `R_RISCV_QC_E_JUMP_PLT`
relocation and the `InstFormatQC_EJ` instruction format.
Vendor relocation support will be added in a later patch.
This avoids needing to hardcode the mapping between architectures and
their sizes of long doubles.
This fixes a case in test_demangle.pass.cpp, that previously failed like
this (XFAILed):
.---command stdout------------
| Testing 29859 symbols.
| _ZN5test01hIfEEvRAcvjplstT_Le4001a000000000000000E_c should be invalid
but is not
| Got: 0, void test0::h<float>(char (&) [(unsigned int)(sizeof (float) +
0x0.07ff98f7ep-1022L)])
`-----------------------------
.---command stderr------------
| Assertion failed: !passed && "demangle did not fail", file
libcxxabi/test/test_demangle.pass.cpp, line 30338
`-----------------------------
This testcase is defined within
// Is long double fp80? (Only x87 extended double has 64-bit mantissa)
#define LDBL_FP80 (__LDBL_MANT_DIG__ == 64)
...
#if !LDBL_FP80
...
#endif
The case failed on x86 architectures with an unusual size for long
doubles, as the test expected the demangler to not be able to demangle
an 80 bit long double (based on the `__LDBL_MANT_DIG__ == 64` condition
in the test). However as the libcxxabi implementation was hardcoded to
demangle 80 bit long doubles on x86_64 regardless of the actual size,
this test failed (by unexpectedly being able to demangle it).
By configuring libcxxabi's demangling of long doubles to match what the
compiler specifies, we no longer hit the expected failures in the
test_demangle.pass.cpp test on Android on x86.
This makes libcxxabi require a GCC-compatible compiler that defines
nonstandard defines like `__LDBL_MANT_DIG__`, but I presume that's
already essentially required anyway.
... to clarify ownership, aligning with other parameters. Using
`std::unique_ptr` encourages users to manage `createMCInstPrinter` with
a unique_ptr instead of a raw pointer, reducing the risk of memory
leaks.
* llvm-mc: fix a leak and update llvm/test/tools/llvm-mc/disassembler-options.test
* #121078 copied the llvm-mc code to CodeGenTargetMachineImpl and made
the same mistake. Fixed by 2b8cc651dca0c000ee18ec79bd5de4826156c9d6
Using unique_ptr requires #include MCInstPrinter.h in a few translation
units.
* Delete a createAsmStreamer overload I deprecated in 2024
* SystemZMCTargetDesc.cpp: rename to `createSystemZAsmStreamer` to fix
an overload conflict.
Pull Request: https://github.com/llvm/llvm-project/pull/135128
This can be done with a vrgather.vi/vx, and (possibly) a register move.
The alternative is to do a vrgather.vv with a full width index vector.
We'd already caught the two operands forms of this shuffle; this patch
specifically handles the single operand form.
Unfortunately only in abstract, it would be nice if we canonicalized
shuffles in some way wouldn't it?
Fixes error: ISO C++20 considers use of overloaded operator '==' (with
operand types 'MacosVersion' and 'MacosVersion') to be ambiguous despite
there being a unique best viable function
[-Werror,-Wambiguous-reversed-operator].
This converts the comparison operator from a non-symmetric operator
(const VersionBase<VersionType>& (as "this") and const VersionType &).
into a symmetric operator
Relands #135068
Co-authored-by: Ivan Tadeu Ferreira Antunes Filho <antunesi@google.com>
This patch defines `fir::SafeTempArrayCopyAttrInterface` and the
corresponding
OpenACC/OpenMP related attributes in FIR dialect. The actual
implementations
are just placeholders right now, and array repacking becomes a no-op
if `-fopenacc/-fopenmp` is used for the compilation.
The NVVMReflect pass simply replaces calls to nvvm-reflect functions
with the appropriate constant, either the architecture number, or
nvvm-reflect-ftz, found in the module's metadata.
The implementation is inefficient and does this by traversing through
all instructions to find calls. The common case is that you never call
nvvm-reflect, so this traversal is costly.
This PR:
- Updates the pass so that it finds the reflect functions by name, and
then traverses through their uses to find the calls directly.
- Adds a line (245) to make sure the dead nvvm-reflect definitions are
erased.
- Adds the ability to set reflect values via command line
In the LLVM middle-end we want to fold `gep inbounds null, idx -> null`:
https://alive2.llvm.org/ce/z/5ZkPx-
This pattern is common in real-world programs
(https://github.com/dtcxzyw/llvm-opt-benchmark/pull/55#issuecomment-1870963906).
Generally, it exists in some (actually) unreachable blocks, which is
introduced by JumpThreading.
However, some old-style offsetof macros are still widely used in
real-world C/C++ code (e.g., hwloc/slurm/luajit). To avoid breaking
existing code and inconvenience to downstream users, this patch removes
the inbounds flag from the struct gep if the base pointer is null.
`WidenIV::widenWithVariantUse` assumes that exactly one of the binop
operands is the IV to be widened. This miscompilation happens when it
tries to sign-extend the "NonIV" operand while the IV is zero-extended.
Closes https://github.com/llvm/llvm-project/issues/135182.
The 'set' lowering is pretty trivial. 'device_type' is a little more
restricted since both the MLIR-Dialect and language limit it to only 1
value (as confirmed by standards-discussion).
This patch implements 'set', with 'device_type', since 'set' requires at
least 1 clause, and this is the least difficult to implement at the
moment.
This PR is mainly about exposing the python bindings for
`linalg::isaConvolutionOpInterface` and `linalg::inferConvolutionDims`.
---------
Signed-off-by: Bangtian Liu <liubangtian@gmail.com>
This is a basic implementation of P2719: "Type-aware allocation and
deallocation functions" described at http://wg21.link/P2719
The proposal includes some more details but the basic change in
functionality is the addition of support for an additional implicit
parameter in operators `new` and `delete` to act as a type tag. Tag is
of type `std::type_identity<T>` where T is the concrete type being
allocated. So for example, a custom type specific allocator for `int`
say can be provided by the declaration of
void *operator new(std::type_identity<int>, size_t, std::align_val_t);
void operator delete(std::type_identity<int>, void*, size_t, std::align_val_t);
However this becomes more powerful by specifying templated declarations,
for example
template <typename T> void *operator new(std::type_identity<T>, size_t, std::align_val_t);
template <typename T> void operator delete(std::type_identity<T>, void*, size_t, std::align_val_t););
Where the operators being resolved will be the concrete type being
operated over (NB. A completely unconstrained global definition as above
is not recommended as it triggers many problems similar to a general
override of the global operators).
These type aware operators can be declared as either free functions or
in class, and can be specified with or without the other implicit
parameters, with overload resolution performed according to the existing
standard parameter prioritisation, only with type parameterised
operators having higher precedence than non-type aware operators. The
only exception is destroying_delete which for reasons discussed in the
paper we do not support type-aware variants by default.
I am making a CG pass to depend on `FIROpenACCSupport` in #134346.
This introduces a cyclic dependency between `FIROpenACCSupport`
and `FIRCodeGen`. This patch splits `FIRCodeGen` into
`FIRCodeGenDialect` (for FIR CG dialect definition) and `FIRCodeGen`
(for the CG passes).
Now, `FIROpenACCSupport` depends on `FIRCodeGenDialect`,
and `FIRCodeGen` depends on `FIROpenACCSupport`.
Fixes#112270
Completed ACs:
- `-res-may-alias` clang-dxc command-line option added
- It inserts and sets a module metadata flag `dx.resmayalias` to 1
- Shader flag set appropriately:
- The flag IS NOT set if DXIL Version <= 1.6 OR the command-line option
`-res-may-alias` is specified
- Otherwise the flag IS set when:
- DXIL Version > 1.7 AND function uses UAVs, OR
- DXIL Version <= 1.7 AND UAVs present globally
- Add tests
- Tests for Shader Models 6.6, 6.7, and 6.8 corresponding to DXIL
Versions 1.6, 1.7, and 1.8
- Tests (`res-may-alias-0.ll`/`res-may-alias-1.ll`) for when the module
metadata flag `dx.resmayalias` is set to 0 or 1 respectively
- A frontend test (`res-may-alias.hlsl`) for testing that that the
command-line option `-res-may-alias` inserts `dx.resmayalias` module
metadata correctly
This PR implements `verify-diagnostics=only-expected` which is a "best
effort" verification - i.e., `unexpected`s and `near-misses` will not be
considered failures. The purpose is to enable narrowly scoped checking
of verification remarks (just as we have for lit where only a subset of
lines get `CHECK`ed).
This PR fixes a bug that when a template specialization is declared with
a forward declaration of a template, the checker fails to find its
definition in the same translation unit and erroneously emit an unsafe
forward declaration warning.
This PR adds the support for recognizing calling adoptCF/adoptNS on the
result of a cast operation on the return value of a function which
creates NS or CF types. It also fixes a bug that we weren't reporting
memory leaks when CF types are created without ever calling RetainPtr's
constructor, adoptCF, or adoptNS.
To do this, this PR adds a new mechanism to report a memory leak
whenever create or copy CF functions are invoked unless this CallExpr
has already been visited while validating a call to adoptCF. Also added
an early exit when isOwned returns IsOwnedResult::Skip due to an
unresolved template argument.
The recently announced IBM z17 processor implements the architecture
already supported as "arch15" in LLVM. This patch adds support for "z17"
as an alternate architecture name for arch15.
This patch also add the scheduler description for the z17 processor,
provided by Jonas Paulsson.
Discussions with the OpenACC Standard folks and the dialect folks showed
that the ability to have 'set' have a 'device_type' with more than one
architecture was a mistake, and one that will be fixed in future
revisions of the standard. Since the dialect requires this anyway,
we'll implement this in advance of standardization.
Currently, the only way for users to try these schedulers is via
`-misched=` . However, this overrides the default scheduler for all
targets. This causes problems for various toolchains / drivers which
spawn jobs for both x86 and AMDGPU -- e.g. hipcc. On the other hand,
`amdgpu-sched-strategy` only changes the scheduler for AMDGPU target.
If we have a near identity shuffle with a single element repeated, we
manage to match this as a masked vrgather.vi for the two operand
forms, but not the single operand form. If the scalar being repeated
was a scalar just inserted into the vector, we're also missing a
chance to recognize a vmerge.vxm or vmerge.vim in both cases.
There are some opcodes that currently require specialized recipes, due
to their result type not being implied by their operands, including
casts.
This leads to duplication from defining multiple full recipes.
This patch introduces a new VPInstructionWithType subclass that also
stores the result type. The general idea is to have opcodes needing to
specify a result type to use this general recipe. The current patch
replaces VPScalarCastRecipe with VInstructionWithType, a similar patch
for VPWidenCastRecipe will follow soon.
There are a few proposed opcodes that should also benefit, without the
need of workarounds:
* https://github.com/llvm/llvm-project/pull/129508
* https://github.com/llvm/llvm-project/pull/119284
PR: https://github.com/llvm/llvm-project/pull/129706
In powerpc64-unknown-linux-musl, signal.h does not include asm/ptrace.h,
which causes "member access into incomplete type 'struct pt_regs'"
errors. Include the header explicitly to fix this.
Also in sanitizer_linux_libcdep.cpp, there is a usage of TlsPreTcbSize
which is not defined in such a platform. Guard the branch with macro.
On Cygwin at least, GCC is not passing any -m flag to the linker, so
fall back to the default target triple to determine if we need to apply
i386-specific behaviors.
Fixes#134558