This PR has the following changes:
Replace llvm-link with calls to linkInModule to link device files Add
-print-linked-module option to dump linked module for testing Added a
test to verify that linking is working as expected. We will eventually
move to using thin LTO for linking device inputs.
Thanks
---------
Signed-off-by: Arvind Sudarsanam <arvind.sudarsanam@intel.com>
Try to reduce individual subtarget features in the "target-features"
attribute. This attempts a textual removal of the fields in the string,
not a semantic removal. Typically there's a lot of redundant feature spam
in the feature list implied by the target-cpu (which I really wish clang
would stop emitting). If we could parse these out, we could easily drop
the fields without testing anything.
This commit extends the lowering of amdgpu.mfma to handle the new
double-rate MFMAs in gfx950 and adds tests for these operations.
It also adds support for MFMAs on small floats (f6 and f4), which are
implented using the "scaled" MFMA intrinsic with a scale value of 0 in
order to have an unscaled MFMA.
This commit does not add a `amdgpu.scaled_mfma` operation, as that is
future work.
---------
Co-authored-by: Jakub Kuderski <kubakuderski@gmail.com>
The current API doesn't have a way to unset it. The query returns
an optional, but the set doesn't. Alternatively I could switch the
set to also use optional.
So the dSYM can be told what target it has been loaded into.
When lldb is loading modules, while creating a target, it will run
"command script import" on any Python modules in Resources/Python in the
dSYM. However, this happens WHILE the target is being created, so it is
not yet in the target list. That means that these scripts can't act on
the target that they a part of when they get loaded.
This patch adds a new python API that lldb will call:
__lldb_module_added_to_target
if it is defined in the module, passing in the Target the module was
being added to, so that code in these dSYM's don't have to guess.
Not sure this is the right place to put it. This is a property
of GlobalVariable, not GlobalValue. But the ReduceGlobalVars
reduction tries to delete the value entirely.
There are V2S copies between vpgr16 and spgr32 in true16 mode. This is
caused by vgpr16 and sgpr32 both selectable by 16bit src in ISel.
When a V2S copy and its useMI are lowered to VALU, this patch check
1. If the generated new VALU is used by a true16 inst. Add subreg access
if necessary.
2. Legalize the V2S copy by replacing it to subreg_to_reg
an example MIR looks like:
```
%2:sgpr_32 = COPY %1:vgpr_16
%3:sgpr_32 = S_OR_B32 %2:sgpr_32, ...
%4:vgpr_16 = V_ADD_F16_t16 %3:sgpr_32, ...
```
currently lowered to
```
%2:vgpr_32 = COPY %1:vgpr_16
%3:vgpr_32 = V_OR_B32 %2:vgpr_32, ...
%4:vgpr_16 = V_ADD_F16_t16 %3:vgpr_32, ...
```
after this patch
```
%2:vgpr_32 = SUBREG_TO_REG 0, %1:vgpr_16, lo16
%3:vgpr_32 = V_OR_B32 %2:vgpr_32, ...
%4:vgpr_16 = V_ADD_F16_t16 %3.lo16:vgpr_32, ...
```
Not sure why the "fold-all" option naming didn't match the
variable "FoldPreOutputs", but I've preserved the difference.
More annoyingly, the pass name "normalize" does not match the pass
name IRNormalizer and should probably be fixed one way or the other.
Also the existing test coverage for the flags is lacking. I've added
a test that shows they parse, but we should have tests that they
do something.
MacOS platforms using mlir-runner in lit tests consistently hit the
following error:
```
# .---command stderr------------
# | JIT session error: Symbols not found: [ __mlir_ciface_printMemrefI32 ]
# | Error: Failed to materialize symbols: { (main, { __mlir_printMemrefI32, ... }) }
# `-----------------------------
```
https://github.com/google/heir/issues/1521#issuecomment-2751303404
confirms the issue is fixed by using `alwayslink` on these two targets,
and I confirmed on a separate Apple M1 (OSX version Sequoia 15.3.2.).
I'm not an expert on the mlir runner internals, but given the
mlir-runner is purely for testing, and alwayslink at worst adds some
overhead by not removing symbols, it seems low risk.
- Use static instead of anonymous namespace for file local functions.
- Enclose file-local classes in anonymous namespace.
- Eliminate `llvm::` qualifier when file has `using namespace llvm`.
- Eliminate namespace surrounding entire code in
SPIRVConvergenceRegionAnalysis.cpp file.
- Eliminate call to `initializeSPIRVStructurizerPass` from the pass
constructor (https://github.com/llvm/llvm-project/issues/111767)
The combination of `-fcomplex-arithmetic=promoted` and `mno-x87` for
`double` complex division is leading to a crash.
See https://godbolt.org/z/189G957oY
This patch fixes that.
It happened in https://lab.llvm.org/buildbot/#/builders/152/builds/1131
when the buildbot was switched from CTK12.3 to CTK12.8.
The logs are gone by now, so the above link is useless.
The error was:
error: ‘auto’ not permitted in template argument
This workaround helps, but I also reported the issue to NVCC devs.
We only need to keep track of the last register seen. We never need the
first register once we've parsed. Currently if s0/x8 is used RegStart
will point to that and not ra/s1 so it already isn't the start.
We were only checking that the last register was a register, not that it
was a legal register for a register list. This caused the encoder
function to hit an llvm_unreachable.
The error messages are not good, but this only one of multiple things
that need to be fixed in this function. I'll focus on error messages
later once I have the other issues fixed.
Implement all single-multi {BF/F/S/U/SU/US}MOP4{A/S} instructions in
clang and llvm following the acle in
https://github.com/ARM-software/acle/pull/381/files.
This PR depends on https://github.com/llvm/llvm-project/pull/127797
This patch updates the semantics of template arguments in intrinsic
names for clarity and ease of use. Previously, template argument numbers
indicated which character in the prototype string determined the final
type suffix, which was confusing—especially for intrinsics using
multiple prototype modifiers per operand (e.g., intrinsics operating on
arrays of vectors). The number had to reference the correct character in
the prototype (e.g., the ‘u’ in “2.u”), making the system cumbersome and
error-prone.
With this patch, template argument numbers now refer to the operand
number that determines the final type suffix, providing a more intuitive
and consistent approach.
Addition of `no_inline` and `always_inline` attributes for CallOps in
MLIR in order to be able to inline or not directly the call of a
function without having the attribute on the `FuncOp`.
The addition of these attributes will be used in a future PR in Flang
(`[NO]INLINE` directive).
Add the implementation of the `PERROR(STRING) ` intrinsic from the GNU
Extension to prints on the stderr a newline-terminated error message
corresponding to the last system error prefixed by `STRING`.
(https://gcc.gnu.org/onlinedocs/gfortran/PERROR.html)
During the transition from debug intrinsics to debug records, we used
several different command line options to customise handling: the
printing of debug records to bitcode and textual could be independent of
how the debug-info was represented inside a module, whether the
autoupgrader ran could be customised. This was all valuable during
development, but now that totally removing debug intrinsics is coming
up, this patch removes those options in favour of a single flag
(experimental-debuginfo-iterators), which enables autoupgrade, in-memory
debug records, and debug record printing to bitcode and textual IR.
We need to do this ahead of removing the
experimental-debuginfo-iterators flag, to reduce the amount of
test-juggling that happens at that time.
There are quite a number of weird test behaviours related to this --
some of which I simply delete in this commit. Things like
print-non-instruction-debug-info.ll , the test suite now checks for
debug records in all tests, and we don't want to check we can print as
intrinsics. Or the update_test_checks tests -- these are duplicated with
write-experimental-debuginfo=false to ensure file writing for intrinsics
is correct, but that's something we're imminently going to delete.
A short survey of curious test changes:
* free-intrinsics.ll: we don't need to test that debug-info is a zero
cost intrinsic, because we won't be using intrinsics in the future.
* undef-dbg-val.ll: apparently we pinned this to non-RemoveDIs in-memory
mode while we sorted something out; it works now either way.
* salvage-cast-debug-info.ll: was testing intrinsics-in-memory get
salvaged, isn't necessary now
* localize-constexpr-debuginfo.ll: was producing "dead metadata"
intrinsics for optimised-out variable values, dbg-records takes the
(correct) representation of poison/undef as an operand. Looks like we
didn't update this in the past to avoid spurious test differences.
* Transforms/Scalarizer/dbginfo.ll: this test was explicitly testing
that debug-info affected codegen, and we deferred updating the tests
until now. This is just one of those silent gnochange issues that get
fixed by RemoveDIs.
Finally: I've added a bitcode test, dbg-intrinsics-autoupgrade.ll.bc,
that checks we can autoupgrade debug intrinsics that are in bitcode into
the new debug records.
If the 512-bit unary shuffle is a concatenation of 128/256-bit subvectors then we're better off using a X86ISD::SHUF128 node so we can fold the concatenation into the shuffle as well.
Targeted rewrite of a linalg.copy on memrefs to a memref.copy.
This is useful when bufferizing copies to a linalg.copy, applying some
transformations, and then rewriting the copy into a memref.copy.
If the element types of the source and destination differ, or if the
source is a scalar, the transform produces a silenceable failure.
**Currently we don't make use of the JIT for the wasm use cases so the
approach using the execution engine won't work in these cases.**
Rather if we use dlopen. We should be able to do the following
(demonstrating through a toy project)
1) Make use of LoadDynamicLibrary through the given implementation
```
extern "C" EMSCRIPTEN_KEEPALIVE int load_library(const char *name) {
auto Err = Interp->LoadDynamicLibrary(name);
if (Err) {
llvm::logAllUnhandledErrors(std::move(Err), llvm::errs(), "load_library error: ");
return -1;
}
return 0;
}
```
2) Add a button to call load_library once the library has been added in
our MEMFS (currently we have symengine built as a SIDE MODULE and we are
loading it)
Introduce SVEIntrinsicInfo to store properties common across SVE
intrinsics. This allows a seperation between intrinsic IDs and the
transformations that can be applied to them, which reduces the layering
problems we hit when adding new combines.
This PR is mostly refactoring to bring in the concept and port the most
common combines (e.g. dead code when all false). This will be followed
up with new combines where I plan to reuse much of the existing
instruction simplifcation logic to significantly improve our ability to
constant fold SVE intrinsics.