Apply signature conversion for `func.func` in the gpu.module. More work
will need to be done for gpu.func op and implement the NVVM ABI for
conversion in the gpu module.
The spec is available here:
https://github.com/intel/llvm/pull/12497
The PR doesn't add OpCooperativeMatrixApplyFunctionINTEL instruction as
it's still experimental and not properly tested E2E.
The PR also fixes few bugs in the related code:
1. CooperativeMatrixMulAddKHR optional operand must be literal, not a
constant;
2. Fixed available capabilities table creation for a case, when a single
extension adds few capabilities, that occupy not contiguous op codes.
---------
Signed-off-by: Sidorov, Dmitry <dmitry.sidorov@intel.com>
The existing mechanism being used is to manually add a reference in the
documentation. These references link to the beginning of the text rather
than the heading for the attribute which is what this PR allows.
---------
Co-authored-by: Sirraide <aeternalmail@gmail.com>
Since we moved to a Docker-in-Docker setup, CI jobs sometimes fail due
to the Docker VM dying with 'context cancelled' errors. This is currently
not recognized as a spurious failure, which leads to the job not being
automatically restarted. This patch fixes that.
This commit only adds a test job with the new logic since this workflow
triggers on workflow_run, which means that the changes need to be on
`main` before they can be tested. Once this is tested to work properly,
I'll make it the default definition for the workflow.
Support finding the lldb-dap binary with `xcrun` on Darwin or in PATH on
all other platforms.
Unfortunately, this PR is larger than I would like because it removes
the `lldbDapOptions`. I believe these options are not necessary, and as
previously implemented, they caused a spurious warning with this change.
The problem was that the options were created before the custom factory.
By moving the creation logic into the factory, we make sure it's only
called after the factory has been registered. The upside is that this
simplifies the code and removes a level of indirection.
Global constants have no symbols in library files. They are replaced
with literal constants during lowering before kernels are moved into a
GPU module. Do not register them because they will result in unresolved
symbols.
The current implementation of lowering to llvm for vector.extract
incorrectly assumes that if the number of indices is zero, the operation
can be folded away. This PR removes this condition and relies on the
folder to do it instead.
This PR also unifies the logic for scalar extracts and slice extracts,
which as a side effect also enables vector.extract lowering for n-d
vector.extract with dynamic inner most dimension. (This was only
prevented by a conservative check in the old implementation)
This commit ensures than noundef (which is frequently a prerequisite for
other annotations) and range() annotations on kernel arguments are
copied onto their corresponding load from the kernel argument structure.
On systems supporting ting ipv4 and ipv6 the second socket to initialize
will not update the listening address correctly after the call to `bind`.
This results in the second address listed in
`Socket::GetListeningConnectionURI` to have port `:0`, which is
incorrect.
To fix this, correct which address is used to detect the port and update
the unit tests to cover this use case.
Additionally, I updated the SocketTest's to only parameterize tests that
can work on ipv4 or ipv6. This means tests like
`SocketTest::DecodeHostAndPort` are only run once, instead of twice
since they do not change behavior based on parameters.
I also included a new unit test to cover listening on `localhost:0`,
validating both sockets correctly list the updated port.
Similar to what we do for broadcast shuffles, when legalising load costs, if the value is known to be uniform, then we will only load a single vector and reuse this across the split legalised registers.
Fixes#111126
Update the error returns in ValueObject::CastToBasicType and
ValueObject::CastToEnumType to create new errors and return a
ValueObjectConstResult with the error, rather tnan updating the error in
(and returning) the input ValueObject.
Specify ENABLE_THREADS := YES within test's Makefile instead of passing
-lpthread explicitly via the compiler's CFLAGS options.
Refactoring fix.
Co-authored-by: Vladimir Vereschaka <vvereschaka@accesssoftek.com>
* Signal frame unwinding on x86_64 from X512
* Header search for commpage_defs.h on non-standard paths
Unwind supported tests pass on Haiku x86_64
---------
Co-authored-by: Trung Nguyen <trungnt282910@gmail.com>
ConstraintElimination currently only supports decomposing gep nusw with
non-negative indices (with "non-negative" possibly being enforced via
pre-condition).
Add support for gep nuw, which directly gives us the necessary
guarantees for the decomposition.
ValueMap automatically updates entries with the new value if they have
been RAUW. This can lead to instructions that are expected to not have
shape info to be added to the map (e.g. shufflevector as in the added
test case).
This leads to incorrect results. Originally it was used for transpose
optimizations, but they now all use updateShapeAndReplaceAllUsesWith,
which takes care of updating the shape info as needed.
This fixes a crash in the newly added test cases.
PR: https://github.com/llvm/llvm-project/pull/118282
This DAG combine was incorrect for big-endian targets, because it
assumes that when a bitcast changes the lane width, the
least-significant bits of the wider lanes are in the lower-numbered
lanes of the smaller type, which is only true for little-endian.
This patch allows using `getSpecificAttr` for getting `const`
attributes. Previously, if users of this API would want to get a const
Attribute pointer, they had to pass `getSpecificAttr<const XYZ>()`, to
get it compile. It feels like an arbitrary limitation as the constness
was already encoded in the Attribute container's value type.
The DAG has the same instructions: the signed and unsigned absolute
difference of it's input. For AArch64, they map to uabd and sabd for
Neon and SVE. The Neon and SVE instructions will require custom
patterns.
They are pseudo opcodes and are not imported by the IRTranslator. We
need combines to create them.
PowerPC, ARM, and AArch64 have native instructions.
/// i.e trunc(abs(sext(Op0) - sext(Op1))) becomes abds(Op0, Op1)
/// or trunc(abs(zext(Op0) - zext(Op1))) becomes abdu(Op0, Op1)
For GlobalISel, we are going to write the combines in MIR patterns.
see:
llvm/test/CodeGen/AArch64/abd-combine.ll
- [ ] combine into abd
- [ ] legalize and add td patterns
Nothing really uses these yet, but we shouldn't be losing the info.
We can also pass on the OpInfo arg to the getMemoryOpCost constant load call to indicate if its constant/uniform/pow2 etc.
Prep cleanup for #111126
The meaning of `__ARM_NEON_SVE_BRIDGE` was changed here:
https://github.com/ARM-software/acle/pull/362
Such that it should be defined to `1` if the `arm_neon_sve_bridge.h`
header file is available, which is the case for Clang.