llvm-project

mirror of https://github.com/llvm/llvm-project.git synced 2025-04-24 20:36:05 +00:00

Author	SHA1	Message	Date
Letu Ren	9438694a54	[mlir][llvmir] Add llvm.intr.ldexp operation (#133070 ) https://llvm.org/docs/LangRef.html#llvm-ldexp-intrinsic	2025-03-27 07:50:39 +01:00
Letu Ren	ad51368881	[mlir][llvmir] add llvm.experimental.constrained.sitofp intrinsics (#133166 ) https://llvm.org/docs/LangRef.html#llvm-experimental-constrained-sitofp-intrinsic Signed-off-by: Letu Ren <fantasquex@gmail.com>	2025-03-27 07:50:04 +01:00
Frank Schlimbach	9269aaecff	[mlir][mesh] fixes for 0d tensors (#132948 ) In some cases 0d tensors have no sharding. This PR provides a few minor fixes to account for such cases.	2025-03-26 18:13:41 +01:00
Guray Ozen	e8dfd70fe2	[MLIR][NVGPU] Use `gpu.dynamic_shared_memory` in tests (#133122 ) Reland #133051	2025-03-26 18:00:22 +01:00
Karlo Basioli	3f82c3d5a8	Revert "[MLIR][NVGPU] Use `gpu.dynamic_shared_memory` in tests" (#133103 ) Reverts llvm/llvm-project#133051 due to failing integration tests	2025-03-26 15:39:14 +00:00
Guray Ozen	15f5a7a3ec	[MLIR][NVGPU] Use `gpu.dynamic_shared_memory` in tests (#133051 ) The `memref.subview` ops in the test case were incorrect: they extracted out-of-bounds.	2025-03-26 14:32:04 +01:00
Longsheng Mou	894b27a746	[mlir][MemRefToLLVM] Fix crash with unconvertable memory space (#132323 ) This PR adds handling when the `memref.alloca` with unconvertable memory space to prevent a crash. Fixes #131439.	2025-03-26 16:51:26 +08:00
Longsheng Mou	73f487d31e	[mlir][TosaToLinalg] Fix bugs in PointwiseConverter (#132526 )	2025-03-26 08:33:47 +00:00
Srinivasa Ravi	40815be30a	[MLIR][NVVM] Add support for st.bulk Op (#131727 ) This change adds the `st.bulk` NVVM Op for the `st.bulk` instruction introduced in ptx8.6 for sm_100. PTX Spec Reference: https://docs.nvidia.com/cuda/parallel-thread-execution/#data-movement-and-conversion-instructions-st-bulk	2025-03-26 08:30:58 +05:30
Emilio Cota	2da4ce8624	Revert "[mlir] Fix DistinctAttributeUniquer deleting attribute storage when crash reproduction is enabled" (#133000 ) Reverts llvm/llvm-project#128566. See as well the discussion in llvm/llvm-project#132935.	2025-03-25 22:40:06 +00:00
Bruno Cardoso Lopes	e7e242e7ad	[MLIR][LLVM] Fix debug value/declare import in face of landing pads (#132871 ) Debug value/declare operations imported before landing pad operations at the bb start break invoke op verification: ``` error: first operation in unwind destination should be a llvm.landingpad operation ``` This this issue by making the placement slightly more smart.	2025-03-25 13:15:51 -07:00
Bruno Cardoso Lopes	74c2c049d1	[MLIR][LLVM] Add weak_odr to allowed linkage for alias (#132840 ) I missed this when originally introduced the feature (note the verifier message already contains it), this fixes a small bug.	2025-03-25 10:46:02 -07:00
Igor Wodiany	7b3885d47b	[mlir][spirv] Add definition for GL Fract (#132921 )	2025-03-25 16:59:09 +00:00
Ian Tayler Lessa	5f58f3dda8	[mlir][tosa] Avoid overflow in reduction folders (#132786 ) Avoid operations that can overflow in constant folders for `tosa.reduce_max` and `tosa.reduce_min` Includes tests to avoid regressions Signed-off-by: Ian Tayler Lessa <ian.taylerlessa@arm.com>	2025-03-25 16:43:46 +00:00
Karlo Basioli	f6823a0ae1	Revert "[mlir][memref] Verify out-of-bounds access for `memref.subview`" (#132940 ) Reverts llvm/llvm-project#131876 GPU integration tests get broken by this PR. E.x. `mlir/test/Integration/GPU/CUDA/sm90/gemm_f32_f16_f16_128x128x128.mlir`	2025-03-25 14:56:08 +00:00
Ivan Butygin	9b022220b7	[mlir][vector] Propagate `vector.extract` through elementwise ops (#131462 ) Propagate `Extract(Elementwise(...))` -> `Elemetwise(Extract...)`. Currenly limited to the case when extract is the single use of elementwise to avoid introducing additional elementwise ops.	2025-03-25 14:07:48 +03:00
Matthias Springer	d4304d85f2	[mlir][memref] Verify out-of-bounds access for `memref.subview` (#131876 ) * Improve the verifier of `memref.subview` to detect out-of-bounds extractions. * Improve the documentation of `memref.subview` to make clear that out-of-bounds extractions are not allowed. Rewrite examples to use the new `strided<>` notation instead of `affine_map` layout maps. Also remove all unrelated operations (`memref.alloc`) from the examples. * Fix various test cases where `memref.subview` ops ran out-of-bounds. * Update canonicalizations patterns to ensure that they do not fold IR if it would generate IR that no longer verifies. Related discussion on Discourse: https://discourse.llvm.org/t/out-of-bounds-semantics-of-memref-subview/85293	2025-03-25 11:25:11 +01:00
Luke Hutton	d4570ea813	[mlir][tosa] Disallow invalid datatype combinations in the validation pass (#131595 ) This commit checks if the operands/results of an operator can be found in the profile compliance mapping, if it isn't the operator is considered invalid. As a result, operator datatype combinations that are not listed under the "Supported Data Types" of the TOSA specification are disallowed and the validation pass results in failure. Signed-off-by: Luke Hutton <luke.hutton@arm.com>	2025-03-25 10:05:39 +00:00
Georgios Pinitas	3df92197bb	[mlir][tosa] Support `DenseResourceElementsAttr` in TOSA transpose folders (#124532 ) Handle dense resource attributes in the transpose TOSA folder. Currently their interface does not align with the rest of the `ElementsAttr` when it comes to data accessing hence the special handling. Signed-off-by: Georgios Pinitas <georgios.pinitas@arm.com>	2025-03-24 21:48:22 +00:00
Jerry-Ge	1b9d475453	[mlir][tosa] Align validation profiles and extensions to TOSA v1.0 spec (#132768 ) - Add missing int16 extension for concat operator - Remove int16 extension for cast operator - Add pro_int and pro_fp profiles for const_shape operator Signed-off-by: Jerry Ge <jerry.ge@arm.com>	2025-03-24 13:01:15 -07:00
MaheshRavishankar	e4172196a7	[mlir][TilingInterface] Make `tileAndFuseConsumerOfSlice` take surrounding loops as an argument. (#132082 ) This gets the consumer fusion method in sync with the corresponding producer fusion method `tileAndFuseProducerOfSlice`. Not taking this as input required use of complicated analysis to retrieve the surrounding loops which are very fragile. Just like the producer fusion method, the loops need to be taken in as an argument, with typically the loops being created by the tiling methods. Some utilities are added to check that the loops passed in are perfectly nested (in the case of an `scf.for` loop nest. This is change 1 of N to simplify the implementation of tile and fuse consumers. --------- Signed-off-by: MaheshRavishankar <mahesh.ravishankar@gmail.com>	2025-03-24 11:41:26 -07:00
Bruno Cardoso Lopes	8a2a694438	[MLIR][LLVM] Support dso_local_equivalent constants (#132131 ) Create a new operation `DSOLocalEquivalentOp`, following the steps of other constants. This is similar in a way to `AddressOfOp` but with specific semantics: only support functions and function aliases (no globals) and extern_weak linkage is not allowed. An alternative approach is to use a new `UnitAttr` in `AddressOfOp` and check that attribute to enforce specific semantics in the verifiers. The drawback is going against what other constants do and having to add more attributes in the future when we introduce `no_cfi`, `blockaddress`, etc. While here, improve the error message for other missing constants.	2025-03-24 10:43:53 -07:00
Kareem Ergawy	5c02f1a5af	[OpenMP][IRBuilder] De-duplicate code that emit task dependencies (#132340 ) A small clean-up following up on #131795. Seems like we had 2 quite similar implementations for the same thing: emit task dependencies struct and filling it. This PR unifies the 2 versions into one. This is better since we had to fix a bug in one of them in #131795 so this applies the fix for both.	2025-03-24 16:04:20 +01:00
Igor Wodiany	3aa20c266c	[mlir][spirv] Add definition for selected sample operations (#129558 ) This commit adds following three operations: ImageSampleImplicitLodOp, ImageSampleExplicitLodOp and ImageSampleProjDrefImplicitLodOp	2025-03-24 13:56:36 +00:00
Matthias Springer	529ee3cf3b	[mlir][tensor] Fix slice canonicalizer for out-of-bounds cases (#132534 ) Since #130487, `tensor.extract_slice` and `tensor.insert_slice` ops that are statically detected to go out of bounds are rejected by the verifier. This commit fixes canonicalization patterns that currently fold dynamically out-of-bounds ops (valid IR) to statically out-of-bounds ops (invalid IR).	2025-03-24 14:39:37 +01:00
Kunwar Grover	f3fa54a191	[mlir][Vector] Handle 0-rank case in fold instead of RewriterPattern (#130168 ) For vector.extract, the folder always canonicalizes to a vector.extract operation, while the rewrite pattern canonicalizes to a vector.broadcast except in the case of 0-rank vectors. Remove this special casing, and instead handle the 0-rank vector case in the folder.	2025-03-24 13:14:24 +00:00
Kunwar Grover	dc28e0d5d2	[mlir][Vector] Remove more special case uses for extractelement/insertelement (#130166 ) A number of places in our codebase special case to use extractelement/insertelement for 0D vectors, because extract/insert did not support 0D vectors previously. Since insert/extract support 0D vectors now, use them instead of special casing.	2025-03-24 13:04:16 +00:00
Kunwar Grover	24a8e18f5a	[mlir][vector] Allow multi dim vectors in vector.scatter (#132217 ) This patch matches the definition of vector.scatter as a counter part of vector.gather. All of the changes done in this patch make vector.scatter match vector.gather 's multi dimensional definition. Unrolling for vector.scatter will be implemented in subsequent patches. Discourse Discussion: https://discourse.llvm.org/t/rfc-improving-gather-codegen-for-vector-dialect/85011/13	2025-03-24 12:52:46 +00:00
Kunwar Grover	cf0efb3188	[mlir][vector] Decouple unrolling gather and gather to llvm lowering (#132206 ) This patch decouples unrolling vector.gather and lowering vector.gather to llvm.masked.gather. This is consistent with how vector.load, vector.store, vector.maskedload, vector.maskedstore lower to LLVM. Some interesting test changes from this patch: - 2D vector.gather lowering to llvm tests are deleted. This is consistent with other memory load/store ops. - There are still tests for 2D vector.gather, but the constant mask for these test is modified. This is because with the updated lowering, one of the unrolled vector.gather disappears because it is masked off (also demonstrating why this is a better lowering path) Overall, this makes vector.gather take the same consistent path for lowering to LLVM as other load/store ops. Discourse Discussion: https://discourse.llvm.org/t/rfc-improving-gather-codegen-for-vector-dialect/85011/13	2025-03-24 12:25:17 +00:00
Longsheng Mou	94783a8199	[mlir][mesh] Exit after `signalPassFailure` to fix a crash (#132662 ) Fixes #131435.	2025-03-24 20:19:56 +08:00
Letu Ren	071643f339	[mlir][llvm] Add `llvm.experimental.constrained.fpext` operation (#129054 ) Ref: https://github.com/llvm/llvm-project/pull/86260	2025-03-24 13:04:19 +01:00
Letu Ren	91140e6a51	[mlir][llvm] Add llvm.intr.exp10 operation (#129378 )	2025-03-24 13:02:14 +01:00
Igor Wodiany	ef9c4f4f5c	[mlir][spirv] Update assembly format for Image operand types (#130758 ) In the example below it is not clear that `(f32)` relates to `%arg2` and not to `vector<2xf32>`: ```mlir %0 = spirv.ImageSampleImplicitLod %arg0, %arg1 ["Lod"](%arg2) : !spirv.sampled_image<...>, vector<2xf32>(f32) -> vector<4xf32> ``` This change applies new format to image operations and image operands that does not use parenthesis and is less ambiguous: ```mlir %0 = spirv.ImageSampleImplicitLod %arg0, %arg1 ["Lod"], %arg2 : !spirv.sampled_image<...>, vector<2xf32>, f32 -> vector<4xf32> ```	2025-03-24 09:30:39 +00:00
Javed Absar	41f9a00818	[NFC][mlir][bufferization] (#132637 )	2025-03-24 07:50:27 +00:00
Austin	7a056895c5	[mlir] Fix typo of tests (NFC) (#132396 )	2025-03-24 09:29:00 +08:00
Sandeep Dasgupta	81d7eef134	Sub-channel quantized type implementation (#120172 ) This is an implementation for [RFC: Supporting Sub-Channel Quantization in MLIR](https://discourse.llvm.org/t/rfc-supporting-sub-channel-quantization-in-mlir/82694). In order to make the review process easier, the PR has been divided into the following commit labels: 1. Add implementation for sub-channel type: Includes the class design for `UniformQuantizedSubChannelType`, printer/parser and bytecode read/write support. The existing types (per-tensor and per-axis) are unaltered. 2. Add implementation for sub-channel type: Lowering of `quant.qcast` and `quant.dcast` operations to Linalg operations. 3. Adding C/Python Apis: We first define he C-APIs and build the Python-APIs on top of those. 4. Add pass to normalize generic ....: This pass normalizes sub-channel quantized types to per-tensor per-axis types, if possible. A design note: - Explicitly storing the `quantized_dimensions`, even when they can be derived for ranked tensor. While it's possible to infer quantized dimensions from the static shape of the scales (or zero-points) tensor for ranked data tensors ([ref](https://discourse.llvm.org/t/rfc-supporting-sub-channel-quantization-in-mlir/82694/3) for background), there are cases where this can lead to ambiguity and issues with round-tripping. ``` Consider the example: tensor<2x4x!quant.uniform<i8:f32:{0:2, 0:2}, {{s00:z00, s01:z01}}>> ``` The shape of the scales tensor is [1, 2], which might suggest that only axis 1 is quantized. While this inference is technically correct, as the block size for axis 0 is a degenerate case (equal to the dimension size), it can cause problems with round-tripping. Therefore, even for ranked tensors, we are explicitly storing the quantized dimensions. Suggestions welcome! PS: I understand that the upcoming holidays may impact your schedule, so please take your time with the review. There's no rush.	2025-03-23 07:37:55 -05:00
Han-Chung Wang	900be712ce	[mlir][Linalg] Preserve encodings in static shape inference. (#132311 ) Previously, the encodings are unconditionally dropped during the shape inference. The revision adds the support for preserving the encodings in the linalg ops. --------- Signed-off-by: hanhanW <hanhan0912@gmail.com>	2025-03-21 13:36:44 -07:00
Jerry-Ge	4dd7feab2b	[mlir][tosa] Add more error_if checks for Resize Op (#132290 ) Some of the error_if checks were missed in this PR: https://github.com/llvm/llvm-project/pull/124956 Add back those tests to check suitable sizes for Resize Signed-off-by: Luke Hutton <luke.hutton@arm.com> Co-authored-by: Luke Hutton <luke.hutton@arm.com>	2025-03-21 09:16:05 -07:00
Yi Qian	0ea4fb9264	[AMD][ROCDL] Add packed conversions fp8/bf8->bf16 and fp8/bf8->fp32 in ROCDL dialect (#131850 ) - Add packed conversions fp8/bf8->bf16 for gfx950 and fp8/bf8->fp32 for gfx942 in ROCDL dialect - Update amdgpu.ext_packed_fp8 lowering to use ROCDL packed fp8/bf8->f32 conversions for vector target types and ROCDL scalar fp8/bf8->fp32 for scalar target type. --------- Co-authored-by: Jungwook Park <jungwook.park@amd.com>	2025-03-21 14:49:50 +00:00
Vadim Curcă	6b59b33358	[MLIR] Handle call site locations when inlining (#132247 ) When inlining a `callee` with a call site debug location, the inlining infrastructure was trivially combining the `callee` and the `caller` locations, forming a "tree" of call stacks. Because of this, the remarks were printing an incomplete inlining stack. This commit handles this case and appends the `caller` location at the end of the `callee`'s stack, extending the chain.	2025-03-21 14:06:53 +01:00
Zhuoran Yin	ea03bdee70	[MLIR][AMDGPU] Adding Vector transfer_read to load rewrite pattern (#131803 ) This PR adds the Vector transfer_read to load rewrite pattern. The pattern creates a transfer read op lowering. A vector trasfer read op will be lowered to a combination of `vector.load`, `arith.select` and `vector.broadcast` if: - The transfer op is masked. - The memref is in buffer address space. - Other conditions introduced from `TransferReadToVectorLoadLowering` The motivation of this PR is due to the lack of support of masked load from amdgpu backend. `llvm.intr.masked.load` lower to a series of conditional scalar loads refer to (`scalarize-masked-mem-intrin` pass). This PR will make it possible for masked transfer_read to be lowered towards buffer load with bounds check, allowing a more optimized global load accessing pattern compared with existing implementation of `llvm.intr.masked.load` on vectors.	2025-03-21 08:42:04 -04:00
Uday Bondhugula	2170d77e5d	[MLIR][Affine] Fix getAccessRelation for 0-d memrefs (#132347 ) Fix getAccessRelation for 0-d memref accesses in certain cases. Fixes corner case crashes when using scalrep, affine dep analysis, etc. Fixes: https://github.com/llvm/llvm-project/issues/132163	2025-03-21 14:35:20 +05:30
Kazu Hirata	3041fa6c7a	[mlir] Use *Set::insert_range (NFC) (#132326 ) DenseSet, SmallPtrSet, SmallSet, SetVector, and StringSet recently gained C++23-style insert_range. This patch replaces: Dest.insert(Src.begin(), Src.end()); with: Dest.insert_range(Src); This patch does not touch custom begin like succ_begin for now.	2025-03-20 22:24:17 -07:00
Sergei Lebedev	c8a9a4109a	[MLIR] [python] A few improvements to the Python bindings (#131686 ) * `PyRegionList` is now sliceable. The dialect bindings generator seems to assume it is sliceable already (!), yet accessing e.g. `cases` on `scf.IndexedSwitchOp` raises a `TypeError` at runtime. * `PyBlockList` and `PyOperationList` support negative indexing. It is common for containers to do that in Python, and most container in the MLIR Python bindings already allow the index to be negative.	2025-03-21 00:13:13 -04:00
Kareem Ergawy	f23a6ef54c	[flang][OpenMP] Process `omp.atomic.update` while translating scopes for target device (#132165 ) Fixes a bug introduced by https://github.com/llvm/llvm-project/pull/130078. For non-BlockArgOpenMPOpInterface ops, we also want to map their entry block arguments to their operands, if any. For the current support in the OpenMP dialect, the table below lists all ops that have arguments (SSA operands and/or attributes) and not target-related. Of all these ops, we need to only process `omp.atomic.update` since it is the only op that has SSA operands & an attached region. Therefore, the region's entry block arguments must be mapped to the op's operands in case they are referenced inside the region. \| op \| operands? \| region(s)? \| parent is func? \| processed? \| \|--------------\|-------------\|------------\|------------------\|-------------\| \| atomic.read \| yes \| no \| yes \| no \| \| atomic.write \| yes \| no \| yes \| no \| \| atomic.update \| yes \| yes \| yes \| yes \| \| critical \| no \| no \| yes \| no \| \| declare_mapper \| no \| yes \| no \| no \| \| declare_reduction \| no \| yes \| no \| no \| \| flush \| yes \| no \| yes \| no \| \| private \| no \| yes \| yes \| no \| \| threadprivate \| yes \| no \| yes \| no \| \| yield \| yes \| no \| yes \| no \|	2025-03-20 16:21:09 -05:00
Longsheng Mou	7c11d053f6	[mlir][TosaToLinalg] Exit after `notifyMatchFailure` (#132012 ) This PR adds `return nullptr` when the shift value of `tosa.mul` is not constant to prevent a crash. Fixes #131766.	2025-03-20 19:17:04 +00:00
Sergio Afonso	b231f6f862	[MLIR][OpenMP] Improve omp.map.info verification (#132066 ) This patch makes the `map_type` and `map_capture_type` arguments of the `omp.map.info` operation required, which was already an invariant being verified by its users via `verifyMapClause()`. This makes it clearer, as getters no longer return misleading `std::optional` values. Checks for the `mapper_id` argument are moved to a verifier for the operation, rather than being checked by users. Functionally NFC, but not marked as such due to a reordering of arguments in the assembly format of `omp.map.info`.	2025-03-20 15:48:45 +00:00
Longsheng Mou	ead2724600	[mlir][scf] Fix a div-by-zero bug when step of `scf.for` is zero (#131079 ) Fixes #130095.	2025-03-20 09:23:59 +08:00
Matthias Springer	a810141281	[mlir][memref] Add runtime verification for `memref.assume_alignment` (#130412 ) Implement runtime verification for `memref.assume_alignment`.	2025-03-19 21:23:40 +01:00
Joel Wee	0260bcb5f4	Fix after f3dcc0f	2025-03-19 17:37:51 +00:00

1 2 3 4 5 ...

12803 Commits