This adds support for handling the address of and dereference unary
operations in ClangIR code generation. This also adds handling for
nullptr and proper initialization via the NullToPointer cast.
While we already have some detailed debug messages on the candidate
selection process -- which selects a SUnit from the Available queue, we
didn't say much about why a SUnit was _not_ moved from Pending queue to
Available queue in the first place, which is just as important as why we
scheduled a node IMHO. Therefore, I added some debug prints for this
very purpose.
I decide to print these extra messages by default (instead of being
guarded by command line like `-misched-detail-resource-booking`) because
we have been printing some of the hazard remarks, so I thought we might
as well print these new messages -- which are mostly about hazard -- by
default.
This commit extends the MLIR vector type to support pointer-like types
such as `!llvm.ptr` and `!ptr.ptr`, as indicated by the newly added
`VectorTypeElementInterface`. This makes the LLVM dialect closer to LLVM
IR. LLVM IR already supports pointers as vector element type.
Only integers, floats, pointers and index are valid vector element types
for now. Additional vector element types may be added in the future
after further discussions. The interface is still evolving and may
eventually turn into one of the alternatives that were discussed on the
RFC.
This commit also disallows `!llvm.ptr` as an element type of
`!llvm.vec`. This type exists due to limitations of the MLIR vector
type.
RFC:
https://discourse.llvm.org/t/rfc-allow-pointers-as-element-type-of-vector/85360
https://reviews.llvm.org/D17938 introduced lowerRelativeReference to
give ConstantExpr sub (A-B) special semantics in ELF: when `A` is an
`unnamed_addr` function, create a PLT-generating relocation. This was
intended for C++ relative vtables, but C++ relative vtable ended up
using DSOLocalEquivalent (lowerDSOLocalEquivalent).
This special treatment of `unnamed_addr` seems unusual.
Let's remove it. Only COFF needs an overload to generate a @IMGREL32
relocation specifier (llvm/test/MC/COFF/cross-section-relative.ll).
Pull Request: https://github.com/llvm/llvm-project/pull/134781
This is the first of a few patches that will do infrastructure work to
enable the OpenACC lowering via the OpenACC dialect.
At the moment this just gets the various function calls that will end up
generating OpenACC, plus some tests to validate that we're doing the
diagnostics in OpenACC specific locations.
Additionally, this adds Stmt and Decl files for CIRGen.
This patch adds support for comparison operators with ClangIR, both
integral and floating point.
---------
Co-authored-by: Morris Hafner <mhafner@nvidia.com>
Co-authored-by: Henrich Lauko <xlauko@mail.muni.cz>
Co-authored-by: Andy Kaylor <akaylor@nvidia.com>
I recently received an internal error report that LLDB was OOM'ing when
creating a Minidump. In my 64b refactor we made a decision to acquire
buffers the size of the largest memory region so we could read all of
the contents in one call. This made error handling very simple (and
simpler coding for me!) but had the trade off of large allocations if
huge pages were enabled.
This patch is one I've had on the back burner for awhile, but we can
read and write the Minidump memory sections in discrete chunks which we
already do for writing to disk.
I had to refactor the error handling a bit, but it remains the same. We
make a best effort attempt to read as much of the memory region as
possible, but fail immediately if we receive an error writing to disk. I
did not add new tests for this because our existing test suite is quite
good, but I did manually verify a few Minidumps couldn't read beyond the
red_zone.
```
(lldb) reg read $sp
rsp = 0x00007fffffffc3b0
(lldb) p/x 0x00007fffffffc3b0 - 128
(long) 0x00007fffffffc330
(lldb) memory read 0x00007fffffffc330
0x7fffffffc330: 60 c3 ff ff ff 7f 00 00 60 cd ff ff ff 7f 00 00 `.......`.......
0x7fffffffc340: 60 c3 ff ff ff 7f 00 00 65 e6 26 00 00 00 00 00 `.......e.&.....
(lldb) memory read 0x00007fffffffc329
error: could not parse memory info (Success!)
```
I'm not sure how to quantify the memory improvement other than we would
allocate the largest size regardless of the size. So a 2gb unreadable
region would cause a 2gb allocation even if we were reading 4096 kb. Now
we will take the range size or the max chunk size of 128 mb.
As part of RemoveFactorFromExpression, we attempt to remove a factor
from a mul/fmul expression; this may involve generating new
instructions, e.g. to negate the result if the factor was negative in
the original expression. When this happens, the new instructions should
have a DebugLoc set from the instruction that the factored expression is
being used to compute.
Found using https://github.com/llvm/llvm-project/pull/107279.
When combining 2 x 128-bit subvectors, don't assume that if the node is
already a X86ISD::VPERM2X128 node then there's nothing to do.
Fix issue where if we'd somehow combined to X86ISD::VPERM2X128
(typically if the 2 operands had then simplified to a common operand),
we can't canonicalise back to X86ISD::VPERMI on AVX2+ targets.
This matches the v4f64/v4i64 shuffle lowering preference for VPERMQ/PD
over VPERM2F128/I128.
TestCases/Linux/asan_rt_confict_test-2.cpp started failing in https://lab.llvm.org/buildbot/#/builders/66/builds/12265/steps/9/logs/stdio
The only change is "[LLD][ELF] Allow merging XO and RX sections, and add --[no-]xosegment flag (#132412)" (2c1bdd4a08). Based on the test case (which deliberately tries to mix static and dynamically linked ASan), I suspect it's actually the test case that needs to be fixed (probably with a different error message check).
This patch disables TestCases/Linux/asan_rt_confict_test-2.cpp to make the buildbots green while I investigate.
This macro isn't required if we define all the functions inline. In
fact, quite a few of the marked functions have already been inlined.
This patch basically only moves code around and adds
`_LIBCPP_HIDE_FROM_ABI` to the places where it's been missing so far.
This also removes inlining hints, since it dropps `inline` in some
places, but that shouldn't make much of a difference. The functions tend
to be either really small, so should be inlined anyways, or are big
enough that they shouldn't be inlined even with an inlinehint.
Following PR #132569 (RISC-V), which added `parseDataExpr` for parsing
expressions in data directives (e.g., `.word`), this PR migrates AArch64
`@plt`, `@gotpcrel`, and `@AUTH` from the `parsePrimaryExpr` workaround
to `parseDataExpr`. The goal is to align with the GNU assembler model,
where relocation specifiers apply to the entire operand rather than
individual terms, reducing complexity-especially evident in `@AUTH`
parsing.
Note: AArch64 ELF lacks an official syntax for data directives
(#132570). A prefix notation might be a preferable future direction.
I recommend `%specifier(expr)`.
AsmParser's `@specifier` parsing is suboptimal, necessitating lexer
workarounds. `@` might appear multiple times in an operand.
We should not use `@` beyond the existing AArch64 Mach-O instruction
operands.
In the test elf-reloc-ptrauth.s, many errors are now reported at parse
time.
Pull Request: https://github.com/llvm/llvm-project/pull/134202
Currently in Reassociate we may create a set of new instructions when
optimizing an `add`, but we do not set DebugLocs on the new
instructions; this patch propagates the add's DebugLoc to the new
instructions.
Found using #107279.
This reverts commit d1a05721172272f7aab685b56d99e86814a15bff.
There was further discussion on the PR about whether the intinsics
should exist in this form.
If an attribute is not defined earlier in the same file, but just
referenced from its dialect directly, then currently not the correct
check is being emited.
What would it emit for #toy.shape<[1, 2, 3]>:
Earlier:
// CHECK: #[['?']]<[1, 2, 3]>
Now:
// CHECK: #toy.shape<[1, 2, 3]>
The component diagnostic headers (i.e. `DiagnosticAST.h` and friends)
all follow the same format, and there’s enough of them (and in them) to
where updating all of them has become rather tedious (at least it was
for me while working on #132348), so this patch instead generates all of
them (or rather their contents) via Tablegen.
Also, it seems that `%enum_select` currently wouldn’t work in
`DiagnosticCommonKinds.td` because the infrastructure for that was
missing from `DiagnosticIDs.h`; this patch should fix that as well.
.swiftinterface files into the dSYM bundle. These typically come only
from the SDK (since textual interfaces require library evolution) and
thus are a waste of space to copy into the bundle.
The information about this is being parsed out of the control block,
which means duplicating 5 constants from the Swift frontend. If a file
cannot be parsed, dsymutil errs on the side of copying the file anyway.
rdar://138186524
Currently if a developer uses the flag `--mlir-enable-debugger-hook` the
debugger hook is not actually enabled. It seems the DebugConfig and the
MainMLIROptConfig are not connected.
To fix this we can move the `enableDebuggerHook` CL Option to the
DebugConfigCLOptions struct so that it can get registered and enabled
along with the other debugger flags. AFAICS there are no other uses of
the flag so this should be safe.
This also adds a small LIT test to check that the hook is enabled by
checking the std::cerr output for the log message.
We cannot determine ScalarTy from VL because some ScalarTy is determined
from VL[0]->getType(), while others are determined from
getValueType(VL[0]).
Fix "Mask and VecTy are incompatible".
Fix https://github.com/llvm/llvm-project/issues/134126.
The matching code was previous written as if we were mutating the
indices to replace undef elements with preferred values, but the actual
lowering code just took a prefix of the index vector. This resulted in
us using undef indices for lanes which should have been defined,
resulting in incorrect codegen.
Longer term, we probably should rewrite the mask, but this seemed like
an easier tactical fix.
llvm-mt requires libxml2 to work, so do not even build it without
libxml2.
CMake 3.31 and later prefer llvm-mt.exe over Microsoft's mt.exe if
available and using clang-cl.exe as CMAKE_CXX_COMPILER. When CMake picks
up llvm-mt.exe without libxml2, any build will fail with the message
```
llvm-mt: error: no libxml2
```
Any test except `--help` already uses `REQUIRES: libxml2`. There is no
point in having a non-functional executable. Not building llvm-mt.exe
will force CMake to use Microsoft's `mt.exe` instead.
Fixes: #134237
This is an optional mechanism that automatically detects roots. It's a best-effort mechanism, and its main goal is to *avoid* pointing at the message pump function as a root. This is the function that polls message queue(s) in an infinite loop, and is thus a bad root (it never exits).
High-level, when collection is requested - which should happen when a server has already been set up and handing requests - we spend a bit of time sampling all the server's threads. Each sample is a stack which we insert in a `PerThreadCallsiteTrie`. After a while, we run for each `PerThreadCallsiteTrie` the root detection logic. We then traverse all the `FunctionData`, find the ones matching the detected roots, and allocate a `ContextRoot` for them. From here, we special case `FunctionData` objects, in `__llvm_ctx_profile_get_context, that have a `CtxRoot` and route them to `__llvm_ctx_profile_start_context`.
For this to work, on the llvm side, we need to have all functions call `__llvm_ctx_profile_release_context` because they _might_ be roots. This comes at a slight (percentages) penalty during collection - which we can afford since the overall technique is ~5x faster than normal instrumentation. We can later explore conditionally enabling autoroot detection and avoiding this penalty, if desired.
Note that functions that `musttail call` can't have their return instrumented this way, and a subsequent patch will harden the mechanism against this case.
The mechanism could be used in combination with explicit root specification, too.
#130963 switches the default to COV6, which requires ROCm 6.3.
Currently, if the
device libraries for COV6 are not found, the error message is not very
helpful.
This PR provides a more informative error message in such cases.
Devices not supporting denormals can compare them true against zero. It
leads to result not matching the CTS expectation when either supporting
or not denormals.
For example for 0x1.008p-140 we get {0x1.008p-140, 0} while the CTS
expects {0x1.008p-1, -139} when supporting denormals, or {0, 0} when not
supporting denormals (flushed to zero).
Ref #129871
Tail calls were disabled from callers with inreg parameters in 5dc8aeb
with a fixme to check if the callee also takes an inreg parameter.
The issue is that inreg parameters (which are passed in x0 or x1 for
free and member functions respectively) are supposed to be returned (in
x0) at the end of the function. In case of a tail call, that means the
callee needs to return the same value as the caller would.
We can check for that case, and it's not as niche as it sounds, as
that's how Clang will lower one function with an sret return value
calling another, such as:
```
struct T { int x; };
struct S {
T foo();
T bar();
};
T S::foo() { return bar(); } // foo's sret argument will get passed directly to bar
```
Fixes#133098
Defining a new `amdgpu.global_load` op, which is a thin wrap around
ROCDL `global_load_lds` intrinsic, along with its lowering logics to
`rocdl.global.load.lds`.