Similar to #135415.
The spec has not strict restriction to allow a single `if_present`
clause on the host_data and update directives. Allowing this clause
multiple times does not change the semantic of it. This patch relax the
rules in ACC.td since there is no restriction in the standard.
The OpenACC dialect represents the `if_present` clause with a `UnitAttr`
so the attribute will be set if the is one or more `if_present` clause.
The spec has not strict restriction to allow a single `finalize` clause
on the `exit data` directive. Allowing this clause multiple times does
not change the semantic of it. This patch relax the rules in `ACC.td`
since there is no restriction in the standard.
The OpenACC dialect represent the finalize clause with a UnitAttr so the
attribute will be set if the is one or more `finalize` clause.
Fix unicode build fail issue:
```
C4819 The file contains a character that cannot be represented in the current code page (936). Save the file in Unicode format to prevent data loss ...\llvm-project\llvm\include\llvm\Support\Compiler.h
```
... to clarify ownership, aligning with other parameters. Using
`std::unique_ptr` encourages users to manage `createMCInstPrinter` with
a unique_ptr instead of a raw pointer, reducing the risk of memory
leaks.
* llvm-mc: fix a leak and update llvm/test/tools/llvm-mc/disassembler-options.test
* #121078 copied the llvm-mc code to CodeGenTargetMachineImpl and made
the same mistake. Fixed by 2b8cc651dca0c000ee18ec79bd5de4826156c9d6
Using unique_ptr requires #include MCInstPrinter.h in a few translation
units.
* Delete a createAsmStreamer overload I deprecated in 2024
* SystemZMCTargetDesc.cpp: rename to `createSystemZAsmStreamer` to fix
an overload conflict.
Pull Request: https://github.com/llvm/llvm-project/pull/135128
This commit is for #107569 and #126817.
Stop changing musttail's caller and callee's function signature when
calling convention is not swifttailcc nor tailcc. Verifier makes sure
musttail's caller and callee shares exactly the same signature, see
commit 9ff2eb1 and #54964.
Otherwise just make sure the return type is the same and then process
musttail like usual calls.
close#107569, #126817
https://github.com/llvm/llvm-project/pull/134202 removed support for
`sym@page-offset` in instruction operands. This change is generally
reasonable since subtracting an offset from a symbol typically doesn’t
make sense for Mach-O due to its .subsections_via_symbols mechanism, which treats
them as separate atoms.
However, BoringSSL relies on a temporary symbol with a negative offset,
which can be meaningful when the symbol and the referenced location are
within the same atom.
```
../../third_party/boringssl/src/gen/bcm/p256-armv8-asm-apple.S:1160:25: error: unexpected token in argument list
adrp x23,Lone_mont@PAGE-64
```
It's worth noting that expressions involving @ can be complex and
brittle in MCParser, and much of the Mach-O @ offsets remains
under-tested.
* Allow default argument for parsePrimaryExpr. The argument, used by the niche llvm-ml,
should not require other targets to adapt.
Introduce 'erase(const ElemTy &V)' member function to allow the deletion
of a certain value from EquivClasses. This is essential for certain
scenarios that require modifying the contents of EquivClasses.
---------
Co-authored-by: Florian Hahn <flo@fhahn.com>
This patch dismantles G_SHUFFLE_VECTOR before lowering. The original
lowering would emit extract vector element ops. We found that by using
unmerged values the build vector op combine could find ways to fold.
Only enabled on AMDGPU.
This resolves#123631
GEPs and truncates should not have any metadata that can be propgated at
the moment, so addMetadata is a no-op. Remove the calls.
This patch also adds assertions to the recipes' constructors, to ensure
no metadata is accidentially dropped.
When `llvm-extract`-ing a function, and the `CG Profile` flag is present in the original module, we end up with lots of `!{null, null, i64 1234}` entries for call edges that have disappeared as result of the removed functions.
This patch fixes that by adding a pass to `llvm-extract` that finds `CG Profile` edges with one or both operands `null` and removes them. This results in a cleaner output.
The pass may run more than once during ThinLTO for example (see bug).
Maybe that means those pass pipelines aren't optimal, but the pass
should be resilient against that.
Fixes#134054
ByOperand must be false, this is implied by the iterator type.
The instr_iterator cases are a separate implementation from the single
operand defusechain_iterator.
Additionally ByInstr and ByBundle are mutually exclusive.
The current hashing quality for `ValueInfo` is poor because it uses
pointers as the hash value, which can negatively impact performance in
various places that use a `DenseSet`/`Map` of `ValueInfo`. In one
observed case, `ModuleSummaryIndex::propagateAttributes()` was taking
about 25 minutes to complete on a ThinLTO application. Profiling
revealed that the majority of this time was spent operating on the
`MarkedNonReadWriteOnly` set.
With the improved hashing, the execution time for `propagateAttributes`
is dramatically reduced to less than 10 seconds.
An Inlined library is a dylib that is reexported from an umbrella or
top-level library. When this is encoded in tbd files, ensure we are
reading the symbol table from the inlined library when
`--add-inlinedinfo` is used as opposed to the top-level lib.
resolves: rdar://147767733
Move wasm-specific members outside of MCSymbolRefExpr::VariantKind (a
legacy interface I am eliminating). Most changes are mechanic and
similar to what I've done for many ELF targets (e.g. X86 #132149)
Notes:
* `fixSymbolsInTLSFixups` is replaced with `setTLS` in
`WebAssemblyWasmObjectWriter::getRelocType`, similar to what I've done
for many ELF targets.
* `SymA->setUsedInGOT()` in `recordRelocation` is moved to
`getRelocType`.
While here, rename "Modifier' to "Specifier":
> "Relocation modifier", though concise, suggests adjustments happen during the linker's relocation step rather than the assembler's expression evaluation. I landed on "relocation specifier" as the winner. It's clear, aligns with Arm and IBM’s usage, and fits the assembler's role seamlessly.
Pull Request: https://github.com/llvm/llvm-project/pull/133116
For many targets, llvm-objdump and llvm-mc
(https://reviews.llvm.org/D103004) support -M no-aliases (e.g.
`RISCVInstPrinter::applyTargetSpecificCLOption`).
This patch implements -M for llc.
While here, rename "DisassemblerOptions" in llvm-mc to the more
appropriate "InstPrinterOptions". For llvm-mc --assemble, there is no
disassembler involved.
Pull Request: https://github.com/llvm/llvm-project/pull/121078
Same idea as in #134723 - flatten indirect call info in `"VP"` `MD_prof` metadata for the thinlinker, for cases that aren't covered by a contextual profile. If we don't ICP an indirect call target in the specialized module, the call will fall to the copy of that target outside the specialized module. If the graph under that target also has some indirect calls, in the absence of this pass, we'd have a steeper performance regression - because none of those would have a chance to be ICPed.
Flatten the profile pre-thinlink so that ThinLTO has something to work with for the parts of the binary that aren't covered by contextual profiles. Post-thinlink, the flattener is re-run and will actually change profile info, but just for the modules containing contextual trees ("specialized modules"). For the rest, the flattener just yanks out the instrumentation.
Extract the logic for choosing FP_TO_UINT vs FP_TO_SINT opcodes into a
TLI hook. This hook can be overridden by targets that prefer not to use
the default behavior of replacing FP_TO_UINT with FP_TO_SINT when both
are custom.
Implement an override for NVPTX to only change opcode when
FP_TO_UINT is not legal and FP_TO_SINT is legal.
This fixes:
```
[2295/3381] Building CXX object lib\Passes\CMakeFiles\LLVMPasses.dir\PassBuilder.cpp.obj
In file included from C:\git\llvm-project\llvm\lib\Passes\PassBuilder.cpp:368:
In file included from C:\git\llvm-project\llvm\include\llvm/Transforms/Vectorize/SandboxVectorizer/SandboxVectorizer.h:16:
C:\git\llvm-project\llvm\include\llvm/SandboxIR/Context.h(73,16): warning: unqualified friend declaration referring to type outside of the nearest enclosing namespace is a Microsoft extension; add a nested name specifier [-Wmicrosoft-unqualified-friend]
73 | friend class Region; // For LLVMCtx.
| ^
| ::llvm::
1 warning generated.
```
This fixes:
```
[1230/3381] Building CXX object lib\Transforms\Vectorize\CMakeFiles\LLVMVectorize.dir\SandboxVectorizer\VecUtils.cpp.obj
In file included from C:\git\llvm-project\llvm\lib\Transforms\Vectorize\SandboxVectorizer\VecUtils.cpp:9:
In file included from C:\git\llvm-project\llvm\include\llvm/Transforms/Vectorize/SandboxVectorizer/VecUtils.h:17:
C:\git\llvm-project\llvm\include\llvm/SandboxIR/Type.h(55,16): warning: unqualified friend declaration referring to type outside of the nearest enclosing namespace is a Microsoft extension; add a nested name specifier [-Wmicrosoft-unqualified-friend]
55 | friend class CallBase; // For LLVMTy.
| ^
| ::llvm::
C:\git\llvm-project\llvm\include\llvm/SandboxIR/Type.h(60,16): warning: unqualified friend declaration referring to type outside of the nearest enclosing namespace is a Microsoft extension; add a nested name specifier [-Wmicrosoft-unqualified-friend]
60 | friend class CmpInst; // For LLVMTy. TODO: Cleanup after
| ^
| ::llvm::
2 warnings generated.
```
https://reviews.llvm.org/D17938 introduced lowerRelativeReference to
give ConstantExpr sub (A-B) special semantics in ELF: when `A` is an
`unnamed_addr` function, create a PLT-generating relocation. This was
intended for C++ relative vtables, but C++ relative vtable ended up
using DSOLocalEquivalent (lowerDSOLocalEquivalent).
This special treatment of `unnamed_addr` seems unusual.
Let's remove it. Only COFF needs an overload to generate a @IMGREL32
relocation specifier (llvm/test/MC/COFF/cross-section-relative.ll).
Pull Request: https://github.com/llvm/llvm-project/pull/134781
As part of RemoveFactorFromExpression, we attempt to remove a factor
from a mul/fmul expression; this may involve generating new
instructions, e.g. to negate the result if the factor was negative in
the original expression. When this happens, the new instructions should
have a DebugLoc set from the instruction that the factored expression is
being used to compute.
Found using https://github.com/llvm/llvm-project/pull/107279.
Following PR #132569 (RISC-V), which added `parseDataExpr` for parsing
expressions in data directives (e.g., `.word`), this PR migrates AArch64
`@plt`, `@gotpcrel`, and `@AUTH` from the `parsePrimaryExpr` workaround
to `parseDataExpr`. The goal is to align with the GNU assembler model,
where relocation specifiers apply to the entire operand rather than
individual terms, reducing complexity-especially evident in `@AUTH`
parsing.
Note: AArch64 ELF lacks an official syntax for data directives
(#132570). A prefix notation might be a preferable future direction.
I recommend `%specifier(expr)`.
AsmParser's `@specifier` parsing is suboptimal, necessitating lexer
workarounds. `@` might appear multiple times in an operand.
We should not use `@` beyond the existing AArch64 Mach-O instruction
operands.
In the test elf-reloc-ptrauth.s, many errors are now reported at parse
time.
Pull Request: https://github.com/llvm/llvm-project/pull/134202
This reverts commit d1a05721172272f7aab685b56d99e86814a15bff.
There was further discussion on the PR about whether the intinsics
should exist in this form.
This is an optional mechanism that automatically detects roots. It's a best-effort mechanism, and its main goal is to *avoid* pointing at the message pump function as a root. This is the function that polls message queue(s) in an infinite loop, and is thus a bad root (it never exits).
High-level, when collection is requested - which should happen when a server has already been set up and handing requests - we spend a bit of time sampling all the server's threads. Each sample is a stack which we insert in a `PerThreadCallsiteTrie`. After a while, we run for each `PerThreadCallsiteTrie` the root detection logic. We then traverse all the `FunctionData`, find the ones matching the detected roots, and allocate a `ContextRoot` for them. From here, we special case `FunctionData` objects, in `__llvm_ctx_profile_get_context, that have a `CtxRoot` and route them to `__llvm_ctx_profile_start_context`.
For this to work, on the llvm side, we need to have all functions call `__llvm_ctx_profile_release_context` because they _might_ be roots. This comes at a slight (percentages) penalty during collection - which we can afford since the overall technique is ~5x faster than normal instrumentation. We can later explore conditionally enabling autoroot detection and avoiding this penalty, if desired.
Note that functions that `musttail call` can't have their return instrumented this way, and a subsequent patch will harden the mechanism against this case.
The mechanism could be used in combination with explicit root specification, too.
We will subsequently treat the whole profile as "flat" in the frontend, (i.e flatten and combine with the flat profile section), so we can have a profile for ThinLTO for parts of the application that don't come under the contextual profile. After ThinLTO, we will treat the module(s) containing contextual trees differently: they'll have only the contextual profile pertinent to them. The rest of the modules (non-contextual) will proceed "as usual", off the flattened profile.
This patch implements pruning of the contextual profile to enable the above.