Currently, we use `csrr` with `vlenb` to obtain the `VLEN`, but this is
not the only option. We can also use `vsetvli` with `e8`/`m1` to get
`VLENMAX`, which is equal to the VLEN. This method is preferable on some
microarchitectures and makes it easier to obtain values like `VLEN * 2`,
`VLEN * 4`, or `VLEN * 8`, reducing the number of instructions needed to
calculate VLEN multiples.
However, this approach is *NOT* always interchangeable, as it changes
the state of `VTYPE` and `VL`, which can alter the behavior of vector
instructions, potentially causing incorrect code generation if applied
after a vsetvli insertion. Therefore, we limit its use to the
prologue/epilogue for now, as there are no vector operations within the
prologue/epilogue sequence.
With further analysis, we may extend this approach beyond the
prologue/epilogue in the future, but starting here should be a good
first step.
This feature is gurded by the `+prefer-vsetvli-over-read-vlenb` feature,
which is disabled by default for now.
On Mac x86-64, the debugserver reports a register ('ds' at least) but
returns an error when we try to read it. Just skip storing such
registers in snapshots so we won't try to restore them.
Building clang under UBsan, it reported an integer overflow in this
function when the input value was -2^63, because it's UB to pass the
maximum negative value of an integer type to `std::abs`.
Fixed by adding a new absolute-value function in `MathExtras.h` whose
return type is the unsigned version of the argument type, and using that
instead.
(This seems like the kind of thing C++ should have already had, but
apparently it doesn't, and I couldn't find an existing one in LLVM
Support either.)
Initial Parse/Sema support for reduction over private variable with
reduction clause.
Section 7.6.10 in in OpenMP 6.0 spec.
- list item in a reduction clause can now be private in the enclosing
context.
- Added support for _original-sharing-modifier_ with reduction clause.
---------
Co-authored-by: Chandra Ghale <ghale@pe31.hpc.amslabs.hpecorp.net>
In preparation for adding more gadget kinds to detect, streamline
issue reporting.
Rename classes representing issue reports. In particular, rename
`Annotation` base class to `Report`, as it has nothing to do with
"annotations" in `MCPlus` terms anymore. Remove references to "return
instructions" from variable names and report messages, use generic
terms instead. Rename NonPacProtectedRetAnalysis to PAuthGadgetScanner.
Remove `GeneralDiagnostic` as a separate class, make `GenericReport`
(former `GenDiag`) store `std::string Text` directly. Remove unused
`operator=` and `operator==` methods, as `Report`s are created on the
heap and referenced via `shared_ptr`s.
Introduce `GadgetKind` class - currently, it only wraps a `const char *`
description to display to the user. This description is intended to be
a per-gadget-kind constant (or a few hard-coded constants), so no need
to store it to `std::string` field in each report instance. To handle
both free-form `GenericReport`s and statically-allocated messages
without unnecessary overhead, move printing of the report header to the
base class (and take the message argument as a `StringRef`).
The LoopInterchange cost-model consists of several decision rules. They
are called one by one, and if some rule can determine the profitability,
then the subsequent rules aren't called. In the current implementation,
the rule for `CacheCostAnalysis` is called first, and if it fails to
determine the profitability, then the rule for vectorization is called.
However, there are cases where interchanging loops for vectorization
makes the code faster even if such exchanges are detrimental to the
cache. For example, exchanging the inner two loops in the following
example looks about x3 faster in my local (compiled with `-O3
-mcpu=neoverse-v2 -mllvm -cache-line-size=64`), even though it's
rejected by the rule based on cache cost. (NOTE: LoopInterchange cannot
exchange these loops due to legality checks. This should also be
improved.)
```c
__attribute__((aligned(64))) float aa[256][256],bb[256][256],cc[256][256],
dd[256][256],ee[256][256],ff[256][256];
// Alternative of TSVC s231 with more array accesses than the original.
void s231_alternative() {
for (int nl = 0; nl < 100*(100000/256); nl++) {
for (int i = 0; i < 256; ++i) {
for (int j = 1; j < 256; j++) {
aa[j][i] = aa[j-1][i] + bb[j][i] + cc[i][j]
+ dd[i][j] + ee[i][j] + ff[i][j];
}
}
}
}
```
This patch introduces a new option to prioritize the vectorization rule
over the cache cost rule.
Related issue: #131130
---------
Co-authored-by: Florian Hahn <flo@fhahn.com>
The previous implementation incorrectly calculated incoming values from
loop backedges, as demonstrated by the tests. The issue was that it did
not distinguish between live-in and live-out values for blocks. This
patch addresses the problem and fixes
https://github.com/llvm/llvm-project/pull/131761.
To avoid bloating storage in `R.Defines`, processing data has been moved
to a temporary map `BBInfos`. This change helps manage heap allocation
more efficiently and likely improves caching.
Follow the X86, Mips, and RISCV renaming.
> "Relocation modifier" suggests adjustments happen during the linker's relocation step rather than the assembler's expression evaluation.
> "Relocation specifier" is clear, aligns with Arm and IBM AIX's documentation, and fits the assembler's role seamlessly.
In addition, rename *MCExpr::getKind, which confusingly shadows the base class getKind.
The parseSpecifier name follows Sparc.
Follow the X86, Mips, and RISCV renaming.
> "Relocation modifier" suggests adjustments happen during the linker's relocation step rather than the assembler's expression evaluation.
> "Relocation specifier" is clear, aligns with Arm and IBM AIX's documentation, and fits the assembler's role seamlessly.
In addition, rename *MCExpr::getKind, which confusingly shadows the base class getKind.
Follow the X86, Mips, and RISCV renaming.
> "Relocation modifier" suggests adjustments happen during the linker's relocation step rather than the assembler's expression evaluation.
> "Relocation specifier" is clear, aligns with Arm and IBM AIX's documentation, and fits the assembler's role seamlessly.
In addition, rename *MCExpr::getKind, which confusingly shadows the base class getKind.
The aarch64-cond-br-tuning pass transforms a CBZX instruction into a
conditional branch (B.cond). One of the by products of the
transformation is that the source instruction of the CBZX, which is an
ANDXri instruction, gets transformed into a ANDSXri instruction, however
this transformation doesn't preserve it's debug instruction number.
This patch fixes that issue.
Follow the X86, Mips, and RISCV renaming.
> "Relocation modifier" suggests adjustments happen during the linker's relocation step rather than the assembler's expression evaluation.
> "Relocation specifier" is clear, aligns with Arm and IBM AIX's documentation, and fits the assembler's role seamlessly.
In addition, rename *MCExpr::getKind, which confusingly shadows the base class getKind.
Follow the X86 and Mips renaming.
> "Relocation modifier" suggests adjustments happen during the linker's relocation step rather than the assembler's expression evaluation.
> "Relocation specifier" is clear, aligns with Arm and IBM AIX's documentation, and fits the assembler's role seamlessly.
In addition, rename *MCExpr::getKind, which confusingly shadows the base class getKind.
DenseSet, SmallPtrSet, SmallSet, SetVector, and StringSet recently
gained C++23-style insert_range. This patch replaces:
Dest.insert(Src.begin(), Src.end());
with:
Dest.insert_range(Src);
This patch does not touch custom begin like succ_begin for now.
DenseSet, SmallPtrSet, SmallSet, SetVector, and StringSet recently
gained C++23-style insert_range. This patch replaces:
Dest.insert(Src.begin(), Src.end());
with:
Dest.insert_range(Src);
This patch does not touch custom begin like succ_begin for now.
Follow the X86 and Mips renaming.
> "Relocation modifier" suggests adjustments happen during the linker's relocation step rather than the assembler's expression evaluation.
> "Relocation specifier" is clear, aligns with Arm and IBM’s usage, and fits the assembler's role seamlessly.
In addition, rename *MCExpr::getKind, which confusingly shadows the base class getKind.
Follow the X86 renaming.
> "Relocation modifier" suggests adjustments happen during the linker's relocation step rather than the assembler's expression evaluation.
> "Relocation specifier" is clear, aligns with Arm and IBM’s usage, and fits the assembler's role seamlessly.
In addition, rename MipsMCExpr::getKind, which confusingly shadows the base class getKind.
* `PyRegionList` is now sliceable. The dialect bindings generator seems
to assume it is sliceable already (!), yet accessing e.g. `cases` on
`scf.IndexedSwitchOp` raises a `TypeError` at runtime.
* `PyBlockList` and `PyOperationList` support negative indexing. It is
common for containers to do that in Python, and most container in the
MLIR Python bindings already allow the index to be negative.
Move target-specific members outside of MCSymbolRefExpr::VariantKind
(a legacy interface I am eliminating). Most changes are mechanic,
except:
* ELFObjectWriter::shouldRelocateWithSymbol
* The legacy generic code uses `ELFObjectWriter::fixSymbolsInTLSFixups`
to set `STT_TLS` (and use an unnecessary expression walk). The better
way is to do this in `getRelocType`, which I have done for
AArch64, PowerPC, and RISC-V.
In the future, we should encode expressions with a relocation specifier
as X86MCExpr and use MCValue::RefKind to hold the specifier of the
relocatable expression.
https://maskray.me/blog/2025-03-16-relocation-generation-in-assemblers
While here, rename "Modifier' to "Specifier":
> "Relocation modifier", though concise, suggests adjustments happen during the linker's relocation step rather than the assembler's expression evaluation. I landed on "relocation specifier" as the winner. It's clear, aligns with Arm and IBM’s usage, and fits the assembler's role seamlessly.
Pull Request: https://github.com/llvm/llvm-project/pull/132149
The i6400 and i6500 are high performance multi-core microprocessors from
MIPS that provide best in class power efficiency for use in
system-on-chip (SoC) applications. i6400 and i6500 implements Release 6
of the MIPS64 Instruction Set Architecture with full hardware
multithreading and hardware virtualization support.
- Add '_' after cvt[t]s intrinsics when 's' is for saturation;
- Add 's_' for all ipcvt[t] intrinsics since they are all saturation
ones;
- Move 's' after 'cvt' and add '_' after it for prior `biass`
intrinsics;
This is to solve potential confusion since 's' before a type usually
represents for scalar.
Synced with GCC folks and they will change in the same way.
This introduces some tablegen helpers for defining compatibility
warnings. The main aim of this is to both simplify adding new
compatibility warnings as well as to unify the naming of compatibility
warnings.
I’ve refactored ~half of the compatiblity warnings (that follow the
usual scheme) in `DiagnosticSemaKinds.td` for illustration purposes and
also to simplify/unify the wording of some of them (I also corrected a
typo in one of them as a drive-by fix).
I haven’t (yet) migrated *all* warnings even in that one file, and there
are some more specialised ones for which the scheme I’ve established
here doesn’t work (e.g. because they’re warning+error instead of
warning+extwarn; however, warning+extension *is* supported), but the
point of this isn’t to implement *all* compatibility-related warnings
this way, only to make the common case a bit easier to handle.
This currently also only handles C++ compatibility warnings, but it
should be fairly straight-forward to extend the tablegen code so it can
also be used for C compatibility warnings (if this gets merged, I’m
planning to do that in a follow-up pr).
The vast majority of compatibility warnings are emitted by writing
```c++
Diag(Loc, getLangOpts().CPlusPlusYZ ? diag::ext_... : diag::warn_...)
```
in accordance with which I’ve chosen the following naming scheme:
```c++
Diag(Loc, getLangOpts().CPlusPlusYZ ? diag::compat_cxxyz_foo : diag::compat_pre_cxxyz_foo)
```
That is, for a warning about a C++20 feature—i.e. C++≤17
compatibility—we get:
```c++
Diag(Loc, getLangOpts().CPlusPlus20 ? diag::compat_cxx20_foo : diag::compat_pre_cxx20_foo)
```
While there is an argument to be made against writing ‘`compat_cxx20`’
here since is technically a case of ‘C++17 compatibility’ and not ‘C++20
compatibility’, I at least find this easier to reason about, because I
can just write the same number 3 times instead of having to use
`ext_cxx20_foo` but `warn_cxx17_foo`. Instead, I like to read this as a
warning about the ‘compatibility *of* a C++20 feature’ rather than
‘*with* C++17’.
I also experimented with moving all compatibility warnings to a separate
file, but 1. I don’t think it’s worth the effort, and 2. I think it
hurts compile times a bit because at least in my testing I felt that I
had to recompile more code than if we just keep e.g. Sema-specific
compat warnings in the Sema diagnostics file.
Instead, I’ve opted to put them all in the same place within any one
file; currently this is a the very top but I don’t really have strong
opinions about this.
After 3c8c2914e067e132af951f70d2b3577fe049e19a the lowering of 64-bit
funnel shifts has been improved to the point where this intrinsic is no
longer needed.
If we have an absolute address whose high bits are known to be a sign
extend of the low 12 bits, we can avoid emitting the LUI entirely. This
is implemented in an analogous manner to the gp relative relocations -
defining an internal usage relocation type.
Since 12 bits (really 11 since the high bit must be zero in user code)
is less than one page, all of these offsets fit in the null page. As
such, the only application of these is likely to be undefined weak
symbols except for embedded use cases. I'm mostly posting this for
completeness sake.
This PR adds the missing opening parenthesis for sollya command comment
in `libc/src/math/generic/cbrtf.cpp#L28`.
Signed-off-by: krishna2803 <kpandey81930@gmail.com>
In [1], Nikita Popov suggested that during lowering 'unreachable' insn
should not generate extra code for naked functions, and this applies to
all architectures. Note that for naked functions, 'unreachable' insn is
necessary in IR since the basic block needs a terminator to end.
This patch checked whether a function is naked function or not. If it is
a naked function, 'unreachable' insn will not generate ISD::TRAP.
[1] https://github.com/llvm/llvm-project/pull/131731
Co-authored-by: Yonghong Song <yonghong.song@linux.dev>
Unary ops had previously been omitted from the list of ops handled in
the CIRCanonicalizePass. Although the incubator code doesn't use them
directly, the mlir folding code does.
This change enables folding of unary ops by adding them to the list.