If we have a near identity shuffle with a single element repeated, we
manage to match this as a masked vrgather.vi for the two operand
forms, but not the single operand form. If the scalar being repeated
was a scalar just inserted into the vector, we're also missing a
chance to recognize a vmerge.vxm or vmerge.vim in both cases.
There are some opcodes that currently require specialized recipes, due
to their result type not being implied by their operands, including
casts.
This leads to duplication from defining multiple full recipes.
This patch introduces a new VPInstructionWithType subclass that also
stores the result type. The general idea is to have opcodes needing to
specify a result type to use this general recipe. The current patch
replaces VPScalarCastRecipe with VInstructionWithType, a similar patch
for VPWidenCastRecipe will follow soon.
There are a few proposed opcodes that should also benefit, without the
need of workarounds:
* https://github.com/llvm/llvm-project/pull/129508
* https://github.com/llvm/llvm-project/pull/119284
PR: https://github.com/llvm/llvm-project/pull/129706
In powerpc64-unknown-linux-musl, signal.h does not include asm/ptrace.h,
which causes "member access into incomplete type 'struct pt_regs'"
errors. Include the header explicitly to fix this.
Also in sanitizer_linux_libcdep.cpp, there is a usage of TlsPreTcbSize
which is not defined in such a platform. Guard the branch with macro.
On Cygwin at least, GCC is not passing any -m flag to the linker, so
fall back to the default target triple to determine if we need to apply
i386-specific behaviors.
Fixes#134558
This adds an experimental flag that will keep track of where the manual memory poisoning (`__asan_poison_memory_region`) is called from, and print the stack trace if the poisoned region is accessed. (Absent this flag, ASan will tell you what code accessed a poisoned region, but not which code set the poison.)
This implementation performs best-effort record keeping using ring buffers, as suggested by Vitaly. The size of each ring buffer is set by the `poison_history_size` flag.
[NFC] add a pre-commit test case for patch [Eliminating li of 0 into arg
registers of unused
arguments](https://github.com/llvm/llvm-project/pull/122741)
The test case tests that extend poison are lower to undef and also test
there are redendunt instrution load 0 into argument registers for unused
arguments.
Stop relying on the uselist of constants, which also required filtering
the uses that are in the correct function. I'm not sure why this pass is
doing its own cloning instead of building the value map while doing its
cloning.
Due to some optimization changes, INIT_UNDEF is making its way to
`getInstSizeInBytes` in `llvm/lib/Target/SystemZ/SystemZLongBranch.cpp`
but we do not have an exception there in the assert. Since INIT_UNDEF is
described as being similar to IMPLICIT_DEF and there is a check for
IMPLICIT_DEF, it seems logical to also add a check for INIT_UNDEF.
---------
Co-authored-by: Tony Tao <tonytao@ca.ibm.com>
This patch bumps workflows depending upon the Linux CI container to
ubuntu 24.04. The 22.04 container is no longer being built as it was
recently bumped to 24.04, so this patch moves all of these workflows
over to the new container to keep them updated and ensure they are using
an actually maintained version of the container image.
Moved check from buildTree_rec function to a separate
isLegalToVectorizeScalars function.
Reviewers: RKSimon, hiraditya
Reviewed By: RKSimon
Pull Request: https://github.com/llvm/llvm-project/pull/134132
The v1i64 patterns were next to the vector variants, not the SIMDScalar
instructions tht define them. In moving them closer they cal also be
incorporated into the definitions themselves. SIMDScalarRShiftDTied is
altered to remove the redundant i64 variants.
We execute tests in read only environment which leads to test failure
when tests try to write to the current directory. Either they should
write to a temporary directory or not write if output is not needed.
Fallback from #134717
…utdown'
This patch emits the lowering for 'device_type' on an 'init' or
'shutdown'. This one is fairly unique, as these directives have it as an
attribute, rather than as a component of the individual operands, like
the rest of the constructs.
So this patch implements the lowering as an attribute.
In order to do tis, a few refactorings had to happen: First, the
'emitOpenACCOp' functions needed to pick up th edirective kind/location
so that the NYI diagnostic could be reasonable.
Second, and most impactful, the `applyAttributes` function ends up
needing to encode some of the appertainment rules, thanks to the way the
OpenACC-MLIR operands get their attributes attached. Since they each use
a special function (rather than something that can be legalized at
runtime), the forms of 'setDefaultAttr' is only valid for some ops. SO
this patch uses some `if constexpr` and a small type-trait to help
legalize these.
In the 'single-file-parse' mode, seeing `#include UNDEFINED_IDENTIFIER`
should not be treated as an error. The identifier might be defined in a
header that we decided to skip, resulting in a nonsensical diagnostic
from the user point of view.
This header is needed to provide the declaration for the sqrt template.
You can build without these in the CMake build, but not having this
include in the architecture specific headers makes them not self
contained.
Inferring super-register classes naively requires checking every
register class against every other register class and sub-register
index.
Each of those checks is itself a non-trivial operation on register sets.
Culling as many (RC, RC, SubIdx) triples as possible is important for
the running time of TableGen for architectures with complex sub-register
relations.
Use transitivity to cull many (RC, RC, SubIdx) triples. This
unfortunately requires us to complete the transitive closure of
super-register classes explicitly, but it still cuts down the running
time on AMDGPU substantially -- in some upcoming work in the
backend by more than half (in very rough measurements).
This changes the names of some of the inferred register classes, since
the order in which they are inferred changes. The names of the inferred
register classes become shorter, which reduces the size of the generated
files.
Replacing some uses of SmallPtrSet by DenseSet shaves off a few more
percent; there are hundreds of register classes in AMDGPU.
Tweaking the topological signature check to skip reigsters without
super-registers further helps skip register classes that have "pseudo"
registers in them whose sub- and super-register structure is trivial.
This is a ~port of https://reviews.llvm.org/D117284. Like in that
change, archives without indices are treated as a collection of lazy
object files (as in `--start-lib/--end-lib`)
Porting the ELF follow-up to convert *all* archives to the lazy object
code path (https://reviews.llvm.org/D119074) is a natural next step, but
we would need to ensure the assertions about memory use hold for Mach-O.
NB: without an index, we can't do the part of the `-ObjC` scan where we
check for Objective-C symbols directly. We *can* still check for
`__obcj` sections so I wonder how much of a problem this actually is,
since I'm not sure how the "symbols but no sections" case can appear in
the wild.
fixes#135122
SemaExpr.cpp - Make all doubles fail. Add sema support for float scalars
and vectors when language mode is HLSL.
CGExprScalar.cpp - Allow emit frem when language mode is HLSL.
* DescribeIEEESignaledExceptions() is unused on the device - warning.
* StopStatementText() could return while marked noreturn - warning.
* Including cuda/std/complex only in the device compilation
may cause nvcc to try to register variables in `cuda` namespace,
while they are not defined in the host compilation - error.
I decided to include cuda/std/complex always under RT_USE_LIBCUDACXX.
PTX 7.1 introduces the concept of a "sink" register, `_`, which is a
register to which writes are ignored.
This patch makes us use sink registers where possible, instead of using
explicit temp registers.
This results in cleaner assembly, and also works around a problem we
encountered in some private workloads.
(Unfortunately the tablegen is not particularly clean. But then again,
it's tablegen...)
Handle signals in a separate thread in the driver so that we can stop
worrying about signal safety of functions in libLLDB that may get called
from a signal handler.
Update the documentation for the unsafe_buffer_usage attribute to
capture the new behavior introduced by
https://github.com/llvm/llvm-project/pull/125671
Co-authored-by: MalavikaSamak <malavika2@apple.com>
Scanning functions without CFG information as well as the detection of
authentication oracles requires introducing more classes related to
register state analysis. To make the future code easier to understand,
rename several classes beforehand.
To detect authentication oracles, one has to query the properties of
*output* operands of authentication instructions *after* the instruction
is executed - this requires adding another analysis that iterates over
the instructions in reverse order, and a corresponding state class.
As the main difference of the existing `State` class is that it stores
the properties of source register operands of the instructions before
the instruction's execution, rename it to `SrcState` and
`PacRetAnalysis` to `SrcSafetyAnalysis`.
Apply minor adjustments to the debug output along the way.
This lock is unnecessary because we can add the relocations to
shards and let them be sorted later.
Reviewers: smithp35, fmayer, MaskRay
Reviewed By: MaskRay
Pull Request: https://github.com/llvm/llvm-project/pull/135123
The existing version check can lead to test failures on some distribution
packages of gdb where not all components of the version number are
integers, such as Fedora where gdb.VERSION can be something like
"15.2-4.fc41". Fix it by replacing the version check with a feature check.
Reviewers: philnik777
Reviewed By: philnik777
Pull Request: https://github.com/llvm/llvm-project/pull/132291
This changes the TemplateArgument representation to hold a flag
indicating whether a template argument of expression type is supposed to
be canonical or not.
This gets one step closer to solving
https://github.com/llvm/llvm-project/issues/92292
This still doesn't try to unique as-written TSTs. While this would
increase the amount of memory savings and make code dealing with the AST
more well-behaved, profiling template argument lists is still too
expensive for this to be worthwhile, at least for now. Without this
uniquing, this patch stands neutral in terms of performance impact.
This also fixes the context creation of TSTs, so that they don't in some
cases get incorrectly flagged as sugar over their own canonical form.
This is captured in the test expectation change of some AST dumps.
This fixes some places which were unnecessarily canonicalizing these
TSTs.
See test case. When Fortran line continuation has been used, don't
insert spaces in -E formatted output to put things into the right
column, as this can break up a token.
Fixes https://github.com/llvm/llvm-project/issues/134986.
Make some minor tweaks (inlining, caching) to the formatting input path
to improve integer input in a SPEC code. (None of the I/O library has
been tuned yet for performance, and there are some easy optimizations
for common cases.) Input integer values are now calculated with native
C/C++ 128-bit integers.
A benchmark that only reads about 5M lines of three integer values each
speeds up from over 8 seconds to under 3 in my environment with these
changeds.
If this works out, the code here can be used to optimize the formatted
input paths for real and character data, too.
Fixes https://github.com/llvm/llvm-project/issues/134026.
These nodes are effectively free, so we should only concatenate if the
inner nodes will concatenate together.
This also exposed a regression in canonicalizeShuffleWithOp that failed
to realize it could potentially merge shuffles with a CONCAT_VECTORS
node.
Summary:
Previously, we removed the special handling for the code object version
global. I erroneously thought that this meant we cold get rid of this
weird `-Xclang` option. However, this also emits an LLVM IR module flag,
which will then cause linking issues.