Improve hasNonDefaultLowerBounds to follow box fir.convert. This helps
HLFIR helpers to generate less code when it can be easily deduced that
the fir.box lower bounds were set to ones.
It will help me for SELECT RANK lowering to avoid generating
hlfir.declare with lower bounds inside the RANK CASE (Current situation
would not be incorrect, the lower bounds would be SSA value ending-up
being one, I just want simpler IR).
Renamed to mayHaveNonDefaultLowerBounds since it may still answer yes when
the lower bounds are ones.
This patch adds processing of min/max intrinsics in LoopPeel in the
similar way as it was done for conditional statements: for
min/max(IterVal, BoundVal) we peel iterations where IterVal < BoundVal
for monotonically increasing IterVal; for monotonically decreasing
IterVal we peel iterations where IterVal > BoundVal (strict comparision
predicates are used to minimize number of peeled iterations).
Updated the documentation in `checkers.rst` to include an example of how
`trylock` function is handled.
Added a new test for a scenario where `pthread_mutex_trylock` is used,
demonstrating the current limitation.
Prefer using `llvm-spirv-<LLVM_VERSION_MAJOR>` tool (i.e.
`llvm-spirv-18`) over plain `llvm-spirv`. If the versioned tool is not
found in PATH, fall back to use the plain `llvm-spirv`.
An issue with the using `llvm-spirv` is that the one found in PATH might
be compiled against older LLVM version which could lead to crashes or
obscure bugs. For example, `llvm-spirv` distributed by Ubuntu links
against different LLVM version depending on the Ubuntu release (LLVM-10
in 20.04LTS, LLVM-13 in 22.04LTS).
The pass constructor can be generated automatically by tablegen.
This pass does not need adapting to work with non-function top level
operations because it operates specifically on call operations inside of
an OpenMP declare target function.
This reverts commit e1cc9e4eaddcc295b4e775512e33b947b1514c17.
This causes some non-trivial text size increases in unoptimized
builds for Bullet. Revert while I investigate.
Because symbols cannot refer to operations outside of their symbol
tables, it was impossible to refer to operations outside of the dialect
currently being defined. This PR modifies the lookup logic to happen
relative to the symbol table containing the dialect-defining operations.
This is a bit of hack but should unblock the situation here.
I'd like to nominate myself to join the LLVM Security group as a
representative of ST. I work in ST's compiler team contributing to
upstream (LLVM and GNU) and several downstream toolchains. We believe
that it is important for us to be part of this group to address or
report any potential security issues the LLVM project or our toolchains
may encounter.
This fold is subtly incorrect, because DL-unaware constant folding does
not know the correct index type to use, and just performs the addition
in the type that happens to already be there. This is incorrect, since
sext(X)+sext(Y) is generally not the same as sext(X+Y). See the
`@constexpr_gep_of_gep_with_narrow_type()` for a miscompile with the
current implementation.
One could try to restrict the fold to cases where no overflow occurs,
but I'm not bothering with that here, because the DL-aware constant
folding will take care of this anyway. I've only kept the
straightforward zero-index case, where we just concatenate two GEPs.
Currently, the tablegen files that generate the instruction definitions
in lib/Target/AMDGPU/AMDGPUGenInstrInfo.inc often only include implicit
operands for the architecture-independent pseudo instructions, but not
for the corresponding real instructions. The missing implicit operands
(most prominently: the EXEC mask) do not affect code generation, since
that operates on pseudo instructions, but they are problematic when
working with real instructions, e.g., as a decoding result from the MC
layer.
This patch copies the implicit Defs and Uses from pseudo instructions to
the corresponding real instructions, so that implicit operands are also
defined for real instructions.
Addresses issue #89830.
Assumed-rank fir.box/class may describe assumed-size array. This case
needs special handling in SELECT RANK. It is not possible to generate
FIR code to detect that a fir.box is an assumed-size (the way to detect
that is to check that upper dimension extent is -1 in the descriptor).
Instead of emitting a runtime call directly in lowering, add an
operation that can later be lowered to a runtime call or inline code
when the descriptor layout is known.
GEPNoWrapFlags.h calls `assert` creating a undeclared identifier error
when running an Apple-stage2 build with LLVM_ENABLE_MODULES enabled.
resolves: rdar://129031201
If a weak function is missing, still return it's address (zero) rather
than failing interpretation. Otherwise we have a mismatch between
Interpret() and CanInterpret() resulting in failures that would not
occur with JIT execution.
Alternatively, we could try to look for weak symbols in CanInterpret()
and generally reject them there.
This is the root cause for the issue exposed by
https://github.com/llvm/llvm-project/pull/92885. Previously, the case
affected by that always fell back to JIT because an icmp constant
expression was used, which is not supported by the interpreter. Now a
normal icmp instruction is used, which is supported. However, we fail to
interpret due to incorrect handling of weak function addresses.
`MachORebaseEntry::moveNext()` and `MachOBindEntry::moveNext()` assume
that the rebase/bind table ends with `{REBASE|BIND}_OPCODE_DONE` or an
actual rebase/bind. However a valid rebase/bind table might also end
with other effectively no-op opcodes, which caused the parser to move
past the end and go into the next table, resulting in corrupted entries
or infinite loops.
CDSplit splits functions up to three ways: main fragment with no suffix,
and fragments with .cold and .warm suffixes.
Add .warm suffix to the regex used to recognize split fragments.
Test Plan: updated register-fragments-bolt-symbols.s
`Eval->Value.get` returns a null pointer when the variable doesn't have
an initializer. Use `cast_if_present` instead of `cast`.
This fixes https://github.com/llvm/llvm-project/issues/93625.
rdar://128482541
This reverts commit fe82a3da36196157c0caa1ef2505186782f750d1.
This broke LLDB on MacOS due to a missing symbol during linking.
The fix has been applied in c6c08eee37bada190bd1aa4593c88a5e2c8cdaac.
Original commit message:
The terminfo dependency introduces a significant nonhermeticity into the
build. It doesn't respect `--no-undefined-version` meaning that it's not
a dependency that can be built with Clang 17+. This forces maintainers
of source-based distributions to implement patches or ignore linker
errors.
Remove it to reduce the closure size and improve portability of
LLVM-based tools. Users can still use command line arguments to toggle
color support expliticly.
Fixes#75490Closes#53294#23355
DeclBase.h only contains a forward declaration of ObjCMethodDecl, and
when building clang/Sema/Attr.h with header modules this causes a build
failure because `llvm::isa<ObjCMethodDecl>` requires the full type.
This test is meant to check the behavior when -fno-standalone-debug is
active - it doesn't care whether it's explicit or implicit, so let's
make it explicit so it applies equally to MacOS and other platforms.
Explicitly mark the unused implicit arguments in the test, since this
should be sensitive to the number of free user SGPRs.
This is in preparation for #83131.
With this patch, we stop using on-disk hash tables for Frames and call
stacks. Instead, we'll write out all the Frames as a flat array while
maintaining mappings from FrameIds to the indexes into the array.
Then we serialize call stacks in terms of those indexes.
Likewise, we'll write out all the call stacks as another flat array
while maintaining mappings from CallStackIds to the indexes into the
call stack array. One minor difference from Frames is that the
indexes into the call stack array are not contiguous because call
stacks are variable-length objects.
Then we serialize IndexedMemProfRecords in terms of the indexes
into the call stack array.
Now, we describe each call stack with 32-bit indexes into the Frame
array (as opposed to the 64-bit FrameIds in Version 2). The use of
the smaller type cuts down the profile file size by about 40% relative
to Version 2. The departure from the on-disk hash tables contributes
a little bit to the savings, too.
For now, IndexedMemProfRecords refer to call stacks with 64-bit
indexes into the call stack array. As a follow-up, I'll change that
to uint32_t, including necessary updates to RecordWriterTrait.
Test llvm-project/lldb/test/API/python_api/address_range/TestAddressRange.py is failing on Windows due adding a carriage return character at the end of line. Original PR is #93836.
If the smax removed all negative numbers, then we can treat the smin
like a umin.
If the smin and smax are in the other order we can swap them and use a
vnclipu as long as the smax constant is smaller than the smin constant.
This is based on similar code from X86's detectUSatPattern.