Combine the StartBits, EndBits, and FieldVals vectors into a single
vector of a struct that contains all 3 pieces of information.
Instead of storing EndBits, we store NumBits since that's what the users
want.
I've removed the BitNo variable as it was easy to construct calculate
from StartBit. I've also removed Num in favor of Islands.size().
This reduces the amount of space needed for vectors of bit_value_t
and allows the user of memset.
Also reorder the enum values so BIT_FALSE is 0 and BIT_TRUE is 1.
Instead of returning the number of bytes emitted, just take the iterator
by reference so the increments in emitULEB128 will update the copy in
the caller.
Also pass the iterator by reference to emitNumToSkip so we don't need a
separate I += 3 in the caller.
This is necessary to enable composing subregisters in peephole-opt.
For now use a brute force table to find the return value. The worst
case target is AMDGPU with a 399 x 399 entry table.
This patch adds support for describing per-write resource cycle counts
for ReadAdvance records via a new optional field called `tunables`.
This makes it possible to declare ReadAdvance records such as:
def : ReadAdvance<Read_C, 1, [Write_A, Write_B], [2]>;
The above will effectively declare two entries in the ReadAdvance
table for Read_C, one for Write_A with a cycle count of 1+2, and one for
Write_B with a cycle count of 1+0 (omitted values are assumed 0).
The field `tunables` provides a list of deltas relative to the base
`cycle` count of the ReadAdvance. Since the field is optional and
defaults to a list of 0's, this change doesn't affect current targets.
- Change InstrInfoEmitter to emit OpName as an enum class
instead of an anonymous enum in the OpName namespace.
- This will help clearly distinguish between values that are
OpNames vs just operand indices and should help avoid
bugs due to confusion between the two.
- Rename OpName::OPERAND_LAST to NUM_OPERAND_NAMES.
- Emit declaration of getOperandIdx() along with the OpName
enum so it doesn't have to be repeated in various headers.
- Also updated AMDGPU, RISCV, and WebAssembly backends
to conform to the new definition of OpName (mostly
mechanical changes).
- Use range for loops and `enumerate` in a few places.
- Use `StringRef` for `TargetName` in `InstrInfoEmitter::run`.
- Use `\n` character for new line instead of string.
- Use StringRef in `InstrNames` (instead of std::string) and
avoid string copies.
- Detect whether logical operand mapping/named operand mappings have
been enabled in a previous pass over instructions and execute the
relevant emission code only if those mappings are enabled.
- For these mappings, skip the fixed set of predefined instructions as
they won't have these mappings enabled.
- Emit operand type mappings only for X86 target, as they are only used
by X86 and look for X86 specific `X86MemOperand`.
- Cleanup `emitOperandTypeMappings` code: remove code to handle empty
instruction list and use range for loops.
After f9250401ef120a4605ad67bb43d3b25500900498, this function is
tail recursive so it was straightforward to convert this to iteratively
walk the linkd list.
ContractNodes recursively walks forward through a linked list. During
this recursion, Matchers are combined into other Matchers.
Previously the formation of MoveSiblingMatcher was after the
recursive call so it occurred as we were unwinding. If a
MoveSiblingMatcher was formed, we would recursively walk forward
to the end of the linked list again which isn't efficient.
To make this more efficient, move the formation of MoveSiblingMatcher
to the forward pass. Add additional rules to unfold MoveSiblingMatcher
if it would be more efficient to use CheckChildType, CheckChildInteger,
CheckChildSame, etc.
As an added benefit, this makes the function tail recursive which
the compiler can better optimize.
- Emit C++17 nested namespaces.
- Shorten the binary search table name to just `Table` since its
declared in the scope of each search function.
- Use `using namespace XXX` in the search function to avoid emitting the
Target Inst Namespace prefix in the table entries.
- Add short-cut handling of `TableSize` == 0 case (verified in Hexagon
target).
- Use `SetVector` in `ColFieldValueMap` to get automatic deduplication
and eliminate manual deduplication code.
- Use range for loops.
I tried to keep it readable by making the width of the column with the
index always enough to contain the largest number.
That way things don't shift to the right every time a new digit appears,
it remains consistent.
Tests don't break because this only affects the beginning of the line
and FileCheck doesn't care about what comes before for the most part.
Example of the new output:
```
/* 758359 */ // Label 9988: @758359
/* 758359 */ GIM_Try, /*On fail goto*//*Label 9989*/ GIMT_Encode4(758435), // Rule ID 6715 //
/* 758364 */ GIM_CheckConstantInt8, /*MI*/0, /*Op*/2, 0,
/* 758368 */ // MIs[0] offset
```
Fixes#119177
Add declarations of SDTypeConstraint's operator== and operator< to the
llvm namespace. These are declared as friends inside the class which
makes them part of the enclosing namespace, but gcc wants it to be more
explicit.
Fixes#125537.
- Use StringRef::str() instead of std::string(StringRef).
- Use const pointers for `Candidates` in getSuperRegForSubReg().
- Make `AsmParserCat` and `AsmWriterCat` static.
- Use enumerate() in `ComputeInstrsByEnum` to assign inst enums.
- Use range-based for loops.
This patch adds a new option `-aarch64-enable-zpr-predicate-spills`
(which is disabled by default), this option replaces predicate spills
with vector spills in streaming[-compatible] functions.
For example:
```
str p8, [sp, #7, mul vl] // 2-byte Folded Spill
// ...
ldr p8, [sp, #7, mul vl] // 2-byte Folded Reload
```
Becomes:
```
mov z0.b, p8/z, #1
str z0, [sp] // 16-byte Folded Spill
// ...
ldr z0, [sp] // 16-byte Folded Reload
ptrue p4.b
cmpne p8.b, p4/z, z0.b, #0
```
This is done to avoid streaming memory hazards between FPR/vector and
predicate spills, which currently occupy the same stack area even when
the `-aarch64-stack-hazard-size` flag is set.
This is implemented with two new pseudos SPILL_PPR_TO_ZPR_SLOT_PSEUDO
and FILL_PPR_FROM_ZPR_SLOT_PSEUDO. The expansion of these pseudos
handles scavenging the required registers (z0 in the above example) and,
in the worst case spilling a register to an emergency stack slot in the
expansion. The condition flags are also preserved around the `cmpne` in
case they are live at the expansion point.
The loop at the top of FactorNodes creates additional variables to deal
with needing to use a pointer to a unique_ptr instead of a reference.
Encapsulate this to its own function for better scoping.
This also allows us to directly skip this loop when we already know we
have a ScopeMatcher.