28106 Commits

Author SHA1 Message Date
Matt Arsenault
b71203a751 GlobalISel: Move some legalizer functions to utils 2020-03-04 16:40:00 -05:00
Matt Arsenault
fb0c35fa34 GlobalISel: Set alignment on function argument stack load/store 2020-03-04 16:38:46 -05:00
Wei Mi
3c96d01d2e Generate Callee Saved Register (CSR) related cfi directives like .cfi_restore.
https://reviews.llvm.org/D42848 only handled CFA related cfi directives but
didn't handle CSR related cfi. The patch adds the CSR part. Basically it reuses
the framework created in D42848. For each basicblock, the patch tracks which
CSR set have been saved at its CFG predecessors's exits, and compare the CSR
set with the set at its previous basicblock's exit (The previous block is the
block laid before the current block). If the saved CSR set at its previous
basicblock's exit is larger, .cfi_restore will be inserted.

The patch also generates proper .cfi_restore in epilogue to make sure the
saved CSR set is consistent for the incoming edges of each block.

Differential Revision: https://reviews.llvm.org/D74303
2020-03-04 11:18:37 -08:00
Guozhi Wei
ee9a3eba76 [CodeGenPrepare] Handle ExtractValueInst in dupRetToEnableTailCallOpts
As the test case shows if there is an ExtractValueInst in the Ret block, function dupRetToEnableTailCallOpts can't duplicate it into the block containing call. So later no tail call is generated in CodeGen.

    This patch adds the ExtractValueInst handling code in function dupRetToEnableTailCallOpts and FoldReturnIntoUncondBranch, and later tail call can be generated for this case.

Differential Revision: https://reviews.llvm.org/D74242
2020-03-04 11:10:32 -08:00
Nikita Popov
0e890cd4d4 [ConstantFolding] Always return something from ConstantFoldConstant
Spin-off from D75407. As described there, ConstantFoldConstant()
currently returns null for non-ConstantExpr/ConstantVector inputs,
but otherwise always returns non-null, independently of whether
any folding has happened or not.

This is confusing and makes consumer code more complicated.
I would expect either that ConstantFoldConstant() returns only if
it actually folded something, or that it always returns non-null.
I'm going to the latter possibility here, which appears to be more
useful considering existing usage.

Differential Revision: https://reviews.llvm.org/D75543
2020-03-04 18:24:47 +01:00
Sanjay Patel
29a2b20ab3 [SDAG] simplify FP binops to undef
As discussed in the commit thread for rGa253a2a and D73978, we can do more undef folding for FP ops.
The nnan and ninf fast-math-flags specify that if an operand is the disallowed value, the result is
poison, so we can produce an undef result.

But this doesn't work as expected (the undef operand cases remain) because of a Flags propagation
problem in SelectionDAGBuilder.

I've added DAGCombiner calls to enable these for the other cases because we've shown in other
patches that (because of the limited way that SDAG iterates), it is possible to miss simplifications
like this if they are done only at node creation time.

Several potential follow-ups to expand on this patch are possible.

Differential Revision: https://reviews.llvm.org/D75576
2020-03-04 10:42:16 -05:00
Amara Emerson
e91e1df6ab [GlobalISel][Localizer] Enable intra-block localization of already-local uses.
This changes the localizer to attempt intra-block localizer of instructions
that have local uses. This is useful because sometimes the entry block itself
has many uses of constant-like instructions, which would benefit from shortening
live ranges. Previously if an inst had no non-local uses, we wouldn't add it to
the list of instructions to attempt further intra-block localization.

This gives a 0.7% geomean code size improvement on CTMark.

Differential Revision: https://reviews.llvm.org/D75555
2020-03-03 18:14:57 -08:00
Fangrui Song
90acc505ed [MCDwarf] Change emitListsTableHeaderStart to use a reference and fold Start/End symbols generation into it
Apply @dblaikie's suggestions in a post-commit review for D75375

Reviewed By: dblaikie

Differential Revision: https://reviews.llvm.org/D75568
2020-03-03 16:20:40 -08:00
Amy Huang
5b3b21f025 [DebugInfo] Fix for adding "returns cxx udt" option to functions in CodeView.
Summary:
This change checks for the return type in the frontend and adds a flag
to the DISubroutineType to indicate that the option should be added in
CodeViewDebug.

Previously function types sometimes appeared twice in the PDB: once with
"returns cxx udt" and once without.
See https://bugs.llvm.org/show_bug.cgi?id=44785.

Reviewers: rnk, asmith

Subscribers: hiraditya, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D75215
2020-03-03 14:00:08 -08:00
Vedant Kumar
f002ee55c7 [MachineVerifier] Remove placement rule exception for debug entry values
There should not be an exception allowing debug entry values to be
placed after a terminator.

Differential Revision: https://reviews.llvm.org/D75559
2020-03-03 13:02:18 -08:00
Vedant Kumar
2bf496620c [LiveDebugValues] Do not insert DBG_VALUEs after a MBB terminator
This fixes a miscompile that happened because a DBG_VALUE interfered
with the MachineOutliner's liveness analysis.

Inserting a DBG_VALUE after a terminator breaks predicates on MBB such
as isReturnBlock(). And the resulting DBG_VALUE cannot be "live".

I plan to introduce a MachineVerifier check for this situation in a
follow up.

rdar://59859175

Testing: check-llvm, LNT build with a stage2 compiler & entry values
enabled

Differential Revision: https://reviews.llvm.org/D75548
2020-03-03 13:00:52 -08:00
Fangrui Song
55a56041d1 [MCDwarf] Generate DWARF v5 .debug_rnglists for assembly files
```
// clang -c -gdwarf-5 a.s -o a.o
.section .init; ret
.text; ret
```

.debug_info contains DW_AT_ranges and llvm-dwarfdump will report
a verification error because .debug_rnglists does not exist (not
implemented).

This patch generates .debug_rnglists for assembly files.
emitListsTableHeaderStart() in DwarfDebug.cpp can be shared with
MCDwarf.cpp. Because CodeGen depends on MC, I move the function to
MCDwarf.cpp

Reviewed By: probinson

Differential Revision: https://reviews.llvm.org/D75375
2020-03-03 09:03:34 -08:00
Craig Topper
d8ad7cc088 [DAGCombiner][X86] Improve narrowExtractedVectorLoad to handle cases where the element size isn't byte sized by the subvector is.
Summary:
Follow up from D75377. If the subvector is byte sized and the
index is aligned to the subvector size, we can shrink the load.

Reviewers: spatel, RKSimon

Reviewed By: RKSimon

Subscribers: dbabokin, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D75434
2020-03-03 08:41:31 -08:00
Sam Parker
5618e9be37 [RDA][ARM] collectKilledOperands across multiple blocks
Use MIOperand in collectLocalKilledOperands to make the search
global, as we already have to search for global uses too. This
allows us to delete more dead code when tail predicating.

Differential Revision: https://reviews.llvm.org/D75167
2020-03-03 15:23:05 +00:00
Sam Parker
dfe8f5da4c [ARM][RDA] Allow multiple killed users
In RDA, check against the already decided dead instructions when
looking at users. This allows an instruction to be removed if it
has multiple users, but they're all dead.

This means that IT instructions can be considered killed once all
the itstate using instructions are dead.

Differential Revision: https://reviews.llvm.org/D75245
2020-03-03 15:12:29 +00:00
Clement Courbet
b0ae20d92e [ExpandMemCmp][NFC] Fix typo in comment. 2020-03-03 11:07:13 +01:00
Awanish Pandey
1cb0e01e42 [DebugInfo][DWARF5]: Added support for debuginfo generation for defaulted parameters
This patch adds support for dwarf emission/dumping part of debuginfo
generation for defaulted parameters.

Reviewers: probinson, aprantl, dblaikie

Reviewed By: aprantl, dblaikie

Differential Revision: https://reviews.llvm.org/D73462
2020-03-03 13:09:53 +05:30
Vedant Kumar
d64a22a2ad [LiveDebugValues] Prevent some misuse of LocIndex::fromRawInteger, NFC
Make it a compile-time error to pass an int/unsigned/etc to
fromRawInteger.

Hopefully this prevents errors of the form:

```
for (unsigned ID : getVarLocs()) {
  auto VL = LocMap[LocIndex::fromRawInteger(ID)];
  ...
```
2020-03-02 16:59:09 -08:00
Jordan Rupprecht
d7803c3832 Add default case to fix -Wswitch errors 2020-03-02 14:23:46 -08:00
Craig Topper
adc69729ec [TargetLowering] Fix what look like copy/paste mistakes in compare with infinity handling SimplifySetCC.
I expect that the isCondCodeLegal checks should match that CC of
the node that we're going to create.

Rewriting to a switch to minimize repeated mentions of the same
constants.
2020-03-02 14:12:16 -08:00
Stanislav Mekhanoshin
1bacdcf48d Extend LaneBitmask to 64 bit
This is needed for D74873, AMDGPU going to have 16 bit subregs
and the largest tuple is 32 VGPRs, which results in 64 lanes.

Differential Revision: https://reviews.llvm.org/D75378
2020-03-02 12:10:52 -08:00
Volkan Keles
4167645d1e GlobalISel: Move Localizer::shouldLocalize(..) to TargetLowering
Add a new target hook for shouldLocalize so that
targets can customize the logic.

https://reviews.llvm.org/D75207
2020-03-02 09:15:40 -08:00
Simon Pilgrim
d20fb7ea13 Fix shadow variable warning. NFC. 2020-03-02 11:41:20 +00:00
Simon Pilgrim
e4380b07cc Fix operator precedence warning. NFCI. 2020-03-02 10:56:58 +00:00
Serguei Katkov
496e0a99c7 [InlineSpiller] Relax re-materialization restriction for statepoint
We should be careful to allow count of re-materialization of operands to be less
then number of physical registers.

STATEPOINT instruction has a variable number of operands and potentially very big.
So re-materialization for all operands is disabled at the moment if restrict-statepoint-remat is true.

The patch relaxes the re-materialization restriction for STATEPOINT instruction allowing it for
fixed operands. Specifically it is about call target.

Reviewers: reames
Reviewed By: reames
Subscribers: llvm-commits, qcolombet, hiraditya
Differential Revision: https://reviews.llvm.org/D75335
2020-03-02 11:25:44 +07:00
Craig Topper
0cd6712a7a [DAGCombiner][X86] Disable narrowExtractedVectorLoad if the element type size isn't byte sized
The address calculation for the offset assumes that you can calculate the offset by multiplying the index by the store size of the element. But that only works if the element's store size is exactly its real size since we store vectors tightly packed in memory. There are improvements we could make to this like special casing extracting element 0. I think we could also handle cases where the extracted VT is byte sized and the index is aligned with the extract element count.

Differential Revision: https://reviews.llvm.org/D75377
2020-03-01 18:13:25 -08:00
Craig Topper
b6e2796114 [X86][TwoAddressInstructionPass] Teach tryInstructionCommute to continue checking for commutable FMA operands in more cases.
Previously we would only check for another commutable operand if the first commute was an aggressive commute.

But if we have two kill operands and neither is tied to the def at the start, we should consider both operands as the one to use as the new def.

This improves the loop in the fma-commute-loop.ll test. This test is derived from a post from discourse here https://llvm.discourse.group/t/unnecessary-vmovapd-instructions-generated-can-you-hint-in-favor-of-vfmadd231pd/582

Differential Revision: https://reviews.llvm.org/D75016
2020-03-01 16:38:08 -08:00
Craig Topper
211fb91f10 [DAGCombiner] Don't emit select_cc from visitSINT_TO_FP/visitUINT_TO_FP. Use plain select instead.
Select_cc isn't used by all targets. X86 doesn't have optimizations
for it.

Since we already know the input to the sint_to_fp/uint_to_fp is
a setcc we can just emit a plain select using that setcc as the
condition. Other DAG combines can turn that into a select_cc on
targets that support it.

Differential Revision: https://reviews.llvm.org/D75415
2020-03-01 10:52:17 -08:00
Sanjay Patel
619d7dc39a [DAGCombiner] recognize shuffle (shuffle X, Mask0), Mask --> splat X
We get the simple cases of this via demanded elements and other folds,
but that doesn't work if the values have >1 use, so add a dedicated
match for the pattern.

We already have this transform in IR, but it doesn't help the
motivating x86 tests (based on PR42024) because the shuffles don't
exist until after legalization and other combines have happened.
The AArch64 test shows a minimal IR example of the problem.

Differential Revision: https://reviews.llvm.org/D75348
2020-03-01 09:10:25 -05:00
Simon Pilgrim
d955b221cb [MachineInst] Remove dead code. NFCI.
The MachineFunction MF value is not used any more and is always null.
2020-02-29 19:25:02 +00:00
Simon Pilgrim
6e7a768354 Make argument const to silence cppcheck warning. NFCI. 2020-02-29 19:25:01 +00:00
Fangrui Song
692e0c9648 [MC] Add MCStreamer::emitInt{8,16,32,64}
Similar to AsmPrinter::emitInt{8,16,32,64}.
2020-02-29 09:40:21 -08:00
Vedant Kumar
dd1ea9de2e Reland: [Coverage] Revise format to reduce binary size
Try again with an up-to-date version of D69471 (99317124 was a stale
revision).

---

Revise the coverage mapping format to reduce binary size by:

1. Naming function records and marking them `linkonce_odr`, and
2. Compressing filenames.

This shrinks the size of llc's coverage segment by 82% (334MB -> 62MB)
and speeds up end-to-end single-threaded report generation by 10%. For
reference the compressed name data in llc is 81MB (__llvm_prf_names).

Rationale for changes to the format:

- With the current format, most coverage function records are discarded.
  E.g., more than 97% of the records in llc are *duplicate* placeholders
  for functions visible-but-not-used in TUs. Placeholders *are* used to
  show under-covered functions, but duplicate placeholders waste space.

- We reached general consensus about giving (1) a try at the 2017 code
  coverage BoF [1]. The thinking was that using `linkonce_odr` to merge
  duplicates is simpler than alternatives like teaching build systems
  about a coverage-aware database/module/etc on the side.

- Revising the format is expensive due to the backwards compatibility
  requirement, so we might as well compress filenames while we're at it.
  This shrinks the encoded filenames in llc by 86% (12MB -> 1.6MB).

See CoverageMappingFormat.rst for the details on what exactly has
changed.

Fixes PR34533 [2], hopefully.

[1] http://lists.llvm.org/pipermail/llvm-dev/2017-October/118428.html
[2] https://bugs.llvm.org/show_bug.cgi?id=34533

Differential Revision: https://reviews.llvm.org/D69471
2020-02-28 18:12:04 -08:00
Vedant Kumar
3388871714 Revert "[Coverage] Revise format to reduce binary size"
This reverts commit 99317124e1c772e9a9de41a0cd56e1db049b4ea4. This is
still busted on Windows:

http://lab.llvm.org:8011/builders/lld-x86_64-win7/builds/40873

The llvm-cov tests report 'error: Could not load coverage information'.
2020-02-28 18:03:15 -08:00
Vedant Kumar
99317124e1 [Coverage] Revise format to reduce binary size
Revise the coverage mapping format to reduce binary size by:

1. Naming function records and marking them `linkonce_odr`, and
2. Compressing filenames.

This shrinks the size of llc's coverage segment by 82% (334MB -> 62MB)
and speeds up end-to-end single-threaded report generation by 10%. For
reference the compressed name data in llc is 81MB (__llvm_prf_names).

Rationale for changes to the format:

- With the current format, most coverage function records are discarded.
  E.g., more than 97% of the records in llc are *duplicate* placeholders
  for functions visible-but-not-used in TUs. Placeholders *are* used to
  show under-covered functions, but duplicate placeholders waste space.

- We reached general consensus about giving (1) a try at the 2017 code
  coverage BoF [1]. The thinking was that using `linkonce_odr` to merge
  duplicates is simpler than alternatives like teaching build systems
  about a coverage-aware database/module/etc on the side.

- Revising the format is expensive due to the backwards compatibility
  requirement, so we might as well compress filenames while we're at it.
  This shrinks the encoded filenames in llc by 86% (12MB -> 1.6MB).

See CoverageMappingFormat.rst for the details on what exactly has
changed.

Fixes PR34533 [2], hopefully.

[1] http://lists.llvm.org/pipermail/llvm-dev/2017-October/118428.html
[2] https://bugs.llvm.org/show_bug.cgi?id=34533

Differential Revision: https://reviews.llvm.org/D69471
2020-02-28 17:33:25 -08:00
Vedant Kumar
0368b42295 [entry values] ARM: Add a describeLoadedValue override (PR45025)
As a narrow stopgap for the assertion failure described in PR45025, add
a describeLoadedValue override to ARMBaseInstrInfo and use it to detect
copies in which the forwarding reg is a super/sub reg of the copy
destination. For the moment this is unsupported.

Several follow ups are possible:

1) Handle VORRq. At the moment, we do not, because isCopyInstrImpl
   returns early when !MI.isMoveReg().

2) In the case where forwarding reg is a super-reg of the copy
   destination, we should be able to describe the forwarding reg as a
   subreg within the copy destination. I'm not 100% sure about this, but
   it looks like that's what's done in AArch64InstrInfo.

3) In the case where the forwarding reg is a sub-reg of the copy
   destination, maybe we could describe the forwarding reg using the
   copy destinaion and a DW_OP_LLVM_fragment (I guess this should be
   possible after D75036).

https://bugs.llvm.org/show_bug.cgi?id=45025
rdar://59772698

Differential Revision: https://reviews.llvm.org/D75273
2020-02-28 14:30:40 -08:00
David Green
1de1070559 [DAGCombine] Fix alias analysis for unaligned accesses
The alias analysis in DAG Combine looks at the BaseAlign, the Offset and
the Size of two accesses, and determines if they are known to access
different parts of memory by the fact that they are different offsets
from inside that "alignment window". It does not seem to account for
accesses that are not a multiple of the size, and may overflow from one
alignment window into another.

For example in the test case we have a 19byte memset that is splits into
a 16 byte neon store and an unaligned 4 byte store with a 15 byte
offset. This 15byte offset (with a base align of 8) wraps around to the
next alignment windows. When compared to an access that is a 16byte
offset (of the same 4byte size and 8byte basealign), the two accesses
are said not to alias.

I've fixed this here by just ensuring that the offsets are a multiple of
the size, ensuring that they don't overlap by wrapping. Fixes PR45035,
which was exposed by the UseAA changes in the arm backend.

Differential Revision: https://reviews.llvm.org/D75238
2020-02-28 18:44:36 +00:00
Simon Pilgrim
4bc6f63320 [TargetLowering] SimplifyDemandedBits - fix SCALAR_TO_VECTOR knownbits bug
We can only report the knownbits for a SCALAR_TO_VECTOR node if we only demand the 0'th element - the upper elements are undefined and shouldn't be trusted.

This is causing a number of regressions that need addressing but we need to get the bugfix in first.
2020-02-28 15:23:37 +00:00
Jeremy Morse
6af859dcca [DebugInfo] Re-implement LexicalScopes dominance method, add unit tests
Way back in D24994, the combination of LexicalScopes::dominates and
LiveDebugValues was identified as having worst-case quadratic complexity,
but it wasn't triggered by any code path at the time. I've since run into a
scenario where this occurs, in a very large basic block where large numbers
of inlined DBG_VALUEs are present.

The quadratic-ness comes from LiveDebugValues::join calling "dominates" on
every variable location, and LexicalScopes::dominates potentially touching
every instruction in a block to test for the presence of a scope. We have,
however, already computed the presence of scopes in blocks, in the
"InstrRanges" of each scope. This patch switches the dominates method to
examine whether a block is present in a scope's InsnRanges, avoiding
walking through the whole block.

At the same time, fix getMachineBasicBlocks to account for the fact that
InsnRanges can cover multiple blocks, and add some unit tests, as Lexical
Scopes didn't have any.

Differential revision: https://reviews.llvm.org/D73725
2020-02-28 11:41:28 +00:00
Sam Parker
bf61421a02 [RDA] Track implicit-defs
Ensure that we're recording implicit defs, as well as visiting implicit
uses and implicit defs when we're walking through operands.

Differential Revision: https://reviews.llvm.org/D75185
2020-02-28 11:14:42 +00:00
serge-sans-paille
6d15c4deab No longer generate calls to *_finite
According to Joseph Myers, a libm maintainer

> They were only ever an ABI (selected by use of -ffinite-math-only or
> options implying it, which resulted in the headers using "asm" to redirect
> calls to some libm functions), not an API. The change means that ABI has
> turned into compat symbols (only available for existing binaries, not for
> anything newly linked, not included in static libm at all, not included in
> shared libm for future glibc ports such as RV32), so, yes, in any case
> where tools generate direct calls to those functions (rather than just
> following the "asm" annotations on function declarations in the headers),
> they need to stop doing so.

As a consequence, we should no longer assume these symbols are available on the
target system.

Still keep the TargetLibraryInfo for constant folding.

Differential Revision: https://reviews.llvm.org/D74712
2020-02-28 10:07:37 +01:00
Vedant Kumar
a993720397 [LiveDebugValues] Encode register location within VarLoc IDs [3/3]
This is part 3 of a 3-part series to address a compile-time explosion
issue in LiveDebugValues.

---

Start encoding register locations within VarLoc IDs, and take advantage
of this encoding to speed up transferRegisterDef.

There is no fundamental algorithmic change: this patch simply swaps out
SparseBitVector in favor of CoalescingBitVector. That changes iteration
order (hence the test updates), but otherwise this patch is NFCI.

The only interesting change is in transferRegisterDef. Instead of doing:

```
KillSet = {}
for (ID : OpenRanges.getVarLocs())
  if (DeadRegs.count(ID))
    KillSet.add(ID)
```

We now do:

```
KillSet = {}
for (Reg : DeadRegs)
  for (ID : intervalsReservedForReg(Reg, OpenRanges.getVarLocs()))
    KillSet.add(ID)
```

By not visiting each open location every time we visit an instruction,
this eliminates some potentially quadratic behavior. The new
implementation basically does a constant amount of work per instruction
because the interval map lookups are very fast.

For a file in WebKit, this brings the time spent in LiveDebugValues down
from ~2.5 minutes to 4 seconds, reducing compile time spent in that pass
from 28% of the total to just over 1%.

Before:

```
2.49 min   27.8%	0 s	LiveDebugValues::process
2.41 min   27.0%	5.40 s	LiveDebugValues::transferRegisterDef
1.51 min   16.9%	1.51 min LiveDebugValues::VarLoc::isDescribedByReg() const
32.73 s    6.1%		8.70 s	 llvm::SparseBitVector<128u>::SparseBitVectorIterator::operator++()
```

After:

```
4.53 s	1.1%	0 s	LiveDebugValues::process
3.00 s	0.7%	107.00 ms		LiveDebugValues::transferRegisterCopy
892.00 ms	0.2%	406.00 ms	LiveDebugValues::transferSpillOrRestoreInst
404.00 ms	0.1%	32.00 ms	LiveDebugValues::transferRegisterDef
110.00 ms	0.0%	2.00 ms		  LiveDebugValues::getUsedRegs
57.00 ms	0.0%	1.00 ms		  std::__1::vector<>::push_back
40.00 ms	0.0%	1.00 ms		  llvm::CoalescingBitVector<>::find(unsigned long long)
```

FWIW, I tried the same approach using SparseBitVector, but got bad
results. To do that, I had to extend SparseBitVector to support 64-bit
indices and expose its lower bound operation. The problem with this is
that the performance is very hard to predict: SparseBitVector's lower
bound operation falls back to O(n) linear scans in a std::list if you're
not /very/ careful about managing iteration order. When I profiled this
the performance looked worse than the baseline.

You can see the full CoalescingBitVector-based implementation here:

  https://github.com/vedantk/llvm-project/commits/try-coalescing

You can see the full SparseBitVector-based implementation here:

  https://github.com/vedantk/llvm-project/commits/try-sparsebitvec-find

Depends on D74984 and D74985.

Differential Revision: https://reviews.llvm.org/D74986
2020-02-27 12:39:47 -08:00
Vedant Kumar
210c4853de [LiveDebugValues] Encode a location in VarLoc IDs, NFC [2/3]
This is part 2 of a 3-part series to address a compile-time explosion
issue in LiveDebugValues.

---

Each VarLoc has a unique ID: this ID is used to look up a VarLoc in the
VarLocMap, and to virtually insert a VarLoc into a VarLocSet. Instead of
inserting the VarLoc /itself/ into the VarLocSet, we insert just the ID,
because this can be represented efficiently with a SparseBitVector.

This change introduces LocIndex, a layer of abstraction on top of VarLoc
IDs. Prior to this change, an ID was just an index into a vector. With
this change, an ID encodes both an index /and/ a register location. The
type-checker ensures that conversions to and from LocIndex are correct.

For the moment the register location is always 0 (undef). We have plenty
of bits left over to encode physregs, stack slots, and other locations
in the future.

Differential Revision: https://reviews.llvm.org/D74985
2020-02-27 12:39:47 -08:00
Sanjay Patel
90fd859f51 [x86] use instruction-level fast-math-flags to drive MachineCombiner
The code changes here are hopefully straightforward:

1. Use MachineInstruction flags to decide if FP ops can be reassociated
   (use both "reassoc" and "nsz" to be consistent with IR transforms;
   we probably don't need "nsz", but that's a safer interpretation of
   the FMF).
2. Check that both nodes allow reassociation to change instructions.
   This is a stronger requirement than we've usually implemented in
   IR/DAG, but this is needed to solve the motivating bug (see below),
   and it seems unlikely to impede optimization at this late stage.
3. Intersect/propagate MachineIR flags to enable further reassociation
   in MachineCombiner.

We managed to make MachineCombiner flexible enough that no changes are
needed to that pass itself. So this patch should only affect x86
(assuming no other targets have implemented the hooks using MachineIR
flags yet).

The motivating example in PR43609 is another case of fast-math transforms
interacting badly with special FP ops created during lowering:
https://bugs.llvm.org/show_bug.cgi?id=43609
The special fadd ops used for converting int to FP assume that they will
not be altered, so those are created without FMF.

However, the MachineCombiner pass was being enabled for FP ops using the
global/function-level TargetOption for "UnsafeFPMath". We managed to run
instruction/node-level FMF all the way down to MachineIR sometime in the
last 1-2 years though, so we can do better now.

The test diffs require some explanation:

1. llvm/test/CodeGen/X86/fmf-flags.ll - no target option for unsafe math was
   specified here, so MachineCombiner kicks in where it did not previously;
   to make it behave consistently, we need to specify a CPU schedule model,
   so use the default model, and there are no code diffs.
2. llvm/test/CodeGen/X86/machine-combiner.ll - replace the target option for
   unsafe math with the equivalent IR-level flags, and there are no code diffs;
   we can't remove the NaN/nsz options because those are still used to drive
   x86 fmin/fmax codegen (special SDAG opcodes).
3. llvm/test/CodeGen/X86/pow.ll - similar to #1
4. llvm/test/CodeGen/X86/sqrt-fastmath.ll - similar to #1, but MachineCombiner
   does some reassociation of the estimate sequence ops; presumably these are
   perf wins based on latency/throughput (and we get some reduction of move
   instructions too); I'm not sure how it affects numerical accuracy, but the
   test reflects reality better now because we would expect MachineCombiner to
   be enabled if the IR was generated via something like "-ffast-math" with clang.
5. llvm/test/CodeGen/X86/vec_int_to_fp.ll - this is the test added to model PR43609;
   the fadds are not reassociated now, so we should get the expected results.
6. llvm/test/CodeGen/X86/vector-reduce-fadd-fast.ll - similar to #1
7. llvm/test/CodeGen/X86/vector-reduce-fmul-fast.ll - similar to #1

Differential Revision: https://reviews.llvm.org/D74851
2020-02-27 15:19:37 -05:00
Djordje Todorovic
016d91ccbd [CallSiteInfo] Handle bundles when updating call site info
This will address the issue: P8198 and P8199 (from D73534).

The methods was not handle bundles properly.

Differential Revision: https://reviews.llvm.org/D74904
2020-02-27 13:57:06 +01:00
David Stenberg
6d857166d2 [DebugInfo] Describe call site values for chains of expression producing instrs
Summary:
If the describeLoadedValue() hook produced a DIExpression when
describing a instruction, and it was not possible to emit a call site
entry directly (the value operand was not an immediate nor a preserved
register), then that described value could not be inserted into the
worklist, and would instead be dropped, meaning that the parameter's
call site value couldn't be described.

This patch extends the worklist so that each entry has an DIExpression
that is built up when iterating through the instructions.

This allows us to describe instruction chains like this:

  $reg0 = mv $fp
  $reg0 = add $reg0, offset
  call @call_with_offseted_fp

Since DW_OP_LLVM_entry_value operations can't be combined with any other
expression, such call site entries will not be emitted. I have added a
test, dbgcall-site-expr-entry-value.mir, which verifies that we don't
assert or emit broken DWARF in such cases.

Reviewers: djtodoro, aprantl, vsk

Reviewed By: djtodoro, vsk

Subscribers: hiraditya, llvm-commits

Tags: #debug-info, #llvm

Differential Revision: https://reviews.llvm.org/D75036
2020-02-27 11:18:51 +01:00
David Stenberg
ff574ff291 [DebugInfo][NFC] Move out lambdas from collectCallSiteParameters()
Summary:
This is a preparatory patch for D75036, in which a debug expression is
associated with each parameter register in the worklist. In that patch
the two lambda functions addToWorklist() and finishCallSiteParams() grow
a bit, so move those out to separate functions. This patch also prepares
for each parameter register having their own expression moving the
creation of the DbgValueLoc into finishCallSiteParams().

Reviewers: djtodoro, vsk

Reviewed By: djtodoro, vsk

Subscribers: hiraditya, llvm-commits

Tags: #debug-info, #llvm

Differential Revision: https://reviews.llvm.org/D75050
2020-02-27 11:18:51 +01:00
Matt Arsenault
6fc0d00823 GlobalISel: Fix lowering for G_UADDE/G_USUBE
The type parameter passed into lower is invalid and should be removed
from the function.
2020-02-26 19:10:52 -08:00
Matt Arsenault
c7e8d8b13e GlobalISel: Cleanup code with MachineIRBuilder features 2020-02-26 19:10:34 -08:00
Krzysztof Parzyszek
fd7c2e24c1 [SDAG] Add SDNode::values() = make_range(values_begin(), values_end())
Also use it in a few places to simplify code a little bit.  NFC
2020-02-26 12:07:38 -06:00