1309 Commits

Author SHA1 Message Date
Matt Arsenault
bf0db918b2 R600: Implement getRecipEstimate
This requires a new hook to prevent expanding sqrt in terms
of rsqrt and reciprocal. v_rcp_f32, v_rsq_f32, and v_sqrt_f32 are
all the same rate, so this expansion would just double the number
of instructions and cycles.

llvm-svn: 225828
2015-01-13 20:53:23 +00:00
Olivier Sallenave
325096980b Added TLI hook for isFPExtFree. Some of the FMA combine heuristics are now guarded with that hook.
llvm-svn: 225795
2015-01-13 15:06:36 +00:00
Matt Arsenault
a982e4f82b Combine fcmp + select to fminnum / fmaxnum if no nans and legal
Also require unsafe FP math for no since there isn't a way to
test for signed zeros.

llvm-svn: 225744
2015-01-13 00:43:00 +00:00
Hal Finkel
0ce7f372e5 [DAGCombine] Remainder of fix to r225380 (More FMA folding opportunities)
As pointed out by Aditya (and Owen), when we elide an FP extend to form an FMA,
we need to extend the incoming operands so that the resulting node will really
be legal. This is currently enabled only for PowerPC, and it happens to work
there regardless, but this should fix the functionality for everyone else
should anyone else wish to use it.

llvm-svn: 225492
2015-01-09 01:29:29 +00:00
Hal Finkel
33ead6f901 Partial fix to r225380 (More FMA folding opportunities)
As pointed out by Aditya (and Owen), there are two things wrong with this code.
First, it adds patterns which elide FP extends when forming FMAs, and that might
not be profitable on all targets (it belongs behind the pre-existing
aggressive-FMA-formation flag). This is fixed by this change.

Second, the resulting nodes might have operands of different types (the
extensions need to be re-added). That will be fixed in the follow-up commit.

llvm-svn: 225485
2015-01-09 00:45:54 +00:00
Ahmed Bougacha
2b6917b020 [SelectionDAG] Allow targets to specify legality of extloads' result
type (in addition to the memory type).

The *LoadExt* legalization handling used to only have one type, the
memory type.  This forced users to assume that as long as the extload
for the memory type was declared legal, and the result type was legal,
the whole extload was legal.

However, this isn't always the case.  For instance, on X86, with AVX,
this is legal:
    v4i32 load, zext from v4i8
but this isn't:
    v4i64 load, zext from v4i8
Whereas v4i64 is (arguably) legal, even without AVX2.

Note that the same thing was done a while ago for truncstores (r46140),
but I assume no one needed it yet for extloads, so here we go.

Calls to getLoadExtAction were changed to add the value type, found
manually in the surrounding code.

Calls to setLoadExtAction were mechanically changed, by wrapping the
call in a loop, to match previous behavior.  The loop iterates over
the MVT subrange corresponding to the memory type (FP vectors, etc...).
I also pulled neighboring setTruncStoreActions into some of the loops;
those shouldn't make a difference, as the additional types are illegal.
(e.g., i128->i1 truncstores on PPC.)

No functional change intended.

Differential Revision: http://reviews.llvm.org/D6532

llvm-svn: 225421
2015-01-08 00:51:32 +00:00
Olivier Sallenave
0451532996 More FMA folding opportunities.
llvm-svn: 225380
2015-01-07 20:54:17 +00:00
Olivier Sallenave
e64ad7cedd Test commit
llvm-svn: 225368
2015-01-07 19:45:17 +00:00
Craig Topper
d3c02f177a Replace several 'assert(false' with 'llvm_unreachable' or fold a condition into the assert.
llvm-svn: 225160
2015-01-05 10:15:49 +00:00
Alexey Samsonov
553185ee4b Revert "merge consecutive stores of extracted vector elements"
This reverts commit r224611. This change causes crashes
in X86 DAG->DAG Instruction Selection.

llvm-svn: 225031
2014-12-31 00:40:28 +00:00
Mehdi Amini
d38920891e Always assert in DAGCombine and not only when -debug is enabled
Right now in DAG Combine check the validity of the returned type 
only when -debug is given on the command line. However usually 
the test cases in the validation does not use -debug. 
An Assert build should always check this.

llvm-svn: 224779
2014-12-23 18:59:02 +00:00
Michael Kuperstein
f4536ea6e8 [DagCombine] Improve DAGCombiner BUILD_VECTOR when it has two sources of elements
This partially fixes PR21943.

For AVX, we go from:

vmovq   (%rsi), %xmm0
vmovq   (%rdi), %xmm1
vpermilps       $-27, %xmm1, %xmm2 ## xmm2 = xmm1[1,1,2,3]
vinsertps       $16, %xmm2, %xmm1, %xmm1 ## xmm1 = xmm1[0],xmm2[0],xmm1[2,3]
vinsertps       $32, %xmm0, %xmm1, %xmm1 ## xmm1 = xmm1[0,1],xmm0[0],xmm1[3]
vpermilps       $-27, %xmm0, %xmm0 ## xmm0 = xmm0[1,1,2,3]
vinsertps       $48, %xmm0, %xmm1, %xmm0 ## xmm0 = xmm1[0,1,2],xmm0[0]

To the expected:

vmovq   (%rdi), %xmm0
vmovhpd (%rsi), %xmm0, %xmm0
retq

Fixing this for AVX2 is still open.

Differential Revision: http://reviews.llvm.org/D6749

llvm-svn: 224759
2014-12-23 08:59:45 +00:00
Sanjay Patel
0428a5786e merge consecutive stores of extracted vector elements
Add a path to DAGCombiner::MergeConsecutiveStores() 
to combine multiple scalar stores when the store operands
are extracted vector elements. This is a partial fix for
PR21711 ( http://llvm.org/bugs/show_bug.cgi?id=21711 ).

For the new test case, codegen improves from:

   vmovss  %xmm0, (%rdi)
   vextractps      $1, %xmm0, 4(%rdi)
   vextractps      $2, %xmm0, 8(%rdi)
   vextractps      $3, %xmm0, 12(%rdi)
   vextractf128    $1, %ymm0, %xmm0
   vmovss  %xmm0, 16(%rdi)
   vextractps      $1, %xmm0, 20(%rdi)
   vextractps      $2, %xmm0, 24(%rdi)
   vextractps      $3, %xmm0, 28(%rdi)
   vzeroupper
   retq

To:

   vmovups	%ymm0, (%rdi)
   vzeroupper
   retq

Patch reviewed by Nadav Rotem.

Differential Revision: http://reviews.llvm.org/D6698

llvm-svn: 224611
2014-12-19 20:23:41 +00:00
Michael Kuperstein
047b1a0400 [DAGCombine] Slightly improve lowering of BUILD_VECTOR into a shuffle.
This handles the case of a BUILD_VECTOR being constructed out of elements extracted from a vector twice the size of the result vector. Previously this was always scalarized. Now, we try to construct a shuffle node that feeds on extract_subvectors.

This fixes PR15872 and provides a partial fix for PR21711.

Differential Revision: http://reviews.llvm.org/D6678

llvm-svn: 224429
2014-12-17 12:32:17 +00:00
Matt Arsenault
810cb62962 Add target hook for whether it is profitable to reduce load widths
Add an option to disable optimization to shrink truncated larger type
loads to smaller type loads. On SI this prevents using scalar load
instructions in some cases, since there are no scalar extloads.

llvm-svn: 224084
2014-12-12 00:00:24 +00:00
Owen Anderson
558012a3fc Fix a few instances found in SelectionDAG where we were not handling F16 at parity with F32 and F64.
llvm-svn: 223760
2014-12-09 06:50:39 +00:00
Simon Pilgrim
be24ab367b [InstCombine] Minor optimization for bswap with binary ops
Added instcombine optimizations for BSWAP with AND/OR/XOR ops:

OP( BSWAP(x), BSWAP(y) ) -> BSWAP( OP(x, y) )
OP( BSWAP(x), CONSTANT ) -> BSWAP( OP(x, BSWAP(CONSTANT) ) )

Since its just a one liner, I've also added BSWAP to the DAGCombiner equivalent as well:

fold (OP (bswap x), (bswap y)) -> (bswap (OP x, y))

Refactored bswap-fold tests to use FileCheck instead of just checking that the bswaps had gone.

Differential Revision: http://reviews.llvm.org/D6407

llvm-svn: 223349
2014-12-04 09:44:01 +00:00
Elena Demikhovsky
f1de34b84d Masked Load / Store Intrinsics - the CodeGen part.
I'm recommiting the codegen part of the patch.
The vectorizer part will be send to review again.

Masked Vector Load and Store Intrinsics.
Introduced new target-independent intrinsics in order to support masked vector loads and stores. The loop vectorizer optimizes loops containing conditional memory accesses by generating these intrinsics for existing targets AVX2 and AVX-512. The vectorizer asks the target about availability of masked vector loads and stores.
Added SDNodes for masked operations and lowering patterns for X86 code generator.
Examples:
<16 x i32> @llvm.masked.load.v16i32(i8* %addr, <16 x i32> %passthru, i32 4 /* align */, <16 x i1> %mask)
declare void @llvm.masked.store.v8f64(i8* %addr, <8 x double> %value, i32 4, <8 x i1> %mask)

Scalarizer for other targets (not AVX2/AVX-512) will be done in a separate patch.

http://reviews.llvm.org/D6191

llvm-svn: 223348
2014-12-04 09:40:44 +00:00
Duncan P. N. Exon Smith
9bc81fbe92 Revert "Masked Vector Load and Store Intrinsics."
This reverts commit r222632 (and follow-up r222636), which caused a host
of LNT failures on an internal bot.  I'll respond to the commit on the
list with a reproduction of one of the failures.

Conflicts:
	lib/Target/X86/X86TargetTransformInfo.cpp

llvm-svn: 222936
2014-11-28 21:29:14 +00:00
Elena Demikhovsky
9e5089a938 Masked Vector Load and Store Intrinsics.
Introduced new target-independent intrinsics in order to support masked vector loads and stores. The loop vectorizer optimizes loops containing conditional memory accesses by generating these intrinsics for existing targets AVX2 and AVX-512. The vectorizer asks the target about availability of masked vector loads and stores.
Added SDNodes for masked operations and lowering patterns for X86 code generator.
Examples:
<16 x i32> @llvm.masked.load.v16i32(i8* %addr, <16 x i32> %passthru, i32 4 /* align */, <16 x i1> %mask)
declare void @llvm.masked.store.v8f64(i8* %addr, <8 x double> %value, i32 4, <8 x i1> %mask)

Scalarizer for other targets (not AVX2/AVX-512) will be done in a separate patch.

http://reviews.llvm.org/D6191

llvm-svn: 222632
2014-11-23 08:07:43 +00:00
Andrea Di Biagio
0225b5bf6f [DAG] Teach how to turn a build_vector into a shuffle if some of the operands are zero.
Before this patch, the DAGCombiner only tried to convert build_vector dag nodes
into shuffles if all operands were either extract_vector_elt or undef.

This patch improves that logic and teaches the DAGCombiner how to deal with
build_vector dag nodes where one or more operands are zero. A build_vector
dag node with some zero operands is turned into a shuffle only if the resulting
shuffle mask is legal for the target.

llvm-svn: 222536
2014-11-21 14:32:06 +00:00
Andrea Di Biagio
26e8f4d166 [DAG] Refactor the shuffle combining logic in DAGCombiner. NFC.
This patch simplifies the logic that combines a pair of shuffle nodes into
a single shuffle if there is a legal mask. Also added comments to better
describe the algorithm. No functional change intended.

llvm-svn: 222522
2014-11-21 11:33:07 +00:00
Hao Liu
44e5d7a131 DAGCombiner: Allow the DAGCombiner to combine multiple FDIVs with the same divisor info FMULs by the reciprocal.
E.g., ( a / D; b / D ) -> ( recip = 1.0 / D; a * recip; b * recip)

A hook is added to allow the target to control whether it needs to do such combine.

Reviewed in http://reviews.llvm.org/D6334

llvm-svn: 222510
2014-11-21 06:39:58 +00:00
David Blaikie
70573dcd9f Update SetVector to rely on the underlying set's insert to return a pair<iterator, bool>
This is to be consistent with StringSet and ultimately with the standard
library's associative container insert function.

This lead to updating SmallSet::insert to return pair<iterator, bool>,
and then to update SmallPtrSet::insert to return pair<iterator, bool>,
and then to update all the existing users of those functions...

llvm-svn: 222334
2014-11-19 07:49:26 +00:00
Oliver Stannard
d29db9b949 Fix optimisations of SELECT_CC which assumed result is boolean
Some optimisations in DAGCombiner cause miscompilations for targets that use
TargetLowering::UndefinedBooleanContent, because they assume that the results
of a SELECT_CC node are boolean values, and can be safely ANDed, ORed and
XORed. These optimisations are only valid for targets that use
ZeroOrOneBooleanContent or ZeroOrNegativeOneBooleanContent.

This is a follow-up to D6210/r221693.

llvm-svn: 222123
2014-11-17 10:49:31 +00:00
Andrea Di Biagio
e13a0b81f4 [DAG] Improved target independent vector shuffle folding logic.
This patch teaches the DAGCombiner how to combine shuffles according to rules:
   shuffle(shuffle(A, Undef, M0), B, M1) -> shuffle(B, A, M2)
   shuffle(shuffle(A, B, M0), B, M1) -> shuffle(B, A, M2)
   shuffle(shuffle(A, B, M0), A, M1) -> shuffle(B, A, M2)

llvm-svn: 222090
2014-11-15 22:56:25 +00:00
Oliver Stannard
8c2c67e63c LLVM incorrectly folds xor into select
LLVM replaces the SelectionDAG pattern (xor (set_cc cc x y) 1) with
(set_cc !cc x y), which is only correct when the xor has type i1.
Instead, we should check that the constant operand to the xor is all
ones.

llvm-svn: 221693
2014-11-11 17:36:01 +00:00
Andrea Di Biagio
ce46b97b48 [X86] Teach method 'isVectorClearMaskLegal' how to check for legal blend masks.
This patch improves the folding of vector AND nodes into blend operations for
targets that feature SSE4.1. A vector AND node where one of the operands is
a constant build_vector with elements that are either zero or all-ones can be
converted into a blend.

This allows for example to simplify the following code:

define <4 x i32> @test(<4 x i32> %A, <4 x i32> %B) {
  %1 = and <4 x i32> %A, <i32 0, i32 0, i32 0, i32 -1>
  %2 = and <4 x i32> %B, <i32 -1, i32 -1, i32 -1, i32 0>
  %3 = or <4 x i32> %1, %2
  ret <4 x i32> %3
}

Before this patch llc (-mcpu=corei7) generated:
        andps  LCPI1_0(%rip), %xmm0, %xmm0
        andps  LCPI1_1(%rip), %xmm1, %xmm1
        orps   %xmm1, %xmm0, %xmm0
        retq

With this patch we generate a single 'vpblendw'.

llvm-svn: 221343
2014-11-05 13:04:14 +00:00
Paul Robinson
ad06e430ce Normally an 'optnone' function goes through fast-isel, which does not
call DAGCombiner. But we ran into a case (on Windows) where the
calling convention causes argument lowering to bail out of fast-isel,
and we end up in CodeGenAndEmitDAG() which does run DAGCombiner.
So, we need to make DAGCombiner check for 'optnone' after all.

Commit includes the test that found this, plus another one that got
missed in the original optnone work.

llvm-svn: 221168
2014-11-03 18:19:26 +00:00
Louis Gerbarg
e8f9c78247 Fix incorrect invariant check in DAG Combine
Earlier this summer I fixed an issue where we were incorrectly combining
multiple loads that had different constraints such alignment, invariance,
temporality, etc. Apparently in one case I made copt paste error and swapped
alignment and invariance.

Tests included.

rdar://18816719

llvm-svn: 220933
2014-10-30 22:21:03 +00:00
NAKAMURA Takumi
f51a34ec1f Whitespace.
llvm-svn: 220857
2014-10-29 15:23:11 +00:00
Sanjay Patel
957efc23bb Use rsqrt (X86) to speed up reciprocal square root calcs
This is a first step for generating SSE rsqrt instructions for
reciprocal square root calcs when fast-math is allowed.

For now, be conservative and only enable this for AMD btver2
where performance improves significantly - for example, 29%
on llvm/projects/test-suite/SingleSource/Benchmarks/BenchmarkGame/n-body.c
(if we convert the data type to single-precision float).

This patch adds a two constant version of the Newton-Raphson
refinement algorithm to DAGCombiner that can be selected by any target
via a parameter returned by getRsqrtEstimate()..

See PR20900 for more details:
http://llvm.org/bugs/show_bug.cgi?id=20900

Differential Revision: http://reviews.llvm.org/D5658

llvm-svn: 220570
2014-10-24 17:02:16 +00:00
Benjamin Kramer
7ad22403fb Strength reduce constant-sized vectors into arrays. No functionality change.
llvm-svn: 220412
2014-10-22 19:55:26 +00:00
Matt Arsenault
7c93690be0 Add minnum / maxnum codegen
llvm-svn: 220342
2014-10-21 23:01:01 +00:00
Jan Vesely
af62cf4db0 SelectionDAG: Add sext_inreg optimizations
v2: use dyn_cast
    fixup comments
v3: use cast

Reviewed-by: Matt Arsenault <arsenm2@gmail.com>
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 220044
2014-10-17 14:45:25 +00:00
Sanjay Patel
3d497cd778 Improve sqrt estimate algorithm (fast-math)
This patch changes the fast-math implementation for calculating sqrt(x) from:
y = 1 / (1 / sqrt(x))
to:
y = x * (1 / sqrt(x))

This has 2 benefits: less code / faster code and one less estimate instruction 
that may lose precision.

The only target that will be affected (until http://reviews.llvm.org/D5658 is approved)
is PPC. The difference in codegen for PPC is 2 less flops for a single-precision sqrtf
or vector sqrtf and 4 less flops for a double-precision sqrt. 
We also eliminate a constant load and extra register usage.

Differential Revision: http://reviews.llvm.org/D5682

llvm-svn: 219445
2014-10-09 21:26:35 +00:00
Eric Christopher
40cba91ad1 Remove unnecessary include.
llvm-svn: 219368
2014-10-08 23:38:40 +00:00
Eric Christopher
f55d4714d2 Use both the cached TLI and the subtarget off of the DAG in
the DAG combiner.

llvm-svn: 219367
2014-10-08 23:38:39 +00:00
Hal Finkel
9808595319 [DAGCombine] Remove SIGN_EXTEND-related inf-loop
The patch's author points out that, despite the function's documentation,
getSetCCResultType is only used to get the SETCC result type (with one
here-removed problematic exception). In one case, getSetCCResultType was being
used to get the predicate type to use for a SELECT node, and then
SIGN_EXTENDing (or truncating) to get the input predicate to match that type.
Unfortunately, this was happening inside visitSIGN_EXTEND, and creating new
SIGN_EXTEND nodes was causing an infinite loop. In addition, this behavior was
wrong if a target was not using ZeroOrNegativeOneBooleanContent. Lastly, the
extension/truncation seems unnecessary here: SELECT is defined as:

  Select(COND, TRUEVAL, FALSEVAL). If the type of the boolean COND is not i1
  then the high bits must conform to getBooleanContents.

So here we remove this use of getSetCCResultType and update
getSetCCResultType's documentation to reflect its actual uses.

Patch by deadal nix!

llvm-svn: 219141
2014-10-06 20:19:47 +00:00
Sanjay Patel
7bc9185ab5 Fast-math fold: x / (y * sqrt(z)) -> x * (rsqrt(z) / y)
The motivation is to recognize code such as this from /llvm/projects/test-suite/SingleSource/Benchmarks/BenchmarkGame/n-body.c:

float distance = sqrt(dx * dx + dy * dy + dz * dz);
float mag = dt / (distance * distance * distance);

Without this patch, we don't match the sqrt as a reciprocal sqrt, so for PPC the new testcase in this patch produces:

   addis 3, 2, .LCPI4_2@toc@ha
   lfs 4, .LCPI4_2@toc@l(3)
   addis 3, 2, .LCPI4_1@toc@ha
   lfs 0, .LCPI4_1@toc@l(3)
   fcmpu 0, 1, 4
   beq 0, .LBB4_2
# BB#1:
   frsqrtes 4, 1
   addis 3, 2, .LCPI4_0@toc@ha
   lfs 5, .LCPI4_0@toc@l(3)
   fnmsubs 13, 1, 5, 1
   fmuls 6, 4, 4
   fmadds 1, 13, 6, 5
   fmuls 1, 4, 1
   fres 4, 1                <--- reciprocal of reciprocal square root
   fnmsubs 1, 1, 4, 0
   fmadds 4, 4, 1, 4
.LBB4_2:
   fmuls 1, 4, 2
   fres 2, 1
   fnmsubs 0, 1, 2, 0
   fmadds 0, 2, 0, 2
   fmuls 1, 3, 0
   blr

After the patch, this simplifies to:

frsqrtes 0, 1
addis 3, 2, .LCPI4_1@toc@ha
fres 5, 2
lfs 4, .LCPI4_1@toc@l(3)
addis 3, 2, .LCPI4_0@toc@ha
lfs 7, .LCPI4_0@toc@l(3)
fnmsubs 13, 1, 4, 1
fmuls 6, 0, 0
fnmsubs 2, 2, 5, 7
fmadds 1, 13, 6, 4
fmadds 2, 5, 2, 5
fmuls 0, 0, 1
fmuls 0, 0, 2
fmuls 1, 3, 0
blr

Differential Revision: http://reviews.llvm.org/D5628

llvm-svn: 219139
2014-10-06 19:31:18 +00:00
Chandler Carruth
daa1ff985c [x86, dag] Teach the DAG combiner to prune inputs toa vector_shuffle
that are unused.

This allows the combiner to delete math feeding shuffles where the math
isn't actually necessary. This improves some of the vperm2x128 tests
that regressed when the vector shuffle lowering started actually
generating vperm instructions rather than forcibly decomposing them.

Sadly, this isn't enough to get this *really* right because we still
form a completely unnecessary permutation. To fix that, we also need to
fold shuffles which just rearrange concatenated or inserted subvectors.

llvm-svn: 219086
2014-10-05 19:14:34 +00:00
Sanjay Patel
ab7f460bca Use the target-specified iteration count to opt out of any further refinement of an estimate. NFC.
llvm-svn: 218700
2014-09-30 20:44:23 +00:00
Sanjay Patel
8fde95cb2b Split the estimate() interface into separate functions for each type. NFC.
It was hacky to use an opcode as a switch because it won't always match
(rsqrte != sqrte), and it looks like we'll need to add more special casing
per arch than I had hoped for. Eg, x86 will prefer a different NR estimate
implementation. ARM will want to use it's 'step' instructions. There also
don't appear to be any new estimate instructions in any arch in a long,
long time. Altivec vloge and vexpte may have been the first and last in
that field...

llvm-svn: 218698
2014-09-30 20:28:48 +00:00
Andrea Di Biagio
c7c524129b [DAG] Check in advance if a build_vector has a legal type before attempting to convert it into a shuffle.
Currently, the DAG Combiner only tries to convert type-legal build_vector nodes
into shuffles. This patch simply moves the logic that checks if a
build_vector has a legal value type up before we even start analyzing the
operands. This allows to early exit immediately from method
'visitBUILD_VECTOR' if the node type is known to be illegal.

No functional change intended.

llvm-svn: 218677
2014-09-30 15:30:22 +00:00
James Molloy
463db9a77c [AArch64] Redundant store instructions should be removed as dead code
If there is a store followed by a store with the same value to the same location, then the store is dead/noop. It can be removed.

This problem is found in spec2006-197.parser.

For example,
  stur    w10, [x11, #-4]
  stur    w10, [x11, #-4]
Then one of the two stur instructions can be removed.

Patch by David Xu!

llvm-svn: 218569
2014-09-27 17:02:54 +00:00
Sanjay Patel
bdf1e38856 Refactor reciprocal and reciprocal square root estimate into target-independent functions (part 2).
This is purely refactoring. No functional changes intended. PowerPC is the only target
that is currently using this interface.

The ultimate goal is to allow targets other than PowerPC (certainly X86 and Aarch64) to turn this:

z = y / sqrt(x)

into:

z = y * rsqrte(x)

And:

z = y / x

into:

z = y * rcpe(x)

using whatever HW magic they can use. See http://llvm.org/bugs/show_bug.cgi?id=20900 .

There is one hook in TargetLowering to get the target-specific opcode for an estimate instruction
along with the number of refinement steps needed to make the estimate usable.

Differential Revision: http://reviews.llvm.org/D5484

llvm-svn: 218553
2014-09-26 23:01:47 +00:00
David Xu
418da223dd Revert patch ofr218493
llvm-svn: 218494
2014-09-26 02:28:03 +00:00
David Xu
64f661ee0b Redundant store instructions should be removed as dead code
llvm-svn: 218493
2014-09-26 02:02:09 +00:00
Sanjay Patel
6a42292795 Use SDValue bool operator to reduce code. No functional change.
llvm-svn: 218314
2014-09-23 16:24:20 +00:00
Sanjay Patel
b67bd262ea Refactor reciprocal square root estimate into target-independent function; NFC.
This is purely a plumbing patch. No functional changes intended.

The ultimate goal is to allow targets other than PowerPC (certainly X86 and Aarch64) to turn this:

z = y / sqrt(x)

into:

z = y * rsqrte(x)

using whatever HW magic they can use. See http://llvm.org/bugs/show_bug.cgi?id=20900 .

The first step is to add a target hook for RSQRTE, take the already target-independent code selfishly hoarded by PPC, and put it into DAGCombiner.

Next steps:

    The code in DAGCombiner::BuildRSQRTE() should be refactored further; tests that exercise that logic need to be added.
    Logic in PPCTargetLowering::BuildRSQRTE() should be hoisted into DAGCombiner.
    X86 and AArch64 overrides for TargetLowering.BuildRSQRTE() should be added.

Differential Revision: http://reviews.llvm.org/D5425

llvm-svn: 218219
2014-09-21 15:19:15 +00:00