156 Commits

Author SHA1 Message Date
Chris Lattner
2a099c04c1 Pull some code out into a helper function.
Effeciently codegen even splats in the range [-32,30].

This allows us to codegen <30,30,30,30> as:

        vspltisw v0, 15
        vadduwm v2, v0, v0

instead of as a cp load.

llvm-svn: 27750
2006-04-17 06:00:21 +00:00
Chris Lattner
071ad01ceb Implement a TODO: for any shuffle that can be viewed as a v4[if]32 shuffle,
if it can be implemented in 3 or fewer discrete altivec instructions, codegen
it as such.  This implements Regression/CodeGen/PowerPC/vec_perf_shuffle.ll

llvm-svn: 27748
2006-04-17 05:28:54 +00:00
Chris Lattner
06a21ba96b Implement a TODO: have the legalizer canonicalize a bunch of operations to
one type (v4i32) so that we don't have to write patterns for each type, and
so that more CSE opportunities are exposed.

llvm-svn: 27731
2006-04-16 01:37:57 +00:00
Chris Lattner
fa5aa396c2 Make the BUILD_VECTOR lowering code much more aggressive w.r.t constant vectors.
Remove some done items from the todo list.

llvm-svn: 27729
2006-04-16 01:01:29 +00:00
Chris Lattner
24acbe46c0 Fix a crash when faced with a shuffle vector that has an undef in its mask.
llvm-svn: 27726
2006-04-15 23:48:05 +00:00
Chris Lattner
559c8ba466 Allow undef in a shuffle mask
llvm-svn: 27714
2006-04-14 23:19:08 +00:00
Chris Lattner
4211ca9108 Move the rest of the PPCTargetLowering::LowerOperation cases out into
separate functions, for simplicity and code clarity.

llvm-svn: 27693
2006-04-14 06:01:58 +00:00
Chris Lattner
19e9055eb5 Pull the VECTOR_SHUFFLE and BUILD_VECTOR lowering code out into separate
functions, which makes the code much cleaner :)

llvm-svn: 27692
2006-04-14 05:19:18 +00:00
Chris Lattner
883fb053bd Force non-darwin targets to use a static relo model. This fixes PR734,
tested by CodeGen/Generic/vector.ll

llvm-svn: 27657
2006-04-13 17:10:48 +00:00
Chris Lattner
147e50e1c5 Add a new way to match vector constants, which make it easier to bang bits of
different types.

Codegen spltw(0x7FFFFFFF) and spltw(0x80000000) without a constant pool load,
implementing PowerPC/vec_constants.ll:test1.  This compiles:

typedef float vf __attribute__ ((vector_size (16)));
typedef int vi __attribute__ ((vector_size (16)));
void test(vi *P1, vi *P2, vf *P3) {
  *P1 &= (vi){0x80000000,0x80000000,0x80000000,0x80000000};
  *P2 &= (vi){0x7FFFFFFF,0x7FFFFFFF,0x7FFFFFFF,0x7FFFFFFF};
  *P3 = vec_abs((vector float)*P3);
}

to:

_test:
        mfspr r2, 256
        oris r6, r2, 49152
        mtspr 256, r6
        vspltisw v0, -1
        vslw v0, v0, v0
        lvx v1, 0, r3
        vand v1, v1, v0
        stvx v1, 0, r3
        lvx v1, 0, r4
        vandc v1, v1, v0
        stvx v1, 0, r4
        lvx v1, 0, r5
        vandc v0, v1, v0
        stvx v0, 0, r5
        mtspr 256, r2
        blr

instead of (with two constant pool entries):

_test:
        mfspr r2, 256
        oris r6, r2, 49152
        mtspr 256, r6
        li r6, lo16(LCPI1_0)
        lis r7, ha16(LCPI1_0)
        li r8, lo16(LCPI1_1)
        lis r9, ha16(LCPI1_1)
        lvx v0, r7, r6
        lvx v1, 0, r3
        vand v0, v1, v0
        stvx v0, 0, r3
        lvx v0, r9, r8
        lvx v1, 0, r4
        vand v1, v1, v0
        stvx v1, 0, r4
        lvx v1, 0, r5
        vand v0, v1, v0
        stvx v0, 0, r5
        mtspr 256, r2
        blr

GCC produces (with 2 cp entries):

_test:
        mfspr r0,256
        stw r0,-4(r1)
        oris r0,r0,0xc00c
        mtspr 256,r0
        lis r2,ha16(LC0)
        lis r9,ha16(LC1)
        la r2,lo16(LC0)(r2)
        lvx v0,0,r3
        lvx v1,0,r5
        la r9,lo16(LC1)(r9)
        lwz r12,-4(r1)
        lvx v12,0,r2
        lvx v13,0,r9
        vand v0,v0,v12
        stvx v0,0,r3
        vspltisw v0,-1
        vslw v12,v0,v0
        vandc v1,v1,v12
        stvx v1,0,r5
        lvx v0,0,r4
        vand v0,v0,v13
        stvx v0,0,r4
        mtspr 256,r12
        blr

llvm-svn: 27624
2006-04-12 19:07:14 +00:00
Chris Lattner
74cf9ff761 Rename get_VSPLI_elt -> get_VSPLTI_elt
Canonicalize BUILD_VECTOR's that match VSPLTI's into a single type for each
form, eliminating a bunch of Pat patterns in the .td file and allowing us to
CSE stuff more aggressively.  This implements
PowerPC/buildvec_canonicalize.ll:VSPLTI

llvm-svn: 27614
2006-04-12 17:37:20 +00:00
Chris Lattner
e318a7574e Ensure that zero vectors are always v4i32, which forces them to CSE with
each other.  This implements CodeGen/PowerPC/vxor-canonicalize.ll

llvm-svn: 27609
2006-04-12 16:53:28 +00:00
Chris Lattner
e4db08a2f1 Vector function results go into V2 according to GCC. The darwin ABI doc
doesn't say where they go :-/

llvm-svn: 27579
2006-04-11 01:38:39 +00:00
Chris Lattner
92533cfb4a Move some return-handling code from lowerarguments to the ISD::RET handling stuff.
No functionality change.

llvm-svn: 27577
2006-04-11 01:21:43 +00:00
Chris Lattner
3a68f3c3ca properly mark vector selects as expanded to select_cc
llvm-svn: 27544
2006-04-08 22:59:15 +00:00
Chris Lattner
0a3d1bbca4 Add VRRC select support
llvm-svn: 27543
2006-04-08 22:45:08 +00:00
Chris Lattner
d9e80f4516 Implement PowerPC/CodeGen/vec_splat.ll:spltish to use vsplish instead of a
constant pool load.

llvm-svn: 27538
2006-04-08 07:14:26 +00:00
Chris Lattner
d71a1f946d Change the interface to the predicate that determines if vsplti* can be used.
No functionality changes.

llvm-svn: 27536
2006-04-08 06:46:53 +00:00
Chris Lattner
466841ddc7 Make sure to return the result in the right type.
llvm-svn: 27469
2006-04-06 23:12:19 +00:00
Chris Lattner
a4bbfaed5c Match vpku[hw]um(x,x).
Convert vsldoi(x,x) to work the same way other (x,x) cases work.

llvm-svn: 27467
2006-04-06 22:28:36 +00:00
Chris Lattner
f38e033270 Add support for matching vmrg(x,x) patterns
llvm-svn: 27463
2006-04-06 22:02:42 +00:00
Chris Lattner
d1dcb52093 Pattern match vmrg* instructions, which are now lowered by the CFE into shuffles.
llvm-svn: 27457
2006-04-06 21:11:54 +00:00
Chris Lattner
1d33819194 Support pattern matching vsldoi(x,y) and vsldoi(x,x), which allows the f.e. to
lower it and LLVM to have one fewer intrinsic.  This implements
CodeGen/PowerPC/vec_shuffle.ll

llvm-svn: 27450
2006-04-06 18:26:28 +00:00
Chris Lattner
e8b83b4206 Compile the vpkuhum/vpkuwum intrinsics into vpkuhum/vpkuwum instead of into
vperm with a perm mask lvx'd from the constant pool.

llvm-svn: 27448
2006-04-06 17:23:16 +00:00
Chris Lattner
39cc717c65 Fix CodeGen/PowerPC/2006-04-05-splat-ish.ll
llvm-svn: 27439
2006-04-05 17:39:25 +00:00
Evan Cheng
2cf4232ced Fallthrough to expand if a VECTOR_SHUFFLE cannot be custom lowered.
llvm-svn: 27433
2006-04-05 06:09:26 +00:00
Chris Lattner
4a744e5c9d Fix some broken logic that would cause us to codegen {2147483647,2147483647,2147483647,2147483647} as 'vspltisb v0, -1'.
llvm-svn: 27413
2006-04-04 22:28:35 +00:00
Chris Lattner
95c7adc7cb Ask legalize to promote all vector shuffles to be v16i8 instead of having to
handle all 4 PPC vector types.   This simplifies the matching code and allows
us to eliminate a bunch of patterns.  This also adds cases we were missing,
such as CodeGen/PowerPC/vec_splat.ll:splat_h.

llvm-svn: 27400
2006-04-04 17:25:31 +00:00
Chris Lattner
447a7968af Revert accidentally committed hunks.
llvm-svn: 27386
2006-04-03 23:58:04 +00:00
Chris Lattner
533aed9a35 Make sure to mark unsupported SCALAR_TO_VECTOR operations as expand.
llvm-svn: 27385
2006-04-03 23:55:43 +00:00
Chris Lattner
c5287c0ece Inform the dag combiner that the predicate compares only return a low bit.
llvm-svn: 27359
2006-04-02 06:26:07 +00:00
Chris Lattner
9b2d6e7886 Custom lower all BUILD_VECTOR's so that we can compile vec_splat_u8(8) into
"vspltisb v0, 8" instead of a constant pool load.

llvm-svn: 27335
2006-04-02 00:43:36 +00:00
Chris Lattner
baa73e0d91 Rearrange code a bit
llvm-svn: 27306
2006-03-31 19:52:36 +00:00
Chris Lattner
754b41c84b Add, sub and shuffle are legal for all vector types
llvm-svn: 27305
2006-03-31 19:48:58 +00:00
Chris Lattner
829a061abf note to self: *save* file, then check it in
llvm-svn: 27291
2006-03-31 06:04:53 +00:00
Chris Lattner
d4058a59d4 Implement an item from the readme, folding vcmp/vcmp. instructions with
identical instructions into a single instruction.  For example, for:

void test(vector float *x, vector float *y, int *P) {
  int v = vec_any_out(*x, *y);
  *x = (vector float)vec_cmpb(*x, *y);
  *P = v;
}

we now generate:

_test:
        mfspr r2, 256
        oris r6, r2, 49152
        mtspr 256, r6
        lvx v0, 0, r4
        lvx v1, 0, r3
        vcmpbfp. v0, v1, v0
        mfcr r4, 2
        stvx v0, 0, r3
        rlwinm r3, r4, 27, 31, 31
        xori r3, r3, 1
        stw r3, 0(r5)
        mtspr 256, r2
        blr

instead of:

_test:
        mfspr r2, 256
        oris r6, r2, 57344
        mtspr 256, r6
        lvx v0, 0, r4
        lvx v1, 0, r3
        vcmpbfp. v2, v1, v0
        mfcr r4, 2
***     vcmpbfp v0, v1, v0
        rlwinm r4, r4, 27, 31, 31
        stvx v0, 0, r3
        xori r3, r4, 1
        stw r3, 0(r5)
        mtspr 256, r2
        blr

Testcase here: CodeGen/PowerPC/vcmp-fold.ll

llvm-svn: 27290
2006-03-31 06:02:07 +00:00
Chris Lattner
d7495ae7e9 Lower vector compares to VCMP nodes, just like we lower vector comparison
predicates to VCMPo nodes.

llvm-svn: 27285
2006-03-31 05:13:27 +00:00
Chris Lattner
bca5fbe914 Mark INSERT_VECTOR_ELT as expand
llvm-svn: 27276
2006-03-31 01:48:55 +00:00
Nate Begeman
1b3928765d Add a few more altivec intrinsics
llvm-svn: 27215
2006-03-28 04:15:58 +00:00
Chris Lattner
cb5ec07cc3 Use normal lvx for scalar_to_vector instead of lve*x. They do the exact
same thing and we have a dag node for the former.

llvm-svn: 27205
2006-03-28 01:43:22 +00:00
Chris Lattner
e55d171ccd Tblgen doesn't like multiple SDNode<> definitions that map to the sameenum value. Split them into separate enums.
llvm-svn: 27201
2006-03-28 00:40:33 +00:00
Nate Begeman
ed728c1291 SelectionDAGISel can now natively handle Switch instructions, in the same
manner that the LowerSwitch LLVM to LLVM pass does: emitting a binary
search tree of basic blocks.  The new approach has several advantages:
it is faster, it generates significantly smaller code in many cases, and
it paves the way for implementing dense switch tables as a jump table by
handling switches directly in the instruction selector.

This functionality is currently only enabled on x86, but should be safe for
every target.  In anticipation of making it the default, the cfg is now
properly updated in the x86, ppc, and sparc select lowering code.

llvm-svn: 27156
2006-03-27 01:32:24 +00:00
Chris Lattner
6961fc76bb Codegen vector predicate compares.
llvm-svn: 27151
2006-03-26 10:06:40 +00:00
Evan Cheng
b1ddc988af Remove PPC:isZeroVector, use ISD::isBuildVectorAllZeros instead
llvm-svn: 27149
2006-03-26 09:52:32 +00:00
Chris Lattner
1cb91b3cd9 Add some basic patterns for other datatypes
llvm-svn: 27116
2006-03-25 07:39:07 +00:00
Chris Lattner
2771e2c960 Codegen things like:
<int -1, int -1, int -1, int -1>
and
 <int 65537, int 65537, int 65537, int 65537>

Using things like:
  vspltisb v0, -1
and:
  vspltish v0, 1

instead of using constant pool loads.

This implements CodeGen/PowerPC/vec_splat.ll:splat_imm_i{32|16}.

llvm-svn: 27106
2006-03-25 06:12:06 +00:00
Chris Lattner
a90b7141ed Disable the i32->float G5 optimization. It is unsafe, as documented in the
comment.

This fixes 177.mesa, and McCat/09-vor with the td scheduler.

llvm-svn: 27060
2006-03-24 07:53:47 +00:00
Chris Lattner
ab882abce8 add support for using vxor to build zero vectors. This implements
Regression/CodeGen/PowerPC/vec_zero.ll

llvm-svn: 27059
2006-03-24 07:48:08 +00:00
Chris Lattner
4a66d69433 When possible, custom lower 32-bit SINT_TO_FP to this:
_foo2:
        extsw r2, r3
        std r2, -8(r1)
        lfd f0, -8(r1)
        fcfid f0, f0
        frsp f1, f0
        blr

instead of this:

_foo2:
        lis r2, ha16(LCPI2_0)
        lis r4, 17200
        xoris r3, r3, 32768
        stw r3, -4(r1)
        stw r4, -8(r1)
        lfs f0, lo16(LCPI2_0)(r2)
        lfd f1, -8(r1)
        fsub f0, f1, f0
        frsp f1, f0
        blr

This speeds up Misc/pi from 2.44s->2.09s with LLC and from 3.01->2.18s
with llcbeta (16.7% and 38.1% respectively).

llvm-svn: 26943
2006-03-22 05:30:33 +00:00
Chris Lattner
00f4683bf6 These targets don't support EXTRACT_VECTOR_ELT, though, in time, X86 will.
llvm-svn: 26930
2006-03-21 20:51:05 +00:00