Evan Cheng
f9adce90bf
Get rid of some memory leaks identified by Valgrind
...
llvm-svn: 25960
2006-02-04 06:49:00 +00:00
Chris Lattner
3b48431333
Add initial support for immediates. This allows us to compile this:
...
int %rlwnm(int %A, int %B) {
%C = call int asm "rlwnm $0, $1, $2, $3, $4", "=r,r,r,n,n"(int %A, int %B, int 4, int 17)
ret int %C
}
into:
_rlwnm:
or r2, r3, r3
or r3, r4, r4
rlwnm r2, r2, r3, 4, 17 ;; note the immediates :)
or r3, r2, r2
blr
llvm-svn: 25955
2006-02-04 02:26:14 +00:00
Chris Lattner
65ad53feb3
Initial early support for non-register operands, like immediates
...
llvm-svn: 25952
2006-02-04 02:16:44 +00:00
Chris Lattner
f68fd20286
remove some #ifdef'd out code, which should properly be in the dag combiner anyway.
...
llvm-svn: 25941
2006-02-03 20:13:59 +00:00
Chris Lattner
7f5880b1c7
Implement matching constraints. We can now say things like this:
...
%C = call int asm "xyz $0, $1, $2, $3", "=r,r,r,0"(int %A, int %B, int 4)
and get:
xyz r2, r3, r4, r2
note that the r2's are pinned together. Yaay for 2-address instructions.
2342 ----------------------------------------------------------------------
llvm-svn: 25893
2006-02-02 00:25:23 +00:00
Chris Lattner
1558fc64f9
Implement simple register assignment for inline asms. This allows us to compile:
...
int %test(int %A, int %B) {
%C = call int asm "xyz $0, $1, $2", "=r,r,r"(int %A, int %B)
ret int %C
}
into:
(0x8906130, LLVM BB @0x8902220):
%r2 = OR4 %r3, %r3
%r3 = OR4 %r4, %r4
INLINEASM <es:xyz $0, $1, $2>, %r2<def>, %r2, %r3
%r3 = OR4 %r2, %r2
BLR
which asmprints as:
_test:
or r2, r3, r3
or r3, r4, r4
xyz $0, $1, $2 ;; need to print the operands now :)
or r3, r2, r2
blr
llvm-svn: 25878
2006-02-01 18:59:47 +00:00
Chris Lattner
3a5ed55187
adjust to changes in InlineAsm interface. Fix a few minor bugs.
...
llvm-svn: 25865
2006-02-01 01:28:23 +00:00
Chris Lattner
2e56e89452
Handle physreg input/outputs. We now compile this:
...
int %test_cpuid(int %op) {
%B = alloca int
%C = alloca int
%D = alloca int
%A = call int asm "cpuid", "=eax,==ebx,==ecx,==edx,eax"(int* %B, int* %C, int* %D, int %op)
%Bv = load int* %B
%Cv = load int* %C
%Dv = load int* %D
%x = add int %A, %Bv
%y = add int %x, %Cv
%z = add int %y, %Dv
ret int %z
}
to this:
_test_cpuid:
sub %ESP, 16
mov DWORD PTR [%ESP], %EBX
mov %EAX, DWORD PTR [%ESP + 20]
cpuid
mov DWORD PTR [%ESP + 8], %ECX
mov DWORD PTR [%ESP + 12], %EBX
mov DWORD PTR [%ESP + 4], %EDX
mov %ECX, DWORD PTR [%ESP + 12]
add %EAX, %ECX
mov %ECX, DWORD PTR [%ESP + 8]
add %EAX, %ECX
mov %ECX, DWORD PTR [%ESP + 4]
add %EAX, %ECX
mov %EBX, DWORD PTR [%ESP]
add %ESP, 16
ret
... note the proper register allocation. :)
it is unclear to me why the loads aren't folded into the adds.
llvm-svn: 25827
2006-01-31 02:03:41 +00:00
Chris Lattner
98ed05c81d
remove method I just added
...
llvm-svn: 25728
2006-01-28 03:43:09 +00:00
Chris Lattner
43b867dd3b
add a new callback
...
llvm-svn: 25727
2006-01-28 03:37:03 +00:00
Nate Begeman
595ec734fc
Implement Promote for VAARG, and allow it to be custom promoted for people
...
who don't want the default behavior (Alpha).
llvm-svn: 25726
2006-01-28 03:14:31 +00:00
Nate Begeman
8c47c3a3b1
Remove TLI.LowerReturnTo, and just let targets custom lower ISD::RET for
...
the same functionality. This addresses another piece of bug 680. Next,
on to fixing Alpha VAARG, which I broke last time.
llvm-svn: 25696
2006-01-27 21:09:22 +00:00
Chris Lattner
476e67be14
initial selectiondag support for new INLINEASM node. Note that inline asms
...
with outputs or inputs are not supported yet. :)
llvm-svn: 25664
2006-01-26 22:24:51 +00:00
Nate Begeman
e74795cd70
First part of bug 680:
...
Remove TLI.LowerVA* and replace it with SDNodes that are lowered the same
way as everything else.
llvm-svn: 25606
2006-01-25 18:21:52 +00:00
Evan Cheng
a6eff8a432
If scheduler choice is the default (-sched=default), use target scheduling
...
preference to determine which scheduler to use. SchedulingForLatency ==
Breadth first; SchedulingForRegPressure == bottom up register reduction list
scheduler.
llvm-svn: 25599
2006-01-25 09:12:57 +00:00
Jim Laskey
b8566fa10a
Typo.
...
llvm-svn: 25545
2006-01-23 13:34:04 +00:00
Evan Cheng
31272347d4
Skeleton of the list schedule.
...
llvm-svn: 25544
2006-01-23 08:26:10 +00:00
Evan Cheng
c1e1d9724d
Factor out more instruction scheduler code to the base class.
...
llvm-svn: 25532
2006-01-23 07:01:07 +00:00
Chris Lattner
deda32a786
Fix bugs lowering stackrestore, fixing 2004-08-12-InlinerAndAllocas.c on
...
PPC.
llvm-svn: 25522
2006-01-23 05:22:07 +00:00
Chris Lattner
e23928c67f
Fix a bug in a recent refactor that caused a bunch of programs to miscompile
...
or the compiler to crash.
llvm-svn: 25503
2006-01-21 19:12:11 +00:00
Evan Cheng
739a6a456e
Do some code refactoring on Jim's scheduler in preparation of the new list
...
scheduler.
llvm-svn: 25493
2006-01-21 02:32:06 +00:00
Chris Lattner
222ceabbee
If the target doesn't support f32 natively, insert the FP_EXTEND in target-indep
...
code, so that the LowerReturn code doesn't have to handle it.
llvm-svn: 25482
2006-01-20 18:38:32 +00:00
Chris Lattner
e2ee190821
Temporary work around for a libcall insertion bug: If a target doesn't
...
support FSIN/FCOS nodes, do not lower sin/cos to them.
llvm-svn: 25425
2006-01-18 21:50:14 +00:00
Robert Bocchino
03e95af9f7
Support for the insertelement operation.
...
llvm-svn: 25405
2006-01-17 20:06:42 +00:00
Reid Spencer
b4f9a6f110
For PR411:
...
This patch is an incremental step towards supporting a flat symbol table.
It de-overloads the intrinsic functions by providing type-specific intrinsics
and arranging for automatically upgrading from the old overloaded name to
the new non-overloaded name. Specifically:
llvm.isunordered -> llvm.isunordered.f32, llvm.isunordered.f64
llvm.sqrt -> llvm.sqrt.f32, llvm.sqrt.f64
llvm.ctpop -> llvm.ctpop.i8, llvm.ctpop.i16, llvm.ctpop.i32, llvm.ctpop.i64
llvm.ctlz -> llvm.ctlz.i8, llvm.ctlz.i16, llvm.ctlz.i32, llvm.ctlz.i64
llvm.cttz -> llvm.cttz.i8, llvm.cttz.i16, llvm.cttz.i32, llvm.cttz.i64
New code should not use the overloaded intrinsic names. Warnings will be
emitted if they are used.
llvm-svn: 25366
2006-01-16 21:12:35 +00:00
Nate Begeman
542c3c17a9
Remove some duplicated code
...
llvm-svn: 25313
2006-01-14 03:18:27 +00:00
Nate Begeman
2fba8a3aaa
bswap implementation
...
llvm-svn: 25312
2006-01-14 03:14:10 +00:00
Chris Lattner
b32664583b
Compile llvm.stacksave/restore into STACKSAVE/STACKRESTORE nodes, and allow
...
targets to custom expand them as they desire.
llvm-svn: 25273
2006-01-13 02:50:02 +00:00
Chris Lattner
6c9c250dcd
Add "support" for stacksave/stackrestore to the dag isel
...
llvm-svn: 25268
2006-01-13 02:24:42 +00:00
Robert Bocchino
2c966e7617
Added selection DAG support for the extractelement operation.
...
llvm-svn: 25179
2006-01-10 19:04:57 +00:00
Jim Laskey
219d559824
Applied some recommend changes from sabre. The dominate one beginning "let the
...
pass manager do it's thing." Fixes crash when compiling -g files and suppresses
dwarf statements if no debug info is present.
llvm-svn: 25100
2006-01-04 22:28:25 +00:00
Chris Lattner
44c07ed61a
enable the gep isel opt
...
llvm-svn: 24910
2005-12-21 19:36:36 +00:00
Chris Lattner
803a575616
Lower ConstantAggregateZero into zeros
...
llvm-svn: 24890
2005-12-21 02:43:26 +00:00
Jim Laskey
7c462768ed
Added source file/line correspondence for dwarf (PowerPC only at this point.)
...
llvm-svn: 24748
2005-12-16 22:45:29 +00:00
Chris Lattner
5d4e61dd87
Don't lump the filename and working dir together
...
llvm-svn: 24697
2005-12-13 17:40:33 +00:00
Chris Lattner
9e8b633ec1
Accept and ignore prefetches for now
...
llvm-svn: 24678
2005-12-12 22:51:16 +00:00
Chris Lattner
f1a54c0d14
Minor tweak to get isel opt
...
llvm-svn: 24663
2005-12-11 09:05:13 +00:00
Chris Lattner
be73d6eece
improve code insertion in two ways:
...
1. Only forward subst offsets into loads and stores, not into arbitrary
things, where it will likely become a load.
2. If the source is a cast from pointer, forward subst the cast as well,
allowing us to fold the cast away (improving cases when the cast is
from an alloca or global).
This hasn't been fully tested, but does appear to further reduce register
pressure and improve code. Lets let the testers grind on it a bit. :)
llvm-svn: 24640
2005-12-08 08:00:12 +00:00
Nate Begeman
ae89d862f5
Fix a crash where ConstantVec nodes were being generated with the wrong
...
type when the target did not support them. Also teach Legalize how to
expand ConstantVecs.
This allows us to generate
_test:
lwz r2, 12(r3)
lwz r4, 8(r3)
lwz r5, 4(r3)
lwz r6, 0(r3)
addi r2, r2, 4
addi r4, r4, 3
addi r5, r5, 2
addi r6, r6, 1
stw r2, 12(r3)
stw r4, 8(r3)
stw r5, 4(r3)
stw r6, 0(r3)
blr
For:
void %test(%v4i *%P) {
%T = load %v4i* %P
%S = add %v4i %T, <int 1, int 2, int 3, int 4>
store %v4i %S, %v4i * %P
ret void
}
On PowerPC.
llvm-svn: 24633
2005-12-07 19:48:11 +00:00
Nate Begeman
41b1cdc771
Teach the SelectionDAG ISel how to turn ConstantPacked values into
...
constant nodes with vector types. Also teach the asm printer how to print
ConstantPacked constant pool entries. This allows us to generate altivec
code such as the following, which adds a vector constantto a packed float.
LCPI1_0: <4 x float> < float 0.0e+0, float 0.0e+0, float 0.0e+0, float 1.0e+0 >
.space 4
.space 4
.space 4
.long 1065353216 ; float 1
.text
.align 4
.globl _foo
_foo:
lis r2, ha16(LCPI1_0)
la r2, lo16(LCPI1_0)(r2)
li r4, 0
lvx v0, r4, r2
lvx v1, r4, r3
vaddfp v0, v1, v0
stvx v0, r4, r3
blr
For the llvm code:
void %foo(<4 x float> * %a) {
entry:
%tmp1 = load <4 x float> * %a;
%tmp2 = add <4 x float> %tmp1, < float 0.0, float 0.0, float 0.0, float 1.0 >
store <4 x float> %tmp2, <4 x float> *%a
ret void
}
llvm-svn: 24616
2005-12-06 06:18:55 +00:00
Chris Lattner
3539778883
Fix the #1 code quality problem that I have seen on X86 (and it also affects
...
PPC and other targets). In a particular, consider code like this:
struct Vector3 { double x, y, z; };
struct Matrix3 { Vector3 a, b, c; };
double dot(Vector3 &a, Vector3 &b) {
return a.x * b.x + a.y * b.y + a.z * b.z;
}
Vector3 mul(Vector3 &a, Matrix3 &b) {
Vector3 r;
r.x = dot( a, b.a );
r.y = dot( a, b.b );
r.z = dot( a, b.c );
return r;
}
void transform(Matrix3 &m, Vector3 *x, int n) {
for (int i = 0; i < n; i++)
x[i] = mul( x[i], m );
}
we compile transform to a loop with all of the GEP instructions for indexing
into 'm' pulled out of the loop (9 of them). Because isel occurs a bb at a time
we are unable to fold the constant index into the loads in the loop, leading to
PPC code that looks like this:
LBB3_1: ; no_exit.preheader
li r2, 0
addi r6, r3, 64 ;; 9 values live across the loop body!
addi r7, r3, 56
addi r8, r3, 48
addi r9, r3, 40
addi r10, r3, 32
addi r11, r3, 24
addi r12, r3, 16
addi r30, r3, 8
LBB3_2: ; no_exit
lfd f0, 0(r30)
lfd f1, 8(r4)
fmul f0, f1, f0
lfd f2, 0(r3) ;; no constant indices folded into the loads!
lfd f3, 0(r4)
lfd f4, 0(r10)
lfd f5, 0(r6)
lfd f6, 0(r7)
lfd f7, 0(r8)
lfd f8, 0(r9)
lfd f9, 0(r11)
lfd f10, 0(r12)
lfd f11, 16(r4)
fmadd f0, f3, f2, f0
fmul f2, f1, f4
fmadd f0, f11, f10, f0
fmadd f2, f3, f9, f2
fmul f1, f1, f6
stfd f0, 0(r4)
fmadd f0, f11, f8, f2
fmadd f1, f3, f7, f1
stfd f0, 8(r4)
fmadd f0, f11, f5, f1
addi r29, r4, 24
stfd f0, 16(r4)
addi r2, r2, 1
cmpw cr0, r2, r5
or r4, r29, r29
bne cr0, LBB3_2 ; no_exit
uh, yuck. With this patch, we now sink the constant offsets into the loop, producing
this code:
LBB3_1: ; no_exit.preheader
li r2, 0
LBB3_2: ; no_exit
lfd f0, 8(r3)
lfd f1, 8(r4)
fmul f0, f1, f0
lfd f2, 0(r3)
lfd f3, 0(r4)
lfd f4, 32(r3) ;; much nicer.
lfd f5, 64(r3)
lfd f6, 56(r3)
lfd f7, 48(r3)
lfd f8, 40(r3)
lfd f9, 24(r3)
lfd f10, 16(r3)
lfd f11, 16(r4)
fmadd f0, f3, f2, f0
fmul f2, f1, f4
fmadd f0, f11, f10, f0
fmadd f2, f3, f9, f2
fmul f1, f1, f6
stfd f0, 0(r4)
fmadd f0, f11, f8, f2
fmadd f1, f3, f7, f1
stfd f0, 8(r4)
fmadd f0, f11, f5, f1
addi r6, r4, 24
stfd f0, 16(r4)
addi r2, r2, 1
cmpw cr0, r2, r5
or r4, r6, r6
bne cr0, LBB3_2 ; no_exit
This is much nicer as it reduces register pressure in the loop a lot. On X86,
this takes the function from having 9 spilled registers to 2. This should help
some spec programs on X86 (gzip?)
This is currently only enabled with -enable-gep-isel-opt to allow perf testing
tonight.
llvm-svn: 24606
2005-12-05 07:10:48 +00:00
Chris Lattner
8782b782cd
dbg.stoppoint returns a value, don't forget to init it
...
llvm-svn: 24583
2005-12-03 18:50:48 +00:00
Nate Begeman
1064d6ec43
First chunk of actually generating vector code for packed types. These
...
changes allow us to generate the following code:
_foo:
li r2, 0
lvx v0, r2, r3
vaddfp v0, v0, v0
stvx v0, r2, r3
blr
for this llvm:
void %foo(<4 x float>* %a) {
entry:
%tmp1 = load <4 x float>* %a
%tmp2 = add <4 x float> %tmp1, %tmp1
store <4 x float> %tmp2, <4 x float>* %a
ret void
}
llvm-svn: 24534
2005-11-30 08:22:07 +00:00
Reid Spencer
3fd1b4c9bf
Fix a problem with llvm-ranlib that (on some platforms) caused the archive
...
file to become corrupted due to interactions between mmap'd memory segments
and file descriptors closing. The problem is completely avoiding by using
a third temporary file.
Patch provided by Evan Jones
llvm-svn: 24527
2005-11-30 05:21:10 +00:00
Chris Lattner
435b402e1f
Add support for a new STRING and LOCATION node for line number support, patch
...
contributed by Daniel Berlin, with a few cleanups here and there by me.
llvm-svn: 24515
2005-11-29 06:21:05 +00:00
Nate Begeman
d37c13154a
Check in code to scalarize arbitrarily wide packed types for some simple
...
vector operations (load, add, sub, mul).
This allows us to codegen:
void %foo(<4 x float> * %a) {
entry:
%tmp1 = load <4 x float> * %a;
%tmp2 = add <4 x float> %tmp1, %tmp1
store <4 x float> %tmp2, <4 x float> *%a
ret void
}
on ppc as:
_foo:
lfs f0, 12(r3)
lfs f1, 8(r3)
lfs f2, 4(r3)
lfs f3, 0(r3)
fadds f0, f0, f0
fadds f1, f1, f1
fadds f2, f2, f2
fadds f3, f3, f3
stfs f0, 12(r3)
stfs f1, 8(r3)
stfs f2, 4(r3)
stfs f3, 0(r3)
blr
llvm-svn: 24484
2005-11-22 18:16:00 +00:00
Nate Begeman
07890bbec4
Rather than attempting to legalize 1 x float, make sure the SD ISel never
...
generates it. Make MVT::Vector expand-only, and remove the code in
Legalize that attempts to legalize it.
The plan for supporting N x Type is to continually epxand it in ExpandOp
until it gets down to 2 x Type, where it will be scalarized into a pair of
scalars.
llvm-svn: 24482
2005-11-22 01:29:36 +00:00
Chris Lattner
19baba67b5
Unbreak codegen of bools. This should fix the llc/jit/llc-beta failures
...
from last night.
llvm-svn: 24427
2005-11-19 18:40:42 +00:00
Nate Begeman
b2e089c31b
Teach LLVM how to scalarize packed types. Currently, this only works on
...
packed types with an element count of 1, although more generic support is
coming. This allows LLVM to turn the following code:
void %foo(<1 x float> * %a) {
entry:
%tmp1 = load <1 x float> * %a;
%tmp2 = add <1 x float> %tmp1, %tmp1
store <1 x float> %tmp2, <1 x float> *%a
ret void
}
Into:
_foo:
lfs f0, 0(r3)
fadds f0, f0, f0
stfs f0, 0(r3)
blr
llvm-svn: 24416
2005-11-19 00:36:38 +00:00
Nate Begeman
127321b14c
Split out the shift code from visitBinary.
...
llvm-svn: 24412
2005-11-18 07:42:56 +00:00