324 Commits

Author SHA1 Message Date
Alexey Bataev
3b1b8951b9 [OPENMP] Codegen for 'task_reduction' clause.
Added codegen for taskgroup directive with task_reduction clause.
```
<body>
```
The next code is emitted:
```
%struct.kmp_task_red_input_t red_init[n];
void *td;
call void @__kmpc_taskgroup(%ident_t id, i32 gtid)
...
red_init[i].shar = &<item>;
red_init[i].size = sizeof(<item>);
red_init[i].init = (void*)initializer_function;
red_init[i].fini = (void*)destructor_function;
red_init[i].comb = (void*)combiner_function;
red_init[i].flags = flags;
...
td = call i8* @__kmpc_task_reduction_init(i32 gtid, i32 n, i8*
(void*)red_init);
call void @__kmpc_end_taskgroup(%ident_t id, i32 gtid)

void initializer_function(i8* priv) {
  *(<type>*)priv = <red_init>;
  ret void;
}

void destructor_function(i8* priv) {
  (<type>*)priv->~();
  ret void;
}

void combiner_function(i8* inout, i8* in) {
  *(<type>*)inout = *(<type>*)inout <red_id> *(<type>*)in;
  ret void;
}
```

llvm-svn: 308979
2017-07-25 15:53:26 +00:00
Alexey Bataev
fa312f33f8 [OPENMP] Initial support for 'in_reduction' clause.
Parsing/sema analysis for 'in_reduction' clause for task-based
directives.

llvm-svn: 308768
2017-07-21 18:48:21 +00:00
Alexey Bataev
169d96a203 [OPENMP] Initial support for 'task_reduction' clause.
Parsing/sema analysis of the 'task_reduction' clause.

llvm-svn: 308352
2017-07-18 20:17:46 +00:00
Alexey Bataev
be5a8b42cd [OPENMP] Codegen for reduction clauses in 'taskloop' directives.
Adds codegen for taskloop-based directives.

llvm-svn: 308174
2017-07-17 13:30:36 +00:00
Eric Christopher
7aba9784c3 Change dyn_casts with unused variables to isa statements to avoid unused variables.
llvm-svn: 307988
2017-07-14 01:42:57 +00:00
Alexey Bataev
5c40bec5eb [OPENMP] Generalization of codegen for reduction clauses.
Reworked codegen for reduction clauses for future support of reductions
in task-based directives.

llvm-svn: 307910
2017-07-13 13:36:14 +00:00
Alexey Bataev
3344603f7b [OPENMP] Emit implicit taskgroup block around taskloop directives.
If taskloop directive has no associated nogroup clause, it must emitted
inside implicit taskgroup block. Runtime supports it, but we need to
generate implicit taskgroup block explicitly to support future
reductions codegen.

llvm-svn: 307822
2017-07-12 18:09:32 +00:00
Alexey Bataev
1fdfdf7155 [OPENMP][DEBUG] Generate second function with correct arg types.
Currently, if the some of the parameters are captured by value, this
argument is converted to uintptr_t type and thus we loosing the debug
info about real type of the argument (captured variable):
```
void @.outlined_function.(uintptr %par);

...
%a = alloca i32
%a.casted = alloca uintptr
%cast = bitcast uintptr* %a.casted to i32*
%a.val = load i32, i32 *%a
store i32 %a.val, i32 *%cast
%a.casted.val = load uintptr, uintptr* %a.casted
call void @.outlined_function.(uintptr %a.casted.val)
...
```

To resolve this problem, in debug mode a speciall external wrapper
function is generated, that calls the outlined function with the correct
parameters types:
```
void @.wrapper.(uintptr %par) {
  %a = alloca i32
  %cast = bitcast i32* %a to uintptr*
  store uintptr %par, uintptr *%cast
  %a.val = load i32, i32* %a
  call void @.outlined_function.(i32 %a)
  ret void
}
void @.outlined_function.(i32 %par);

...
%a = alloca i32
%a.casted = alloca uintptr
%cast = bitcast uintptr* %a.casted to i32*
%a.val = load i32, i32 *%a
store i32 %a.val, i32 *%cast
%a.casted.val = load uintptr, uintptr* %a.casted
call void @.wrapper.(uintptr %a.casted.val)
...
```

llvm-svn: 306697
2017-06-29 16:43:05 +00:00
Alexey Bataev
56223237b0 [DebugInfo] Add kind of ImplicitParamDecl for emission of FlagObjectPointer.
Summary:
If the first parameter of the function is the ImplicitParamDecl, codegen
automatically marks it as an implicit argument with `this` or `self`
pointer. Added internal kind of the ImplicitParamDecl to separate
'this', 'self', 'vtt' and other implicit parameters from other kind of
parameters.

Reviewers: rjmccall, aaron.ballman

Subscribers: cfe-commits

Differential Revision: https://reviews.llvm.org/D33735

llvm-svn: 305075
2017-06-09 13:40:18 +00:00
Krzysztof Parzyszek
8f248234fa [CodeGen] Propagate LValueBaseInfo instead of AlignmentSource
The functions creating LValues propagated information about alignment
source. Extend the propagated data to also include information about
possible unrestricted aliasing. A new class LValueBaseInfo will
contain both AlignmentSource and MayAlias info.

This patch should not introduce any functional changes.

Differential Revision: https://reviews.llvm.org/D33284

llvm-svn: 303358
2017-05-18 17:07:11 +00:00
Hans Wennborg
ed129aebbb Fix -Wpedantic about extra semicolons in CGStmtOpenMP.cpp
llvm-svn: 301564
2017-04-27 17:02:25 +00:00
Carlo Bertolli
b0ff0a69c3 Recommit of
[OpenMP] Initial implementation of code generation for pragma 'distribute parallel for' on host

https://reviews.llvm.org/D29508

This patch makes the following additions:

It abstracts away loop bound generation code from procedures associated with pragma 'for' and loops in general, in such a way that the same procedures can be used for 'distribute parallel for' without the need for a full re-implementation.
It implements code generation for 'distribute parallel for' and adds regression tests. It includes tests for clauses.
It is important to notice that most of the clauses are implemented as part of existing procedures. For instance, firstprivate is already implemented for 'distribute' and 'for' as separate pragmas. As the implementation of 'distribute parallel for' is based on the same procedures, then we automatically obtain implementation for such clauses without the need to add new code. However, this requires regression tests that verify correctness of produced code.

llvm-svn: 301340
2017-04-25 17:52:12 +00:00
Carlo Bertolli
f09daae75d Revert r301223
llvm-svn: 301233
2017-04-24 19:50:35 +00:00
Carlo Bertolli
4287d65c10 [OpenMP] Initial implementation of code generation for pragma 'distribute parallel for' on host
https://reviews.llvm.org/D29508

This patch makes the following additions:

1. It abstracts away loop bound generation code from procedures associated with pragma 'for' and loops in general, in such a way that the same procedures can be used for 'distribute parallel for' without the need for a full re-implementation.
2. It implements code generation for 'distribute parallel for' and adds regression tests. It includes tests for clauses.

It is important to notice that most of the clauses are implemented as part of existing procedures. For instance, firstprivate is already implemented for 'distribute' and 'for' as separate pragmas. As the implementation of 'distribute parallel for' is based on the same procedures, then we automatically obtain implementation for such clauses without the need to add new code. However, this requires regression tests that verify correctness of produced code.

Looking forward to comments.

llvm-svn: 301223
2017-04-24 19:26:11 +00:00
Alexey Bataev
f7ce166220 [OPENMP] Fix for PR32333: Crash in call of outlined Function.
If the type of the captured variable is a pointer(s) to variably
modified type, this type was not processed correctly. Need to drill into
the type, find the innermost variably modified array type and convert it
to canonical parameter type.

llvm-svn: 299868
2017-04-10 19:16:45 +00:00
Adam Nemet
484aa45153 Encapsulate FPOptions and use it consistently
Sema holds the current FPOptions which is adjusted by 'pragma STDC
FP_CONTRACT'.  This then gets propagated into expression nodes as they are
built.

This encapsulates FPOptions so that this propagation happens opaquely rather
than directly with the fp_contractable on/off bit.  This allows controlled
transitioning of fp_contractable to a ternary value (off, on, fast).  It will
also allow adding more fast-math flags later.

This is toward moving fp-contraction=fast from an LLVM TargetOption to a
FastMathFlag in order to fix PR25721.

Differential Revision: https://reviews.llvm.org/D31166

llvm-svn: 298877
2017-03-27 19:17:25 +00:00
Arpith Chacko Jacob
fc711b1f47 [OpenMP] Teams reduction on the NVPTX device.
This patch implements codegen for the reduction clause on
any teams construct for elementary data types.  It builds
on parallel reductions on the GPU.  Subsequently,
the team master writes to a unique location in a global
memory scratchpad.  The last team to do so loads and
reduces this array to calculate the final result.

This patch emits two helper functions that are used by
the OpenMP runtime on the GPU to perform reductions across
teams.

Patch by Tian Jin in collaboration with Arpith Jacob

Reviewers: ABataev
Differential Revision: https://reviews.llvm.org/D29879

llvm-svn: 295335
2017-02-16 16:48:49 +00:00
Arpith Chacko Jacob
101e8fb1f3 [OpenMP] Parallel reduction on the NVPTX device.
This patch implements codegen for the reduction clause on
any parallel construct for elementary data types.  An efficient
implementation requires hierarchical reduction within a
warp and a threadblock.  It is complicated by the fact that
variables declared in the stack of a CUDA thread cannot be
shared with other threads.

The patch creates a struct to hold reduction variables and
a number of helper functions.  The OpenMP runtime on the GPU
implements reduction algorithms that uses these helper
functions to perform reductions within a team.  Variables are
shared between CUDA threads using shuffle intrinsics.

An implementation of reductions on the NVPTX device is
substantially different to that of CPUs.  However, this patch
is written so that there are minimal changes to the rest of
OpenMP codegen.

The implemented design allows the compiler and runtime to be
decoupled, i.e., the runtime does not need to know of the
reduction operation(s), the type of the reduction variable(s),
or the number of reductions.  The design also allows reuse of
host codegen, with appropriate specialization for the NVPTX
device.

While the patch does introduce a number of abstractions, the
expected use case calls for inlining of the GPU OpenMP runtime.
After inlining and optimizations in LLVM, these abstractions
are unwound and performance of OpenMP reductions is comparable
to CUDA-canonical code.

Patch by Tian Jin in collaboration with Arpith Jacob

Reviewers: ABataev
Differential Revision: https://reviews.llvm.org/D29758

llvm-svn: 295333
2017-02-16 16:20:16 +00:00
Arpith Chacko Jacob
bd6344c0be Revert r295319 while investigating buildbot failure.
llvm-svn: 295323
2017-02-16 14:25:35 +00:00
Arpith Chacko Jacob
8e170fc857 [OpenMP] Parallel reduction on the NVPTX device.
This patch implements codegen for the reduction clause on
any parallel construct for elementary data types.  An efficient
implementation requires hierarchical reduction within a
warp and a threadblock.  It is complicated by the fact that
variables declared in the stack of a CUDA thread cannot be
shared with other threads.

The patch creates a struct to hold reduction variables and
a number of helper functions.  The OpenMP runtime on the GPU
implements reduction algorithms that uses these helper
functions to perform reductions within a team.  Variables are
shared between CUDA threads using shuffle intrinsics.

An implementation of reductions on the NVPTX device is
substantially different to that of CPUs.  However, this patch
is written so that there are minimal changes to the rest of
OpenMP codegen.

The implemented design allows the compiler and runtime to be
decoupled, i.e., the runtime does not need to know of the
reduction operation(s), the type of the reduction variable(s),
or the number of reductions.  The design also allows reuse of
host codegen, with appropriate specialization for the NVPTX
device.

While the patch does introduce a number of abstractions, the
expected use case calls for inlining of the GPU OpenMP runtime.
After inlining and optimizations in LLVM, these abstractions
are unwound and performance of OpenMP reductions is comparable
to CUDA-canonical code.

Patch by Tian Jin in collaboration with Arpith Jacob

Reviewers: ABataev
Differential Revision: https://reviews.llvm.org/D29758

llvm-svn: 295319
2017-02-16 14:03:36 +00:00
Arpith Chacko Jacob
99a1e0eba5 [OpenMP] Codegen support for 'target teams' on the host.
This patch adds support for codegen of 'target teams' on the host.
This combined directive has two captured statements, one for the
'teams' region, and the other for the 'parallel'.

This target teams region is offloaded using the __tgt_target_teams()
call. The patch sets the number of teams as an argument to
this call.

Reviewers: ABataev
Differential Revision: https://reviews.llvm.org/D29084

llvm-svn: 293005
2017-01-25 02:18:43 +00:00
Arpith Chacko Jacob
86f9e46365 Reverting commit because an NVPTX patch sneaked in. Break up into two
patches.

llvm-svn: 293003
2017-01-25 01:45:59 +00:00
Arpith Chacko Jacob
4dbf368e14 [OpenMP] Codegen support for 'target teams' on the host.
This patch adds support for codegen of 'target teams' on the host.
This combined directive has two captured statements, one for the
'teams' region, and the other for the 'parallel'.

This target teams region is offloaded using the __tgt_target_teams()
call. The patch sets the number of teams as an argument to
this call.

Reviewers: ABataev
Differential Revision: https://reviews.llvm.org/D29084

llvm-svn: 293001
2017-01-25 01:38:33 +00:00
Arpith Chacko Jacob
fe4890a68b [OpenMP] Support for the if-clause on the combined directive 'target parallel'.
The if-clause on the combined directive potentially applies to both the
'target' and the 'parallel' regions.  Codegen'ing the if-clause on the
combined directive requires additional support because the expression in
the clause must be captured by the 'target' capture statement but not
the 'parallel' capture statement.  Note that this situation arises for
other clauses such as num_threads.

The OMPIfClause class inherits OMPClauseWithPreInit to support capturing
of expressions in the clause.  A member CaptureRegion is added to
OMPClauseWithPreInit to indicate which captured statement (in this case
'target' but not 'parallel') captures these expressions.

To ensure correct codegen of captured expressions in the presence of
combined 'target' directives, OMPParallelScope was added to 'parallel'
codegen.

Reviewers: ABataev
Differential Revision: https://reviews.llvm.org/D28781

llvm-svn: 292437
2017-01-18 20:40:48 +00:00
Arpith Chacko Jacob
19b911cb75 [OpenMP] Codegen support for 'target parallel' on the host.
This patch adds support for codegen of 'target parallel' on the host.
It is also the first combined directive that requires two or more
captured statements.  Support for this functionality is included in
the patch.

A combined directive such as 'target parallel' has two captured
statements, one for the 'target' and the other for the 'parallel'
region.  Two captured statements are required because each has
different implicit parameters (see SemaOpenMP.cpp).  For example,
the 'parallel' has 'global_tid' and 'bound_tid' while the 'target'
does not.  The patch adds support for handling multiple captured
statements based on the combined directive.

When codegen'ing the 'target parallel' directive, the 'target'
outlined function is created using the outer captured statement
and the 'parallel' outlined function is created using the inner
captured statement.

Reviewers: ABataev
Differential Revision: https://reviews.llvm.org/D28753

llvm-svn: 292419
2017-01-18 18:18:53 +00:00
Arpith Chacko Jacob
42793e000a Revert r292374 to debug Windows buildbot failure.
llvm-svn: 292400
2017-01-18 15:36:05 +00:00
Arpith Chacko Jacob
68019578a3 [OpenMP] Codegen support for 'target parallel' on the host.
This patch adds support for codegen of 'target parallel' on the host.
It is also the first combined directive that requires two or more
captured statements.  Support for this functionality is included in
the patch.

A combined directive such as 'target parallel' has two captured
statements, one for the 'target' and the other for the 'parallel'
region.  Two captured statements are required because each has
different implicit parameters (see SemaOpenMP.cpp).  For example,
the 'parallel' has 'global_tid' and 'bound_tid' while the 'target'
does not.  The patch adds support for handling multiple captured
statements based on the combined directive.

When codegen'ing the 'target parallel' directive, the 'target'
outlined function is created using the outer captured statement
and the 'parallel' outlined function is created using the inner
captured statement.

Reviewers: ABataev
Differential Revision: https://reviews.llvm.org/D28753

llvm-svn: 292374
2017-01-18 15:14:52 +00:00
Arpith Chacko Jacob
43a8b7bc8c [OpenMP] Refactor code that calls codegen for target regions on the device.
This patch refactors code that calls codegen for target regions.  Currently
the codebase only supports the 'target' directive.  The patch pulls out
common target processing code into a static function that can be called
by codegen for any target directive.

Reviewers: ABataev
Differential Revision: https://reviews.llvm.org/D28752

llvm-svn: 292134
2017-01-16 15:26:02 +00:00
Malcolm Parsons
c6e4583dbb Remove unused lambda captures. NFC
llvm-svn: 291939
2017-01-13 18:55:32 +00:00
Kelvin Li
da68118729 [OpenMP] Sema and parsing for 'target teams distribute simd’ pragma
This patch is to implement sema and parsing for 'target teams distribute simd’ pragma.
    
Differential Revision: https://reviews.llvm.org/D28252

llvm-svn: 291579
2017-01-10 18:08:18 +00:00
Carlo Bertolli
962bb807ec [OPENMP] Private, firstprivate, and lastprivate clauses for distribute, host code generation
https://reviews.llvm.org/D17840

This patch enables private, firstprivate, and lastprivate clauses for the OpenMP distribute directive.
Regression tests differ from the similar case of the same clauses on the for directive, by removing a reference to two global variables g and g1. This is necessary because: 1. a distribute pragma is only allowed inside a target region; 2. referring a global variable (e.g. g and g1) in a target region requires the program to enclose the variable in a "declare target" region; 3. declare target pragmas, which are used to define a declare target region, are currently unavailable in clang (patch being prepared).
For this reason, I moved the global declarations into local variables.

llvm-svn: 290898
2017-01-03 18:24:42 +00:00
Kelvin Li
1851df563d [OpenMP] Sema and parsing for 'target teams distribute parallel for simd’ pragma
This patch is to implement sema and parsing for 'target teams distribute parallel for simd’ pragma.

Differential Revision: https://reviews.llvm.org/D28202

llvm-svn: 290862
2017-01-03 05:23:48 +00:00
Kelvin Li
80e8f56284 [OpenMP] Sema and parsing for 'target teams distribute parallel for’ pragma
This patch is to implement sema and parsing for 'target teams distribute parallel for’ pragma.

Differential Revision: https://reviews.llvm.org/D28160

llvm-svn: 290725
2016-12-29 22:16:30 +00:00
Kelvin Li
26fd21ab80 Fix format. NFC
llvm-svn: 290673
2016-12-28 17:57:07 +00:00
Kelvin Li
83c451e998 [OpenMP] Sema and parsing for 'target teams distribute' pragma
This patch is to implement sema and parsing for 'target teams distribute' pragma.

Differential Revision: https://reviews.llvm.org/D28015

llvm-svn: 290508
2016-12-25 04:52:54 +00:00
Kelvin Li
bf594a5600 [OpenMP] Sema and parsing for 'target teams' pragma
This patch is to implement sema and parsing for 'target teams' pragma.

Differential Revision: https://reviews.llvm.org/D27818

llvm-svn: 290038
2016-12-17 05:48:59 +00:00
Kelvin Li
51336dd0b4 Fix typo in comment. NFC.
llvm-svn: 289836
2016-12-15 17:55:32 +00:00
Kelvin Li
7ade93f5e2 [OpenMP] Sema and parsing for 'teams distribute parallel for' pragma
This patch is to implement sema and parsing for 'teams distribute parallel for' pragma.
    
Differential Revision: https://reviews.llvm.org/D27345

llvm-svn: 289179
2016-12-09 03:24:30 +00:00
Kelvin Li
579e41ced2 [OpenMP] Sema and parsing for 'teams distribute parallel for simd' pragma
This patch is to implement sema and parsing for 'teams distribute parallel for simd' pragma.

Differential Revision: https://reviews.llvm.org/D27084

llvm-svn: 288294
2016-11-30 23:51:03 +00:00
Alexey Bataev
957d856e7e [OPENMP] Fixed codegen for 'omp cancel' construct.
If 'omp cancel' construct is used in a worksharing construct it may
cause hanging of the software in case if reduction clause is used. Patch fixes this problem by avoiding extra reduction processing for branches that were canceled.

llvm-svn: 287227
2016-11-17 15:12:05 +00:00
Vitaly Buka
2d15858e40 Revert "[OPENMP] Fixed codegen for 'omp cancel' construct."
Summary:
r286944 introduced bugs detected by ASAN as use-after-return.
r287025 have not fixed them completely.

This reverts commit r286944 and r287025.

Reviewers: ABataev

Subscribers: cfe-commits

Differential Revision: https://reviews.llvm.org/D26720

llvm-svn: 287069
2016-11-16 01:01:22 +00:00
Alexey Bataev
ba002163c9 [OPENMP] Fix stack use after delete, NFC.
Fixed possible use of stack variable after deletion.

llvm-svn: 287025
2016-11-15 20:57:18 +00:00
Alexey Bataev
473a3e7fed [OPENMP] Fixed codegen for 'omp cancel' construct.
If 'omp cancel' construct is used in a worksharing construct it may cause
hanging of the software in case if reduction clause is used. Patch fixes
this problem by avoiding extra reduction processing for branches that
were canceled.

llvm-svn: 286944
2016-11-15 09:11:50 +00:00
Amara Emerson
652795db16 Add the loop end location to the loop metadata. This additional information
can be used to improve the locations when generating remarks for loops.

Depends on the companion LLVM change r286227.

Patch by Florian Hahn.

Differential Revision: https://reviews.llvm.org/D25764

llvm-svn: 286456
2016-11-10 14:44:30 +00:00
Alexey Bataev
ac5eabb0b9 [OPENMP] Fixed capturing of VLA variables.
After some changes in codegen capturing of VLA variables in OpenMP regions was broken, causing compiler crash. Patch fixes this issue.

llvm-svn: 286103
2016-11-07 11:16:04 +00:00
Diana Picus
1e2b7e6672 Revert "[OPENMP] Fixed capturing of VLA variables."
This reverts commit r286098 because the modified test breaks on many of the
buildbots.

llvm-svn: 286102
2016-11-07 10:01:43 +00:00
Alexey Bataev
420537fad8 [OPENMP] Fixed capturing of VLA variables.
After some changes in codegen capturing of VLA variables in OpenMP
regions was broken, causing compiler crash. Patch fixes this issue.

llvm-svn: 286098
2016-11-07 08:07:25 +00:00
Kelvin Li
4e325f77a9 Re-apply patch r279045.
llvm-svn: 285066
2016-10-25 12:50:55 +00:00
Akira Hatanaka
642f799b0d [CodeGen][ObjC] Do not call objc_storeStrong when initializing a
constexpr variable.

When compiling a constexpr NSString initialized with an objective-c
string literal, CodeGen emits objc_storeStrong on an uninitialized
alloca, which causes a crash.

This patch folds the code in EmitScalarInit into EmitStoreThroughLValue
and fixes the crash by calling objc_retain on the string instead of
using objc_storeStrong.

rdar://problem/28562009

Differential Revision: https://reviews.llvm.org/D25547

llvm-svn: 284516
2016-10-18 19:05:41 +00:00
Alexey Bataev
2f5ed34279 Fix for PR30639: CGDebugInfo Null dereference with OpenMP array
access, by Erich Keane

OpenMP creates a variable array type with a a null size-expr. The Debug
generation failed to due to this. This patch corrects the openmp
implementation, updates the tests, and adds a new one for this
condition.

Differential Revision: https://reviews.llvm.org/D25373

llvm-svn: 284110
2016-10-13 09:52:46 +00:00