171 Commits

Author SHA1 Message Date
Alexey Bataev
fd006c44ee [OPENMP] Fix emission of the __kmpc_global_thread_num.
Fixed emission of the __kmpc_global_thread_num() so that it is not
messed up with alloca instructions anymore. Plus, fixes emission of the
__kmpc_global_thread_num() functions in the target outlined regions so
that they are not called before runtime is initialized.

llvm-svn: 343856
2018-10-05 15:08:53 +00:00
Gheorghe-Teodor Bercea
8233af90e1 [OpenMP] Make default parallel for schedule in NVPTX target regions in SPMD mode achieve coalescing
Summary: Set default schedule for parallel for loops to schedule(static, 1) when using SPMD mode on the NVPTX device offloading toolchain to ensure coalescing.

Reviewers: ABataev, Hahnfeld, caomhin

Reviewed By: ABataev

Subscribers: jholewinski, guansong, cfe-commits

Differential Revision: https://reviews.llvm.org/D52629

llvm-svn: 343260
2018-09-27 20:29:00 +00:00
Gheorghe-Teodor Bercea
02650d4c2c [OpenMP] Make default distribute schedule for NVPTX target regions in SPMD mode achieve coalescing
Summary: For the OpenMP NVPTX toolchain choose a default distribute schedule that ensures coalescing on the GPU when in SPMD mode. This significantly increases the performance of offloaded target code and reduces the number of registers used on the GPU side.

Reviewers: ABataev, caomhin, Hahnfeld

Reviewed By: ABataev, Hahnfeld

Subscribers: Hahnfeld, jholewinski, guansong, cfe-commits

Differential Revision: https://reviews.llvm.org/D52434

llvm-svn: 343253
2018-09-27 19:22:56 +00:00
Alexey Bataev
e6aa4694de [OPENMP] Fix PR38903: Crash on instantiation of the non-dependent
declare reduction.

If the declare reduction construct with the non-dependent type is
defined in the template construct, the compiler might crash on the
template instantition. Reworked the whole instantiation scheme for the
declare reduction constructs to fix this problem correctly.

llvm-svn: 342151
2018-09-13 16:54:05 +00:00
Alexey Bataev
f138fda5ed [OPENMP] Fix emission of the loop doacross constructs.
The number of loops associated with the OpenMP loop constructs should
not be considered as the number loops to collapse.

llvm-svn: 339603
2018-08-13 19:04:24 +00:00
Alexey Bataev
23647171ea Revert "[OPENMP] Fix emission of the loop doacross constructs."
This reverts commit r339568 because of the problems with the buildbots.

llvm-svn: 339574
2018-08-13 14:42:18 +00:00
Alexey Bataev
0ce6360e0e [OPENMP] Fix emission of the loop doacross constructs.
The number of loops associated with the OpenMP loop constructs should
not be considered as the number loops to collapse.

llvm-svn: 339568
2018-08-13 14:05:43 +00:00
Alexey Bataev
bf8fe71b91 [OPENMP] Mark variables captured in declare target region as implicitly
declare target.

According to OpenMP 5.0, variables captured in lambdas in declare target
regions must be considered as implicitly declare target.

llvm-svn: 339152
2018-08-07 16:14:36 +00:00
Alexey Bataev
c52f01d1d7 [OPENMP] Fix checks for declare target link entries.
If the declare target link entries are created but not used, the
compiler will produce an error message. Patch improves handling of such
situations + improves checks for possibly lost declare target variables.

llvm-svn: 337207
2018-07-16 20:05:25 +00:00
Adrian Prantl
9fc8faf9e6 Remove \brief commands from doxygen comments.
This is similar to the LLVM change https://reviews.llvm.org/D46290.

We've been running doxygen with the autobrief option for a couple of
years now. This makes the \brief markers into our comments
redundant. Since they are a visual distraction and we don't want to
encourage more \brief markers in new code either, this patch removes
them all.

Patch produced by

for i in $(git grep -l '\@brief'); do perl -pi -e 's/\@brief //g' $i & done
for i in $(git grep -l '\\brief'); do perl -pi -e 's/\\brief //g' $i & done

Differential Revision: https://reviews.llvm.org/D46320

llvm-svn: 331834
2018-05-09 01:00:01 +00:00
Alexey Bataev
6d94410914 [OPENMP] Support C++ member functions in the device constructs.
Added correct emission of the C++ member functions for the device
function when they are used in the device constructs.

llvm-svn: 331365
2018-05-02 15:45:28 +00:00
Alexey Bataev
18fa2323b6 [OPENMP] Emit names of the globals depending on target.
Some symbols are not allowed to be used as names on some targets. Patch
ries to unify the emission of the names of LLVM globals so they could be
used on different targets.

llvm-svn: 331358
2018-05-02 14:20:50 +00:00
Alexey Bataev
a4fa0b880a [OPENMP] General code improvements.
llvm-svn: 330140
2018-04-16 17:59:34 +00:00
Alexander Kornienko
2a8c18d991 Fix typos in clang
Found via codespell -q 3 -I ../clang-whitelist.txt
Where whitelist consists of:

  archtype
  cas
  classs
  checkk
  compres
  definit
  frome
  iff
  inteval
  ith
  lod
  methode
  nd
  optin
  ot
  pres
  statics
  te
  thru

Patch by luzpaz! (This is a subset of D44188 that applies cleanly with a few
files that have dubious fixes reverted.)

Differential revision: https://reviews.llvm.org/D44188

llvm-svn: 329399
2018-04-06 15:14:32 +00:00
Alexey Bataev
03f270c900 [OPENMP] Added emission of offloading data sections for declare target
variables.

Added emission of the offloading data sections for the variables within
declare target regions + fixes emission of the declare target variables
marked as declare target not within the declare target region.

llvm-svn: 328888
2018-03-30 18:31:07 +00:00
Alexey Bataev
34f8a7043b [OPENMP] Codegen for ctor|dtor of declare target variables.
When the declare target variables are emitted for the device,
constructors|destructors for these variables must emitted and registered
by the runtime in the offloading sections.

llvm-svn: 328705
2018-03-28 14:28:54 +00:00
Alexey Bataev
92327c50d3 [OPENMP] Codegen for declare target with link clause.
If the link clause is used on the declare target directive, the object
should be linked on target or target data directives, not during the
codegen. Patch adds support for this clause.

llvm-svn: 328544
2018-03-26 16:40:55 +00:00
Alexey Bataev
b7f3cba84c [OPENMP, NVPTX] Emit correct thread id.
We emitted fake thread id for the outined function in NVPTX codegen.
Patch adds emission of the real thread id.

llvm-svn: 327867
2018-03-19 17:04:07 +00:00
Alexey Bataev
4f4bf7c348 [OPENMP] Codegen for omp declare target construct.
Added initial codegen for device side of declarations inside `omp
declare target` construct + codegen for implicit `declare target`
functions, which are used in the target regions.

llvm-svn: 327636
2018-03-15 15:47:20 +00:00
Gheorghe-Teodor Bercea
d3dcf2f05d [OpenMP] Add OpenMP data sharing infrastructure using global memory
Summary:
This patch handles the Clang code generation phase for the OpenMP data sharing infrastructure.

TODO: add a more detailed description.

Reviewers: ABataev, carlo.bertolli, caomhin, hfinkel, Hahnfeld

Reviewed By: ABataev

Subscribers: jholewinski, guansong, cfe-commits

Differential Revision: https://reviews.llvm.org/D43660

llvm-svn: 327513
2018-03-14 14:17:45 +00:00
Alexey Bataev
1c44e15f6d [OPENMP] Fix generation of the unique names for task reduction
variables.

If the task has reduction construct and this construct for some variable
requires unique threadprivate storage, we may generate different names
for variables used in taskgroup task_reduction clause and in task
  in_reduction clause. Patch fixes this problem.

llvm-svn: 326827
2018-03-06 18:59:43 +00:00
Alexey Bataev
7ef47a67a5 [OPENMP] Require valid SourceLocation in function call, NFC.
Removed default empty SourceLocation argument from `emitCall` function
and require valid location.

llvm-svn: 325812
2018-02-22 18:33:31 +00:00
Alexey Bataev
8451efad89 [OPENMP] Add codegen for depend clauses on target directive.
Added basic support for codegen of `depend` clauses on `target`
directive.

llvm-svn: 322501
2018-01-15 19:06:12 +00:00
Alexey Bataev
7cae94e74c [OPENMP] Add debug info for generated functions.
Most of the generated functions for the OpenMP were generated with
disabled debug info. Patch fixes this for better user experience.

llvm-svn: 321816
2018-01-04 19:45:16 +00:00
Alexey Bataev
a8a9153a37 [OPENMP] Support for -fopenmp-simd option with compilation of simd loops
only.

Added support for -fopenmp-simd option that allows compilation of
simd-based constructs without emission of OpenMP runtime calls.

llvm-svn: 321560
2017-12-29 18:07:07 +00:00
Alexey Bataev
e213f3e61a [OPENMP] Fix PR34916: Crash on mixing taskloop|tasks directives.
If both taskloop and task directives are used at the same time in one
program, we may ran into the situation when the particular type for task
directive is reused for taskloop directives. Patch fixes this problem.

llvm-svn: 315464
2017-10-11 15:29:40 +00:00
Alexey Bataev
f43f714213 [OPENMP] Fix for PR33922: New ident_t flags for
__kmpc_for_static_fini().

Added special flags for calls of __kmpc_for_static_fini(), like previous
ly for __kmpc_for_static_init(). Added flag OMP_IDENT_WORK_DISTRIBUTE
for distribute cnstruct, OMP_IDENT_WORK_SECTIONS for sections-based
  constructs and OMP_IDENT_WORK_LOOP for loop-based constructs in
  location flags.

llvm-svn: 312642
2017-09-06 16:17:35 +00:00
Alexey Bataev
0f87dbee4e [OPENMP] Fix for PR33922: New ident_t flags for
__kmpc_for_static_init().

OpenMP 5.0 will include OpenMP Tools interface that requires distinguishing different worksharing constructs.

Since the same entry point (__kmp_for_static_init(ident_t *loc,
kmp_int32 global_tid,........)) is called in case static
loop/sections/distribute it is suggested using 'flags' field of the
ident_t structure to pass the type of the construct.

llvm-svn: 310865
2017-08-14 17:56:13 +00:00
Alexey Bataev
3c595a6b2c [OPENMP] Generalization of calls of the outlined functions.
General improvement of the outlined functions calls.

llvm-svn: 310840
2017-08-14 15:01:03 +00:00
Alexey Bataev
3b8d5586ec [OPENMP][DEBUG] Set proper address space info if required by target.
Arguments, passed to the outlined function, must have correct address
space info for proper Debug info support. Patch sets global address
space for arguments that are mapped and passed by reference.

Also, cuda-gdb does not handle reference types correctly, so reference
arguments are represented as pointers.

llvm-svn: 310387
2017-08-08 18:04:06 +00:00
Alexey Bataev
4aa19052f3 Revert "[OPENMP][DEBUG] Set proper address space info if required by target."
This reverts commit r310377.

llvm-svn: 310379
2017-08-08 16:45:36 +00:00
Alexey Bataev
5a497136be [OPENMP][DEBUG] Set proper address space info if required by target.
Arguments, passed to the outlined function, must have correct address
space info for proper Debug info support. Patch sets global address
space for arguments that are mapped and passed by reference.

Also, cuda-gdb does not handle reference types correctly, so reference
arguments are represented as pointers.

llvm-svn: 310377
2017-08-08 16:29:11 +00:00
Alexey Bataev
6a824b9a45 Revert "[OPENMP][DEBUG] Set proper address space info if required by target."
This reverts commit r310360.

llvm-svn: 310364
2017-08-08 14:44:43 +00:00
Alexey Bataev
59b81e51d3 [OPENMP][DEBUG] Set proper address space info if required by target.
Arguments, passed to the outlined function, must have correct address
space info for proper Debug info support. Patch sets global address
space for arguments that are mapped and passed by reference.

Also, cuda-gdb does not handle reference types correctly, so reference
arguments are represented as pointers.

llvm-svn: 310360
2017-08-08 14:25:14 +00:00
Alexey Bataev
d90ec748a8 Revert "[OPENMP][DEBUG] Set proper address space info if required by target."
This reverts commit r310104.

llvm-svn: 310135
2017-08-04 21:27:11 +00:00
Alexey Bataev
be83fad57e [OPENMP][DEBUG] Set proper address space info if required by target.
Arguments, passed to the outlined function, must have correct address
space info for proper Debug info support. Patch sets global address
space for arguments that are mapped and passed by reference.

Also, cuda-gdb does not handle reference types correctly, so reference
arguments are represented as pointers.

llvm-svn: 310104
2017-08-04 19:46:10 +00:00
Alexey Bataev
2c7eee5b84 [OPENMP] Unify generation of outlined function calls.
llvm-svn: 310098
2017-08-04 19:10:54 +00:00
Alexey Bataev
be5a8b42cd [OPENMP] Codegen for reduction clauses in 'taskloop' directives.
Adds codegen for taskloop-based directives.

llvm-svn: 308174
2017-07-17 13:30:36 +00:00
Simon Pilgrim
6c0eeffe71 Fix spelling mistakes in comments. NFCI.
llvm-svn: 307932
2017-07-13 17:34:44 +00:00
Simon Pilgrim
f87eb880dd Fix -Wdocumentation warning. NFCI
llvm-svn: 307931
2017-07-13 17:29:48 +00:00
Alexey Bataev
5c40bec5eb [OPENMP] Generalization of codegen for reduction clauses.
Reworked codegen for reduction clauses for future support of reductions
in task-based directives.

llvm-svn: 307910
2017-07-13 13:36:14 +00:00
Carlo Bertolli
b0ff0a69c3 Recommit of
[OpenMP] Initial implementation of code generation for pragma 'distribute parallel for' on host

https://reviews.llvm.org/D29508

This patch makes the following additions:

It abstracts away loop bound generation code from procedures associated with pragma 'for' and loops in general, in such a way that the same procedures can be used for 'distribute parallel for' without the need for a full re-implementation.
It implements code generation for 'distribute parallel for' and adds regression tests. It includes tests for clauses.
It is important to notice that most of the clauses are implemented as part of existing procedures. For instance, firstprivate is already implemented for 'distribute' and 'for' as separate pragmas. As the implementation of 'distribute parallel for' is based on the same procedures, then we automatically obtain implementation for such clauses without the need to add new code. However, this requires regression tests that verify correctness of produced code.

llvm-svn: 301340
2017-04-25 17:52:12 +00:00
Carlo Bertolli
f09daae75d Revert r301223
llvm-svn: 301233
2017-04-24 19:50:35 +00:00
Carlo Bertolli
4287d65c10 [OpenMP] Initial implementation of code generation for pragma 'distribute parallel for' on host
https://reviews.llvm.org/D29508

This patch makes the following additions:

1. It abstracts away loop bound generation code from procedures associated with pragma 'for' and loops in general, in such a way that the same procedures can be used for 'distribute parallel for' without the need for a full re-implementation.
2. It implements code generation for 'distribute parallel for' and adds regression tests. It includes tests for clauses.

It is important to notice that most of the clauses are implemented as part of existing procedures. For instance, firstprivate is already implemented for 'distribute' and 'for' as separate pragmas. As the implementation of 'distribute parallel for' is based on the same procedures, then we automatically obtain implementation for such clauses without the need to add new code. However, this requires regression tests that verify correctness of produced code.

Looking forward to comments.

llvm-svn: 301223
2017-04-24 19:26:11 +00:00
Simon Pilgrim
2c51880a82 Spelling mistakes in comments. NFCI. (PR27635)
llvm-svn: 299083
2017-03-30 14:13:19 +00:00
Arpith Chacko Jacob
101e8fb1f3 [OpenMP] Parallel reduction on the NVPTX device.
This patch implements codegen for the reduction clause on
any parallel construct for elementary data types.  An efficient
implementation requires hierarchical reduction within a
warp and a threadblock.  It is complicated by the fact that
variables declared in the stack of a CUDA thread cannot be
shared with other threads.

The patch creates a struct to hold reduction variables and
a number of helper functions.  The OpenMP runtime on the GPU
implements reduction algorithms that uses these helper
functions to perform reductions within a team.  Variables are
shared between CUDA threads using shuffle intrinsics.

An implementation of reductions on the NVPTX device is
substantially different to that of CPUs.  However, this patch
is written so that there are minimal changes to the rest of
OpenMP codegen.

The implemented design allows the compiler and runtime to be
decoupled, i.e., the runtime does not need to know of the
reduction operation(s), the type of the reduction variable(s),
or the number of reductions.  The design also allows reuse of
host codegen, with appropriate specialization for the NVPTX
device.

While the patch does introduce a number of abstractions, the
expected use case calls for inlining of the GPU OpenMP runtime.
After inlining and optimizations in LLVM, these abstractions
are unwound and performance of OpenMP reductions is comparable
to CUDA-canonical code.

Patch by Tian Jin in collaboration with Arpith Jacob

Reviewers: ABataev
Differential Revision: https://reviews.llvm.org/D29758

llvm-svn: 295333
2017-02-16 16:20:16 +00:00
Arpith Chacko Jacob
bd6344c0be Revert r295319 while investigating buildbot failure.
llvm-svn: 295323
2017-02-16 14:25:35 +00:00
Arpith Chacko Jacob
8e170fc857 [OpenMP] Parallel reduction on the NVPTX device.
This patch implements codegen for the reduction clause on
any parallel construct for elementary data types.  An efficient
implementation requires hierarchical reduction within a
warp and a threadblock.  It is complicated by the fact that
variables declared in the stack of a CUDA thread cannot be
shared with other threads.

The patch creates a struct to hold reduction variables and
a number of helper functions.  The OpenMP runtime on the GPU
implements reduction algorithms that uses these helper
functions to perform reductions within a team.  Variables are
shared between CUDA threads using shuffle intrinsics.

An implementation of reductions on the NVPTX device is
substantially different to that of CPUs.  However, this patch
is written so that there are minimal changes to the rest of
OpenMP codegen.

The implemented design allows the compiler and runtime to be
decoupled, i.e., the runtime does not need to know of the
reduction operation(s), the type of the reduction variable(s),
or the number of reductions.  The design also allows reuse of
host codegen, with appropriate specialization for the NVPTX
device.

While the patch does introduce a number of abstractions, the
expected use case calls for inlining of the GPU OpenMP runtime.
After inlining and optimizations in LLVM, these abstractions
are unwound and performance of OpenMP reductions is comparable
to CUDA-canonical code.

Patch by Tian Jin in collaboration with Arpith Jacob

Reviewers: ABataev
Differential Revision: https://reviews.llvm.org/D29758

llvm-svn: 295319
2017-02-16 14:03:36 +00:00
Arpith Chacko Jacob
19b911cb75 [OpenMP] Codegen support for 'target parallel' on the host.
This patch adds support for codegen of 'target parallel' on the host.
It is also the first combined directive that requires two or more
captured statements.  Support for this functionality is included in
the patch.

A combined directive such as 'target parallel' has two captured
statements, one for the 'target' and the other for the 'parallel'
region.  Two captured statements are required because each has
different implicit parameters (see SemaOpenMP.cpp).  For example,
the 'parallel' has 'global_tid' and 'bound_tid' while the 'target'
does not.  The patch adds support for handling multiple captured
statements based on the combined directive.

When codegen'ing the 'target parallel' directive, the 'target'
outlined function is created using the outer captured statement
and the 'parallel' outlined function is created using the inner
captured statement.

Reviewers: ABataev
Differential Revision: https://reviews.llvm.org/D28753

llvm-svn: 292419
2017-01-18 18:18:53 +00:00
Arpith Chacko Jacob
42793e000a Revert r292374 to debug Windows buildbot failure.
llvm-svn: 292400
2017-01-18 15:36:05 +00:00