* Set `__cpp_auto_cast`, as per
https://github.com/cplusplus/CWG/issues/281
* Support `__has_extension(cxx_generalized_nttp)` in C++20 as the
feature isn't stable enough for a feature test macro
* Support `__has_extension(cxx_explicit_this_parameter)` in c++23 as the
feature isn't stable enough for a feature test macro
This PR exposes four PGO functions
- `__llvm_profile_set_filename`
- `__llvm_profile_reset_counters`,
- `__llvm_profile_dump`
- `__llvm_orderfile_dump`
to user programs through the new header `instr_prof_interface.h` under
`compiler-rt/include/profile`. This way, the user can include the header
`profile/instr_prof_interface.h` to introduce these four names to their
programs.
Additionally, this PR defines macro `__LLVM_INSTR_PROFILE_GENERATE` when
the program is compiled with profile generation, and defines macro
`__LLVM_INSTR_PROFILE_USE` when the program is compiled with profile
use. `__LLVM_INSTR_PROFILE_GENERATE` together with
`instr_prof_interface.h` define the PGO functions only when the program
is compiled with profile generation. When profile generation is off,
these PGO functions are defined away and leave no trace in the user's
program.
Background:
https://discourse.llvm.org/t/pgo-are-the-llvm-profile-functions-stable-c-apis-across-llvm-releases/75832
Add an extension feature `define-target-os-macros` that enables clang to
provide definitions of common TARGET_OS_* conditional macros. The
extension is enabled in the Darwin toolchain driver.
Summary:
The standard GNU atomic operations are a very common way to target
hardware atomics on the device. With more heterogenous devices being
introduced, the concept of memory scopes has been in the LLVM language
for awhile via the `syncscope` modifier. For targets, such as the GPU,
this can change code generation depending on whether or not we only need
to be consistent with the memory ordering with the entire system, the
single GPU device, or lower.
Previously these scopes were only exported via the `opencl` and `hip`
variants of these functions. However, this made it difficult to use
outside of those languages and the semantics were different from the
standard GNU versions. This patch introduces a `__scoped_atomic` variant
for the common functions. There was some discussion over whether or not
these should be overloads of the existing ones, or simply new variants.
I leant towards new variants to be less disruptive.
The scope here can be one of the following
```
__MEMORY_SCOPE_SYSTEM // All devices and systems
__MEMORY_SCOPE_DEVICE // Just this device
__MEMORY_SCOPE_WRKGRP // A 'work-group' AKA CUDA block
__MEMORY_SCOPE_WVFRNT // A 'wavefront' AKA CUDA warp
__MEMORY_SCOPE_SINGLE // A single thread.
```
Naming consistency was attempted, but it is difficult to capture to full
spectrum with no many names. Suggestions appreciated.
Initial commits to support OpenACC. This patchset:
adds a clang-command line argument '-fopenacc', and starts
to define _OPENACC, albeit to '1' instead of the standardized
value (since we don't properly implement OpenACC yet).
The OpenACC spec defines `_OPENACC` to be equal to the latest standard
implemented. However, since we're not done implementing any standard,
we've defined this by default to be `1`. As it is useful to run our
compiler against existing OpenACC workloads, we're providing a
temporary override flag to change the `_OPENACC` value to be any
entirely digit value, permitting testing against any existing OpenACC
project.
Exactly like the OpenMP parser, the OpenACC pragma parser needs to
consume and reprocess the tokens. This patch sets up the infrastructure
to do so by refactoring the OpenMP version of this into a more general
version that works for OpenACC as well.
Additionally, this adds a few diagnostics and token kinds to get us
started.
This patch adds the Driver changes needed for enabling HIP parallel algorithm offload on AMDGPU targets. This change merely adds two macros to inform user space if we are compiling in `hipstdpar` mode and, respectively, if the optional allocation interposition mode has been requested, as well as associated minimal tests. The macros can be used by the runtime implementation of offload to drive conditional compilation, and are only defined if the HIP language has been enabled.
Reviewed by: yaxunl
Differential Revision: https://reviews.llvm.org/D155826
This does the rename for most internal uses of C2x, but does not rename
or reword diagnostics (those will be done in a follow-up).
I also updated standards references and citations to the final wording
in the standard.
C2x was finalized at the June 2023 WG14 meeting. The DIS is out for
balloting and the comment period for that closes later this year/early
next year. While that does leave an opportunity for more changes to the
standard during the DIS ballot resolution process, only editorial
changes are anticipated and as a result, the C committee considers C2x
to be final. The committee took a straw poll on what we'd prefer the
informal name of the standard be, and we decided it should be called
C23 regardless of what year ISO publishes it.
However, because the final publication is not out, this patch does not
add the language standard alias for the -std=iso9899:<year> spelling of
the standard mode; that will be added once ISO finally publishes the
document and the year chosen will match the publication date.
This also changes the value of __STDC_VERSION__ from the placeholder
value 202000L to the final value 202311L.
Subsequent patches will start renaming things from c2x to c23, cleaning
up documentation, etc.
Differential Revision: https://reviews.llvm.org/D157606
This is a C++ feature that allows the use of `_` to
declare multiple variable of that name in the same scope;
these variables can then not be referred to.
In addition, while P2169 does not extend to parameter
declarations, we stop warning on unused parameters of that name,
for consistency.
The feature is backported to all C++ language modes.
Reviewed By: #clang-language-wg, aaron.ballman
Differential Revision: https://reviews.llvm.org/D153536
Rename -fcuda-approx-transcendentals as
-fgpu-approx-transcendentals and pass it
to both device and host clang -cc1.
Fix its interaction with -ffast-math to allow
-fno-gpu-approx-transcendentals to override
the implicit -fcuda-approx-transcendentals
due to -ffast-math.
Rename the predefined macro to be
__CLANG_GPU_APPROX_TRANSCENDENTALS__.
Emit the macro for both device and host compilation.
Reviewed by: Artem Belevich, Fangrui Song
Differential Revision: https://reviews.llvm.org/D154797
Rename HIP_API_PER_THREAD_DEFAULT_STREAM and __HIP_NO_IMAGE_SUPPORT
so that they follow the convention with prefix and postfix __.
Reviewed by: Artem Belevich
Differential Revision: https://reviews.llvm.org/D155480
I'm using clang to compile CUDA code. And just found that clang doesn't support the per-thread stream option for NV CUDA. I don't know if there is another solution.
Reviewed By: tra
Differential Revision: https://reviews.llvm.org/D154822
This patch renames the `OpenMPIRBuilderConfig` flags to reduce confusion over
their meaning. `IsTargetCodegen` becomes `IsGPU`, whereas `IsEmbedded` becomes
`IsTargetDevice`. The `-fopenmp-is-device` compiler option is also renamed to
`-fopenmp-is-target-device` and the `omp.is_device` MLIR attribute is renamed
to `omp.is_target_device`. Getters and setters of all these renamed properties
are also updated accordingly. Many unit tests have been updated to use the new
names, but an alias for the `-fopenmp-is-device` option is created so that
external programs do not stop working after the name change.
`IsGPU` is set when the target triple is AMDGCN or NVIDIA PTX, and it is only
valid if `IsTargetDevice` is specified as well. `IsTargetDevice` is set by the
`-fopenmp-is-target-device` compiler frontend option, which is only added to
the OpenMP device invocation for offloading-enabled programs.
Differential Revision: https://reviews.llvm.org/D154591
A new builtin function __builtin_isfpclass is added. It is called as:
__builtin_isfpclass(<floating point value>, <test>)
and returns an integer value, which is non-zero if the floating point
argument falls into one of the classes specified by the second argument,
and zero otherwise. The set of classes is an integer value, where each
value class is represented by a bit. There are ten data classes, as
defined by the IEEE-754 standard, they are represented by bits:
0x0001 (__FPCLASS_SNAN) - Signaling NaN
0x0002 (__FPCLASS_QNAN) - Quiet NaN
0x0004 (__FPCLASS_NEGINF) - Negative infinity
0x0008 (__FPCLASS_NEGNORMAL) - Negative normal
0x0010 (__FPCLASS_NEGSUBNORMAL) - Negative subnormal
0x0020 (__FPCLASS_NEGZERO) - Negative zero
0x0040 (__FPCLASS_POSZERO) - Positive zero
0x0080 (__FPCLASS_POSSUBNORMAL) - Positive subnormal
0x0100 (__FPCLASS_POSNORMAL) - Positive normal
0x0200 (__FPCLASS_POSINF) - Positive infinity
They have corresponding builtin macros to facilitate using the builtin
function:
if (__builtin_isfpclass(x, __FPCLASS_NEGZERO | __FPCLASS_POSZERO) {
// x is any zero.
}
The data class encoding is identical to that used in llvm.is.fpclass
function.
Differential Revision: https://reviews.llvm.org/D152351
The default version of OpenMP is updated from 5.0 to 5.1 which means if -fopenmp is specified but -fopenmp-version is not specified with clang, the default version of OpenMP is taken to be 5.1. After modifying the Frontend for that, various LIT tests were updated. This patch contains all such changes. At a high level, these are the patterns of changes observed in LIT tests -
# RUN lines which mentioned `-fopenmp-version=50` need to kept only if the IR for version 5.0 and 5.1 are different. Otherwise only one RUN line with no version info(i.e. default version) needs to be there.
# Test cases of this sort already had the RUN lines with respect to the older default version 5.0 and the version 5.1. Only swapping the version specification flag `-fopenmp-version` from newer version RUN line to older version RUN line is required.
# Diagnostics: Remove the 5.0 version specific RUN lines if there was no difference in the Diagnostics messages with respect to the default 5.1.
# Diagnostics: In case there was any difference in diagnostics messages between 5.0 and 5.1, mention version specific messages in tests.
# If the test contained version specific ifdef's e.g. "#ifdef OMP5" but there were no RUN lines for any other version than 5.X, then bring the code guarded by ifdef's outside and remove the ifdef's.
# Some tests had RUN lines for both 5.0 and 5.1 versions, but it is found that the IR for 5.0 is not different from the 5.1, therefore such RUN lines are redundant. So, such duplicated lines are removed.
# To generate CHECK lines automatically, use the script llvm/utils/update_cc_test_checks.py
Reviewed By: saiislam, ABataev
Differential Revision: https://reviews.llvm.org/D129635
(cherry picked from commit 9dd2999907dc791136a75238a6000f69bf67cf4e)
HIP texture/image support is optional as some devices
do not have image instructions. A macro __HIP_NO_IMAGE_SUPPORT
is defined for device not supporting images (d0448aa4c4/docs/reference/kernel_language.md (L426) )
Currently the macro is defined by HIP header based on predefined macros
for GPU, e.g __gfx*__ , which is error prone. This patch let clang
emit the predefined macro.
Reviewed by: Matt Arsenault, Artem Belevich
Differential Revision: https://reviews.llvm.org/D151349
Fix several instances of macros being defined multiple times
in several targets. Most of these are just simple duplication in a
TargetInfo or OSTargetInfo of things already defined in
InitializePredefinedMacros or InitializeStandardPredefinedMacros,
but there are a few that aren't:
* AArch64 defines a couple of feature macros for armv8.1a that are
handled generically by getTargetDefines.
* CSKY needs to take care when CPUName and ArchName are the same.
* Many os/target combinations result in __ELF__ being defined twice.
Instead define __ELF__ just once in InitPreprocessor based on
the Triple, which already knows what the object format is based
on os and target.
These changes shouldn't change the final result of which macros are
defined, with the exception of the changes to __ELF__ where if you
explicitly specify the object type in the triple then this affects
if __ELF__ is defined, e.g. --target=i686-windows-elf results in it
being defined where it wasn't before, but this is more accurate as an
ELF file is in fact generated.
Differential Revision: https://reviews.llvm.org/D150966
During the ISO C++ Committee meeting plenary session the C++23 Standard
has been voted as technical complete.
This updates the reference to c++2b to c++23 and updates the __cplusplus
macro.
Drive-by fixes c++1z -> c++17 and c++2a -> c++20 when seen.
Reviewed By: aaron.ballman
Differential Revision: https://reviews.llvm.org/D149553
The feature macro '__cpp_coroutines' is for coroutines TS. And the
coroutines TS is deprecated. So we should remove the feature macro too.
BTW, the corresponding feature macro for standard c++ coroutines is
'__cpp_impl_coroutine'.
GNU line marker directives are not recognised when preprocessing
assembly files, meaning they can't be used in the predefines file
meaning macros defined on the command line are reported as being
built-in.
Change this to permit line markers but only in the predefines file,
so we can correctly report command line macros as coming from the
command line.
Differential Revision: https://reviews.llvm.org/D145397
This commit relands the patches for implementing P0960R3 and P1975R0,
which describe initializing aggregates via a parenthesized list.
The relanded commits are:
* 40c52159d3ee - P0960R3 and P1975R0: Allow initializing aggregates from
a parenthesized list of values
* c77a91bb7ba7 - Remove overly restrictive aggregate paren init logic
* 32d7aae04fdb - Fix a clang crash on invalid code in C++20 mode
This patch also fixes a crash in the original implementation.
Previously, if the input tried to call an implicitly deleted copy or
move constructor of a union, we would then try to initialize the union
by initializing it's first element with a reference to a union. This
behavior is incorrect (we should fail to initialize) and if the type of
the first element has a constructor with a single template typename
parameter, then Clang will explode. This patch fixes that issue by
checking that constructor overload resolution did not result in a
deleted function before attempting parenthesized aggregate
initialization.
Additionally, this patch also includes D140159, which contains some
minor fixes made in response to code review comments in the original
implementation that were made after that patch was submitted.
Co-authored-by: Sheng <ox59616e@gmail.com>
Fixes#54040, Fixes#59675
Reviewed By: ilya-biryukov
Differential Revision: https://reviews.llvm.org/D141546
This feature causes clang to crash when compiling Chrome - see
https://crbug.com/1405031 and
https://github.com/llvm/llvm-project/issues/59675
Revert "[clang] Fix a clang crash on invalid code in C++20 mode."
This reverts commit 32d7aae04fdb58e65a952f281ff2f2c3f396d98f.
Revert "[clang] Remove overly restrictive aggregate paren init logic"
This reverts commit c77a91bb7ba793ec3a6a5da3743ed55056291658.
Revert "[clang][C++20] P0960R3 and P1975R0: Allow initializing aggregates from a parenthesized list of values"
This reverts commit 40c52159d3ee337dbed14e4c73b5616ea354c337.
This patch implements P0960R3, which allows initialization of aggregates
via parentheses.
As an example:
```
struct S { int i, j; };
S s1(1, 1);
int arr1[2](1, 2);
```
This patch also implements P1975R0, which fixes the wording of P0960R3
for single-argument parenthesized lists so that statements like the
following are allowed:
```
S s2(1);
S s3 = static_cast<S>(1);
S s4 = (S)1;
int (&&arr2)[] = static_cast<int[]>(1);
int (&&arr3)[2] = static_cast<int[2]>(1);
```
This patch was originally authored by @0x59616e and completed by
@ayzhao.
Fixes#54040, Fixes#54041
Co-authored-by: Sheng <ox59616e@gmail.com>
Full write up : https://discourse.llvm.org/t/c-20-rfc-suggestion-desired-regarding-the-implementation-of-p0960r3/63744
Reviewed By: ilya-biryukov
Differential Revision: https://reviews.llvm.org/D129531
Mixing LLVM and Clang address spaces can result in subtle bugs, and there
is no need for this hook to use the LLVM IR level address spaces.
Most of this change is just replacing zero with LangAS::Default,
but it also allows us to remove a few calls to getTargetAddressSpace().
This also removes a stale comment+workaround in
CGDebugInfo::CreatePointerLikeType(): ASTContext::getTypeSize() does
return the expected size for ReferenceType (and handles address spaces).
Differential Revision: https://reviews.llvm.org/D138295
After accepted in Kona, update the code to accept static operator[] as well.
No big code changes: accept this operator as static in SemaDeclCXX, update AST call generation in SemaOverload and update feature macros + tests accordingly.
Reviewed By: cor3ntin, erichkeane, #clang-language-wg
Differential Revision: https://reviews.llvm.org/D138387
This implement the C++23 paper P2647R1 (adopted in Kona)
Reviewed By: #clang-language-wg, erichkeane
Differential Revision: https://reviews.llvm.org/D138851
I noticed that the values for __{CLANG,GCC}_ATOMIC_POINTER_LOCK_FREE were
incorrectly set to 1 instead of two in downstream CHERI targets because
pointers are handled specially there. While fixing this downstream, I
noticed that the existing code could be refactored to use
TargetInfo::hasBuiltinAtomic instead of repeating the almost identical
logic. In theory there could be a difference here since hasBuiltinAtomic() also
returns true for types less than 1 char in size, but since
InitializePredefinedMacros() never passes such a value this change should
not introduce any functional changes.
Reviewed By: rprichard, efriedma
Differential Revision: https://reviews.llvm.org/D135142
Implement P2513
This change allows initializing an array of unsigned char,
or char from u8 string literals.
This was done both to support legacy code and for compatibility
with C where char8_t will be typedef to unsigned char.
This is backported to C++20 as per WG21 guidance.
Reviewed By: aaron.ballman
Differential Revision: https://reviews.llvm.org/D136449
We had a bunch of places in the code where we were translating triple
environment enum cases to shader stage enum cases. The order of these
enums needs to be kept in sync for the translation to be simple, but we
were not properly handling out-of-bounds cases.
In normal compilation out-of-bounds cases shouldn't be possible because
the driver errors if you don't have a valid shader environment set, but
in clang tooling that error doesn't get treated as fatal and parsing
continues. This can result in crashes in clang tooling for out-of-range
shader stages.
To address this, this patch adds a constexpr method to handle the
conversion which handles out-of-range values by converting them to
`Invalid`.
Since this new method is a constexpr, the tests for this are a group of
static_asserts in the implementation file that verifies the correct
conversion for each valid enum case and validates that other cases are
converted to `Invalid`.
Reviewed By: bogner
Differential Revision: https://reviews.llvm.org/D135595
Define the feature test macro for named character escapes.
I assume this was not done because it was implemented before formally accepted, right? cxx_status says the paper is implemented.
Reviewed By: cor3ntin
Differential Revision: https://reviews.llvm.org/D134898
This patch implements P0848 in Clang.
During the instantiation of a C++ class, in `Sema::ActOnFields`, we evaluate constraints for all the SMFs and compare the constraints to compute the eligibility. We defer the computation of the type's [copy-]trivial bits from addedMember to the eligibility computation, like we did for destructors in D126194. `canPassInRegisters` is modified as well to better respect the ineligibility of functions.
Note: Because of the non-implementation of DR1734 and DR1496, I treat deleted member functions as 'eligible' for the purpose of [copy-]triviallity. This is unfortunate, but I couldn't think of a way to make this make sense otherwise.
Reviewed By: #clang-language-wg, cor3ntin, aaron.ballman
Differential Revision: https://reviews.llvm.org/D128619
This patch implements P0848 in Clang.
During the instantiation of a C++ class, in `Sema::ActOnFields`, we evaluate constraints for all the SMFs and compare the constraints to compute the eligibility. We defer the computation of the type's [copy-]trivial bits from addedMember to the eligibility computation, like we did for destructors in D126194. `canPassInRegisters` is modified as well to better respect the ineligibility of functions.
Note: Because of the non-implementation of DR1734 and DR1496, I treat deleted member functions as 'eligible' for the purpose of [copy-]triviallity. This is unfortunate, but I couldn't think of a way to make this make sense otherwise.
Reviewed By: #clang-language-wg, cor3ntin, aaron.ballman
Differential Revision: https://reviews.llvm.org/D128619
Correct the logic used to set `ATOMIC_*_LOCK_FREE` preprocessor macros not
to rely on the ABI alignment of types. Instead, just assume all those
types are aligned correctly by default since clang uses safe alignment
for `_Atomic` types even if the underlying types are aligned to a lower
boundary by default.
For example, the `long long` and `double` types on x86 are aligned to
32-bit boundary by default. However, `_Atomic long long` and `_Atomic
double` are aligned to 64-bit boundary, therefore satisfying
the requirements of lock-free atomic operations.
This fixes PR #19355 by correcting the value of
`__GCC_ATOMIC_LLONG_LOCK_FREE` on x86, and therefore also fixing
the assumption made in libc++ tests. This also fixes PR #30581 by
applying a consistent logic between the functions used to implement
both interfaces.
Reviewed By: hfinkel, efriedma
Differential Revision: https://reviews.llvm.org/D28213