Summary:
I neglected the fact that `activemask` is a 6.2 or 6.3 feature, so
building this on older machines is incorrect. Bump this up to 6.3 for
now so it works. In the future we will try to get rid of the N
architecture business.
Usage of uninitialized memory is a top memory safety issue in C++ codebases.
Help mitigate this somewhat by default initialize stack allocations to a
pattern (0xAA repeating).
Clang has received optimizations to sink these into control flow paths that
access such values to minimize the overhead of these added initializations.
If there's a measurable slowdown, we can add
-ftrivial-auto-var-init-max-size=<N> for some value N bytes if we have any
large stack allocations, or add attribute uninitialized to any variable
declarations.
Unsupported until GCC 12.1 / Clang 8.
Increases file size of libc.a from a full build by +8.79Ki (+0.2%).
This reverts commit 6886a52d6dbefff77f33de12ff85d654e2557f81.
Most of the errors observed in postsubmit have been addressed. We can
fix-forward the remaining ones.
Link: https://lab.llvm.org/buildbot/#/changes/117129
This allows individual object files to override the common compile
commands in
their local CMakeLists' add_object_library call.
For example, the common compile commands contain -Wall and -Wextra.
Before
this patch, the per object COMPILE_OPTIONS were prepended to these, so
that
builds of individual object files could not individually disable
specific
diagnostics from those groups explicitly.
After this patch, the per-object file compile objects are appended to
the list
of compiler flags, enabling this use case.
ARGN is a bit of cmake magic; let's be explicit in the APPEND that we're
appending the compile options.
Link: #77007
This reverts commit 606653091d1a66d1a83a1bfdea2883cc8d46687e.
Post submit buildbots are now red. We can use these explicit errors to better
clean up existing warnings, then reland this.
Link: #73966
A recent commit introduced warnings observable when building unit tests.
If the
unit tests don't fail when warnings are introduced into the build, then
we
might fail to notice them in the stream of output from check-libc.
Link: https://github.com/llvm/llvm-project/pull/72763/files#r1410932348
There are some basic vectorization features in standard architecture
specifications. Such as SSE/SSE2 for x86-64, or NEON for aarch64. Even
though such features are almost always available, we still need some
methods to test fallback routines without any vectorization.
Previous attempt in hsearch adds a DISABLE_SSE2_OPT flag that tries to
compile the code with -mno-sse2 in order to test specific table scanning
routines. However, it turns out that such flag may have some unwanted
side effects hindering portability.
This PR introduces PREFER_GENERIC as an alternative. When a target is
built with PREFER_GENERIC, cmake will define a macro
__LIBC_PREFER_GENERIC such that developers can selectively choose the
fallback routine based on the macro.
This patch implements `hcreate(_r)/hsearch(_r)/hdestroy(_r)` as
specified in https://man7.org/linux/man-pages/man3/hsearch.3.html.
Notice that `neon/asimd` extension is not yet added in this patch.
- The implementation is largely simplified from rust's
[`hashbrown`](https://github.com/rust-lang/hashbrown/blob/master/src/raw/mod.rs)
as we only consider fix-sized insertion-only hashtables. Technical
details are provided in code comments.
- This patch also contains a portable string hash function, which is
derived from [`aHash`](https://github.com/tkaitchuck/aHash)'s fallback
routine. Not using any SIMD acceleration, it has a good enough quality
(passing all SMHasher tests) and is not too bad in speed.
- Some general functionalities are added, such as `memory_size`,
`offset_to`(alignment), `next_power_of_two`, `is_power_of_two`.
`ctz/clz` are extended to support shorter integers.
Summray:
A recent patch upgrades the NVPTX ctor / dtor lowering pass to emit
kernels so other languages can call them. We do this manually in `libc`
so we do not need this. Use the provided flag to disable this step to
keep the created kernels cleaner.
Summary:
This patch simply adds the `-fconvergent-functions` flag to the GPU
compilation. This is in relation to the behaviour of SIMT
architectures under divergence. With the flag, we assume every function
is convergent by default and rely on the compiler's divergence analysis
to transform it if possible.
Fixes: https://github.com/llvm/llvm-project/issues/63853
printf_core.parser is not yet updated to use the printf config options. It
does not use them currently anyway and the corresponding parser_test
should be updated to respect the config options.
We want to activate `llvm-header-guard` (#66477) but the current CMake
configuration includes paths that should be `isystem`. This PR restricts
the number of `-I` passed to the clang command line and correctly marks
the llvm libc include path as `isystem`.
Summary:
The GPU build has a lot of magic around how we package the output.
Generally, the GPU needs to exist as a secondary fatbinary image for
offloading languages. This is because offloading languages pretend like
offloading to an accelerator is a single file. This then needs to be put
into a single file to make it mesh with the existing build
infrastructure. To work with this, the `libc` makes an installed version
of the library that simply embeds the GPU code into an empty stub file.
This wasn't being updated correctly, which lead to the installed `libc`
static library not being updated correctly when the underlying file was
changed. The previous behaviour only updated when the entrypoint itself
was modified, but not any of its headers. By adding a dependcy on the
actual *object* file we should now capture the regular CMake semantics.
This is part of a libc wide CMake cleanup which aims to eliminate
certain explicitly duplicated logic which is available in CMake-3.20.
This change in particular makes the entrypoint aliases real library
targets so that they can be treated as normal library targets by other
libc build rules.
Summary:
There is currently effort to change over the default AMDGPU code object
version https://github.com/llvm/llvm-project/pull/65410. However, this
unfortunately causes problems in the LLVM LibC test suite that leads to
a hang while executing. This is most likely a bug to do with indirect
call optimization, as it can be avoided without optimizations or with
manually preventing inlining in the AMDGPU startup code.
This patch sets the AMDGPU code object version to be four explicitly on
the LibC test suite. This should unblock the efforts to move the default
to 5 without breaking the test suite. This isn't a great solution, but
there is currently some time pressure to get COV5 landed and this seems
to be the easiest solution.
Summary:
AMDGPU binaries use a "code object" as the ABI indicator. We are
currently trying to move over to a newer code object. We want these
library functions to use the "generic" or default ABI such that it is
specified when linked into the user application. Currently this will
default to v4 as the startup code will use whatever the current default
is.
It's very important that the GPU build does not include any system
directories. We currently use `-ffreestanding` to disable a lot of these
features, but we can still accidentally include them if they are not
provided by `libc` yet. This patch changes this to use `-nostdinc` to
disable all standard search paths. Then we use the compiler's resource
directory to pick up the provided headers like `stdint.h`.
Differential Revision: https://reviews.llvm.org/D159445
We currently remap vendor implementations of math functions to provide a
temporarily functional `libm.a` for the GPU. However, we should not run
tests on any files that depend on these vendor implementations as they
are not under our control and are not always present.
The goal in the future is to remove the need for this by replacing all
the vendor functionality, but for now this is a workaround.
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D158213
This more closely matches the stricter warnings used for
this same code in the Fuchsia build.
Reviewed By: michaelrj
Differential Revision: https://reviews.llvm.org/D156630
Clang supports the `-Wglobal-constructors` flag which will indicate if a
global constructor is being used. The current goal in `libc` is to make
the constructors `constexpr` to prevent this from happening with
straight construction. However, there are many other cases where we can
emit a constructor that this won't catch. This should give warning if
someone accidentally introduces a global constructor.
Reviewed By: michaelrj
Differential Revision: https://reviews.llvm.org/D155721
CUDA requires a PTX feature to be compiled generally, because the
`libcgpu.a` archive contains LLVM-IR we need to have one present to
compile it. Currently, the wrapper fatbinary format we use to
incorporate these into single-source offloading languages has a special
option to provide this. Since this was not present in the builds, if the
user did not specify it via `-foffload-lto` it would not compile from
CUDA or OpenMP due to the missing PTX features. Fix this by passing it
to the packager invocation.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D154864
D152592 introduced LIBC_INCLUDE_DIR for the location of the include
directory, use it in relevant CMake rules.
Differential Revision: https://reviews.llvm.org/D154278
D152592 introduced LIBC_INCLUDE_DIR for the location of the include
directory, use it in relevant CMake rules.
Differential Revision: https://reviews.llvm.org/D154278
D152592 introduced LIBC_INCLUDE_DIR for the location of the include
directory, use it in relevant CMake rules.
Differential Revision: https://reviews.llvm.org/D154278
We need to perform the GPU build separately. The `CXX_STANDARD` option
was not being passed properly.
Reviewed By: sivachandra
Differential Revision: https://reviews.llvm.org/D149225
Summary:
We generally need a CUDA toolchain to build the tests for the GPU `libc`
targeting NVPTX. However, clang will commonly emit warnings on versions
that are too new. We can ignore these in `libc` since we are manually
specifying the `+ptx` version to use whenever we compile. So we do not
need to worry about unexpected changes and we do not depend on any newer
features. So this should not be problematic.
Summary:
The sm_60 GPU is the oldest model that's supported for using the RPC
features of the `libc` GPU runtime. This also requires at least `+ptx63`
to enable use of the active mask. So, this patch sets that as the
minimum.
Summary:
The `+ptx` features correspond to the related CUDA version. We require a
certain set of features from the `ptxas` assembler, which is tied to the
CUDA version. Some of the ones set here were insufficient, so I am
simply setting a cutoff to the CUDA 9.0 release as the minimum. This
roughly corresponds to what should be required for sm_60 to be compiled
with the source.
Summary:
This variable was not being reset, which caused the options to be
compounded when building multiple architectures. This was very
problematic as the architectures are not compatible.
The NVIDIA compilation path requires some special options. This is
mostly because compilation is dependent on having a valid CUDA
toolchain. We don't actually need the CUDA toolchain to create the
exported `libcgpu.a` library because it's pure LLVM-IR. However, for
some language features we need the PTX version to be set. This is
normally set by checking the CUDA version, but without one installed it
will fail to build. We instead choose a minimum set of features on the
desired target, inferred from
https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#release-notes
and the PTX refernece for functions like `nanosleep`.
Reviewed By: tianshilei1992
Differential Revision: https://reviews.llvm.org/D148532
Let RISCV64 targets math implementations with and without FMA
automatically.
Reviewed By: michaelrj
Differential Revision: https://reviews.llvm.org/D146730
The packaged version of the `libc` library does not depend on the CUDA
installation because it only uses `clang` and emits LLVM-IR. However,
for testing we directly need the CUDA toolkit to emit and execute the
files. This patch explicitly passes `--cuda-path` to the relevant
compilations for NVPTX testing.
Reviewed By: tra
Differential Revision: https://reviews.llvm.org/D147653
Summary:
The AMDGPU ABI isn't stable or well defined. For that reson we prefer to
rely on LTO to ensure that multiple files get linked correctly.
Currently the internal targets used for testing mix LLVM-IR and
assembly. We should be consistent here.
Summary:
We use `-Xclang` to pass the GPU binary to be embedded. In the case of
multi-source objects this will be passed more than once, but CMake
implicitly deduplicates arguments. Use the special generator to prevent
this from happening.
Summary:
Multi-source object libraries require some additional handling, this
logic wasn't correctly settending the dependency on each filename
individually and was instead using the last one. This meant that only
the last file was built for multi-object libraries.