This commit moves the implementation of the smoothstep function to the
CLC library, whilst optimizing the codegen.
This commit also adds support for 'half' versions of smoothstep, which
were previously missing.
The CLC smoothstep implementation now keeps everything in vectors,
rather than recursively splitting vectors by half down to the scalar
base form. This should result in more optimal codegen across the board.
This commit also removes some non-standard overloads of smoothstep with
mixed types, such as 'double smoothstep(float, float, float)'. There
aren't any mixed-(element )type versions of smoothstep as far as I can
see:
gentype smoothstep(gentype edge0, gentype edge1, gentype x)
gentypef smoothstep(float edge0, float edge1, gentypef x)
gentyped smoothstep(double edge0, double edge1, gentyped x)
gentypeh smoothstep(half edge0, half edge1, gentypeh x)
The CLC library only defines the first type, for simplicity; the OpenCL
layer is responsible for handling the scalar/scalar/vector forms. Note
that the scalar/scalar/vector forms now splat the scalars to the vector
type, rather than recursively split vectors as before. The macro that
used to 'vectorize' smoothstep in this way has been moved out of the
shared clcmacro.h header as it was only used for the smoothstep builtin.
Note that the CLC clamp function is now built for both SPIR-V targets.
This is to help build the CLC smoothstep function for the Mesa SPIR-V
target.
VTypeAnalysis contains some assertions which can be useful for reasoning
that the types of various operands match.
This patch teaches VPlanVerifier to invoke VTypeAnalysis to check them,
and catches some issues with VPInstruction types that are also fixed
here:
* Handles the missing cases for CalculateTripCountMinusVF,
CanonicalIVIncrementForPart and AnyOf
* Fixes ICmp and ActiveLaneMask to return i1 (to align with `icmp` and
`@llvm.get.active.lane.mask` in the LangRef)
The VPlanVerifier unit tests also need to be fleshed out a bit more to
satisfy the stricter assertions
This commits allows the container to report 3 additional metrics at
every sampling event:
- a heartbeat
- the size of the workflow queue (filtered)
- the number of running workflows (filtered)
The heartbeat is a simple metric allowing us to monitor the metrics
health. Before this commit, a new metrics was pushed only when a
workflow was completed. This meant we had to wait a few hours
before noticing if the metrics container was unable to push metrics.
In addition to this, this commits adds a sampling of the workflow
queue size and running count. This should allow us to better understand
the load, and improve the autoscale values we pick for the cluster.
---------
Signed-off-by: Nathan Gauër <brioche@google.com>
This patch is the fourth step to extend the current multilib system to
support the selection of library variants which do not correspond to
existing command-line options.
Proposal can be found in
https://discourse.llvm.org/t/rfc-multilib-custom-flags/81058
The multilib mechanism supports libraries that target code generation or
language options such as --target, -mcpu, -mfpu, -mbranch-protection.
However, some library variants are particular to features that do not
correspond to any command-line options. Examples include variants for
multithreading and semihosting.
This work introduces a way to instruct the multilib system to consider
these features in library selection. This particular patch updates the
documentation.
Anonymous namespaces are supposed to be optional when looking up types.
This was not working in combination with -gsimple-template-names,
because the way it was constructing the complete (with template args)
name scope (i.e., by generating thescope as a string and then reparsing
it) did not preserve the information about the scope kinds.
Essentially what the code wants here is to call `GetTypeLookupContext`
(that's the function used to get the context in the "regular" code
path), but to embelish each name with the template arguments (if they
don't have them already). This PR implements exactly that by adding an
argument to control which kind of names are we interested in. This
should also make the lookup faster as it avoids parsing of the long
string, but I haven't attempted to benchmark that.
I believe this function can also be used in some other places where
we're manually appending template names, but I'm leaving that for
another patch.
When passing an instruction with a register mask, the machine copy
propagation pass was dropping the information about some copy
instructions which define a register which is preserved by the mask,
because that register overlaps a register which is partially clobbered
by it. This resulted in a miscompilation for AArch64, because this
caused a live copy to be considered dead.
The fix is to clobber register masks by finding the set of reg units
which is preserved by the mask, and clobbering all units not in that
set.
This is based on #122472, and fixes the compile time performance
regressions which were caused by that.
This patch is the third step to extend the current multilib system to
support the selection of library variants which do not correspond to
existing command-line options.
Proposal can be found in
https://discourse.llvm.org/t/rfc-multilib-custom-flags/81058
The multilib mechanism supports libraries that target code generation or
language options such as --target, -mcpu, -mfpu, -mbranch-protection.
However, some library variants are particular to features that do not
correspond to any command-line options. Examples include variants for
multithreading and semihosting.
This work introduces a way to instruct the multilib system to consider
these features in library selection. This particular patch is comprised
of the core processing of these flags.
- Custom flags in the command-line are read and forwarded to the
multilib system. If multiple flag values are present for the same flag
declaration, the last one wins. Default flag values are inserted for
flag declarations for which no value was given.
- Feed `MacroDefines` back into the driver. Each item `<string>` in the
`MacroDefines` list is formatted as `-D<string>`.
Library variants should list their requirement on one or more custom
flags like they do for any other flag. The new command-line option is
passed as-is to the multilib system, therefore it should be listed in
the format `-fmultilib-flag=<str>`.
Moreover, a variant that does not specify a requirement on any
particular flag can be matched against any value of that flag.
If the user specifies `-fmultilib-flag=<name>` with a name that is
invalid, but close enough to any valid flag value name in terms of edit
distance, a suggesting error is shown:
```
error: unsupported option '-fmultilib-flag=invalidname'; did you mean '-fmultilib-flag=validname'?
```
The candidate with the smallest edit distance is chosen for the
suggestion, up to a certain maximum value (implementation detail), after
which a non-suggesting error is shown instead:
```
error: unsupported option '-fmultilib-flag=invalidname'
```
This patch removes 11 `check_include_file` invocations from
configuration phase of LLVM subproject on most of the platforms,
hardcoding the results. Fallback is left for platforms that we don't
document as supported or that are not detectable via
`CMAKE_SYSTEM_NAME`, e.g. z/OS.
This patch reduces configuration time on Linux by 10%, going from 44.7
seconds down to 40.6 seconds on my Debian machine (ramdisk, `cmake
-DLLVM_ENABLE_PROJECTS="clang;lldb;clang-tools-extra"
-DLLVM_ENABLE_RUNTIMES="libunwind;libcxx;libcxxabi"
-DCMAKE_BUILD_TYPE=RelWithDebInfo -DLLVM_OPTIMIZED_TABLEGEN=ON
-DLLVM_TARGETS_TO_BUILD="X86" -DLLVM_ENABLE_DOXYGEN=ON
-DLLVM_ENABLE_LIBCXX=ON -DBUILD_SHARED_LIBS=ON -DLLDB_ENABLE_PYTHON=ON
~/endill/llvm-project/llvm`).
In order to determine the values to hardcode, I prepared the following
header:
```cpp
#include <dlfcn.h>
#include <errno.h>
#include <fcntl.h>
#include <fenv.h>
#include <mach/mach.h>
#include <malloc/malloc.h>
#include <pthread.h>
#include <signal.h>
#include <sys/ioctl.h>
#include <sys/mman.h>
#include <sys/param.h>
#include <sys/resource.h>
#include <sys/stat.h>
#include <sys/time.h>
#include <sys/types.h>
#include <sysexits.h>
#include <termios.h>
#include <unistd.h>
int main() {}
```
and tried to compile it on the oldest versions of platforms that are
still supported (which was problematic to determine sometimes): macOS
12, Cygwin, DragonFly BSD 6.4.0, FreeBSD 13.3, Haiku R1 beta 4, RHEL
8.10 as a glibc-based Linux, Alpine 3.17 as musl-based Linux, NetBSD 9,
OpenBSD 7.4, Solaris 11.4, Windows SDK 10.0.17763.0, which corresponds
to Windows 10 1809 and is the oldest Windows 10 SDK in Visual Studio
Installer.
For platforms I don't have access to, which are AIX 7.2 TL5 and z/OS
2.4.0, I had to rely on the official documentation. I suspect that AIX
offers a better set of headers than what this PR claims, so I'm open to
input from people who have access to a live system to test it.
Similarly to AIX, I have values for z/OS compiled from the official
documentation that are not included in this patch, because apparently
upstream CMake doesn't even support z/OS, so I don't even know how to
make a place to hold those values. I see `if (ZOS)` in several places
across our CMake files, but it's a mystery to me where this variable
comes from. Input from people who have access to live z/OS instance is
welcome.
x86_64::GOTTableManager and x86_64::PLTTableManager will now look for existing
GOT and PLT sections and re-use existing entries if they're present.
This will be used for an upcoming MachO patch to enable compact unwind support.
This patch is the x86-64 counterpart 42595bdaefb, which added the same
functionality to the GOT and PLT managers for aarch64.
We missed a case of type constraints referencing deduced template
parameters when constructing a deduction guide for the type alias. This
patch fixes the issue by swapping the order of constructing 'template
arguments not appearing in the type alias parameters' and 'template
arguments that are not yet deduced'.
Fixes https://github.com/llvm/llvm-project/issues/122134
When generating a constant vector, if `UseSplat` is false, the indices
different from the index of the extract can be filled with `poison`
instead of `undef`.
Problems:
- Cray pointee cannot be used in the DSA list (If used results in
segmentation fault)
- Cray pointer has to be in the DSA list when Cray pointee is used in
the default (none) region
Fix: Added required semantic checks along the tests
Reference from the documentation (OpenMP 5.0: 2.19.1):
- Cray pointees have the same data-sharing attribute as the storage with
which their Cray pointers are associated.
This commit improves the memory efficiency of the lld-macho linker by
optimizing how thunks are printed in the map file. Previously, merging
vectors of input sections required creating a temporary vector, which
increased memory usage and in some cases caused the linker to run out of
memory as reported in comments on
https://github.com/llvm/llvm-project/pull/120496. The new approach
interleaves the printing of two arrays of ConcatInputSection in sorted
order without allocating additional memory for a merged array.
Implementation details:
The PRIORITY clause is recognized by setting the flags = 32 to the
`__kmpc_omp_task_alloc` runtime call. Also, store the priority-value
to the `kmp_task_t` struct member
We were previously checking a combination of the vector policy op and
the opcode to determine if we needed to skip copying the passthru from
a masked pseudo to an unmasked pseudo.
However we can just do this by checking
RISCVII::isFirstDefTiedToFirstUse, which is a proxy for whether or not
a pseudo has a passthru operand.
This should hopefully remove the need for the changes in #123106
This option is very important for RISC-V as it controls calling
convention and a field in the ELF header. It is used in a large number
of RISC-V lit tests.
Expose the option to -help.
Fixes one issue raised in #123077
Related to #121288
This PR fixes the miscopied `config.lldb_build_directory` variable in
`lit.cfg.py` inside MLIR's test suit. `config.mlir_obj_root` is used as
a replacement for the copied python executable's directory.
**PS**: Since this is a common work-around on macOS, should we promote
it as a utility across projects?
Co-authored-by: Luohao Wang <Luohaothu@users.noreply.github.com>
Co-authored-by: Kai Sasaki <lewuathe@gmail.com>
Block::edges_at is a convenience method for iterating over edges at a given
offset within a jitlink::Block.
This method will be used in an upcoming patch for compact unwind info support.
This is a follow-up from PR #122545, which enabled converting yaml to contextual profiles.
This change uses the lower level yaml APIs because:
- the mapping APIs `llvm::yaml` offers don't work with `const` values, because they (the APIs) want to enable both serialization and deserialization
- building a helper data structure would be an alternative, but it'd be either memory-consuming or overly-complex design, given the recursive nature of the contextual profiles.
There was a bug in both the GNU and libc++ library synthetic child
providers when a typedef was used in the type of the variable. Previous
code was looking at the top level typename to try and determine if
std::unordered_ was a map or set and this failed when typedefs were
being used. This patch fixes both C++ library synthetic child providers
with updated tests.
It will not be set if:
1. `(TypeStr.starts_with("SHT_") || isInteger(TypeStr)) == false`: here
we want go to switch default.
2. `IO.mapRequired("Type", Type);` fail parsing. It sets error
internally, so probably not important what happen next, so it's go to
the switch
These changes ensure that the `sys/wait` header is documented properly
with respect to the issue (#122006 ).
**Changes:**
1. **wait.yaml**: Created a new YAML file for `sys/wait` with functions
(`wait`, `waitid`, `waitpid`) and related macros.
2. **CMakeLists.txt**: Added `sys/wait` to the documentation
directories.
3. **index.rst**: Included `sys/wait` in the documentation index.