While running clang-repl in the browser, we would be interested in this
cc1 command
`
"" -cc1 -triple wasm32-unknown-emscripten -emit-obj -disable-free
-clear-ast-before-backend -disable-llvm-verifier -discard-value-names
-main-file-name "<<< inputs >>>" -mrelocation-model static
-mframe-pointer=none -ffp-contract=on -fno-rounding-math
-mconstructor-aliases -target-cpu generic -debugger-tuning=gdb
-fdebug-compilation-dir=/ -v -fcoverage-compilation-dir=/ -resource-dir
/lib/clang/19 -internal-isystem /include/wasm32-emscripten/c++/v1
-internal-isystem /include/c++/v1 -internal-isystem
/lib/clang/19/include -internal-isystem /include/wasm32-emscripten
-internal-isystem /include -std=c++17 -fdeprecated-macro -ferror-limit
19 -fvisibility=default -fgnuc-version=4.2.1 -fskip-odr-check-in-gmf
-fcxx-exceptions -fexceptions -fincremental-extensions -o "<<< inputs
>>>.o" -x c++ "<<< inputs >>>"
`
As can be seen `shared` is anyway overwritten by `static` which is also
what would be provided by default. Hence we can get rid of the shared
flag here.
Universal Mach-O files can't have an archicture slice that starts beyond
the 4GB boundary. However, we support generating universal binaries with
a fat64 header, but older tools may not understand this format.
Currently, unless -fat64 is passed, dsymutil will error out when it
encounters a slice that would exceeds the 4GB limit. Now that more tools
(like LLDB and CoreSymbolication) understand the fat64 header format,
this patch changes the default behavior to use the fat64 header and
emits a warning instead. The warning can be silenced by passing the
-fat64 flag. The goal is to eventually remove the warning altogether.
rdar://140998416
The DAG maintains a chain of MemDGNodes that links together all the
nodes that may touch memroy.
Whenever a new instruction gets created we need to make sure that this
chain gets updated. If the new instruction touches memory then its
corresponding MemDGNode should be inserted into the chain.
To reduce repeated code, TreeEntry::setOperandsInOrder will be replaced
by VLOperands.
Arg_size will be provided to make sure other operands will not be
reorderd when VL[0] is IntrinsicInst (because APO is a boolean value).
In addition, BoUpSLP::reorderInputsAccordingToOpcode will also be
removed since it is simple.
Pass the FFI-related CMake options through to runtimes, since offload is
building against libffi. This is needed when the system requires custom
`LIBFFI_INCLUDE` to build (e.g. on Gentoo where the headers are
installed to `/usr/lib*/libffi/include`).
Currently all the specializations of a template (including
instantiation, specialization and partial specializations) will be
loaded at once if we want to instantiate another instance for the
template, or find instantiation for the template, or just want to
complete the redecl chain.
This means basically we need to load every specializations for the
template once the template declaration got loaded. This is bad since
when we load a specialization, we need to load all of its template
arguments. Then we have to deserialize a lot of unnecessary
declarations.
For example,
```
// M.cppm
export module M;
export template <class T>
class A {};
export class ShouldNotBeLoaded {};
export class Temp {
A<ShouldNotBeLoaded> AS;
};
// use.cpp
import M;
A<int> a;
```
We should a specialization ` A<ShouldNotBeLoaded>` in `M.cppm` and we
instantiate the template `A` in `use.cpp`. Then we will deserialize
`ShouldNotBeLoaded` surprisingly when compiling `use.cpp`. And this
patch tries to avoid that.
Given that the templates are heavily used in C++, this is a pain point
for the performance.
This patch adds MultiOnDiskHashTable for specializations in the
ASTReader. Then we will only deserialize the specializations with the
same template arguments. We made that by using ODRHash for the template
arguments as the key of the hash table.
To review this patch, I think `ASTReaderDecl::AddLazySpecializations`
may be a good entry point.
The patch was reviewed in
https://github.com/llvm/llvm-project/pull/83237 but that PR is a stacked
PR. But I feel the intention of the stacked PRs get lost during the
review process. So I feel it is better to merge the commits into a
single commit instead of merging them in the PR page. It is better for
us to cherry-pick and revert.
1. In `CufOpConversion` `isDeviceGlobal` was renamed
`isRegisteredGlobal` and moved to the common file. `isRegisteredGlobal`
excludes constant `fir.global` operation from registration. This is to
avoid calls to `_FortranACUFGetDeviceAddress` on globals which do not
have any symbols in the runtime. This was done for
`_FortranACUFRegisterVariable` in #118582, but also needs to be done
here after #118591
2. `CufDeviceGlobal` no longer adds the `#cuf.cuda<constant>` attribute
to the constant global. As discussed in #118582 a module variable with
the #cuf.cuda<constant> attribute is not a compile time constant. Yet,
the compile time constant also needs to be copied into the GPU module.
The candidates for copy to the GPU modules are
- the globals needing regsitrations regardless of their uses in device
code (they can be referred to in host code as well)
- the compile time constant when used in device code
3. The registration of "constant" module device variables (
#cuf.cuda<constant>) can be restored in `CufAddConstructor`
Make EPCGenericMemoryAccess the default implementation for the MemoryAccess
object in SimpleRemoteEPC, and add support for the WritePointers operation to
OrcTargetProcess (previously this operation was unimplemented and would have
triggered an error if accessed in a remote-JIT setup).
No testcase yet: This functionality requires cross-process JITing to test (or a
much more elaborate unit-test setup). It can be tested once the new top-level
ORC runtime project lands.
For a spread with an element type small enough, we can use a zext and
shift to perform the shuffle. For e8, this covers spread(2,4,8), and for
e16 covers spread(2,4). Note that spread(2) is already covered by the
existing interleave logic, and is simply listed for completeness in the
prior description.
When both `-order_file` and `--irpgo-profile-sort=` (soon to be
`-bp-startup-sort=function` in
https://github.com/llvm/llvm-project/pull/118594) are used, we want to
confirm that symbols in the orderfile take precedence.
This is a NFC.
Create and duplicate test file for true16/fake16 mc test and update with
+real-true16/-real-true16 flags properly.
This is for preparing more test changes for true16 flows
Use SymbolStringPtr for Symbol names in LinkGraph. This reduces string interning
on the boundary between JITLink and ORC, and allows pointer comparisons (rather
than string comparisons) between Symbol names. This should improve the
performance and readability of code that bridges between JITLink and ORC (e.g.
ObjectLinkingLayer and ObjectLinkingLayer::Plugins).
To enable use of SymbolStringPtr a std::shared_ptr<SymbolStringPool> is added to
LinkGraph and threaded through to its construction sites in LLVM and Bolt. All
LinkGraphs that are to have symbol names compared by pointer equality must point
to the same SymbolStringPool instance, which in ORC sessions should be the pool
attached to the ExecutionSession.
---------
Co-authored-by: Lang Hames <lhames@gmail.com>
[libc][docs] stub out assert, errno, and locale
These were the remaining c89 library headers (besides string.h and
stdlib.h; I
will split strings.rst in a follow up commit).
The macro support detection in docgen doesn't quite work for some of
these
headers. Add the stubs for these headers for now, and fix up docgen
later.
See the "NIST publication":
Link: https://www.open-std.org/jtc1/sc22/wg14/www/projects.html
This commit does a few things:
* creates libc/docs/headers/ and moves all user API related headers under it.
* updates paths and docgen
* updates the top level index to put these headers under a new "Implementation
Status" tab.
* rename some of the files to be foo.rst for foo.h (except strings, which is
currently a mix of string.h and stdlib.h)
* update the heading of some files to be in the form foo.h.
Writing a test for this transitively exposed a number of places in
BuildLibCalls where
we were failing to propagate address spaces properly, which are
additionally fixed.
For Frames, we prefer the inline notation for the brevity.
For PortableMemInfoBlock, we go through all member fields and print
out those that are populated.
This iteration works around the unavailability of
ScalarTraits<uintptr_t> on macOS.
Includes `CheckCXXCompilerFlag` so when building LLVM libc is built
standalone, it actually works and doesn't complain about
`check_cxx_compiler_flag` not being defined.
Because it was implemented in terms of getMaxIndexSize, it was always
rounding the values up to a multiple of 8. Additionally, it was using
the PointerSpec's BitWidth rather than its IndexBitWidth, which was
self-evidently incorrect.
Since getMaxIndexSize was only used by getMaxIndexSizeInBits, and its
name and function seem niche and somewhat confusing, go ahead and remove
it until a concrete need for it arises.
EVTs potentially contain a Type * that points into memory owned by an
LLVMContext. Storing them in a function scoped static means they may
outlive the LLVMContext they point to.
This std::set is used to unique single element VT lists containing a
single extended EVT. Single element VT list with a simple EVT are
uniqued by a separate cache indexed by the MVT::SimpleValueType enum. VT
lists with more than one element are uniqued by a FoldingSet owned by
the SelectionDAG object.
This patch moves the single element cache into SelectionDAG so that it
will be destroyed when SelectionDAG is destroyed.
Fixes#88233