Implements HLSL availability diagnostics' default and relaxed mode.
HLSL availability diagnostics emits errors or warning when unavailable
shader APIs are used. Unavailable shader APIs are APIs that are exposed
in HLSL code but are not available in the target shader stage or shader
model version.
In the default mode the compiler emits an error when an unavailable API
is found in a code that is reachable from the shader entry point
function. In the future this check will also extended to exported
library functions (#92073). The relaxed diagnostic mode is the same
except the compiler emits a warning. This mode is enabled by
``-Wno-error=hlsl-availability``.
See HLSL Availability Diagnostics design doc
[here](https://github.com/llvm/llvm-project/blob/main/clang/docs/HLSL/AvailabilityDiagnostics.rst)
for more details.
Fixes#90095
When a module contains globals and/or function declarations only, the
'__llvm_profile_raw_version' variable should not be generated because
the module was not instrumented at all.
NFC
In https://github.com/llvm/llvm-project/pull/88323, I changed the logic
within `add_compiler_rt_runtime` to only explicitly code sign the
resulting library if an older version of Apple's ld64 was in use. This
was based on the assumption that newer versions of ld64 and the new
Apple linker always ad-hoc sign their output binaries. This is true in
most cases, but not when using Apple's new linker with the
`-darwin-target-variant` flag to build Mac binaries that are compatible
with Catalyst.
Rather than adding increasingly complicated logic to detect the exact
scenarios that require explicit code signing, I've opted to always
explicitly code sign when using any Apple linker. We instead detect and
use the 'linker-signed' codesigning option when possible to match the
signatures that the linker would otherwise create. This avoids having
non-'linker-signed' ad-hoc signatures which was the underlying problem
that https://github.com/llvm/llvm-project/pull/88323 was intended to
address.
Co-authored-by: Mark Rowe <markrowe@chromium.org>
Follow-up to a previous simplification
2473b1af085ad54e89666cedf684fdf10a84f058.
The xor difference between a SHT_NOTE and a read-only SHT_PROGBITS
(previously >=NOT_SPECIAL) should be smaller than RF_EXEC. Otherwise,
for the following section layout, `findOrphanPos` would place .text
before note.
```
// simplified from linkerscript/custom-section-type.s
non orphans:
progbits 0x8060c00 NOT_SPECIAL
note 0x8040003
orphan:
.text 0x8061000 NOT_SPECIAL
```
---
Identical to 2e0cfe69d0d705e9c5d5f217625bf7e3a0e90871.
The revert 30c10fda2ba539e70bff4f05625ec6358c0f7502 is wrong.
The patch introduces the gmock-based unittest infrastructure for PGO
Instrumentation and adds some test cases to check whether the
instrumentation has taken place. The testing infrastructure for analysis
modules was borrowed from the LoopPassManagerTest unittest and
simplified a bit to handle module analysis passes only. Actually, we are
testing whether the result of a trivial analysis pass was invalidated by
the PGOInstrumentGen one: we exploit the fact the pass invalidates all
the analysis results after a module was instrumented.
NFC.
This patch enhances the SCEVAAResult::alias() interface to handle two
pointers with different pointer bases.
Before calling getMinusSCEV(), we firstly try to explicitly convert
these two pointers into ptrtoint expressions to do that.
Either both pointers are used with ptrtoint or neither, so we can't
end up with a ptr + int mix.
Extend NVPTX DAG combining logic to distribute a mul instruction across
an add of 1 into a mad where possible. In addition, add support for
transposing a mul through a select with an option of 1, if that would
allow further mul folding.
This is one of the major changes we (Microsoft) have made in the version
of asan we ship with Visual Studio.
@amyw-msft wrote a blog post outlining this work at
https://devblogs.microsoft.com/cppblog/msvc-address-sanitizer-one-dll-for-all-runtime-configurations/
> With Visual Studio 2022 version 17.7 Preview 3, we have refactored the
MSVC Address Sanitizer (ASan) to depend on one runtime DLL regardless of
the runtime configuration. This simplifies project onboarding and
supports more scenarios, particularly for projects statically linked
(/MT, /MTd) to the C Runtimes. However, static configurations have a new
dependency on the ASan runtime DLL.
> Summary of the changes:
> ASan now works with /MT or /MTd built DLLs when the host EXE was not
compiled with ASan. This includes Windows services, COM components, and
plugins.
Configuring your project with ASan is now simpler, since your project
doesn’t need to uniformly specify the same [runtime
configuration](https://learn.microsoft.com/en-us/cpp/build/reference/md-mt-ld-use-run-time-library?view=msvc-170)
(/MT, /MTd, /MD, /MDd).
ASan workflows and pipelines for /MT or /MTd built projects will need to
ensure the ASan DLL (clang_rt.asan_dynamic-<arch>.dll) is available on
PATH.
The names of the ASan .lib files needed by the linker have changed (the
linker normally takes care of this if not manually specifying lib names
via /INFERASANLIBS)
You cannot mix ASan-compiled binaries from previous versions of the MSVC
Address Sanitizer (this is always true, but especially true in this
case).
Here's the description of these changes from our internal PR
1. Build one DLL that includes everything debug mode needs (not included
here, already contributed upstream).
* Remove #if _DEBUG checks everywhere.
* In some places, this needed to be replaced with a runtime check. In
asan_win.cpp, IsDebugRuntimePresent was added where we are searching for
allocations prior to ASAN initialization.
* In asan_win_runtime_functions.cpp and interception_win.cpp, we need to
be aware of debug runtime DLLs even when not built with _DEBUG.
2. Redirect statically linked functions to the ASAN DLL for /MT
* New exports for each of the C allocation APIs so that the statically
linked portion of the runtime can call them (see asan_malloc_win.cpp,
search MALLOC_DLL_EXPORT). Since we want our stack trace information to
be accurate and without noise, this means we need to capture stack frame
info from the original call and tell it to our DLL export. For this, I
have reused the __asan_win_new_delete_data used for op new/delete
support from asan_win_new_delete_thunk_common.h and moved it into
asan_win_thunk_common.h renamed as __asan_win_stack_data.
* For the C allocation APIs, a new file is included in the
statically-linked /WHOLEARCHIVE lib - asan_malloc_win_thunk.cpp. These
functions simply provide definitions for malloc/free/etc to be used
instead of the UCRT's definitions for /MT and instead call the ASAN DLL
export. /INFERASANLIBS ensures libucrt.lib will not take precedence via
/WHOLEARCHIVE.
* For other APIs, the interception code was called, so a new export is
provided: __sanitizer_override_function.
__sanitizer_override_function_by_addr is also provided to support
__except_handler4 on x86 (due to the security cookie being per-module).
3. Support weak symbols for /MD
* We have customers (CoreCLR) that rely on this behavior and would force
/MT to get it.
* There was sanitizer_win_weak_interception.cpp before, which did some
stuff for setting up the .WEAK section, but this only worked on /MT. Now
stuff registered in the .WEAK section is passed to the ASAN DLL via new
export __sanitizer_register_weak_function (impl in
sanitizer_win_interception.cpp). Unlike linux, multiple weak symbol
registrations are possible here. Current behavior is to give priority on
module load order such that whoever loads last (so priority is given to
the EXE) will have their weak symbol registered.
* Unfortunately, the registration can only occur during the user module
startup, which is after ASAN DLL startup, so any weak symbols used by
ASAN during initialization will not be picked up. This is most notable
for __asan_default_options and friends (see asan_flags.cpp). A mechanism
was made to add a callback for when a certain weak symbol was
registered, so now we process __asan_default_options during module
startup instead of ASAN startup. This is a change in behavior, but
there's no real way around this due to how DLLs are.
4. Build reorganization
* I noticed that our current build configuration is very MSVC-specific
and so did a bit of reworking. Removed a lot of
create_multiple_windows_obj_lib use since it's no longer needed and it
changed how we needed to refer to each object_lib by adding runtime
configuration to the name, conflicting with how it works for non-MSVC.
* No more Win32 static build, use /MD everywhere.
* Building with /Zl to avoid defaultlib warnings.
In addition:
* I've reapplied "[sanitizer][asan][win] Intercept _strdup on Windows
instead of strdup" which broke the previous static asan runtime. That
runtime is gone now and this change is required for the strdup tests to
work.
* I've modified the MSVC clang driver to support linking the correct
asan libraries, including via defining _DLL (which triggers different
defaultlibs and should result in the asan dll thunk being linked, along
with the dll CRT (via defaultlib directives).
* I've made passing -static-libsan an error on windows, and made
-shared-libsan the default. I'm not sure I did this correctly, or in the
best way.
* Modified the test harnesses to add substitutions for the dynamic and
static thunks and to make the library substitutions point to the dynamic
asan runtime for all test configurations on windows. Both the static and
dynamic windows test configurations remain, because they correspond to
the static and dynamic CRT, not the static and dynamic asan runtime
library.
---------
Co-authored-by: Amy Wishnousky <amyw@microsoft.com>
Similar to #93596, this moves the signed vnclip patterns into DAG
combine.
This will allows us to support more than 1 level of truncate in a
future patch.
Follow-up to a previous simplification
2473b1af085ad54e89666cedf684fdf10a84f058.
The xor difference between a SHT_NOTE and a read-only SHT_PROGBITS
(previously >=NOT_SPECIAL) should be smaller than RF_EXEC. Otherwise,
for the following section layout, `findOrphanPos` would place .text
before note.
```
// simplified from linkerscript/custom-section-type.s
non orphans:
progbits 0x8060c00 NOT_SPECIAL
note 0x8040003
orphan:
.text 0x8061000 NOT_SPECIAL
```
This patch uses `DIExpression::foldConstantMath()` at the result of a
Salvaged expression, that is, it runs the folding optimizations after an
expression has been salvaged completely, to reduce how many times the
fold optimization function is called. Which should help in reducing the
size of DIExpressions that grow because of salvaging debug info
After checking the size of the dSYM with and without this change, I saw
a decrease of about 300KB, where the debug_loc section is about 1.6 GB
in size.
Where the debug loc section reduced in size by 212KB and it is 193MB in
size, the rest comes from the debug_info section
This is part of a stack of patches and comes after:
https://github.com/llvm/llvm-project/pull/69768https://github.com/llvm/llvm-project/pull/71717https://github.com/llvm/llvm-project/pull/71718https://github.com/llvm/llvm-project/pull/71719
DIExpressions can get very long and have a lot of redundant operations.
This function uses simple pattern matching to fold constant math that
can be evaluated at compile time.
The hope is that other people can contribute other patterns as well.
I also couldn't see a good way of combining this with
`DIExpression::constantFold` so it stands alone.
This is part of a stack of patches and comes after
https://github.com/llvm/llvm-project/pull/69768https://github.com/llvm/llvm-project/pull/71717
For each pair of types, we had 3 identical tests using umin with the
unsigned max value.
This patch replaces two of them with smin+smax cases that can be
implemented with a signed vmax followed by a vnclipu.
Current, if an image_atomic instruction has the 'tfe' operand, the
llvm-mc assembler in general would reject it. The only exception is when
dmask is 0x1 and the instruction is not image_atomic_cmpswap (e.g.,
image_atomic_add v[5:6], v252, s[8:15] dmask:0x1 tfe). This patch fixes
this problem and allows tfe to be specified in image_atomic
instructions.
---------
Co-authored-by: Jun Wang <jun.wang7@amd.com>
* Add support for G_LOAD from G_CONSTANT_POOL on X86 and X64
* Add X86GlobalBaseRegPass to handle base register initialization for
X86.
* Fix vector type legalization for G_STORE and G_LOAD as well as enable
scalarization for them.
* Custom lower G_BUILD_VECTOR into G_LOAD from G_CONSTANT_POOL.
This commit adds two functions to the DIExpressionCursor class.
`peekNextN(unsigned)` works like peekNext, but lets you peek the next
Nth element
`assignNewExpr(ArrayRef<uint64_t>)` lets you assign a new expression to
the same DIExpressionCursor object
This is part of a stack of patches, it comes after
https://github.com/llvm/llvm-project/pull/69768
This is an NFC patch to move DIExpressionCursor to DebugInfoMetada.h, so
that it can be used by classes in that header file.
Specifically, I want to use DIExpressionCursor in a subsequent patch:
https://github.com/llvm/llvm-project/pull/71718
Flang triggers some OOM on Windows CI right now. This is disruptive to
MLIR and LLVM changes that don't touch Flang, as such we disable
building Flang on Windows only for these PR that don't touch flang. The
testing on Linux is unchanged, and the post-merge Windows testing is
still fully covering here.
Such expression does not correspond to a variable in the source code
thus does not have a debug location. When the user collects perf data on
the program, if the intermediate memory load instruction is sampled, it
could not be attributed to any variable/class member, which causes the
sampling results to be under-counted.
This patch adds an option `-fdebug_info_for_pointer_type` to generate a
psuedo variable and its debug info for intermediate expression with
pointer dereferencing, so that perf data collected on the instruction of
that expression can be attributed to the correct class member.
This is a prototype so comments are needed.
... as flags have changed. This allows us to revisit the
`osd->osec.hasInputSections` condition in `getRankProximity` (originally
introduced as `Sec->Live` in https://reviews.llvm.org/D61197).
Without a linker script, `--orphan-handling=error` or `=warn` reports
all input sections, including even well-known sections like `.text`,
`.bss`, `.dynamic`, or `.symtab`. However, in this case, no sections
should be considered orphans because they all are placed with the same
default rules. This patch suppresses errors/warnings for placing orphan
sections if no linker script with the `SECTIONS` command is provided.
The proposed behavior matches GNU gold. GNU ld in the same scenario only
reports sections that are not in its default linker script, thus, it
avoids complaining about `.text` and similar.
Remove setupterm workaround on macOS which caused an issues after the
removal of the terminfo dependency. There's a comment that explains why
the workaround is present, but neither Jim nor I were able to reproduce
the issue by setting TERM to vt100.
Complete C++ type information can be quite expensive - and there's
limited value in representing every member function, even those that
can't be called (we don't do similarly for every non-member function
anyway). So add a flag to opt out of this behavior for experimenting
with this more terse behavior.
I think Sony already does this by default, so perhaps with a change to
the defaults, Sony can migrate to this rather than a downstream patch.
This breaks current debuggers in some expected ways - but those
breakages are visible without this feature too. Consider member function
template instantiations - they can't be consistently enumerated in every
translation unit:
a.h:
```
struct t1 {
template <int i>
static int f1() {
return i;
}
};
namespace ns {
template <int i>
int f1() {
return i;
}
} // namespace ns
```
a.cpp:
```
void f1() {
t1::f1<0>();
ns::f1<0>();
}
```
b.cpp:
```
void f1();
int main() {
f1();
t1::f1<1>();
ns::f1<1>();
}
```
```
(gdb) p ns::f1<0>()
$1 = 0
(gdb) p ns::f1<1>()
$2 = 1
(gdb) p t1::f1<0>()
Couldn't find method t1::f1<0>
(gdb) p t1::f1<1>()
$3 = 1
(gdb) s
f1 () at a.cpp:3
3 t1::f1<0>();
(gdb) p t1::f1<0>()
$4 = 0
(gdb) p t1::f1<1>()
Couldn't find method t1::f1<1>
(gdb)
```
(other similar non-canonical features are implicit special members
(copy/move ctor/assignment operator, default ctor) and nested types (eg:
pimpl idiom, where the nested type is declared-but-not-defined in one
TU, and defined in another TU))
lldb can't parse the template expressions above, so I'm not sure how to
test it there, but I'd guess it has similar problems. (
https://stackoverflow.com/questions/64602475/how-to-print-value-returned-by-template-member-function-in-gdb-lldb-debugging
so... I guess that's just totally not supported in lldb, how
unfortunate. And implicit special members are instantiated implicitly by
lldb, so missing those doesn't tickle the same issue)
Some very rudimentary numbers for a clang debug build:
.debug_info section size:
-g: 476MiB
-g -fdebug-types-section: 357MiB
-g -gomit-unreferenced-members: 340MiB
Though it also means a major reduction in .debug_str size,
-fdebug-types-section doesn't reduce string usage (so the first two
examples have the same .debug_str size, 247MiB), down to 175MiB.
So for total clang binary size (I don't have a quick "debug section size
reduction" on-hand): 1.45 (no type units) GiB -> 1.34 -> 1.22, so it
saves about 120MiB of binary size.
Also open to any riffing on the flag name for sure.
@probinson - would this be an accurate upstreaming of your internal
handling/would you use this functionality? If it wouldn't be useful to
you, it's maybe not worth adding upstream yet - not sure we'll use it at
Google, but if it was useful to you folks and meant other folks could
test with it it seemed maybe useful.
Original Differential Revision: https://reviews.llvm.org/D152017
This reverts commit 7832769d329ead264aff238c06dce086b3a74922.
This was reverted prior due to a test failure on the windows builder. I
think this was because we didn't specify the triple and assumed windows.
The other tests use the full triple specifying linux, so we follow suite
here.
---
We are using PLTs for cortex-m33 which only supports thumb. More
specifically, this is for a very restricted use case. There's no MMU so
there's no sharing of virtual addresses between two processes, but this
is fine. The MCU is used for running [chre
nanoapps](https://android.googlesource.com/platform/system/chre/+/HEAD/doc/nanoapp_overview.md)
for android. Each nanoapp is a shared library (but effectively acts as
an executable containing a test suite) that is loaded and run on the MCU
one binary at a time and there's only one process running at a time, so
we ensure that the same text segment cannot be shared by two different
running executables. GNU LD supports thumb PLTs but we want to migrate
to a clang toolchain and use LLD, so thumb PLTs are needed.
Prior, the reltable we create was "reltable." + FuncName which can
result in multiple tables named "reltable." + FuncName + ".{number}" if
we substitute multiple tables in a function. Since we replace the
original global, it makes it easier to just take over the original
global's name. Functionally, this doesn't change the IR emitted, just
global names.
This is a subset of PR 93355 that I'm breaking into multiple patches.