1788 Commits

Author SHA1 Message Date
Erich Keane
dc152659b4 Have cpu-specific variants set 'tune-cpu' as an optimization hint
Due to various implementation constraints, despite the programmer
choosing a 'processor' cpu_dispatch/cpu_specific needs to use the
'feature' list of a processor to identify it. This results in the
identified processor in source-code not being propogated to the
optimizer, and thus, not able to be tuned for.

This patch changes to use the actual cpu as written for tune-cpu so that
opt can make decisions based on the cpu-as-spelled, which should better
match the behavior expected by the programmer.

Note that the 'valid' list of processors for x86 is in
llvm/include/llvm/Support/X86TargetParser.def. At the moment, this list
contains only Intel processors, but other vendors may wish to add their
own entries as 'alias'es (or with different feature lists!).

If this is not done, there is two potential performance issues with the
patch, but I believe them to be worth it in light of the improvements to
behavior and performance.

1- In the event that the user spelled "ProcessorB", but we only have the
features available to test for "ProcessorA" (where A is B minus
features),
AND there is an optimization opportunity for "B" that negatively affects
"A", the optimizer will likely choose to do so.

2- In the event that the user spelled VendorI's processor, and the
feature
list allows it to run on VendorA's processor of similar features, AND
there
is an optimization opportunity for VendorIs that negatively affects
"A"s,
the optimizer will likely choose to do so. This can be fixed by adding
an
alias to X86TargetParser.def.

Differential Revision: https://reviews.llvm.org/D121410
2022-03-14 06:14:30 -07:00
Itay Bookstein
f3480390be [clang][CodeGen] Avoid emitting ifuncs with undefined resolvers
The purpose of this change is to fix the following codegen bug:

```
// main.c
__attribute__((cpu_specific(generic)))
int *foo(void) { static int z; return &z;}
int main() { return *foo() = 5; }

// other.c
__attribute__((cpu_dispatch(generic))) int *foo(void);

// run:
clang main.c other.c -o main; ./main
```

This will segfault prior to the change, and return the correct
exit code 5 after the change.

The underlying cause is that when a translation unit contains
a cpu_specific function without the corresponding cpu_dispatch
the generated code binds the reference to foo() against a
GlobalIFunc whose resolver is undefined. This is invalid: the
resolver must be defined in the same translation unit as the
ifunc, but historically the LLVM bitcode verifier did not check
that. The generated code then binds against the resolver rather
than the ifunc, so it ends up calling the resolver rather than
the resolvee. In the example above it treats its return value as
an int *, therefore trying to write to program text.

The root issue at the representation level is that GlobalIFunc,
like GlobalAlias, does not support a "declaration" state. The
object which provides the correct semantics in these cases
is a Function declaration, but unlike Functions, changing a
declaration to a definition in the GlobalIFunc case constitutes
a change of the object type, as opposed to simply emitting code
into a Function.

I think this limitation is unlikely to change, so I implemented
the fix by returning a function declaration rather than an ifunc
when encountering cpu_specific, and upgrading it to an ifunc
when emitting cpu_dispatch.
This uses `takeName` + `replaceAllUsesWith` in similar vein to
other places where the correct IR object type cannot be known
locally/up-front, like in `CodeGenModule::EmitAliasDefinition`.

Previous discussion in: https://reviews.llvm.org/D112349

Signed-off-by: Itay Bookstein <ibookstein@gmail.com>

Reviewed By: erichkeane

Differential Revision: https://reviews.llvm.org/D120266
2022-02-26 11:17:49 +02:00
Nikita Popov
5065076698 [CodeGen] Rename deprecated Address constructor
To make uses of the deprecated constructor easier to spot, and to
ensure that no new uses are introduced, rename it to
Address::deprecated().

While doing the rename, I've filled in element types in cases
where it was relatively obvious, but we're still left with 135
calls to the deprecated constructor.
2022-02-17 11:26:42 +01:00
Momchil Velikov
6398903ac8 Extend the uwtable attribute with unwind table kind
We have the `clang -cc1` command-line option `-funwind-tables=1|2` and
the codegen option `VALUE_CODEGENOPT(UnwindTables, 2, 0) ///< Unwind
tables (1) or asynchronous unwind tables (2)`. However, this is
encoded in LLVM IR by the presence or the absence of the `uwtable`
attribute, i.e.  we lose the information whether to generate want just
some unwind tables or asynchronous unwind tables.

Asynchronous unwind tables take more space in the runtime image, I'd
estimate something like 80-90% more, as the difference is adding
roughly the same number of CFI directives as for prologues, only a bit
simpler (e.g. `.cfi_offset reg, off` vs. `.cfi_restore reg`). Or even
more, if you consider tail duplication of epilogue blocks.
Asynchronous unwind tables could also restrict code generation to
having only a finite number of frame pointer adjustments (an example
of *not* having a finite number of `SP` adjustments is on AArch64 when
untagging the stack (MTE) in some cases the compiler can modify `SP`
in a loop).
Having the CFI precise up to an instruction generally also means one
cannot bundle together CFI instructions once the prologue is done,
they need to be interspersed with ordinary instructions, which means
extra `DW_CFA_advance_loc` commands, further increasing the unwind
tables size.

That is to say, async unwind tables impose a non-negligible overhead,
yet for the most common use cases (like C++ exceptions), they are not
even needed.

This patch extends the `uwtable` attribute with an optional
value:
      -  `uwtable` (default to `async`)
      -  `uwtable(sync)`, synchronous unwind tables
      -  `uwtable(async)`, asynchronous (instruction precise) unwind tables

Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D114543
2022-02-14 14:35:02 +00:00
Arthur Eubanks
87dd3d350c [clang][OpaquePtr] Remove call to getPointerElementType() in CodeGenModule::GetAddrOfGlobalTemporary() 2022-02-11 10:39:49 -08:00
Sameer Sahasrabuddhe
d8f99bb6e0 [AMDGPU] replace hostcall module flag with function attribute
The module flag to indicate use of hostcall is insufficient to catch
all cases where hostcall might be in use by a kernel. This is now
replaced by a function attribute that gets propagated to top-level
kernel functions via their respective call-graph.

If the attribute "amdgpu-no-hostcall-ptr" is absent on a kernel, the
default behaviour is to emit kernel metadata indicating that the
kernel uses the hostcall buffer pointer passed as an implicit
argument.

The attribute may be placed explicitly by the user, or inferred by the
AMDGPU attributor by examining the call-graph. The attribute is
inferred only if the function is not being sanitized, and the
implictarg_ptr does not result in a load of any byte in the hostcall
pointer argument.

Reviewed By: jdoerfert, arsenm, kpyzhov

Differential Revision: https://reviews.llvm.org/D119216
2022-02-11 22:51:56 +05:30
Yaxun (Sam) Liu
1d97cb1f6e [HIP] Emit amdgpu_code_object_version module flag
code object version determines ABI, therefore should not be mixed.

This patch emits amdgpu_code_object_version module flag in LLVM IR
based on code object version (default 4).

The amdgpu_code_object_version value is code object version times 100.

LLVM IR with different amdgpu_code_object_version module flag cannot
be linked.

The -cc1 option -mcode-object-version=none is for ROCm device library use
only, which supports multiple ABI.

Reviewed by: Artem Belevich

Differential Revision: https://reviews.llvm.org/D119026
2022-02-08 21:58:40 -05:00
Yaxun (Sam) Liu
171da443d5 [HIPSPV] Fix literals are mapped to Generic address space
This issue is an oversight in D108621.

Literals in HIP are emitted as global constant variables with default
address space which maps to Generic address space for HIPSPV. In
SPIR-V such variables translate to OpVariable instructions with
Generic storage class which are not legal. Fix by mapping literals
to CrossWorkGroup address space.

The literals are not mapped to UniformConstant because the “flat”
pointers in HIP may reference them and “flat” pointers are modeled
as Generic pointers in SPIR-V. In SPIR-V/OpenCL UniformConstant
pointers may not be casted to Generic.

Patch by: Henry Linjamäki

Reviewed by: Yaxun Liu

Differential Revision: https://reviews.llvm.org/D118876
2022-02-05 17:26:52 -05:00
Hans Wennborg
853e0aa424 Don't dllexport reference temporaries
Even if the reference itself is dllexport, the temporary should not be.
In fact, we're already giving it internal linkage, so dllexporting it
is not just wasteful, but will fail to link, as in the example below:

  $ cat /tmp/a.cc
  void _DllMainCRTStartup() {}
  const int __declspec(dllexport) &foo = 42;

  $ clang-cl -fuse-ld=lld /tmp/a.cc /Zl /link /dll /out:a.dll
  lld-link: error: <root>: undefined symbol: int const &foo::$RT1

Differential revision: https://reviews.llvm.org/D118980
2022-02-04 16:31:51 +01:00
Amilendra Kodithuwakku
1f08b08674 [clang][ARM] Emit warnings when PACBTI-M is used with unsupported architectures
Branch protection in M-class is supported by
 - Armv8.1-M.Main
 - Armv8-M.Main
 - Armv7-M

Attempting to enable this for other architectures, either by
command-line (e.g -mbranch-protection=bti) or by target attribute
in source code (e.g.  __attribute__((target("branch-protection=..."))) )
will generate a warning.

In both cases function attributes related to branch protection will not
be emitted. Regardless of the warning, module level attributes related to
branch protection will be emitted when it is enabled via the command-line.

The following people also contributed to this patch:
- Victor Campos

Reviewed By: chill

Differential Revision: https://reviews.llvm.org/D115501
2022-01-28 09:59:58 +00:00
Joao Moreira
82af95029e [X86] Enable ibt-seal optimization when LTO is used in Kernel
Intel's CET/IBT requires every indirect branch target to be an ENDBR instruction. Because of that, the compiler needs to correctly emit these instruction on function's prologues. Because this is a security feature, it is desirable that only actual indirect-branch-targeted functions are emitted with ENDBRs. While it is possible to identify address-taken functions through LTO, minimizing these ENDBR instructions remains a hard task for user-space binaries because exported functions may end being reachable through PLT entries, that will use an indirect branch for such. Because this cannot be determined during compilation-time, the compiler currently emits ENDBRs to every non-local-linkage function.

Despite the challenge presented for user-space, the kernel landscape is different as no PLTs are used. With the intent of providing the most fit ENDBR emission for the kernel, kernel developers proposed an optimization named "ibt-seal" which replaces the ENDBRs for NOPs directly in the binary. The discussion of this feature can be seen in [1].

This diff brings the enablement of the flag -mibt-seal, which in combination with LTO enforces a different policy for ENDBR placement in when the code-model is set to "kernel". In this scenario, the compiler will only emit ENDBRs to address taken functions, ignoring non-address taken functions that are don't have local linkage.

A comparison between an LTO-compiled kernel binaries without and with the -mibt-seal feature enabled shows that when -mibt-seal was used, the number of ENDBRs in the vmlinux.o binary patched by objtool decreased from 44383 to 33192, and that the number of superfluous ENDBR instructions nopped-out decreased from 11730 to 540.

The 540 missed superfluous ENDBRs need to be investigated further, but hypotheses are: assembly code not being taken care of by the compiler, kernel exported symbols mechanisms creating bogus address taken situations or even these being removed due to other binary optimizations like kernel's static_calls. For now, I assume that the large drop in the number of ENDBR instructions already justifies the feature being merged.

[1] - https://lkml.org/lkml/2021/11/22/591

Reviewed By: xiangzhangllvm

Differential Revision: https://reviews.llvm.org/D116070
2022-01-21 10:55:34 +08:00
Yaxun (Sam) Liu
85c2bd2a0e Prevent adding module flag amdgpu_hostcall multiple times
HIP program with printf call fails to compile with -fsanitize=address
option, because of appending module flag - amdgpu_hostcall twice, one
for printf and one for sanitize option. This patch fixes that issue.

Patch by: Praveen Velliengiri

Reviewed by: Yaxun Liu, Roman Lebedev

Differential Revision: https://reviews.llvm.org/D116216
2022-01-19 12:52:33 -05:00
Nikita Popov
c63a3175c2 [AttrBuilder] Remove ctor accepting AttributeList and Index
Use the AttributeSet constructor instead. There's no good reason
why AttrBuilder itself should exact the AttributeSet from the
AttributeList. Moving this out of the AttrBuilder generally results
in cleaner code.
2022-01-15 22:39:31 +01:00
Erich Keane
2bcba21c8b [CPU-Dispatch] Make sure Dispatch names get updated if previously mangled
Cases where there is a mangling of a cpu-dispatch/cpu-specific function
before the function becomes 'multiversion' (such as a member function)
causes the wrong name to be emitted for one of the variants/resolver,
since the name is cached.  Make sure we invalidate the cache in
cpu-dispatch/cpu-specific modes, like we previously did for just target
multiversioning.
2022-01-14 10:45:55 -08:00
Erich Keane
b699e8b11a Add another assert to cpu-dispatch emission to help track down a tough
to repro error.

As mentioned yesterday, I've got a problem that I can only reproduce on
Godbolt (none of the build configs on my local machine!), so this is at
least somewhat usable until I figure out a cause.
2022-01-13 06:54:08 -08:00
Erich Keane
6e77ad11ff Add an assert in cpudispatch emit to try to track down an error.
I'm attempting to debug an issue that I can only get to happen on
godbolt, where the cpu-dispatch resolver for an out of line member
function is generated with the wrong name, causing a link failure.
2022-01-12 10:31:28 -08:00
Serge Guelton
d2cc6c2d0c Use a sorted array instead of a map to store AttrBuilder string attributes
Using and std::map<SmallString, SmallString> for target dependent attributes is
inefficient: it makes its constructor slightly heavier, and involves extra
allocation for each new string attribute. Storing the attribute key/value as
strings implies extra allocation/copy step.

Use a sorted vector instead. Given the low number of attributes generally
involved, this is cheaper, as showcased by

https://llvm-compile-time-tracker.com/compare.php?from=5de322295f4ade692dc4f1823ae4450ad3c48af2&to=05bc480bf641a9e3b466619af43a2d123ee3f71d&stat=instructions

Differential Revision: https://reviews.llvm.org/D116599
2022-01-10 14:49:53 +01:00
Kazu Hirata
40446663c7 [clang] Use true/false instead of 1/0 (NFC)
Identified with modernize-use-bool-literals.
2022-01-09 00:19:47 -08:00
serge-sans-paille
9290ccc3c1 Introduce the AttributeMask class
This class is solely used as a lightweight and clean way to build a set of
attributes to be removed from an AttrBuilder. Previously AttrBuilder was used
both for building and removing, which introduced odd situation like creation of
Attribute with dummy value because the only relevant part was the attribute
kind.

Differential Revision: https://reviews.llvm.org/D116110
2022-01-04 15:37:46 +01:00
Sami Tolvanen
ec2e26eaf6 [Clang] Add __builtin_function_start
Control-Flow Integrity (CFI) replaces references to address-taken
functions with pointers to the CFI jump table. This is a problem
for low-level code, such as operating system kernels, which may
need the address of an actual function body without the jump table
indirection.

This change adds the __builtin_function_start() builtin, which
accepts an argument that can be constant-evaluated to a function,
and returns the address of the function body.

Link: https://github.com/ClangBuiltLinux/linux/issues/1353

Depends on D108478

Reviewed By: pcc, rjmccall

Differential Revision: https://reviews.llvm.org/D108479
2021-12-20 12:55:33 -08:00
Nikita Popov
c3b624a191 [CodeGen] Avoid deprecated ConstantAddress constructor
Change all uses of the deprecated constructor to pass the
element type explicitly and drop it.

For cases where the correct element type was not immediately
obvious to me or would require a slightly larger change I'm
falling back to explicitly calling getPointerElementType() for now.
2021-12-15 10:42:41 +01:00
Peter Collingbourne
0a14674f27 CodeGen: Strip exception specifications from function types in CFI type names.
With C++17 the exception specification has been made part of the
function type, and therefore part of mangled type names.

However, it's valid to convert function pointers with an exception
specification to function pointers with the same argument and return
types but without an exception specification, which means that e.g. a
function of type "void () noexcept" can be called through a pointer
of type "void ()". We must therefore consider the two types to be
compatible for CFI purposes.

We can do this by stripping the exception specification before mangling
the type name, which is what this patch does.

Differential Revision: https://reviews.llvm.org/D115015
2021-12-03 14:50:52 -05:00
Ties Stuij
e3b2f0226b [clang][ARM] PACBTI-M frontend support
Handle branch protection option on the commandline as well as a function
attribute. One patch for both mechanisms, as they use the same underlying
parsing mechanism.

These are recorded in a set of LLVM IR module-level attributes like we do for
AArch64 PAC/BTI (see https://reviews.llvm.org/D85649):

- command-line options are "translated" to module-level LLVM IR
  attributes (metadata).

- functions have PAC/BTI specific attributes iff the
  __attribute__((target("branch-protection=...))) was used in the function
  declaration.

- command-line option -mbranch-protection to armclang targeting Arm,
following this grammar:

branch-protection ::= "-mbranch-protection=" <protection>
protection ::=  "none" | "standard" | "bti" [ "+" <pac-ret-clause> ]
                | <pac-ret-clause> [ "+" "bti"]
pac-ret-clause ::= "pac-ret" [ "+" <pac-ret-option> ]
pac-ret-option ::= "leaf" ["+" "b-key"] | "b-key" ["+" "leaf"]

b-key is simply a placeholder to make it consistent with AArch64's
version. In Arm, however, it triggers a warning informing that b-key is
unsupported and a-key will be selected instead.

- Handle _attribute_((target(("branch-protection=..."))) for AArch32 with the
same grammer as the commandline options.

This patch is part of a series that adds support for the PACBTI-M extension of
the Armv8.1-M architecture, as detailed here:

https://community.arm.com/arm-community-blogs/b/architectures-and-processors-blog/posts/armv8-1-m-pointer-authentication-and-branch-target-identification-extension

The PACBTI-M specification can be found in the Armv8-M Architecture Reference
Manual:

https://developer.arm.com/documentation/ddi0553/latest

The following people contributed to this patch:

- Momchil Velikov
- Victor Campos
- Ties Stuij

Reviewed By: vhscampos

Differential Revision: https://reviews.llvm.org/D112421
2021-12-01 10:37:16 +00:00
Erich Keane
fc53eb69c2 Reapply 'Implement target_clones multiversioning'
See discussion in D51650, this change was a little aggressive in an
error while doing a 'while we were here', so this removes that error
condition, as it is apparently useful.

This reverts commit bb4934601d731465e01e2e22c80ce2dbe687d73f.
2021-11-29 06:30:01 -08:00
Phoebe Wang
de34a940ae [X86] Add -mskip-rax-setup support to align with GCC
AMD64 ABI mandates caller to specify the number of used SSE registers
when passing variable arguments.
GCC also provides option -mskip-rax-setup to skip the setup of rax when
SSE is disabled. This helps to reduce the code size, see pr23258.

Reviewed By: nickdesaulniers

Differential Revision: https://reviews.llvm.org/D112413
2021-11-18 11:20:32 +08:00
Kazu Hirata
d0ac215dd5 [clang] Use isa instead of dyn_cast (NFC) 2021-11-14 09:32:40 -08:00
Adrian Kuegel
bb4934601d Revert "Implement target_clones multiversioning"
This reverts commit 9deab60ae710f8c4cc810cd680edfb64c803f42d.
There is a possibly unintended semantic change.
2021-11-12 11:05:58 +01:00
Erich Keane
9deab60ae7 Implement target_clones multiversioning
As discussed here: https://lwn.net/Articles/691932/

GCC6.0 adds target_clones multiversioning. This functionality is
an odd cross between the cpu_dispatch and 'target' MV, but is compatible
with neither.

This attribute allows you to list all options, then emits a separately
optimized version of each function per-option (similar to the
cpu_specific attribute). It automatically generates a resolver, just
like the other two.

The mangling however, is... ODD to say the least. The mangling format
is:
<normal_mangling>.<option string>.<option ordinal>.

Differential Revision:https://reviews.llvm.org/D51650
2021-11-11 11:11:16 -08:00
Yaxun (Sam) Liu
4b3881e9f3 Emit hidden hostcall argument for sanitized kernels
this patch - https://reviews.llvm.org/D110337 changes the way how hostcall
hidden argument is emitted for printf, but the sanitized kernels also use
hostcall buffer to report a error for invalid memory access, which is not
handled by the above patch and it leads to vdi runtime error:

Device::callbackQueue aborting with error : HSA_STATUS_ERROR_MEMORY_FAULT:
Agent attempted to access an inaccessible address. code: 0x2b

Patch by: Praveen Velliengiri

Reviewed by: Yaxun Liu, Matt Arsenault

Differential Revision: https://reviews.llvm.org/D112820
2021-11-10 17:05:57 -05:00
Yaxun (Sam) Liu
80072fde61 [CUDA][HIP] Allow comdat for kernels
Two identical instantiations of a template function can be emitted by two TU's
with linkonce_odr linkage without causing duplicate symbols in linker. MSVC
also requires these symbols be in comdat sections. Linux does not require
the symbols in comdat sections to be merged by linker but by default
clang puts them in comdat sections.

If a template kernel is instantiated identically in two TU's. MSVC requires
that them to be in comdat sections, otherwise MSVC linker will diagnose them as
duplicate symbols. However, currently clang does not put instantiated template
kernels in comdat sections, which causes link error for MSVC.

This patch allows putting instantiated template kernels into comdat sections.

Reviewed by: Artem Belevich, Reid Kleckner

Differential Revision: https://reviews.llvm.org/D112492
2021-11-10 16:42:23 -05:00
Itay Bookstein
9efce0baee [clang] Run LLVM Verifier in modes without CodeGen too
Previously, the Backend_Emit{Nothing,BC,LL} modes did
not run the LLVM verifier since it is usually added via
the TargetMachine::addPassesToEmitFile method according
to the DisableVerify parameter. This is called from
EmitAssemblyHelper::AddEmitPasses, which is only relevant
for BackendAction-s that require CodeGen.

Note:
* In these particular situations the verifier is added
  to the optimization pipeline rather than the codegen
  pipeline so that it runs prior to the BC/LL emission
  pass.
* This change applies to both the old and the new PMs.
* Because the clang tests use -emit-llvm ubiquitously,
  this change will enable the verifier for them.
* A small bug is fixed in emitIFuncDefinition so that
  the clang/test/CodeGen/ifunc.c test would pass:
  the emitIFuncDefinition incorrectly passed the
  GlobalDecl of the IFunc itself to the call to
  GetOrCreateLLVMFunction for creating the resolver.

Signed-off-by: Itay Bookstein <ibookstein@gmail.com>

Reviewed By: rjmccall

Differential Revision: https://reviews.llvm.org/D113352
2021-11-09 23:57:13 +02:00
Itay Bookstein
3b1fd19357 [CodeGen] Diagnose and reject non-function ifunc resolvers
Signed-off-by: Itay Bookstein <ibookstein@gmail.com>

Reviewed By: MaskRay, erichkeane

Differential Revision: https://reviews.llvm.org/D112868
2021-11-09 23:51:36 +02:00
Atmn Patel
737c4a2673 [clang][openmp][NFC] Remove arch-specific CGOpenMPRuntimeGPU files
The existing CGOpenMPRuntimeAMDGCN and CGOpenMPRuntimeNVPTX classes are
just code bloat. By removing them, the codebase gets a bit cleaner.

Reviewed By: jdoerfert, JonChesterfield, tianshilei1992

Differential Revision: https://reviews.llvm.org/D113421
2021-11-09 15:11:05 -05:00
Atmn Patel
ef717f3852 Revert "[clang][openmp][NFC] Remove arch-specific CGOpenMPRuntimeGPU files"
This reverts commit 81a7cad2ffc18f15b732f69d991c8398c979c5ca.
2021-11-09 02:10:42 -05:00
Atmn Patel
81a7cad2ff [clang][openmp][NFC] Remove arch-specific CGOpenMPRuntimeGPU files
The existing CGOpenMPRuntimeAMDGCN and CGOpenMPRuntimeNVPTX classes are
just code bloat. By removing them, the codebase gets a bit cleaner.

Reviewed By: jdoerfert, JonChesterfield, tianshilei1992

Differential Revision: https://reviews.llvm.org/D113421
2021-11-09 01:52:52 -05:00
Benjamin Kramer
8adb6d6de2 [clang] Use llvm::reverse. NFCI. 2021-11-07 14:24:33 +01:00
Itay Bookstein
848812a55e [Verifier] Add verification logic for GlobalIFuncs
Verify that the resolver exists, that it is a defined
Function, and that its return type matches the ifunc's
type. Add corresponding check to BitcodeReader, change
clang to emit the correct type, and fix tests to comply.

Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D112349
2021-10-31 20:00:57 -07:00
Aaron Ballman
aad244dfc5 Revert "AddGlobalAnnotations for function with or without function body."
This reverts commit 121b2252de0eed68f2ddf5f09e924a6c35423d47.

The following code causes a crash in some circumstances:

  struct k {
    ~k() __attribute__((annotate(""))) {}
  };
  void m() { k(); }
2021-10-21 07:08:18 -04:00
Itay Bookstein
08ed216000 [IR] Refactor GlobalIFunc to inherit from GlobalObject, Remove GlobalIndirectSymbol
As discussed in:
* https://reviews.llvm.org/D94166
* https://lists.llvm.org/pipermail/llvm-dev/2020-September/145031.html

The GlobalIndirectSymbol class lost most of its meaning in
https://reviews.llvm.org/D109792, which disambiguated getBaseObject
(now getAliaseeObject) between GlobalIFunc and everything else.
In addition, as long as GlobalIFunc is not a GlobalObject and
getAliaseeObject returns GlobalObjects, a GlobalAlias whose aliasee
is a GlobalIFunc cannot currently be modeled properly. Creating
aliases for GlobalIFuncs does happen in the wild (e.g. glibc). In addition,
calling getAliaseeObject on a GlobalIFunc will currently return nullptr,
which is undesirable because it should return the object itself for
non-aliases.

This patch refactors the GlobalIFunc class to inherit directly from
GlobalObject, and removes GlobalIndirectSymbol (while inlining the
relevant parts into GlobalAlias and GlobalIFunc). This allows for
calling getAliaseeObject() on a GlobalIFunc to return the GlobalIFunc
itself, making getAliaseeObject() more consistent and enabling
alias-to-ifunc to be properly modeled in the IR.

I exercised some judgement in the API clients of GlobalIndirectSymbol:
some were 'monomorphized' for GlobalAlias and GlobalIFunc, and
some remained shared (with the type adapted to become GlobalValue).

Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D108872
2021-10-20 10:29:47 -07:00
Kazu Hirata
d245f2e859 [clang] Use llvm::erase_if (NFC) 2021-10-17 13:50:29 -07:00
Chris Bieneman
121b2252de AddGlobalAnnotations for function with or without function body.
When AnnotateAttr is on a function, AddGlobalAnnotations is only called
in CodeGenModule::EmitGlobalFunctionDefinition which means AnnotateAttr
on function declaration without function body will be ignored.
The patch will move AddGlobalAnnotations  to
CodeGenModule::SetFunctionAttributes, so with or without function body,
the AnnotateAttr will get code gen for a function.

It'll help case when AnnotateAttr is on external function, and the
AnnotateAttr will be consumed in IR level.

For example, a pass to collect num of uses for functions with
__attribute((annotate("count_use"))) after optimizations,
As long as there's __attribute((annotate("count_use"))), function with
or without function body should be counted.

Reviewed By: aaron.ballman

Differential Revision: https://reviews.llvm.org/D111109

Patch by:  python3kgae (Xiang Li)
2021-10-11 14:50:34 -05:00
Arthur Eubanks
05392466f0 Reland [IR] Increase max alignment to 4GB
Currently the max alignment representable is 1GB, see D108661.
Setting the align of an object to 4GB is desirable in some cases to make sure the lower 32 bits are clear which can be used for some optimizations, e.g. https://crbug.com/1016945.

This uses an extra bit in instructions that carry an alignment. We can store 15 bits of "free" information, and with this change some instructions (e.g. AtomicCmpXchgInst) use 14 bits.
We can increase the max alignment representable above 4GB (up to 2^62) since we're only using 33 of the 64 values, but I've just limited it to 4GB for now.

The one place we have to update the bitcode format is for the alloca instruction. It stores its alignment into 5 bits of a 32 bit bitfield. I've added another field which is 8 bits and should be future proof for a while. For backward compatibility, we check if the old field has a value and use that, otherwise use the new field.

Updating clang's max allowed alignment will come in a future patch.

Reviewed By: hans

Differential Revision: https://reviews.llvm.org/D110451
2021-10-06 13:29:23 -07:00
Arthur Eubanks
569346f274 Revert "Reland [IR] Increase max alignment to 4GB"
This reverts commit 8d64314ffea55f2ad94c1b489586daa8ce30f451.
2021-10-06 11:38:11 -07:00
Arthur Eubanks
8d64314ffe Reland [IR] Increase max alignment to 4GB
Currently the max alignment representable is 1GB, see D108661.
Setting the align of an object to 4GB is desirable in some cases to make sure the lower 32 bits are clear which can be used for some optimizations, e.g. https://crbug.com/1016945.

This uses an extra bit in instructions that carry an alignment. We can store 15 bits of "free" information, and with this change some instructions (e.g. AtomicCmpXchgInst) use 14 bits.
We can increase the max alignment representable above 4GB (up to 2^62) since we're only using 33 of the 64 values, but I've just limited it to 4GB for now.

The one place we have to update the bitcode format is for the alloca instruction. It stores its alignment into 5 bits of a 32 bit bitfield. I've added another field which is 8 bits and should be future proof for a while. For backward compatibility, we check if the old field has a value and use that, otherwise use the new field.

Updating clang's max allowed alignment will come in a future patch.

Reviewed By: hans

Differential Revision: https://reviews.llvm.org/D110451
2021-10-06 11:03:51 -07:00
Arthur Eubanks
72cf8b6044 Revert "[IR] Increase max alignment to 4GB"
This reverts commit df84c1fe78130a86445d57563dea742e1b85156a.

Breaks some bots
2021-10-06 10:21:35 -07:00
Arthur Eubanks
df84c1fe78 [IR] Increase max alignment to 4GB
Currently the max alignment representable is 1GB, see D108661.
Setting the align of an object to 4GB is desirable in some cases to make sure the lower 32 bits are clear which can be used for some optimizations, e.g. https://crbug.com/1016945.

This uses an extra bit in instructions that carry an alignment. We can store 15 bits of "free" information, and with this change some instructions (e.g. AtomicCmpXchgInst) use 14 bits.
We can increase the max alignment representable above 4GB (up to 2^62) since we're only using 33 of the 64 values, but I've just limited it to 4GB for now.

The one place we have to update the bitcode format is for the alloca instruction. It stores its alignment into 5 bits of a 32 bit bitfield. I've added another field which is 8 bits and should be future proof for a while. For backward compatibility, we check if the old field has a value and use that, otherwise use the new field.

Updating clang's max allowed alignment will come in a future patch.

Reviewed By: hans

Differential Revision: https://reviews.llvm.org/D110451
2021-10-06 09:54:14 -07:00
Arthur Eubanks
aa53785f23 Reland [clang] Rework dontcall attributes
To avoid using the AST when emitting diagnostics, split the "dontcall"
attribute into "dontcall-warn" and "dontcall-error", and also add the
frontend attribute value as the LLVM attribute value. This gives us all
the information to report diagnostics we need from within the IR (aside
from access to the original source).

One downside is we directly use LLVM's demangler rather than using the
existing Clang diagnostic pretty printing of symbols.

Previous revisions didn't properly declare the new dependencies.

Reviewed By: nickdesaulniers

Differential Revision: https://reviews.llvm.org/D110364
2021-09-28 15:31:30 -07:00
Arthur Eubanks
7833d20f1f Revert "[clang] Rework dontcall attributes"
This reverts commit 2943071e2ee0c7f31f34062a44d12aeb0e3a66fd.

Breaks bots
2021-09-28 14:49:27 -07:00
Arthur Eubanks
2943071e2e [clang] Rework dontcall attributes
To avoid using the AST when emitting diagnostics, split the "dontcall"
attribute into "dontcall-warn" and "dontcall-error", and also add the
frontend attribute value as the LLVM attribute value. This gives us all
the information to report diagnostics we need from within the IR (aside
from access to the original source).

One downside is we directly use LLVM's demangler rather than using the
existing Clang diagnostic pretty printing of symbols.

Reviewed By: nickdesaulniers

Differential Revision: https://reviews.llvm.org/D110364
2021-09-28 14:21:10 -07:00
serge-sans-paille
c3717b6858 Simplify handling of builtin with inline redefinition
(This is a recommit of 3d6f49a56995b845 that should no longer break validation since
bd379915de38a9af3d65e1).

It is a common practice in glibc header to provide an inline redefinition of an
existing function. It is especially the case for fortified function.

Clang currently has an imperfect approach to the problem, using a combination of
trivially recursive function detection and noinline attribute.

Simplify the logic by suffixing these functions by `.inline` during codegen, so
that they are not recognized as builtin by llvm.

After that patch, clang passes all tests from https://github.com/serge-sans-paille/fortify-test-suite

Differential Revision: https://reviews.llvm.org/D109967
2021-09-28 21:00:47 +02:00