llvm-project/clang/test/CodeGen/attr-cpuspecific.c

Ignoring revisions in .git-blame-ignore-revs. Click here to bypass and see the normal blame view.

361 lines
15 KiB
C
Raw Normal View History

// RUN: %clang_cc1 -triple x86_64-linux-gnu -emit-llvm -o - %s | FileCheck %s --check-prefixes=CHECK,LINUX
// RUN: %clang_cc1 -triple x86_64-apple-macos -emit-llvm -o - %s | FileCheck %s --check-prefixes=CHECK,LINUX
// RUN: %clang_cc1 -triple x86_64-windows-pc -fms-compatibility -emit-llvm -o - %s | FileCheck %s --check-prefixes=CHECK,WINDOWS
Implement cpu_dispatch/cpu_specific Multiversioning As documented here: https://software.intel.com/en-us/node/682969 and https://software.intel.com/en-us/node/523346. cpu_dispatch multiversioning is an ICC feature that provides for function multiversioning. This feature is implemented with two attributes: First, cpu_specific, which specifies the individual function versions. Second, cpu_dispatch, which specifies the location of the resolver function and the list of resolvable functions. This is valuable since it provides a mechanism where the resolver's TU can be specified in one location, and the individual implementions each in their own translation units. The goal of this patch is to be source-compatible with ICC, so this implementation diverges from the ICC implementation in a few ways: 1- Linux x86/64 only: This implementation uses ifuncs in order to properly dispatch functions. This is is a valuable performance benefit over the ICC implementation. A future patch will be provided to enable this feature on Windows, but it will obviously more closely fit ICC's implementation. 2- CPU Identification functions: ICC uses a set of custom functions to identify the feature list of the host processor. This patch uses the cpu_supports functionality in order to better align with 'target' multiversioning. 1- cpu_dispatch function def/decl: ICC's cpu_dispatch requires that the function marked cpu_dispatch be an empty definition. This patch supports that as well, however declarations are also permitted, since the linker will solve the issue of multiple emissions. Differential Revision: https://reviews.llvm.org/D47474 llvm-svn: 337552
2018-07-20 14:13:28 +00:00
#ifdef _WIN64
#define ATTR(X) __declspec(X)
#else
#define ATTR(X) __attribute__((X))
#endif // _WIN64
Implement cpu_dispatch/cpu_specific Multiversioning As documented here: https://software.intel.com/en-us/node/682969 and https://software.intel.com/en-us/node/523346. cpu_dispatch multiversioning is an ICC feature that provides for function multiversioning. This feature is implemented with two attributes: First, cpu_specific, which specifies the individual function versions. Second, cpu_dispatch, which specifies the location of the resolver function and the list of resolvable functions. This is valuable since it provides a mechanism where the resolver's TU can be specified in one location, and the individual implementions each in their own translation units. The goal of this patch is to be source-compatible with ICC, so this implementation diverges from the ICC implementation in a few ways: 1- Linux x86/64 only: This implementation uses ifuncs in order to properly dispatch functions. This is is a valuable performance benefit over the ICC implementation. A future patch will be provided to enable this feature on Windows, but it will obviously more closely fit ICC's implementation. 2- CPU Identification functions: ICC uses a set of custom functions to identify the feature list of the host processor. This patch uses the cpu_supports functionality in order to better align with 'target' multiversioning. 1- cpu_dispatch function def/decl: ICC's cpu_dispatch requires that the function marked cpu_dispatch be an empty definition. This patch supports that as well, however declarations are also permitted, since the linker will solve the issue of multiple emissions. Differential Revision: https://reviews.llvm.org/D47474 llvm-svn: 337552
2018-07-20 14:13:28 +00:00
// Each version should have an IFunc and an alias.
// LINUX: @SingleVersion = weak_odr alias void (), ptr @SingleVersion.ifunc
// LINUX: @TwoVersions = weak_odr alias void (), ptr @TwoVersions.ifunc
// LINUX: @OrderDispatchUsageSpecific = weak_odr alias void (), ptr @OrderDispatchUsageSpecific.ifunc
// LINUX: @TwoVersionsSameAttr = weak_odr alias void (), ptr @TwoVersionsSameAttr.ifunc
// LINUX: @ThreeVersionsSameAttr = weak_odr alias void (), ptr @ThreeVersionsSameAttr.ifunc
// LINUX: @OrderSpecificUsageDispatch = weak_odr alias void (), ptr @OrderSpecificUsageDispatch.ifunc
// LINUX: @NoSpecifics = weak_odr alias void (), ptr @NoSpecifics.ifunc
// LINUX: @HasGeneric = weak_odr alias void (), ptr @HasGeneric.ifunc
// LINUX: @HasParams = weak_odr alias void (i32, double), ptr @HasParams.ifunc
// LINUX: @HasParamsAndReturn = weak_odr alias i32 (i32, double), ptr @HasParamsAndReturn.ifunc
// LINUX: @GenericAndPentium = weak_odr alias i32 (i32, double), ptr @GenericAndPentium.ifunc
// LINUX: @DispatchFirst = weak_odr alias i32 (), ptr @DispatchFirst.ifunc
// LINUX: @SingleVersion.ifunc = weak_odr ifunc void (), ptr @SingleVersion.resolver
// LINUX: @TwoVersions.ifunc = weak_odr ifunc void (), ptr @TwoVersions.resolver
// LINUX: @OrderDispatchUsageSpecific.ifunc = weak_odr ifunc void (), ptr @OrderDispatchUsageSpecific.resolver
// LINUX: @TwoVersionsSameAttr.ifunc = weak_odr ifunc void (), ptr @TwoVersionsSameAttr.resolver
// LINUX: @ThreeVersionsSameAttr.ifunc = weak_odr ifunc void (), ptr @ThreeVersionsSameAttr.resolver
// LINUX: @OrderSpecificUsageDispatch.ifunc = weak_odr ifunc void (), ptr @OrderSpecificUsageDispatch.resolver
// LINUX: @NoSpecifics.ifunc = weak_odr ifunc void (), ptr @NoSpecifics.resolver
// LINUX: @HasGeneric.ifunc = weak_odr ifunc void (), ptr @HasGeneric.resolver
// LINUX: @HasParams.ifunc = weak_odr ifunc void (i32, double), ptr @HasParams.resolver
// LINUX: @HasParamsAndReturn.ifunc = weak_odr ifunc i32 (i32, double), ptr @HasParamsAndReturn.resolver
// LINUX: @GenericAndPentium.ifunc = weak_odr ifunc i32 (i32, double), ptr @GenericAndPentium.resolver
// LINUX: @DispatchFirst.ifunc = weak_odr ifunc i32 (), ptr @DispatchFirst.resolver
Implement cpu_dispatch/cpu_specific Multiversioning As documented here: https://software.intel.com/en-us/node/682969 and https://software.intel.com/en-us/node/523346. cpu_dispatch multiversioning is an ICC feature that provides for function multiversioning. This feature is implemented with two attributes: First, cpu_specific, which specifies the individual function versions. Second, cpu_dispatch, which specifies the location of the resolver function and the list of resolvable functions. This is valuable since it provides a mechanism where the resolver's TU can be specified in one location, and the individual implementions each in their own translation units. The goal of this patch is to be source-compatible with ICC, so this implementation diverges from the ICC implementation in a few ways: 1- Linux x86/64 only: This implementation uses ifuncs in order to properly dispatch functions. This is is a valuable performance benefit over the ICC implementation. A future patch will be provided to enable this feature on Windows, but it will obviously more closely fit ICC's implementation. 2- CPU Identification functions: ICC uses a set of custom functions to identify the feature list of the host processor. This patch uses the cpu_supports functionality in order to better align with 'target' multiversioning. 1- cpu_dispatch function def/decl: ICC's cpu_dispatch requires that the function marked cpu_dispatch be an empty definition. This patch supports that as well, however declarations are also permitted, since the linker will solve the issue of multiple emissions. Differential Revision: https://reviews.llvm.org/D47474 llvm-svn: 337552
2018-07-20 14:13:28 +00:00
ATTR(cpu_specific(ivybridge))
Implement cpu_dispatch/cpu_specific Multiversioning As documented here: https://software.intel.com/en-us/node/682969 and https://software.intel.com/en-us/node/523346. cpu_dispatch multiversioning is an ICC feature that provides for function multiversioning. This feature is implemented with two attributes: First, cpu_specific, which specifies the individual function versions. Second, cpu_dispatch, which specifies the location of the resolver function and the list of resolvable functions. This is valuable since it provides a mechanism where the resolver's TU can be specified in one location, and the individual implementions each in their own translation units. The goal of this patch is to be source-compatible with ICC, so this implementation diverges from the ICC implementation in a few ways: 1- Linux x86/64 only: This implementation uses ifuncs in order to properly dispatch functions. This is is a valuable performance benefit over the ICC implementation. A future patch will be provided to enable this feature on Windows, but it will obviously more closely fit ICC's implementation. 2- CPU Identification functions: ICC uses a set of custom functions to identify the feature list of the host processor. This patch uses the cpu_supports functionality in order to better align with 'target' multiversioning. 1- cpu_dispatch function def/decl: ICC's cpu_dispatch requires that the function marked cpu_dispatch be an empty definition. This patch supports that as well, however declarations are also permitted, since the linker will solve the issue of multiple emissions. Differential Revision: https://reviews.llvm.org/D47474 llvm-svn: 337552
2018-07-20 14:13:28 +00:00
void SingleVersion(void){}
// LINUX: define{{.*}} void @SingleVersion.S() #[[S:[0-9]+]]
// WINDOWS: define dso_local void @SingleVersion.S() #[[S:[0-9]+]]
Implement cpu_dispatch/cpu_specific Multiversioning As documented here: https://software.intel.com/en-us/node/682969 and https://software.intel.com/en-us/node/523346. cpu_dispatch multiversioning is an ICC feature that provides for function multiversioning. This feature is implemented with two attributes: First, cpu_specific, which specifies the individual function versions. Second, cpu_dispatch, which specifies the location of the resolver function and the list of resolvable functions. This is valuable since it provides a mechanism where the resolver's TU can be specified in one location, and the individual implementions each in their own translation units. The goal of this patch is to be source-compatible with ICC, so this implementation diverges from the ICC implementation in a few ways: 1- Linux x86/64 only: This implementation uses ifuncs in order to properly dispatch functions. This is is a valuable performance benefit over the ICC implementation. A future patch will be provided to enable this feature on Windows, but it will obviously more closely fit ICC's implementation. 2- CPU Identification functions: ICC uses a set of custom functions to identify the feature list of the host processor. This patch uses the cpu_supports functionality in order to better align with 'target' multiversioning. 1- cpu_dispatch function def/decl: ICC's cpu_dispatch requires that the function marked cpu_dispatch be an empty definition. This patch supports that as well, however declarations are also permitted, since the linker will solve the issue of multiple emissions. Differential Revision: https://reviews.llvm.org/D47474 llvm-svn: 337552
2018-07-20 14:13:28 +00:00
[clang][CodeGen] Avoid emitting ifuncs with undefined resolvers The purpose of this change is to fix the following codegen bug: ``` // main.c __attribute__((cpu_specific(generic))) int *foo(void) { static int z; return &z;} int main() { return *foo() = 5; } // other.c __attribute__((cpu_dispatch(generic))) int *foo(void); // run: clang main.c other.c -o main; ./main ``` This will segfault prior to the change, and return the correct exit code 5 after the change. The underlying cause is that when a translation unit contains a cpu_specific function without the corresponding cpu_dispatch the generated code binds the reference to foo() against a GlobalIFunc whose resolver is undefined. This is invalid: the resolver must be defined in the same translation unit as the ifunc, but historically the LLVM bitcode verifier did not check that. The generated code then binds against the resolver rather than the ifunc, so it ends up calling the resolver rather than the resolvee. In the example above it treats its return value as an int *, therefore trying to write to program text. The root issue at the representation level is that GlobalIFunc, like GlobalAlias, does not support a "declaration" state. The object which provides the correct semantics in these cases is a Function declaration, but unlike Functions, changing a declaration to a definition in the GlobalIFunc case constitutes a change of the object type, as opposed to simply emitting code into a Function. I think this limitation is unlikely to change, so I implemented the fix by returning a function declaration rather than an ifunc when encountering cpu_specific, and upgrading it to an ifunc when emitting cpu_dispatch. This uses `takeName` + `replaceAllUsesWith` in similar vein to other places where the correct IR object type cannot be known locally/up-front, like in `CodeGenModule::EmitAliasDefinition`. Previous discussion in: https://reviews.llvm.org/D112349 Signed-off-by: Itay Bookstein <ibookstein@gmail.com> Reviewed By: erichkeane Differential Revision: https://reviews.llvm.org/D120266
2022-01-29 14:32:54 +02:00
ATTR(cpu_dispatch(ivybridge))
void SingleVersion(void);
// LINUX: define weak_odr ptr @SingleVersion.resolver()
[clang][CodeGen] Avoid emitting ifuncs with undefined resolvers The purpose of this change is to fix the following codegen bug: ``` // main.c __attribute__((cpu_specific(generic))) int *foo(void) { static int z; return &z;} int main() { return *foo() = 5; } // other.c __attribute__((cpu_dispatch(generic))) int *foo(void); // run: clang main.c other.c -o main; ./main ``` This will segfault prior to the change, and return the correct exit code 5 after the change. The underlying cause is that when a translation unit contains a cpu_specific function without the corresponding cpu_dispatch the generated code binds the reference to foo() against a GlobalIFunc whose resolver is undefined. This is invalid: the resolver must be defined in the same translation unit as the ifunc, but historically the LLVM bitcode verifier did not check that. The generated code then binds against the resolver rather than the ifunc, so it ends up calling the resolver rather than the resolvee. In the example above it treats its return value as an int *, therefore trying to write to program text. The root issue at the representation level is that GlobalIFunc, like GlobalAlias, does not support a "declaration" state. The object which provides the correct semantics in these cases is a Function declaration, but unlike Functions, changing a declaration to a definition in the GlobalIFunc case constitutes a change of the object type, as opposed to simply emitting code into a Function. I think this limitation is unlikely to change, so I implemented the fix by returning a function declaration rather than an ifunc when encountering cpu_specific, and upgrading it to an ifunc when emitting cpu_dispatch. This uses `takeName` + `replaceAllUsesWith` in similar vein to other places where the correct IR object type cannot be known locally/up-front, like in `CodeGenModule::EmitAliasDefinition`. Previous discussion in: https://reviews.llvm.org/D112349 Signed-off-by: Itay Bookstein <ibookstein@gmail.com> Reviewed By: erichkeane Differential Revision: https://reviews.llvm.org/D120266
2022-01-29 14:32:54 +02:00
// LINUX: call void @__cpu_indicator_init
// LINUX: %[[FEAT_INIT:.+]] = load i32, ptr getelementptr inbounds ({ i32, i32, i32, [1 x i32] }, ptr @__cpu_model, i32 0, i32 3, i32 0), align 4
// LINUX: %[[FEAT_JOIN:.+]] = and i32 %[[FEAT_INIT]], 525311
// LINUX: %[[FEAT_CHECK:.+]] = icmp eq i32 %[[FEAT_JOIN]], 525311
// LINUX: ret ptr @SingleVersion.S
[clang][CodeGen] Avoid emitting ifuncs with undefined resolvers The purpose of this change is to fix the following codegen bug: ``` // main.c __attribute__((cpu_specific(generic))) int *foo(void) { static int z; return &z;} int main() { return *foo() = 5; } // other.c __attribute__((cpu_dispatch(generic))) int *foo(void); // run: clang main.c other.c -o main; ./main ``` This will segfault prior to the change, and return the correct exit code 5 after the change. The underlying cause is that when a translation unit contains a cpu_specific function without the corresponding cpu_dispatch the generated code binds the reference to foo() against a GlobalIFunc whose resolver is undefined. This is invalid: the resolver must be defined in the same translation unit as the ifunc, but historically the LLVM bitcode verifier did not check that. The generated code then binds against the resolver rather than the ifunc, so it ends up calling the resolver rather than the resolvee. In the example above it treats its return value as an int *, therefore trying to write to program text. The root issue at the representation level is that GlobalIFunc, like GlobalAlias, does not support a "declaration" state. The object which provides the correct semantics in these cases is a Function declaration, but unlike Functions, changing a declaration to a definition in the GlobalIFunc case constitutes a change of the object type, as opposed to simply emitting code into a Function. I think this limitation is unlikely to change, so I implemented the fix by returning a function declaration rather than an ifunc when encountering cpu_specific, and upgrading it to an ifunc when emitting cpu_dispatch. This uses `takeName` + `replaceAllUsesWith` in similar vein to other places where the correct IR object type cannot be known locally/up-front, like in `CodeGenModule::EmitAliasDefinition`. Previous discussion in: https://reviews.llvm.org/D112349 Signed-off-by: Itay Bookstein <ibookstein@gmail.com> Reviewed By: erichkeane Differential Revision: https://reviews.llvm.org/D120266
2022-01-29 14:32:54 +02:00
// LINUX: call void @llvm.trap
// LINUX: unreachable
// WINDOWS: define weak_odr dso_local void @SingleVersion() comdat
// WINDOWS: call void @__cpu_indicator_init()
// WINDOWS: %[[FEAT_INIT:.+]] = load i32, ptr getelementptr inbounds ({ i32, i32, i32, [1 x i32] }, ptr @__cpu_model, i32 0, i32 3, i32 0), align 4
// WINDOWS: %[[FEAT_JOIN:.+]] = and i32 %[[FEAT_INIT]], 525311
// WINDOWS: %[[FEAT_CHECK:.+]] = icmp eq i32 %[[FEAT_JOIN]], 525311
[clang][CodeGen] Avoid emitting ifuncs with undefined resolvers The purpose of this change is to fix the following codegen bug: ``` // main.c __attribute__((cpu_specific(generic))) int *foo(void) { static int z; return &z;} int main() { return *foo() = 5; } // other.c __attribute__((cpu_dispatch(generic))) int *foo(void); // run: clang main.c other.c -o main; ./main ``` This will segfault prior to the change, and return the correct exit code 5 after the change. The underlying cause is that when a translation unit contains a cpu_specific function without the corresponding cpu_dispatch the generated code binds the reference to foo() against a GlobalIFunc whose resolver is undefined. This is invalid: the resolver must be defined in the same translation unit as the ifunc, but historically the LLVM bitcode verifier did not check that. The generated code then binds against the resolver rather than the ifunc, so it ends up calling the resolver rather than the resolvee. In the example above it treats its return value as an int *, therefore trying to write to program text. The root issue at the representation level is that GlobalIFunc, like GlobalAlias, does not support a "declaration" state. The object which provides the correct semantics in these cases is a Function declaration, but unlike Functions, changing a declaration to a definition in the GlobalIFunc case constitutes a change of the object type, as opposed to simply emitting code into a Function. I think this limitation is unlikely to change, so I implemented the fix by returning a function declaration rather than an ifunc when encountering cpu_specific, and upgrading it to an ifunc when emitting cpu_dispatch. This uses `takeName` + `replaceAllUsesWith` in similar vein to other places where the correct IR object type cannot be known locally/up-front, like in `CodeGenModule::EmitAliasDefinition`. Previous discussion in: https://reviews.llvm.org/D112349 Signed-off-by: Itay Bookstein <ibookstein@gmail.com> Reviewed By: erichkeane Differential Revision: https://reviews.llvm.org/D120266
2022-01-29 14:32:54 +02:00
// WINDOWS: call void @SingleVersion.S()
// WINDOWS-NEXT: ret void
// WINDOWS: call void @llvm.trap
// WINDOWS: unreachable
ATTR(cpu_specific(ivybridge))
Implement cpu_dispatch/cpu_specific Multiversioning As documented here: https://software.intel.com/en-us/node/682969 and https://software.intel.com/en-us/node/523346. cpu_dispatch multiversioning is an ICC feature that provides for function multiversioning. This feature is implemented with two attributes: First, cpu_specific, which specifies the individual function versions. Second, cpu_dispatch, which specifies the location of the resolver function and the list of resolvable functions. This is valuable since it provides a mechanism where the resolver's TU can be specified in one location, and the individual implementions each in their own translation units. The goal of this patch is to be source-compatible with ICC, so this implementation diverges from the ICC implementation in a few ways: 1- Linux x86/64 only: This implementation uses ifuncs in order to properly dispatch functions. This is is a valuable performance benefit over the ICC implementation. A future patch will be provided to enable this feature on Windows, but it will obviously more closely fit ICC's implementation. 2- CPU Identification functions: ICC uses a set of custom functions to identify the feature list of the host processor. This patch uses the cpu_supports functionality in order to better align with 'target' multiversioning. 1- cpu_dispatch function def/decl: ICC's cpu_dispatch requires that the function marked cpu_dispatch be an empty definition. This patch supports that as well, however declarations are also permitted, since the linker will solve the issue of multiple emissions. Differential Revision: https://reviews.llvm.org/D47474 llvm-svn: 337552
2018-07-20 14:13:28 +00:00
void NotCalled(void){}
// LINUX: define{{.*}} void @NotCalled.S() #[[S]]
// WINDOWS: define dso_local void @NotCalled.S() #[[S:[0-9]+]]
Implement cpu_dispatch/cpu_specific Multiversioning As documented here: https://software.intel.com/en-us/node/682969 and https://software.intel.com/en-us/node/523346. cpu_dispatch multiversioning is an ICC feature that provides for function multiversioning. This feature is implemented with two attributes: First, cpu_specific, which specifies the individual function versions. Second, cpu_dispatch, which specifies the location of the resolver function and the list of resolvable functions. This is valuable since it provides a mechanism where the resolver's TU can be specified in one location, and the individual implementions each in their own translation units. The goal of this patch is to be source-compatible with ICC, so this implementation diverges from the ICC implementation in a few ways: 1- Linux x86/64 only: This implementation uses ifuncs in order to properly dispatch functions. This is is a valuable performance benefit over the ICC implementation. A future patch will be provided to enable this feature on Windows, but it will obviously more closely fit ICC's implementation. 2- CPU Identification functions: ICC uses a set of custom functions to identify the feature list of the host processor. This patch uses the cpu_supports functionality in order to better align with 'target' multiversioning. 1- cpu_dispatch function def/decl: ICC's cpu_dispatch requires that the function marked cpu_dispatch be an empty definition. This patch supports that as well, however declarations are also permitted, since the linker will solve the issue of multiple emissions. Differential Revision: https://reviews.llvm.org/D47474 llvm-svn: 337552
2018-07-20 14:13:28 +00:00
// Done before any of the implementations. Also has an undecorated forward
// declaration.
void TwoVersions(void);
ATTR(cpu_dispatch(ivybridge, knl))
Implement cpu_dispatch/cpu_specific Multiversioning As documented here: https://software.intel.com/en-us/node/682969 and https://software.intel.com/en-us/node/523346. cpu_dispatch multiversioning is an ICC feature that provides for function multiversioning. This feature is implemented with two attributes: First, cpu_specific, which specifies the individual function versions. Second, cpu_dispatch, which specifies the location of the resolver function and the list of resolvable functions. This is valuable since it provides a mechanism where the resolver's TU can be specified in one location, and the individual implementions each in their own translation units. The goal of this patch is to be source-compatible with ICC, so this implementation diverges from the ICC implementation in a few ways: 1- Linux x86/64 only: This implementation uses ifuncs in order to properly dispatch functions. This is is a valuable performance benefit over the ICC implementation. A future patch will be provided to enable this feature on Windows, but it will obviously more closely fit ICC's implementation. 2- CPU Identification functions: ICC uses a set of custom functions to identify the feature list of the host processor. This patch uses the cpu_supports functionality in order to better align with 'target' multiversioning. 1- cpu_dispatch function def/decl: ICC's cpu_dispatch requires that the function marked cpu_dispatch be an empty definition. This patch supports that as well, however declarations are also permitted, since the linker will solve the issue of multiple emissions. Differential Revision: https://reviews.llvm.org/D47474 llvm-svn: 337552
2018-07-20 14:13:28 +00:00
void TwoVersions(void);
// LINUX: define weak_odr ptr @TwoVersions.resolver()
// LINUX: call void @__cpu_indicator_init
// LINUX: %[[FEAT_INIT:.+]] = load i32, ptr getelementptr inbounds ({ i32, i32, i32, [1 x i32] }, ptr @__cpu_model, i32 0, i32 3, i32 0), align 4
// LINUX: %[[FEAT_JOIN:.+]] = and i32 %[[FEAT_INIT]], 9422847
// LINUX: %[[FEAT_CHECK:.+]] = icmp eq i32 %[[FEAT_JOIN]], 9422847
// LINUX: ret ptr @TwoVersions.Z
// LINUX: ret ptr @TwoVersions.S
// LINUX: call void @llvm.trap
// LINUX: unreachable
// WINDOWS: define weak_odr dso_local void @TwoVersions() comdat
// WINDOWS: call void @__cpu_indicator_init()
// WINDOWS: %[[FEAT_INIT:.+]] = load i32, ptr getelementptr inbounds ({ i32, i32, i32, [1 x i32] }, ptr @__cpu_model, i32 0, i32 3, i32 0), align 4
// WINDOWS: %[[FEAT_JOIN:.+]] = and i32 %[[FEAT_INIT]], 9422847
// WINDOWS: %[[FEAT_CHECK:.+]] = icmp eq i32 %[[FEAT_JOIN]], 9422847
// WINDOWS: call void @TwoVersions.Z()
// WINDOWS-NEXT: ret void
// WINDOWS: call void @TwoVersions.S()
// WINDOWS-NEXT: ret void
// WINDOWS: call void @llvm.trap
// WINDOWS: unreachable
ATTR(cpu_specific(ivybridge))
Implement cpu_dispatch/cpu_specific Multiversioning As documented here: https://software.intel.com/en-us/node/682969 and https://software.intel.com/en-us/node/523346. cpu_dispatch multiversioning is an ICC feature that provides for function multiversioning. This feature is implemented with two attributes: First, cpu_specific, which specifies the individual function versions. Second, cpu_dispatch, which specifies the location of the resolver function and the list of resolvable functions. This is valuable since it provides a mechanism where the resolver's TU can be specified in one location, and the individual implementions each in their own translation units. The goal of this patch is to be source-compatible with ICC, so this implementation diverges from the ICC implementation in a few ways: 1- Linux x86/64 only: This implementation uses ifuncs in order to properly dispatch functions. This is is a valuable performance benefit over the ICC implementation. A future patch will be provided to enable this feature on Windows, but it will obviously more closely fit ICC's implementation. 2- CPU Identification functions: ICC uses a set of custom functions to identify the feature list of the host processor. This patch uses the cpu_supports functionality in order to better align with 'target' multiversioning. 1- cpu_dispatch function def/decl: ICC's cpu_dispatch requires that the function marked cpu_dispatch be an empty definition. This patch supports that as well, however declarations are also permitted, since the linker will solve the issue of multiple emissions. Differential Revision: https://reviews.llvm.org/D47474 llvm-svn: 337552
2018-07-20 14:13:28 +00:00
void TwoVersions(void){}
// CHECK: define {{.*}}void @TwoVersions.S() #[[S]]
Implement cpu_dispatch/cpu_specific Multiversioning As documented here: https://software.intel.com/en-us/node/682969 and https://software.intel.com/en-us/node/523346. cpu_dispatch multiversioning is an ICC feature that provides for function multiversioning. This feature is implemented with two attributes: First, cpu_specific, which specifies the individual function versions. Second, cpu_dispatch, which specifies the location of the resolver function and the list of resolvable functions. This is valuable since it provides a mechanism where the resolver's TU can be specified in one location, and the individual implementions each in their own translation units. The goal of this patch is to be source-compatible with ICC, so this implementation diverges from the ICC implementation in a few ways: 1- Linux x86/64 only: This implementation uses ifuncs in order to properly dispatch functions. This is is a valuable performance benefit over the ICC implementation. A future patch will be provided to enable this feature on Windows, but it will obviously more closely fit ICC's implementation. 2- CPU Identification functions: ICC uses a set of custom functions to identify the feature list of the host processor. This patch uses the cpu_supports functionality in order to better align with 'target' multiversioning. 1- cpu_dispatch function def/decl: ICC's cpu_dispatch requires that the function marked cpu_dispatch be an empty definition. This patch supports that as well, however declarations are also permitted, since the linker will solve the issue of multiple emissions. Differential Revision: https://reviews.llvm.org/D47474 llvm-svn: 337552
2018-07-20 14:13:28 +00:00
ATTR(cpu_specific(knl))
Implement cpu_dispatch/cpu_specific Multiversioning As documented here: https://software.intel.com/en-us/node/682969 and https://software.intel.com/en-us/node/523346. cpu_dispatch multiversioning is an ICC feature that provides for function multiversioning. This feature is implemented with two attributes: First, cpu_specific, which specifies the individual function versions. Second, cpu_dispatch, which specifies the location of the resolver function and the list of resolvable functions. This is valuable since it provides a mechanism where the resolver's TU can be specified in one location, and the individual implementions each in their own translation units. The goal of this patch is to be source-compatible with ICC, so this implementation diverges from the ICC implementation in a few ways: 1- Linux x86/64 only: This implementation uses ifuncs in order to properly dispatch functions. This is is a valuable performance benefit over the ICC implementation. A future patch will be provided to enable this feature on Windows, but it will obviously more closely fit ICC's implementation. 2- CPU Identification functions: ICC uses a set of custom functions to identify the feature list of the host processor. This patch uses the cpu_supports functionality in order to better align with 'target' multiversioning. 1- cpu_dispatch function def/decl: ICC's cpu_dispatch requires that the function marked cpu_dispatch be an empty definition. This patch supports that as well, however declarations are also permitted, since the linker will solve the issue of multiple emissions. Differential Revision: https://reviews.llvm.org/D47474 llvm-svn: 337552
2018-07-20 14:13:28 +00:00
void TwoVersions(void){}
// CHECK: define {{.*}}void @TwoVersions.Z() #[[K:[0-9]+]]
Implement cpu_dispatch/cpu_specific Multiversioning As documented here: https://software.intel.com/en-us/node/682969 and https://software.intel.com/en-us/node/523346. cpu_dispatch multiversioning is an ICC feature that provides for function multiversioning. This feature is implemented with two attributes: First, cpu_specific, which specifies the individual function versions. Second, cpu_dispatch, which specifies the location of the resolver function and the list of resolvable functions. This is valuable since it provides a mechanism where the resolver's TU can be specified in one location, and the individual implementions each in their own translation units. The goal of this patch is to be source-compatible with ICC, so this implementation diverges from the ICC implementation in a few ways: 1- Linux x86/64 only: This implementation uses ifuncs in order to properly dispatch functions. This is is a valuable performance benefit over the ICC implementation. A future patch will be provided to enable this feature on Windows, but it will obviously more closely fit ICC's implementation. 2- CPU Identification functions: ICC uses a set of custom functions to identify the feature list of the host processor. This patch uses the cpu_supports functionality in order to better align with 'target' multiversioning. 1- cpu_dispatch function def/decl: ICC's cpu_dispatch requires that the function marked cpu_dispatch be an empty definition. This patch supports that as well, however declarations are also permitted, since the linker will solve the issue of multiple emissions. Differential Revision: https://reviews.llvm.org/D47474 llvm-svn: 337552
2018-07-20 14:13:28 +00:00
ATTR(cpu_specific(ivybridge, knl))
Implement cpu_dispatch/cpu_specific Multiversioning As documented here: https://software.intel.com/en-us/node/682969 and https://software.intel.com/en-us/node/523346. cpu_dispatch multiversioning is an ICC feature that provides for function multiversioning. This feature is implemented with two attributes: First, cpu_specific, which specifies the individual function versions. Second, cpu_dispatch, which specifies the location of the resolver function and the list of resolvable functions. This is valuable since it provides a mechanism where the resolver's TU can be specified in one location, and the individual implementions each in their own translation units. The goal of this patch is to be source-compatible with ICC, so this implementation diverges from the ICC implementation in a few ways: 1- Linux x86/64 only: This implementation uses ifuncs in order to properly dispatch functions. This is is a valuable performance benefit over the ICC implementation. A future patch will be provided to enable this feature on Windows, but it will obviously more closely fit ICC's implementation. 2- CPU Identification functions: ICC uses a set of custom functions to identify the feature list of the host processor. This patch uses the cpu_supports functionality in order to better align with 'target' multiversioning. 1- cpu_dispatch function def/decl: ICC's cpu_dispatch requires that the function marked cpu_dispatch be an empty definition. This patch supports that as well, however declarations are also permitted, since the linker will solve the issue of multiple emissions. Differential Revision: https://reviews.llvm.org/D47474 llvm-svn: 337552
2018-07-20 14:13:28 +00:00
void TwoVersionsSameAttr(void){}
// CHECK: define {{.*}}void @TwoVersionsSameAttr.S() #[[S]]
// CHECK: define {{.*}}void @TwoVersionsSameAttr.Z() #[[K]]
Implement cpu_dispatch/cpu_specific Multiversioning As documented here: https://software.intel.com/en-us/node/682969 and https://software.intel.com/en-us/node/523346. cpu_dispatch multiversioning is an ICC feature that provides for function multiversioning. This feature is implemented with two attributes: First, cpu_specific, which specifies the individual function versions. Second, cpu_dispatch, which specifies the location of the resolver function and the list of resolvable functions. This is valuable since it provides a mechanism where the resolver's TU can be specified in one location, and the individual implementions each in their own translation units. The goal of this patch is to be source-compatible with ICC, so this implementation diverges from the ICC implementation in a few ways: 1- Linux x86/64 only: This implementation uses ifuncs in order to properly dispatch functions. This is is a valuable performance benefit over the ICC implementation. A future patch will be provided to enable this feature on Windows, but it will obviously more closely fit ICC's implementation. 2- CPU Identification functions: ICC uses a set of custom functions to identify the feature list of the host processor. This patch uses the cpu_supports functionality in order to better align with 'target' multiversioning. 1- cpu_dispatch function def/decl: ICC's cpu_dispatch requires that the function marked cpu_dispatch be an empty definition. This patch supports that as well, however declarations are also permitted, since the linker will solve the issue of multiple emissions. Differential Revision: https://reviews.llvm.org/D47474 llvm-svn: 337552
2018-07-20 14:13:28 +00:00
ATTR(cpu_specific(atom, ivybridge, knl))
Implement cpu_dispatch/cpu_specific Multiversioning As documented here: https://software.intel.com/en-us/node/682969 and https://software.intel.com/en-us/node/523346. cpu_dispatch multiversioning is an ICC feature that provides for function multiversioning. This feature is implemented with two attributes: First, cpu_specific, which specifies the individual function versions. Second, cpu_dispatch, which specifies the location of the resolver function and the list of resolvable functions. This is valuable since it provides a mechanism where the resolver's TU can be specified in one location, and the individual implementions each in their own translation units. The goal of this patch is to be source-compatible with ICC, so this implementation diverges from the ICC implementation in a few ways: 1- Linux x86/64 only: This implementation uses ifuncs in order to properly dispatch functions. This is is a valuable performance benefit over the ICC implementation. A future patch will be provided to enable this feature on Windows, but it will obviously more closely fit ICC's implementation. 2- CPU Identification functions: ICC uses a set of custom functions to identify the feature list of the host processor. This patch uses the cpu_supports functionality in order to better align with 'target' multiversioning. 1- cpu_dispatch function def/decl: ICC's cpu_dispatch requires that the function marked cpu_dispatch be an empty definition. This patch supports that as well, however declarations are also permitted, since the linker will solve the issue of multiple emissions. Differential Revision: https://reviews.llvm.org/D47474 llvm-svn: 337552
2018-07-20 14:13:28 +00:00
void ThreeVersionsSameAttr(void){}
// CHECK: define {{.*}}void @ThreeVersionsSameAttr.O() #[[O:[0-9]+]]
// CHECK: define {{.*}}void @ThreeVersionsSameAttr.S() #[[S]]
// CHECK: define {{.*}}void @ThreeVersionsSameAttr.Z() #[[K]]
Implement cpu_dispatch/cpu_specific Multiversioning As documented here: https://software.intel.com/en-us/node/682969 and https://software.intel.com/en-us/node/523346. cpu_dispatch multiversioning is an ICC feature that provides for function multiversioning. This feature is implemented with two attributes: First, cpu_specific, which specifies the individual function versions. Second, cpu_dispatch, which specifies the location of the resolver function and the list of resolvable functions. This is valuable since it provides a mechanism where the resolver's TU can be specified in one location, and the individual implementions each in their own translation units. The goal of this patch is to be source-compatible with ICC, so this implementation diverges from the ICC implementation in a few ways: 1- Linux x86/64 only: This implementation uses ifuncs in order to properly dispatch functions. This is is a valuable performance benefit over the ICC implementation. A future patch will be provided to enable this feature on Windows, but it will obviously more closely fit ICC's implementation. 2- CPU Identification functions: ICC uses a set of custom functions to identify the feature list of the host processor. This patch uses the cpu_supports functionality in order to better align with 'target' multiversioning. 1- cpu_dispatch function def/decl: ICC's cpu_dispatch requires that the function marked cpu_dispatch be an empty definition. This patch supports that as well, however declarations are also permitted, since the linker will solve the issue of multiple emissions. Differential Revision: https://reviews.llvm.org/D47474 llvm-svn: 337552
2018-07-20 14:13:28 +00:00
[clang][CodeGen] Avoid emitting ifuncs with undefined resolvers The purpose of this change is to fix the following codegen bug: ``` // main.c __attribute__((cpu_specific(generic))) int *foo(void) { static int z; return &z;} int main() { return *foo() = 5; } // other.c __attribute__((cpu_dispatch(generic))) int *foo(void); // run: clang main.c other.c -o main; ./main ``` This will segfault prior to the change, and return the correct exit code 5 after the change. The underlying cause is that when a translation unit contains a cpu_specific function without the corresponding cpu_dispatch the generated code binds the reference to foo() against a GlobalIFunc whose resolver is undefined. This is invalid: the resolver must be defined in the same translation unit as the ifunc, but historically the LLVM bitcode verifier did not check that. The generated code then binds against the resolver rather than the ifunc, so it ends up calling the resolver rather than the resolvee. In the example above it treats its return value as an int *, therefore trying to write to program text. The root issue at the representation level is that GlobalIFunc, like GlobalAlias, does not support a "declaration" state. The object which provides the correct semantics in these cases is a Function declaration, but unlike Functions, changing a declaration to a definition in the GlobalIFunc case constitutes a change of the object type, as opposed to simply emitting code into a Function. I think this limitation is unlikely to change, so I implemented the fix by returning a function declaration rather than an ifunc when encountering cpu_specific, and upgrading it to an ifunc when emitting cpu_dispatch. This uses `takeName` + `replaceAllUsesWith` in similar vein to other places where the correct IR object type cannot be known locally/up-front, like in `CodeGenModule::EmitAliasDefinition`. Previous discussion in: https://reviews.llvm.org/D112349 Signed-off-by: Itay Bookstein <ibookstein@gmail.com> Reviewed By: erichkeane Differential Revision: https://reviews.llvm.org/D120266
2022-01-29 14:32:54 +02:00
ATTR(cpu_specific(knl))
void CpuSpecificNoDispatch(void) {}
// CHECK: define {{.*}}void @CpuSpecificNoDispatch.Z() #[[K:[0-9]+]]
ATTR(cpu_dispatch(knl))
void OrderDispatchUsageSpecific(void);
// LINUX: define weak_odr ptr @OrderDispatchUsageSpecific.resolver()
[clang][CodeGen] Avoid emitting ifuncs with undefined resolvers The purpose of this change is to fix the following codegen bug: ``` // main.c __attribute__((cpu_specific(generic))) int *foo(void) { static int z; return &z;} int main() { return *foo() = 5; } // other.c __attribute__((cpu_dispatch(generic))) int *foo(void); // run: clang main.c other.c -o main; ./main ``` This will segfault prior to the change, and return the correct exit code 5 after the change. The underlying cause is that when a translation unit contains a cpu_specific function without the corresponding cpu_dispatch the generated code binds the reference to foo() against a GlobalIFunc whose resolver is undefined. This is invalid: the resolver must be defined in the same translation unit as the ifunc, but historically the LLVM bitcode verifier did not check that. The generated code then binds against the resolver rather than the ifunc, so it ends up calling the resolver rather than the resolvee. In the example above it treats its return value as an int *, therefore trying to write to program text. The root issue at the representation level is that GlobalIFunc, like GlobalAlias, does not support a "declaration" state. The object which provides the correct semantics in these cases is a Function declaration, but unlike Functions, changing a declaration to a definition in the GlobalIFunc case constitutes a change of the object type, as opposed to simply emitting code into a Function. I think this limitation is unlikely to change, so I implemented the fix by returning a function declaration rather than an ifunc when encountering cpu_specific, and upgrading it to an ifunc when emitting cpu_dispatch. This uses `takeName` + `replaceAllUsesWith` in similar vein to other places where the correct IR object type cannot be known locally/up-front, like in `CodeGenModule::EmitAliasDefinition`. Previous discussion in: https://reviews.llvm.org/D112349 Signed-off-by: Itay Bookstein <ibookstein@gmail.com> Reviewed By: erichkeane Differential Revision: https://reviews.llvm.org/D120266
2022-01-29 14:32:54 +02:00
// LINUX: call void @__cpu_indicator_init
// LINUX: ret ptr @OrderDispatchUsageSpecific.Z
[clang][CodeGen] Avoid emitting ifuncs with undefined resolvers The purpose of this change is to fix the following codegen bug: ``` // main.c __attribute__((cpu_specific(generic))) int *foo(void) { static int z; return &z;} int main() { return *foo() = 5; } // other.c __attribute__((cpu_dispatch(generic))) int *foo(void); // run: clang main.c other.c -o main; ./main ``` This will segfault prior to the change, and return the correct exit code 5 after the change. The underlying cause is that when a translation unit contains a cpu_specific function without the corresponding cpu_dispatch the generated code binds the reference to foo() against a GlobalIFunc whose resolver is undefined. This is invalid: the resolver must be defined in the same translation unit as the ifunc, but historically the LLVM bitcode verifier did not check that. The generated code then binds against the resolver rather than the ifunc, so it ends up calling the resolver rather than the resolvee. In the example above it treats its return value as an int *, therefore trying to write to program text. The root issue at the representation level is that GlobalIFunc, like GlobalAlias, does not support a "declaration" state. The object which provides the correct semantics in these cases is a Function declaration, but unlike Functions, changing a declaration to a definition in the GlobalIFunc case constitutes a change of the object type, as opposed to simply emitting code into a Function. I think this limitation is unlikely to change, so I implemented the fix by returning a function declaration rather than an ifunc when encountering cpu_specific, and upgrading it to an ifunc when emitting cpu_dispatch. This uses `takeName` + `replaceAllUsesWith` in similar vein to other places where the correct IR object type cannot be known locally/up-front, like in `CodeGenModule::EmitAliasDefinition`. Previous discussion in: https://reviews.llvm.org/D112349 Signed-off-by: Itay Bookstein <ibookstein@gmail.com> Reviewed By: erichkeane Differential Revision: https://reviews.llvm.org/D120266
2022-01-29 14:32:54 +02:00
// LINUX: call void @llvm.trap
// LINUX: unreachable
// WINDOWS: define weak_odr dso_local void @OrderDispatchUsageSpecific() comdat
// WINDOWS: call void @__cpu_indicator_init()
// WINDOWS: call void @OrderDispatchUsageSpecific.Z()
// WINDOWS-NEXT: ret void
// WINDOWS: call void @llvm.trap
// WINDOWS: unreachable
// CHECK: define {{.*}}void @OrderDispatchUsageSpecific.Z()
ATTR(cpu_specific(knl))
void OrderSpecificUsageDispatch(void) {}
// CHECK: define {{.*}}void @OrderSpecificUsageDispatch.Z() #[[K:[0-9]+]]
void usages(void) {
Implement cpu_dispatch/cpu_specific Multiversioning As documented here: https://software.intel.com/en-us/node/682969 and https://software.intel.com/en-us/node/523346. cpu_dispatch multiversioning is an ICC feature that provides for function multiversioning. This feature is implemented with two attributes: First, cpu_specific, which specifies the individual function versions. Second, cpu_dispatch, which specifies the location of the resolver function and the list of resolvable functions. This is valuable since it provides a mechanism where the resolver's TU can be specified in one location, and the individual implementions each in their own translation units. The goal of this patch is to be source-compatible with ICC, so this implementation diverges from the ICC implementation in a few ways: 1- Linux x86/64 only: This implementation uses ifuncs in order to properly dispatch functions. This is is a valuable performance benefit over the ICC implementation. A future patch will be provided to enable this feature on Windows, but it will obviously more closely fit ICC's implementation. 2- CPU Identification functions: ICC uses a set of custom functions to identify the feature list of the host processor. This patch uses the cpu_supports functionality in order to better align with 'target' multiversioning. 1- cpu_dispatch function def/decl: ICC's cpu_dispatch requires that the function marked cpu_dispatch be an empty definition. This patch supports that as well, however declarations are also permitted, since the linker will solve the issue of multiple emissions. Differential Revision: https://reviews.llvm.org/D47474 llvm-svn: 337552
2018-07-20 14:13:28 +00:00
SingleVersion();
// LINUX: @SingleVersion.ifunc()
// WINDOWS: @SingleVersion()
Implement cpu_dispatch/cpu_specific Multiversioning As documented here: https://software.intel.com/en-us/node/682969 and https://software.intel.com/en-us/node/523346. cpu_dispatch multiversioning is an ICC feature that provides for function multiversioning. This feature is implemented with two attributes: First, cpu_specific, which specifies the individual function versions. Second, cpu_dispatch, which specifies the location of the resolver function and the list of resolvable functions. This is valuable since it provides a mechanism where the resolver's TU can be specified in one location, and the individual implementions each in their own translation units. The goal of this patch is to be source-compatible with ICC, so this implementation diverges from the ICC implementation in a few ways: 1- Linux x86/64 only: This implementation uses ifuncs in order to properly dispatch functions. This is is a valuable performance benefit over the ICC implementation. A future patch will be provided to enable this feature on Windows, but it will obviously more closely fit ICC's implementation. 2- CPU Identification functions: ICC uses a set of custom functions to identify the feature list of the host processor. This patch uses the cpu_supports functionality in order to better align with 'target' multiversioning. 1- cpu_dispatch function def/decl: ICC's cpu_dispatch requires that the function marked cpu_dispatch be an empty definition. This patch supports that as well, however declarations are also permitted, since the linker will solve the issue of multiple emissions. Differential Revision: https://reviews.llvm.org/D47474 llvm-svn: 337552
2018-07-20 14:13:28 +00:00
TwoVersions();
// LINUX: @TwoVersions.ifunc()
// WINDOWS: @TwoVersions()
Implement cpu_dispatch/cpu_specific Multiversioning As documented here: https://software.intel.com/en-us/node/682969 and https://software.intel.com/en-us/node/523346. cpu_dispatch multiversioning is an ICC feature that provides for function multiversioning. This feature is implemented with two attributes: First, cpu_specific, which specifies the individual function versions. Second, cpu_dispatch, which specifies the location of the resolver function and the list of resolvable functions. This is valuable since it provides a mechanism where the resolver's TU can be specified in one location, and the individual implementions each in their own translation units. The goal of this patch is to be source-compatible with ICC, so this implementation diverges from the ICC implementation in a few ways: 1- Linux x86/64 only: This implementation uses ifuncs in order to properly dispatch functions. This is is a valuable performance benefit over the ICC implementation. A future patch will be provided to enable this feature on Windows, but it will obviously more closely fit ICC's implementation. 2- CPU Identification functions: ICC uses a set of custom functions to identify the feature list of the host processor. This patch uses the cpu_supports functionality in order to better align with 'target' multiversioning. 1- cpu_dispatch function def/decl: ICC's cpu_dispatch requires that the function marked cpu_dispatch be an empty definition. This patch supports that as well, however declarations are also permitted, since the linker will solve the issue of multiple emissions. Differential Revision: https://reviews.llvm.org/D47474 llvm-svn: 337552
2018-07-20 14:13:28 +00:00
TwoVersionsSameAttr();
// LINUX: @TwoVersionsSameAttr.ifunc()
// WINDOWS: @TwoVersionsSameAttr()
Implement cpu_dispatch/cpu_specific Multiversioning As documented here: https://software.intel.com/en-us/node/682969 and https://software.intel.com/en-us/node/523346. cpu_dispatch multiversioning is an ICC feature that provides for function multiversioning. This feature is implemented with two attributes: First, cpu_specific, which specifies the individual function versions. Second, cpu_dispatch, which specifies the location of the resolver function and the list of resolvable functions. This is valuable since it provides a mechanism where the resolver's TU can be specified in one location, and the individual implementions each in their own translation units. The goal of this patch is to be source-compatible with ICC, so this implementation diverges from the ICC implementation in a few ways: 1- Linux x86/64 only: This implementation uses ifuncs in order to properly dispatch functions. This is is a valuable performance benefit over the ICC implementation. A future patch will be provided to enable this feature on Windows, but it will obviously more closely fit ICC's implementation. 2- CPU Identification functions: ICC uses a set of custom functions to identify the feature list of the host processor. This patch uses the cpu_supports functionality in order to better align with 'target' multiversioning. 1- cpu_dispatch function def/decl: ICC's cpu_dispatch requires that the function marked cpu_dispatch be an empty definition. This patch supports that as well, however declarations are also permitted, since the linker will solve the issue of multiple emissions. Differential Revision: https://reviews.llvm.org/D47474 llvm-svn: 337552
2018-07-20 14:13:28 +00:00
ThreeVersionsSameAttr();
// LINUX: @ThreeVersionsSameAttr.ifunc()
// WINDOWS: @ThreeVersionsSameAttr()
[clang][CodeGen] Avoid emitting ifuncs with undefined resolvers The purpose of this change is to fix the following codegen bug: ``` // main.c __attribute__((cpu_specific(generic))) int *foo(void) { static int z; return &z;} int main() { return *foo() = 5; } // other.c __attribute__((cpu_dispatch(generic))) int *foo(void); // run: clang main.c other.c -o main; ./main ``` This will segfault prior to the change, and return the correct exit code 5 after the change. The underlying cause is that when a translation unit contains a cpu_specific function without the corresponding cpu_dispatch the generated code binds the reference to foo() against a GlobalIFunc whose resolver is undefined. This is invalid: the resolver must be defined in the same translation unit as the ifunc, but historically the LLVM bitcode verifier did not check that. The generated code then binds against the resolver rather than the ifunc, so it ends up calling the resolver rather than the resolvee. In the example above it treats its return value as an int *, therefore trying to write to program text. The root issue at the representation level is that GlobalIFunc, like GlobalAlias, does not support a "declaration" state. The object which provides the correct semantics in these cases is a Function declaration, but unlike Functions, changing a declaration to a definition in the GlobalIFunc case constitutes a change of the object type, as opposed to simply emitting code into a Function. I think this limitation is unlikely to change, so I implemented the fix by returning a function declaration rather than an ifunc when encountering cpu_specific, and upgrading it to an ifunc when emitting cpu_dispatch. This uses `takeName` + `replaceAllUsesWith` in similar vein to other places where the correct IR object type cannot be known locally/up-front, like in `CodeGenModule::EmitAliasDefinition`. Previous discussion in: https://reviews.llvm.org/D112349 Signed-off-by: Itay Bookstein <ibookstein@gmail.com> Reviewed By: erichkeane Differential Revision: https://reviews.llvm.org/D120266
2022-01-29 14:32:54 +02:00
CpuSpecificNoDispatch();
// LINUX: @CpuSpecificNoDispatch.ifunc()
// WINDOWS: @CpuSpecificNoDispatch()
OrderDispatchUsageSpecific();
// LINUX: @OrderDispatchUsageSpecific.ifunc()
// WINDOWS: @OrderDispatchUsageSpecific()
OrderSpecificUsageDispatch();
// LINUX: @OrderSpecificUsageDispatch.ifunc()
// WINDOWS: @OrderSpecificUsageDispatch()
Implement cpu_dispatch/cpu_specific Multiversioning As documented here: https://software.intel.com/en-us/node/682969 and https://software.intel.com/en-us/node/523346. cpu_dispatch multiversioning is an ICC feature that provides for function multiversioning. This feature is implemented with two attributes: First, cpu_specific, which specifies the individual function versions. Second, cpu_dispatch, which specifies the location of the resolver function and the list of resolvable functions. This is valuable since it provides a mechanism where the resolver's TU can be specified in one location, and the individual implementions each in their own translation units. The goal of this patch is to be source-compatible with ICC, so this implementation diverges from the ICC implementation in a few ways: 1- Linux x86/64 only: This implementation uses ifuncs in order to properly dispatch functions. This is is a valuable performance benefit over the ICC implementation. A future patch will be provided to enable this feature on Windows, but it will obviously more closely fit ICC's implementation. 2- CPU Identification functions: ICC uses a set of custom functions to identify the feature list of the host processor. This patch uses the cpu_supports functionality in order to better align with 'target' multiversioning. 1- cpu_dispatch function def/decl: ICC's cpu_dispatch requires that the function marked cpu_dispatch be an empty definition. This patch supports that as well, however declarations are also permitted, since the linker will solve the issue of multiple emissions. Differential Revision: https://reviews.llvm.org/D47474 llvm-svn: 337552
2018-07-20 14:13:28 +00:00
}
[clang][CodeGen] Avoid emitting ifuncs with undefined resolvers The purpose of this change is to fix the following codegen bug: ``` // main.c __attribute__((cpu_specific(generic))) int *foo(void) { static int z; return &z;} int main() { return *foo() = 5; } // other.c __attribute__((cpu_dispatch(generic))) int *foo(void); // run: clang main.c other.c -o main; ./main ``` This will segfault prior to the change, and return the correct exit code 5 after the change. The underlying cause is that when a translation unit contains a cpu_specific function without the corresponding cpu_dispatch the generated code binds the reference to foo() against a GlobalIFunc whose resolver is undefined. This is invalid: the resolver must be defined in the same translation unit as the ifunc, but historically the LLVM bitcode verifier did not check that. The generated code then binds against the resolver rather than the ifunc, so it ends up calling the resolver rather than the resolvee. In the example above it treats its return value as an int *, therefore trying to write to program text. The root issue at the representation level is that GlobalIFunc, like GlobalAlias, does not support a "declaration" state. The object which provides the correct semantics in these cases is a Function declaration, but unlike Functions, changing a declaration to a definition in the GlobalIFunc case constitutes a change of the object type, as opposed to simply emitting code into a Function. I think this limitation is unlikely to change, so I implemented the fix by returning a function declaration rather than an ifunc when encountering cpu_specific, and upgrading it to an ifunc when emitting cpu_dispatch. This uses `takeName` + `replaceAllUsesWith` in similar vein to other places where the correct IR object type cannot be known locally/up-front, like in `CodeGenModule::EmitAliasDefinition`. Previous discussion in: https://reviews.llvm.org/D112349 Signed-off-by: Itay Bookstein <ibookstein@gmail.com> Reviewed By: erichkeane Differential Revision: https://reviews.llvm.org/D120266
2022-01-29 14:32:54 +02:00
// LINUX: declare void @CpuSpecificNoDispatch.ifunc()
Implement cpu_dispatch/cpu_specific Multiversioning As documented here: https://software.intel.com/en-us/node/682969 and https://software.intel.com/en-us/node/523346. cpu_dispatch multiversioning is an ICC feature that provides for function multiversioning. This feature is implemented with two attributes: First, cpu_specific, which specifies the individual function versions. Second, cpu_dispatch, which specifies the location of the resolver function and the list of resolvable functions. This is valuable since it provides a mechanism where the resolver's TU can be specified in one location, and the individual implementions each in their own translation units. The goal of this patch is to be source-compatible with ICC, so this implementation diverges from the ICC implementation in a few ways: 1- Linux x86/64 only: This implementation uses ifuncs in order to properly dispatch functions. This is is a valuable performance benefit over the ICC implementation. A future patch will be provided to enable this feature on Windows, but it will obviously more closely fit ICC's implementation. 2- CPU Identification functions: ICC uses a set of custom functions to identify the feature list of the host processor. This patch uses the cpu_supports functionality in order to better align with 'target' multiversioning. 1- cpu_dispatch function def/decl: ICC's cpu_dispatch requires that the function marked cpu_dispatch be an empty definition. This patch supports that as well, however declarations are also permitted, since the linker will solve the issue of multiple emissions. Differential Revision: https://reviews.llvm.org/D47474 llvm-svn: 337552
2018-07-20 14:13:28 +00:00
// has an extra config to emit!
ATTR(cpu_dispatch(ivybridge, knl, atom))
Implement cpu_dispatch/cpu_specific Multiversioning As documented here: https://software.intel.com/en-us/node/682969 and https://software.intel.com/en-us/node/523346. cpu_dispatch multiversioning is an ICC feature that provides for function multiversioning. This feature is implemented with two attributes: First, cpu_specific, which specifies the individual function versions. Second, cpu_dispatch, which specifies the location of the resolver function and the list of resolvable functions. This is valuable since it provides a mechanism where the resolver's TU can be specified in one location, and the individual implementions each in their own translation units. The goal of this patch is to be source-compatible with ICC, so this implementation diverges from the ICC implementation in a few ways: 1- Linux x86/64 only: This implementation uses ifuncs in order to properly dispatch functions. This is is a valuable performance benefit over the ICC implementation. A future patch will be provided to enable this feature on Windows, but it will obviously more closely fit ICC's implementation. 2- CPU Identification functions: ICC uses a set of custom functions to identify the feature list of the host processor. This patch uses the cpu_supports functionality in order to better align with 'target' multiversioning. 1- cpu_dispatch function def/decl: ICC's cpu_dispatch requires that the function marked cpu_dispatch be an empty definition. This patch supports that as well, however declarations are also permitted, since the linker will solve the issue of multiple emissions. Differential Revision: https://reviews.llvm.org/D47474 llvm-svn: 337552
2018-07-20 14:13:28 +00:00
void TwoVersionsSameAttr(void);
// LINUX: define weak_odr ptr @TwoVersionsSameAttr.resolver()
// LINUX: ret ptr @TwoVersionsSameAttr.Z
// LINUX: ret ptr @TwoVersionsSameAttr.S
// LINUX: ret ptr @TwoVersionsSameAttr.O
// LINUX: call void @llvm.trap
// LINUX: unreachable
// WINDOWS: define weak_odr dso_local void @TwoVersionsSameAttr() comdat
// WINDOWS: call void @TwoVersionsSameAttr.Z
// WINDOWS-NEXT: ret void
// WINDOWS: call void @TwoVersionsSameAttr.S
// WINDOWS-NEXT: ret void
// WINDOWS: call void @TwoVersionsSameAttr.O
// WINDOWS-NEXT: ret void
// WINDOWS: call void @llvm.trap
// WINDOWS: unreachable
ATTR(cpu_dispatch(atom, ivybridge, knl))
Implement cpu_dispatch/cpu_specific Multiversioning As documented here: https://software.intel.com/en-us/node/682969 and https://software.intel.com/en-us/node/523346. cpu_dispatch multiversioning is an ICC feature that provides for function multiversioning. This feature is implemented with two attributes: First, cpu_specific, which specifies the individual function versions. Second, cpu_dispatch, which specifies the location of the resolver function and the list of resolvable functions. This is valuable since it provides a mechanism where the resolver's TU can be specified in one location, and the individual implementions each in their own translation units. The goal of this patch is to be source-compatible with ICC, so this implementation diverges from the ICC implementation in a few ways: 1- Linux x86/64 only: This implementation uses ifuncs in order to properly dispatch functions. This is is a valuable performance benefit over the ICC implementation. A future patch will be provided to enable this feature on Windows, but it will obviously more closely fit ICC's implementation. 2- CPU Identification functions: ICC uses a set of custom functions to identify the feature list of the host processor. This patch uses the cpu_supports functionality in order to better align with 'target' multiversioning. 1- cpu_dispatch function def/decl: ICC's cpu_dispatch requires that the function marked cpu_dispatch be an empty definition. This patch supports that as well, however declarations are also permitted, since the linker will solve the issue of multiple emissions. Differential Revision: https://reviews.llvm.org/D47474 llvm-svn: 337552
2018-07-20 14:13:28 +00:00
void ThreeVersionsSameAttr(void){}
// LINUX: define weak_odr ptr @ThreeVersionsSameAttr.resolver()
// LINUX: call void @__cpu_indicator_init
// LINUX: ret ptr @ThreeVersionsSameAttr.Z
// LINUX: ret ptr @ThreeVersionsSameAttr.S
// LINUX: ret ptr @ThreeVersionsSameAttr.O
// LINUX: call void @llvm.trap
// LINUX: unreachable
// WINDOWS: define weak_odr dso_local void @ThreeVersionsSameAttr() comdat
// WINDOWS: call void @__cpu_indicator_init
// WINDOWS: call void @ThreeVersionsSameAttr.Z
// WINDOWS-NEXT: ret void
// WINDOWS: call void @ThreeVersionsSameAttr.S
// WINDOWS-NEXT: ret void
// WINDOWS: call void @ThreeVersionsSameAttr.O
// WINDOWS-NEXT: ret void
// WINDOWS: call void @llvm.trap
// WINDOWS: unreachable
Implement cpu_dispatch/cpu_specific Multiversioning As documented here: https://software.intel.com/en-us/node/682969 and https://software.intel.com/en-us/node/523346. cpu_dispatch multiversioning is an ICC feature that provides for function multiversioning. This feature is implemented with two attributes: First, cpu_specific, which specifies the individual function versions. Second, cpu_dispatch, which specifies the location of the resolver function and the list of resolvable functions. This is valuable since it provides a mechanism where the resolver's TU can be specified in one location, and the individual implementions each in their own translation units. The goal of this patch is to be source-compatible with ICC, so this implementation diverges from the ICC implementation in a few ways: 1- Linux x86/64 only: This implementation uses ifuncs in order to properly dispatch functions. This is is a valuable performance benefit over the ICC implementation. A future patch will be provided to enable this feature on Windows, but it will obviously more closely fit ICC's implementation. 2- CPU Identification functions: ICC uses a set of custom functions to identify the feature list of the host processor. This patch uses the cpu_supports functionality in order to better align with 'target' multiversioning. 1- cpu_dispatch function def/decl: ICC's cpu_dispatch requires that the function marked cpu_dispatch be an empty definition. This patch supports that as well, however declarations are also permitted, since the linker will solve the issue of multiple emissions. Differential Revision: https://reviews.llvm.org/D47474 llvm-svn: 337552
2018-07-20 14:13:28 +00:00
[clang][CodeGen] Avoid emitting ifuncs with undefined resolvers The purpose of this change is to fix the following codegen bug: ``` // main.c __attribute__((cpu_specific(generic))) int *foo(void) { static int z; return &z;} int main() { return *foo() = 5; } // other.c __attribute__((cpu_dispatch(generic))) int *foo(void); // run: clang main.c other.c -o main; ./main ``` This will segfault prior to the change, and return the correct exit code 5 after the change. The underlying cause is that when a translation unit contains a cpu_specific function without the corresponding cpu_dispatch the generated code binds the reference to foo() against a GlobalIFunc whose resolver is undefined. This is invalid: the resolver must be defined in the same translation unit as the ifunc, but historically the LLVM bitcode verifier did not check that. The generated code then binds against the resolver rather than the ifunc, so it ends up calling the resolver rather than the resolvee. In the example above it treats its return value as an int *, therefore trying to write to program text. The root issue at the representation level is that GlobalIFunc, like GlobalAlias, does not support a "declaration" state. The object which provides the correct semantics in these cases is a Function declaration, but unlike Functions, changing a declaration to a definition in the GlobalIFunc case constitutes a change of the object type, as opposed to simply emitting code into a Function. I think this limitation is unlikely to change, so I implemented the fix by returning a function declaration rather than an ifunc when encountering cpu_specific, and upgrading it to an ifunc when emitting cpu_dispatch. This uses `takeName` + `replaceAllUsesWith` in similar vein to other places where the correct IR object type cannot be known locally/up-front, like in `CodeGenModule::EmitAliasDefinition`. Previous discussion in: https://reviews.llvm.org/D112349 Signed-off-by: Itay Bookstein <ibookstein@gmail.com> Reviewed By: erichkeane Differential Revision: https://reviews.llvm.org/D120266
2022-01-29 14:32:54 +02:00
ATTR(cpu_dispatch(knl))
void OrderSpecificUsageDispatch(void);
// LINUX: define weak_odr ptr @OrderSpecificUsageDispatch.resolver()
// LINUX: ret ptr @OrderSpecificUsageDispatch.Z
[clang][CodeGen] Avoid emitting ifuncs with undefined resolvers The purpose of this change is to fix the following codegen bug: ``` // main.c __attribute__((cpu_specific(generic))) int *foo(void) { static int z; return &z;} int main() { return *foo() = 5; } // other.c __attribute__((cpu_dispatch(generic))) int *foo(void); // run: clang main.c other.c -o main; ./main ``` This will segfault prior to the change, and return the correct exit code 5 after the change. The underlying cause is that when a translation unit contains a cpu_specific function without the corresponding cpu_dispatch the generated code binds the reference to foo() against a GlobalIFunc whose resolver is undefined. This is invalid: the resolver must be defined in the same translation unit as the ifunc, but historically the LLVM bitcode verifier did not check that. The generated code then binds against the resolver rather than the ifunc, so it ends up calling the resolver rather than the resolvee. In the example above it treats its return value as an int *, therefore trying to write to program text. The root issue at the representation level is that GlobalIFunc, like GlobalAlias, does not support a "declaration" state. The object which provides the correct semantics in these cases is a Function declaration, but unlike Functions, changing a declaration to a definition in the GlobalIFunc case constitutes a change of the object type, as opposed to simply emitting code into a Function. I think this limitation is unlikely to change, so I implemented the fix by returning a function declaration rather than an ifunc when encountering cpu_specific, and upgrading it to an ifunc when emitting cpu_dispatch. This uses `takeName` + `replaceAllUsesWith` in similar vein to other places where the correct IR object type cannot be known locally/up-front, like in `CodeGenModule::EmitAliasDefinition`. Previous discussion in: https://reviews.llvm.org/D112349 Signed-off-by: Itay Bookstein <ibookstein@gmail.com> Reviewed By: erichkeane Differential Revision: https://reviews.llvm.org/D120266
2022-01-29 14:32:54 +02:00
// WINDOWS: define weak_odr dso_local void @OrderSpecificUsageDispatch() comdat
// WINDOWS: call void @__cpu_indicator_init
// WINDOWS: call void @OrderSpecificUsageDispatch.Z
// WINDOWS-NEXT: ret void
Implement cpu_dispatch/cpu_specific Multiversioning As documented here: https://software.intel.com/en-us/node/682969 and https://software.intel.com/en-us/node/523346. cpu_dispatch multiversioning is an ICC feature that provides for function multiversioning. This feature is implemented with two attributes: First, cpu_specific, which specifies the individual function versions. Second, cpu_dispatch, which specifies the location of the resolver function and the list of resolvable functions. This is valuable since it provides a mechanism where the resolver's TU can be specified in one location, and the individual implementions each in their own translation units. The goal of this patch is to be source-compatible with ICC, so this implementation diverges from the ICC implementation in a few ways: 1- Linux x86/64 only: This implementation uses ifuncs in order to properly dispatch functions. This is is a valuable performance benefit over the ICC implementation. A future patch will be provided to enable this feature on Windows, but it will obviously more closely fit ICC's implementation. 2- CPU Identification functions: ICC uses a set of custom functions to identify the feature list of the host processor. This patch uses the cpu_supports functionality in order to better align with 'target' multiversioning. 1- cpu_dispatch function def/decl: ICC's cpu_dispatch requires that the function marked cpu_dispatch be an empty definition. This patch supports that as well, however declarations are also permitted, since the linker will solve the issue of multiple emissions. Differential Revision: https://reviews.llvm.org/D47474 llvm-svn: 337552
2018-07-20 14:13:28 +00:00
// No Cpu Specific options.
ATTR(cpu_dispatch(atom, ivybridge, knl))
Implement cpu_dispatch/cpu_specific Multiversioning As documented here: https://software.intel.com/en-us/node/682969 and https://software.intel.com/en-us/node/523346. cpu_dispatch multiversioning is an ICC feature that provides for function multiversioning. This feature is implemented with two attributes: First, cpu_specific, which specifies the individual function versions. Second, cpu_dispatch, which specifies the location of the resolver function and the list of resolvable functions. This is valuable since it provides a mechanism where the resolver's TU can be specified in one location, and the individual implementions each in their own translation units. The goal of this patch is to be source-compatible with ICC, so this implementation diverges from the ICC implementation in a few ways: 1- Linux x86/64 only: This implementation uses ifuncs in order to properly dispatch functions. This is is a valuable performance benefit over the ICC implementation. A future patch will be provided to enable this feature on Windows, but it will obviously more closely fit ICC's implementation. 2- CPU Identification functions: ICC uses a set of custom functions to identify the feature list of the host processor. This patch uses the cpu_supports functionality in order to better align with 'target' multiversioning. 1- cpu_dispatch function def/decl: ICC's cpu_dispatch requires that the function marked cpu_dispatch be an empty definition. This patch supports that as well, however declarations are also permitted, since the linker will solve the issue of multiple emissions. Differential Revision: https://reviews.llvm.org/D47474 llvm-svn: 337552
2018-07-20 14:13:28 +00:00
void NoSpecifics(void);
// LINUX: define weak_odr ptr @NoSpecifics.resolver()
// LINUX: call void @__cpu_indicator_init
// LINUX: ret ptr @NoSpecifics.Z
// LINUX: ret ptr @NoSpecifics.S
// LINUX: ret ptr @NoSpecifics.O
// LINUX: call void @llvm.trap
// LINUX: unreachable
// WINDOWS: define weak_odr dso_local void @NoSpecifics() comdat
// WINDOWS: call void @__cpu_indicator_init
// WINDOWS: call void @NoSpecifics.Z
// WINDOWS-NEXT: ret void
// WINDOWS: call void @NoSpecifics.S
// WINDOWS-NEXT: ret void
// WINDOWS: call void @NoSpecifics.O
// WINDOWS-NEXT: ret void
// WINDOWS: call void @llvm.trap
// WINDOWS: unreachable
ATTR(cpu_dispatch(atom, generic, ivybridge, knl))
Implement cpu_dispatch/cpu_specific Multiversioning As documented here: https://software.intel.com/en-us/node/682969 and https://software.intel.com/en-us/node/523346. cpu_dispatch multiversioning is an ICC feature that provides for function multiversioning. This feature is implemented with two attributes: First, cpu_specific, which specifies the individual function versions. Second, cpu_dispatch, which specifies the location of the resolver function and the list of resolvable functions. This is valuable since it provides a mechanism where the resolver's TU can be specified in one location, and the individual implementions each in their own translation units. The goal of this patch is to be source-compatible with ICC, so this implementation diverges from the ICC implementation in a few ways: 1- Linux x86/64 only: This implementation uses ifuncs in order to properly dispatch functions. This is is a valuable performance benefit over the ICC implementation. A future patch will be provided to enable this feature on Windows, but it will obviously more closely fit ICC's implementation. 2- CPU Identification functions: ICC uses a set of custom functions to identify the feature list of the host processor. This patch uses the cpu_supports functionality in order to better align with 'target' multiversioning. 1- cpu_dispatch function def/decl: ICC's cpu_dispatch requires that the function marked cpu_dispatch be an empty definition. This patch supports that as well, however declarations are also permitted, since the linker will solve the issue of multiple emissions. Differential Revision: https://reviews.llvm.org/D47474 llvm-svn: 337552
2018-07-20 14:13:28 +00:00
void HasGeneric(void);
// LINUX: define weak_odr ptr @HasGeneric.resolver()
// LINUX: call void @__cpu_indicator_init
// LINUX: ret ptr @HasGeneric.Z
// LINUX: ret ptr @HasGeneric.S
// LINUX: ret ptr @HasGeneric.O
// LINUX: ret ptr @HasGeneric.A
// LINUX-NOT: call void @llvm.trap
// WINDOWS: define weak_odr dso_local void @HasGeneric() comdat
// WINDOWS: call void @__cpu_indicator_init
// WINDOWS: call void @HasGeneric.Z
// WINDOWS-NEXT: ret void
// WINDOWS: call void @HasGeneric.S
// WINDOWS-NEXT: ret void
// WINDOWS: call void @HasGeneric.O
// WINDOWS-NEXT: ret void
// WINDOWS: call void @HasGeneric.A
// WINDOWS-NEXT: ret void
// WINDOWS-NOT: call void @llvm.trap
ATTR(cpu_dispatch(atom, generic, ivybridge, knl))
void HasParams(int i, double d);
// LINUX: define weak_odr ptr @HasParams.resolver()
// LINUX: call void @__cpu_indicator_init
// LINUX: ret ptr @HasParams.Z
// LINUX: ret ptr @HasParams.S
// LINUX: ret ptr @HasParams.O
// LINUX: ret ptr @HasParams.A
// LINUX-NOT: call void @llvm.trap
// WINDOWS: define weak_odr dso_local void @HasParams(i32 %0, double %1) comdat
// WINDOWS: call void @__cpu_indicator_init
// WINDOWS: call void @HasParams.Z(i32 %0, double %1)
// WINDOWS-NEXT: ret void
// WINDOWS: call void @HasParams.S(i32 %0, double %1)
// WINDOWS-NEXT: ret void
// WINDOWS: call void @HasParams.O(i32 %0, double %1)
// WINDOWS-NEXT: ret void
// WINDOWS: call void @HasParams.A(i32 %0, double %1)
// WINDOWS-NEXT: ret void
// WINDOWS-NOT: call void @llvm.trap
ATTR(cpu_dispatch(atom, generic, ivybridge, knl))
int HasParamsAndReturn(int i, double d);
// LINUX: define weak_odr ptr @HasParamsAndReturn.resolver()
// LINUX: call void @__cpu_indicator_init
// LINUX: ret ptr @HasParamsAndReturn.Z
// LINUX: ret ptr @HasParamsAndReturn.S
// LINUX: ret ptr @HasParamsAndReturn.O
// LINUX: ret ptr @HasParamsAndReturn.A
// LINUX-NOT: call void @llvm.trap
// WINDOWS: define weak_odr dso_local i32 @HasParamsAndReturn(i32 %0, double %1) comdat
// WINDOWS: call void @__cpu_indicator_init
// WINDOWS: %[[RET:.+]] = musttail call i32 @HasParamsAndReturn.Z(i32 %0, double %1)
// WINDOWS-NEXT: ret i32 %[[RET]]
// WINDOWS: %[[RET:.+]] = musttail call i32 @HasParamsAndReturn.S(i32 %0, double %1)
// WINDOWS-NEXT: ret i32 %[[RET]]
// WINDOWS: %[[RET:.+]] = musttail call i32 @HasParamsAndReturn.O(i32 %0, double %1)
// WINDOWS-NEXT: ret i32 %[[RET]]
// WINDOWS: %[[RET:.+]] = musttail call i32 @HasParamsAndReturn.A(i32 %0, double %1)
// WINDOWS-NEXT: ret i32 %[[RET]]
// WINDOWS-NOT: call void @llvm.trap
Implement cpu_dispatch/cpu_specific Multiversioning As documented here: https://software.intel.com/en-us/node/682969 and https://software.intel.com/en-us/node/523346. cpu_dispatch multiversioning is an ICC feature that provides for function multiversioning. This feature is implemented with two attributes: First, cpu_specific, which specifies the individual function versions. Second, cpu_dispatch, which specifies the location of the resolver function and the list of resolvable functions. This is valuable since it provides a mechanism where the resolver's TU can be specified in one location, and the individual implementions each in their own translation units. The goal of this patch is to be source-compatible with ICC, so this implementation diverges from the ICC implementation in a few ways: 1- Linux x86/64 only: This implementation uses ifuncs in order to properly dispatch functions. This is is a valuable performance benefit over the ICC implementation. A future patch will be provided to enable this feature on Windows, but it will obviously more closely fit ICC's implementation. 2- CPU Identification functions: ICC uses a set of custom functions to identify the feature list of the host processor. This patch uses the cpu_supports functionality in order to better align with 'target' multiversioning. 1- cpu_dispatch function def/decl: ICC's cpu_dispatch requires that the function marked cpu_dispatch be an empty definition. This patch supports that as well, however declarations are also permitted, since the linker will solve the issue of multiple emissions. Differential Revision: https://reviews.llvm.org/D47474 llvm-svn: 337552
2018-07-20 14:13:28 +00:00
ATTR(cpu_dispatch(atom, generic, pentium))
int GenericAndPentium(int i, double d);
// LINUX: define weak_odr ptr @GenericAndPentium.resolver()
// LINUX: call void @__cpu_indicator_init
// LINUX: ret ptr @GenericAndPentium.O
// LINUX: ret ptr @GenericAndPentium.B
// LINUX-NOT: ret ptr @GenericAndPentium.A
// LINUX-NOT: call void @llvm.trap
// WINDOWS: define weak_odr dso_local i32 @GenericAndPentium(i32 %0, double %1) comdat
// WINDOWS: call void @__cpu_indicator_init
// WINDOWS: %[[RET:.+]] = musttail call i32 @GenericAndPentium.O(i32 %0, double %1)
// WINDOWS-NEXT: ret i32 %[[RET]]
// WINDOWS: %[[RET:.+]] = musttail call i32 @GenericAndPentium.B(i32 %0, double %1)
// WINDOWS-NEXT: ret i32 %[[RET]]
// WINDOWS-NOT: call i32 @GenericAndPentium.A
// WINDOWS-NOT: call void @llvm.trap
ATTR(cpu_dispatch(atom, pentium))
int DispatchFirst(void);
// LINUX: define weak_odr ptr @DispatchFirst.resolver
// LINUX: ret ptr @DispatchFirst.O
// LINUX: ret ptr @DispatchFirst.B
// WINDOWS: define weak_odr dso_local i32 @DispatchFirst() comdat
// WINDOWS: %[[RET:.+]] = musttail call i32 @DispatchFirst.O()
// WINDOWS-NEXT: ret i32 %[[RET]]
// WINDOWS: %[[RET:.+]] = musttail call i32 @DispatchFirst.B()
// WINDOWS-NEXT: ret i32 %[[RET]]
ATTR(cpu_specific(atom))
int DispatchFirst(void) {return 0;}
// LINUX: define{{.*}} i32 @DispatchFirst.O
// LINUX: ret i32 0
// WINDOWS: define dso_local i32 @DispatchFirst.O()
// WINDOWS: ret i32 0
ATTR(cpu_specific(pentium))
int DispatchFirst(void) {return 1;}
// LINUX: define{{.*}} i32 @DispatchFirst.B
// LINUX: ret i32 1
// WINDOWS: define dso_local i32 @DispatchFirst.B
// WINDOWS: ret i32 1
[clang][CodeGen] Avoid emitting ifuncs with undefined resolvers The purpose of this change is to fix the following codegen bug: ``` // main.c __attribute__((cpu_specific(generic))) int *foo(void) { static int z; return &z;} int main() { return *foo() = 5; } // other.c __attribute__((cpu_dispatch(generic))) int *foo(void); // run: clang main.c other.c -o main; ./main ``` This will segfault prior to the change, and return the correct exit code 5 after the change. The underlying cause is that when a translation unit contains a cpu_specific function without the corresponding cpu_dispatch the generated code binds the reference to foo() against a GlobalIFunc whose resolver is undefined. This is invalid: the resolver must be defined in the same translation unit as the ifunc, but historically the LLVM bitcode verifier did not check that. The generated code then binds against the resolver rather than the ifunc, so it ends up calling the resolver rather than the resolvee. In the example above it treats its return value as an int *, therefore trying to write to program text. The root issue at the representation level is that GlobalIFunc, like GlobalAlias, does not support a "declaration" state. The object which provides the correct semantics in these cases is a Function declaration, but unlike Functions, changing a declaration to a definition in the GlobalIFunc case constitutes a change of the object type, as opposed to simply emitting code into a Function. I think this limitation is unlikely to change, so I implemented the fix by returning a function declaration rather than an ifunc when encountering cpu_specific, and upgrading it to an ifunc when emitting cpu_dispatch. This uses `takeName` + `replaceAllUsesWith` in similar vein to other places where the correct IR object type cannot be known locally/up-front, like in `CodeGenModule::EmitAliasDefinition`. Previous discussion in: https://reviews.llvm.org/D112349 Signed-off-by: Itay Bookstein <ibookstein@gmail.com> Reviewed By: erichkeane Differential Revision: https://reviews.llvm.org/D120266
2022-01-29 14:32:54 +02:00
ATTR(cpu_specific(knl))
void OrderDispatchUsageSpecific(void) {}
[X86] Remove CPU_SPECIFIC* MACROs and add getCPUDispatchMangling This refactor patch means to remove CPU_SPECIFIC* MACROs in X86TargetParser.def and move those information into ProcInfo of X86TargetParser.cpp. Since these two files both maintain a table with redundant info such as cpuname and its features supported. CPU_SPECIFIC* MACROs define some different information. This patch dealt with them in these ways when moving: 1.mangling This is now moved to Mangling in ProcInfo and directly initialized at array of Processors. CPUs don't support cpu_dispatch/specific are assigned '\0' as mangling. 2.CPU alias The alias cpu will also be initialized in array of Processors, its attributes will be same as its alias target cpu. Same feature list, same mangling. 3.TUNE_NAME Before my change, some cpu names support cpu_dispatch/specific are not supported in X86.td, which means optimizer/backend doesn't recognize them. So they use a different TUNE_NAME to generate in IR. In this patch, I added these missing cpu support at X86.td by utilizing existing Features and XXXTunings, so that each cpu name can directly use its own name as TUNE_NAME to be supported by optimizer/backend. 4.Feature list The feature list of one CPU maintained in X86TargetParser.def is not same as the one in X86TargetParser.cpp. It only maintains part of features of one CPU (features defined by X86_FEATURE_COMPAT). While X86TargetParser.cpp maintains a complete one. This patch abandons the feature list maintained by CPU_SPECIFIC* MACROs because assigning a CPU with a complete one doesn't affect the functionality of cpu_dispatch/specific. Except these four info, since some of CPUs supported by cpu_dispatch/specific doesn's support clang options like -march, -mtune before, this patch also kept this behavior still by adding another member OnlyForCPUDispatchSpecific in ProcInfo. Reviewed By: pengfei, RKSimon Differential Revision: https://reviews.llvm.org/D151696
2023-07-05 17:31:11 +08:00
// CHECK: attributes #[[S]] = {{.*}}"target-features"="+avx,+cmov,+crc32,+cx16,+cx8,+f16c,+fsgsbase,+fxsr,+mmx,+pclmul,+popcnt,+rdrnd,+sahf,+sse,+sse2,+sse3,+sse4.1,+sse4.2,+ssse3,+x87,+xsave,+xsaveopt"
Have cpu-specific variants set 'tune-cpu' as an optimization hint Due to various implementation constraints, despite the programmer choosing a 'processor' cpu_dispatch/cpu_specific needs to use the 'feature' list of a processor to identify it. This results in the identified processor in source-code not being propogated to the optimizer, and thus, not able to be tuned for. This patch changes to use the actual cpu as written for tune-cpu so that opt can make decisions based on the cpu-as-spelled, which should better match the behavior expected by the programmer. Note that the 'valid' list of processors for x86 is in llvm/include/llvm/Support/X86TargetParser.def. At the moment, this list contains only Intel processors, but other vendors may wish to add their own entries as 'alias'es (or with different feature lists!). If this is not done, there is two potential performance issues with the patch, but I believe them to be worth it in light of the improvements to behavior and performance. 1- In the event that the user spelled "ProcessorB", but we only have the features available to test for "ProcessorA" (where A is B minus features), AND there is an optimization opportunity for "B" that negatively affects "A", the optimizer will likely choose to do so. 2- In the event that the user spelled VendorI's processor, and the feature list allows it to run on VendorA's processor of similar features, AND there is an optimization opportunity for VendorIs that negatively affects "A"s, the optimizer will likely choose to do so. This can be fixed by adding an alias to X86TargetParser.def. Differential Revision: https://reviews.llvm.org/D121410
2022-03-10 13:31:52 -08:00
// CHECK-SAME: "tune-cpu"="ivybridge"
// CHECK: attributes #[[K]] = {{.*}}"target-features"="+adx,+aes,+avx,+avx2,+avx512cd,+avx512f,+bmi,+bmi2,+cmov,+crc32,+cx16,+cx8,+evex512,+f16c,+fma,+fsgsbase,+fxsr,+invpcid,+lzcnt,+mmx,+movbe,+pclmul,+popcnt,+prfchw,+rdrnd,+rdseed,+sahf,+sse,+sse2,+sse3,+sse4.1,+sse4.2,+ssse3,+x87,+xsave,+xsaveopt"
Have cpu-specific variants set 'tune-cpu' as an optimization hint Due to various implementation constraints, despite the programmer choosing a 'processor' cpu_dispatch/cpu_specific needs to use the 'feature' list of a processor to identify it. This results in the identified processor in source-code not being propogated to the optimizer, and thus, not able to be tuned for. This patch changes to use the actual cpu as written for tune-cpu so that opt can make decisions based on the cpu-as-spelled, which should better match the behavior expected by the programmer. Note that the 'valid' list of processors for x86 is in llvm/include/llvm/Support/X86TargetParser.def. At the moment, this list contains only Intel processors, but other vendors may wish to add their own entries as 'alias'es (or with different feature lists!). If this is not done, there is two potential performance issues with the patch, but I believe them to be worth it in light of the improvements to behavior and performance. 1- In the event that the user spelled "ProcessorB", but we only have the features available to test for "ProcessorA" (where A is B minus features), AND there is an optimization opportunity for "B" that negatively affects "A", the optimizer will likely choose to do so. 2- In the event that the user spelled VendorI's processor, and the feature list allows it to run on VendorA's processor of similar features, AND there is an optimization opportunity for VendorIs that negatively affects "A"s, the optimizer will likely choose to do so. This can be fixed by adding an alias to X86TargetParser.def. Differential Revision: https://reviews.llvm.org/D121410
2022-03-10 13:31:52 -08:00
// CHECK-SAME: "tune-cpu"="knl"
[X86] Remove CPU_SPECIFIC* MACROs and add getCPUDispatchMangling This refactor patch means to remove CPU_SPECIFIC* MACROs in X86TargetParser.def and move those information into ProcInfo of X86TargetParser.cpp. Since these two files both maintain a table with redundant info such as cpuname and its features supported. CPU_SPECIFIC* MACROs define some different information. This patch dealt with them in these ways when moving: 1.mangling This is now moved to Mangling in ProcInfo and directly initialized at array of Processors. CPUs don't support cpu_dispatch/specific are assigned '\0' as mangling. 2.CPU alias The alias cpu will also be initialized in array of Processors, its attributes will be same as its alias target cpu. Same feature list, same mangling. 3.TUNE_NAME Before my change, some cpu names support cpu_dispatch/specific are not supported in X86.td, which means optimizer/backend doesn't recognize them. So they use a different TUNE_NAME to generate in IR. In this patch, I added these missing cpu support at X86.td by utilizing existing Features and XXXTunings, so that each cpu name can directly use its own name as TUNE_NAME to be supported by optimizer/backend. 4.Feature list The feature list of one CPU maintained in X86TargetParser.def is not same as the one in X86TargetParser.cpp. It only maintains part of features of one CPU (features defined by X86_FEATURE_COMPAT). While X86TargetParser.cpp maintains a complete one. This patch abandons the feature list maintained by CPU_SPECIFIC* MACROs because assigning a CPU with a complete one doesn't affect the functionality of cpu_dispatch/specific. Except these four info, since some of CPUs supported by cpu_dispatch/specific doesn's support clang options like -march, -mtune before, this patch also kept this behavior still by adding another member OnlyForCPUDispatchSpecific in ProcInfo. Reviewed By: pengfei, RKSimon Differential Revision: https://reviews.llvm.org/D151696
2023-07-05 17:31:11 +08:00
// CHECK: attributes #[[O]] = {{.*}}"target-features"="+cmov,+cx16,+cx8,+fxsr,+mmx,+movbe,+sahf,+sse,+sse2,+sse3,+ssse3,+x87"
Have cpu-specific variants set 'tune-cpu' as an optimization hint Due to various implementation constraints, despite the programmer choosing a 'processor' cpu_dispatch/cpu_specific needs to use the 'feature' list of a processor to identify it. This results in the identified processor in source-code not being propogated to the optimizer, and thus, not able to be tuned for. This patch changes to use the actual cpu as written for tune-cpu so that opt can make decisions based on the cpu-as-spelled, which should better match the behavior expected by the programmer. Note that the 'valid' list of processors for x86 is in llvm/include/llvm/Support/X86TargetParser.def. At the moment, this list contains only Intel processors, but other vendors may wish to add their own entries as 'alias'es (or with different feature lists!). If this is not done, there is two potential performance issues with the patch, but I believe them to be worth it in light of the improvements to behavior and performance. 1- In the event that the user spelled "ProcessorB", but we only have the features available to test for "ProcessorA" (where A is B minus features), AND there is an optimization opportunity for "B" that negatively affects "A", the optimizer will likely choose to do so. 2- In the event that the user spelled VendorI's processor, and the feature list allows it to run on VendorA's processor of similar features, AND there is an optimization opportunity for VendorIs that negatively affects "A"s, the optimizer will likely choose to do so. This can be fixed by adding an alias to X86TargetParser.def. Differential Revision: https://reviews.llvm.org/D121410
2022-03-10 13:31:52 -08:00
// CHECK-SAME: "tune-cpu"="atom"