Ignoring revisions in .git-blame-ignore-revs. Click here to bypass and see the normal blame view.

203 lines
6.7 KiB
C++
Raw Normal View History

//===--- SystemZ.cpp - Implement SystemZ target feature support -----------===//
//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//
//===----------------------------------------------------------------------===//
//
// This file implements SystemZ TargetInfo objects.
//
//===----------------------------------------------------------------------===//
#include "SystemZ.h"
#include "clang/Basic/Builtins.h"
#include "clang/Basic/LangOptions.h"
#include "clang/Basic/MacroBuilder.h"
#include "clang/Basic/TargetBuiltins.h"
#include "llvm/ADT/StringExtras.h"
#include "llvm/ADT/StringSwitch.h"
using namespace clang;
using namespace clang::targets;
[StrTable] Switch Clang builtins to use string tables This both reapplies #118734, the initial attempt at this, and updates it significantly. First, it uses the newly added `StringTable` abstraction for string tables, and simplifies the construction to build the string table and info arrays separately. This should reduce any `constexpr` compile time memory or CPU cost of the original PR while significantly improving the APIs throughout. It also restructures the builtins to support sharding across several independent tables. This accomplishes two improvements from the original PR: 1) It improves the APIs used significantly. 2) When builtins are defined from different sources (like SVE vs MVE in AArch64), this allows each of them to build their own string table independently rather than having to merge the string tables and info structures. 3) It allows each shard to factor out a common prefix, often cutting the size of the strings needed for the builtins by a factor two. The second point is important both to allow different mechanisms of construction (for example a `.def` file and a tablegen'ed `.inc` file, or different tablegen'ed `.inc files), it also simply reduces the sizes of these tables which is valuable given how large they are in some cases. The third builds on that size reduction. Initially, we use this new sharding rather than merging tables in AArch64, LoongArch, RISCV, and X86. Mostly this helps ensure the system works, as without further changes these still push scaling limits. Subsequent commits will more deeply leverage the new structure, including using the prefix capabilities which cannot be easily factored out here and requires deep changes to the targets.
2024-12-14 09:09:47 +00:00
static constexpr int NumBuiltins =
clang::SystemZ::LastTSBuiltin - Builtin::FirstTSBuiltin;
static constexpr llvm::StringTable BuiltinStrings =
CLANG_BUILTIN_STR_TABLE_START
#define BUILTIN CLANG_BUILTIN_STR_TABLE
#define TARGET_BUILTIN CLANG_TARGET_BUILTIN_STR_TABLE
Switch builtin strings to use string tables (#118734) The Clang binary (and any binary linking Clang as a library), when built using PIE, ends up with a pretty shocking number of dynamic relocations to apply to the executable image: roughly 400k. Each of these takes up binary space in the executable, and perhaps most interestingly takes start-up time to apply the relocations. The largest pattern I identified were the strings used to describe target builtins. The addresses of these string literals were stored into huge arrays, each one requiring a dynamic relocation. The way to avoid this is to design the target builtins to use a single large table of strings and offsets within the table for the individual strings. This switches the builtin management to such a scheme. This saves over 100k dynamic relocations by my measurement, an over 25% reduction. Just looking at byte size improvements, using the `bloaty` tool to compare a newly built `clang` binary to an old one: ``` FILE SIZE VM SIZE -------------- -------------- +1.4% +653Ki +1.4% +653Ki .rodata +0.0% +960 +0.0% +960 .text +0.0% +197 +0.0% +197 .dynstr +0.0% +184 +0.0% +184 .eh_frame +0.0% +96 +0.0% +96 .dynsym +0.0% +40 +0.0% +40 .eh_frame_hdr +114% +32 [ = ] 0 [Unmapped] +0.0% +20 +0.0% +20 .gnu.hash +0.0% +8 +0.0% +8 .gnu.version +0.9% +7 +0.9% +7 [LOAD #2 [R]] [ = ] 0 -75.4% -3.00Ki .relro_padding -16.1% -802Ki -16.1% -802Ki .data.rel.ro -27.3% -2.52Mi -27.3% -2.52Mi .rela.dyn -1.6% -2.66Mi -1.6% -2.66Mi TOTAL ``` We get a 16% reduction in the `.data.rel.ro` section, and nearly 30% reduction in `.rela.dyn` where those reloctaions are stored. This is also visible in my benchmarking of binary start-up overhead at least: ``` Benchmark 1: ./old_clang --version Time (mean ± σ): 17.6 ms ± 1.5 ms [User: 4.1 ms, System: 13.3 ms] Range (min … max): 14.2 ms … 22.8 ms 162 runs Benchmark 2: ./new_clang --version Time (mean ± σ): 15.5 ms ± 1.4 ms [User: 3.6 ms, System: 11.8 ms] Range (min … max): 12.4 ms … 20.3 ms 216 runs Summary './new_clang --version' ran 1.13 ± 0.14 times faster than './old_clang --version' ``` We get about 2ms faster `--version` runs. While there is a lot of noise in binary execution time, this delta is pretty consistent, and represents over 10% improvement. This is particularly interesting to me because for very short source files, repeatedly starting the `clang` binary is actually the dominant cost. For example, `configure` scripts running against the `clang` compiler are slow in large part because of binary start up time, not the time to process the actual inputs to the compiler. ---- This PR implements the string tables using `constexpr` code and the existing macro system. I understand that the builtins are moving towards a TableGen model, and if complete that would provide more options for modeling this. Unfortunately, that migration isn't complete, and even the parts that are migrated still rely on the ability to break out of the TableGen model and directly expand an X-macro style `BUILTIN(...)` textually. I looked at trying to complete the move to TableGen, but it would both require the difficult migration of the remaining targets, and solving some tricky problems with how to move away from any macro-based expansion. I was also able to find a reasonably clean and effective way of doing this with the existing macros and some `constexpr` code that I think is clean enough to be a pretty good intermediate state, and maybe give a good target for the eventual TableGen solution. I was also able to factor the macros into set of consistent patterns that avoids a significant regression in overall boilerplate.
2024-12-08 19:00:14 -08:00
#include "clang/Basic/BuiltinsSystemZ.def"
[StrTable] Switch Clang builtins to use string tables This both reapplies #118734, the initial attempt at this, and updates it significantly. First, it uses the newly added `StringTable` abstraction for string tables, and simplifies the construction to build the string table and info arrays separately. This should reduce any `constexpr` compile time memory or CPU cost of the original PR while significantly improving the APIs throughout. It also restructures the builtins to support sharding across several independent tables. This accomplishes two improvements from the original PR: 1) It improves the APIs used significantly. 2) When builtins are defined from different sources (like SVE vs MVE in AArch64), this allows each of them to build their own string table independently rather than having to merge the string tables and info structures. 3) It allows each shard to factor out a common prefix, often cutting the size of the strings needed for the builtins by a factor two. The second point is important both to allow different mechanisms of construction (for example a `.def` file and a tablegen'ed `.inc` file, or different tablegen'ed `.inc files), it also simply reduces the sizes of these tables which is valuable given how large they are in some cases. The third builds on that size reduction. Initially, we use this new sharding rather than merging tables in AArch64, LoongArch, RISCV, and X86. Mostly this helps ensure the system works, as without further changes these still push scaling limits. Subsequent commits will more deeply leverage the new structure, including using the prefix capabilities which cannot be easily factored out here and requires deep changes to the targets.
2024-12-14 09:09:47 +00:00
;
static constexpr auto BuiltinInfos = Builtin::MakeInfos<NumBuiltins>({
#define BUILTIN CLANG_BUILTIN_ENTRY
#define TARGET_BUILTIN CLANG_TARGET_BUILTIN_ENTRY
#include "clang/Basic/BuiltinsSystemZ.def"
});
const char *const SystemZTargetInfo::GCCRegNames[] = {
"r0", "r1", "r2", "r3", "r4", "r5", "r6", "r7",
"r8", "r9", "r10", "r11", "r12", "r13", "r14", "r15",
"f0", "f2", "f4", "f6", "f1", "f3", "f5", "f7",
"f8", "f10", "f12", "f14", "f9", "f11", "f13", "f15",
/*ap*/"", "cc", /*fp*/"", /*rp*/"", "a0", "a1",
"v16", "v18", "v20", "v22", "v17", "v19", "v21", "v23",
"v24", "v26", "v28", "v30", "v25", "v27", "v29", "v31"
};
const TargetInfo::AddlRegName GCCAddlRegNames[] = {
{{"v0"}, 16}, {{"v2"}, 17}, {{"v4"}, 18}, {{"v6"}, 19},
{{"v1"}, 20}, {{"v3"}, 21}, {{"v5"}, 22}, {{"v7"}, 23},
{{"v8"}, 24}, {{"v10"}, 25}, {{"v12"}, 26}, {{"v14"}, 27},
{{"v9"}, 28}, {{"v11"}, 29}, {{"v13"}, 30}, {{"v15"}, 31}
};
ArrayRef<const char *> SystemZTargetInfo::getGCCRegNames() const {
return llvm::ArrayRef(GCCRegNames);
}
ArrayRef<TargetInfo::AddlRegName> SystemZTargetInfo::getGCCAddlRegNames() const {
return llvm::ArrayRef(GCCAddlRegNames);
}
bool SystemZTargetInfo::validateAsmConstraint(
const char *&Name, TargetInfo::ConstraintInfo &Info) const {
switch (*Name) {
default:
return false;
case 'Z':
switch (Name[1]) {
default:
return false;
case 'Q': // Address with base and unsigned 12-bit displacement
case 'R': // Likewise, plus an index
case 'S': // Address with base and signed 20-bit displacement
case 'T': // Likewise, plus an index
break;
}
[[fallthrough]];
case 'a': // Address register
case 'd': // Data register (equivalent to 'r')
case 'f': // Floating-point register
case 'v': // Vector register
Info.setAllowsRegister();
return true;
case 'I': // Unsigned 8-bit constant
case 'J': // Unsigned 12-bit constant
case 'K': // Signed 16-bit constant
case 'L': // Signed 20-bit displacement (on all targets we support)
case 'M': // 0x7fffffff
return true;
case 'Q': // Memory with base and unsigned 12-bit displacement
case 'R': // Likewise, plus an index
case 'S': // Memory with base and signed 20-bit displacement
case 'T': // Likewise, plus an index
Info.setAllowsMemory();
return true;
}
}
struct ISANameRevision {
llvm::StringLiteral Name;
int ISARevisionID;
};
static constexpr ISANameRevision ISARevisions[] = {
{{"arch8"}, 8}, {{"z10"}, 8},
{{"arch9"}, 9}, {{"z196"}, 9},
{{"arch10"}, 10}, {{"zEC12"}, 10},
{{"arch11"}, 11}, {{"z13"}, 11},
{{"arch12"}, 12}, {{"z14"}, 12},
{{"arch13"}, 13}, {{"z15"}, 13},
{{"arch14"}, 14}, {{"z16"}, 14},
{{"arch15"}, 15}, {{"z17"}, 15},
};
int SystemZTargetInfo::getISARevision(StringRef Name) const {
const auto Rev =
llvm::find_if(ISARevisions, [Name](const ISANameRevision &CR) {
return CR.Name == Name;
});
if (Rev == std::end(ISARevisions))
return -1;
return Rev->ISARevisionID;
}
void SystemZTargetInfo::fillValidCPUList(
SmallVectorImpl<StringRef> &Values) const {
for (const ISANameRevision &Rev : ISARevisions)
Values.push_back(Rev.Name);
}
bool SystemZTargetInfo::hasFeature(StringRef Feature) const {
return llvm::StringSwitch<bool>(Feature)
.Case("systemz", true)
.Case("arch8", ISARevision >= 8)
.Case("arch9", ISARevision >= 9)
.Case("arch10", ISARevision >= 10)
.Case("arch11", ISARevision >= 11)
.Case("arch12", ISARevision >= 12)
.Case("arch13", ISARevision >= 13)
.Case("arch14", ISARevision >= 14)
.Case("arch15", ISARevision >= 15)
.Case("htm", HasTransactionalExecution)
.Case("vx", HasVector)
.Default(false);
}
unsigned SystemZTargetInfo::getMinGlobalAlign(uint64_t Size,
bool HasNonWeakDef) const {
// Don't enforce the minimum alignment on an external or weak symbol if
// -munaligned-symbols is passed.
if (UnalignedSymbols && !HasNonWeakDef)
return 0;
return MinGlobalAlign;
}
void SystemZTargetInfo::getTargetDefines(const LangOptions &Opts,
MacroBuilder &Builder) const {
Builder.defineMacro("__s390__");
Builder.defineMacro("__s390x__");
Builder.defineMacro("__zarch__");
Builder.defineMacro("__LONG_DOUBLE_128__");
Builder.defineMacro("__ARCH__", Twine(ISARevision));
Builder.defineMacro("__GCC_HAVE_SYNC_COMPARE_AND_SWAP_1");
Builder.defineMacro("__GCC_HAVE_SYNC_COMPARE_AND_SWAP_2");
Builder.defineMacro("__GCC_HAVE_SYNC_COMPARE_AND_SWAP_4");
Builder.defineMacro("__GCC_HAVE_SYNC_COMPARE_AND_SWAP_8");
if (HasTransactionalExecution)
Builder.defineMacro("__HTM__");
if (HasVector)
Builder.defineMacro("__VX__");
if (Opts.ZVector)
Builder.defineMacro("__VEC__", "10305");
/* Set __TARGET_LIB__ only if a value was given. If no value was given */
/* we rely on the LE headers to define __TARGET_LIB__. */
if (!getTriple().getOSVersion().empty()) {
llvm::VersionTuple V = getTriple().getOSVersion();
// Create string with form: 0xPVRRMMMM, where P=4
std::string Str("0x");
unsigned int Librel = 0x40000000;
Librel |= V.getMajor() << 24;
Librel |= (V.getMinor() ? V.getMinor().value() : 1) << 16;
Librel |= V.getSubminor() ? V.getSubminor().value() : 0;
Str += llvm::utohexstr(Librel);
Builder.defineMacro("__TARGET_LIB__", Str.c_str());
}
}
[StrTable] Switch Clang builtins to use string tables This both reapplies #118734, the initial attempt at this, and updates it significantly. First, it uses the newly added `StringTable` abstraction for string tables, and simplifies the construction to build the string table and info arrays separately. This should reduce any `constexpr` compile time memory or CPU cost of the original PR while significantly improving the APIs throughout. It also restructures the builtins to support sharding across several independent tables. This accomplishes two improvements from the original PR: 1) It improves the APIs used significantly. 2) When builtins are defined from different sources (like SVE vs MVE in AArch64), this allows each of them to build their own string table independently rather than having to merge the string tables and info structures. 3) It allows each shard to factor out a common prefix, often cutting the size of the strings needed for the builtins by a factor two. The second point is important both to allow different mechanisms of construction (for example a `.def` file and a tablegen'ed `.inc` file, or different tablegen'ed `.inc files), it also simply reduces the sizes of these tables which is valuable given how large they are in some cases. The third builds on that size reduction. Initially, we use this new sharding rather than merging tables in AArch64, LoongArch, RISCV, and X86. Mostly this helps ensure the system works, as without further changes these still push scaling limits. Subsequent commits will more deeply leverage the new structure, including using the prefix capabilities which cannot be easily factored out here and requires deep changes to the targets.
2024-12-14 09:09:47 +00:00
llvm::SmallVector<Builtin::InfosShard>
SystemZTargetInfo::getTargetBuiltins() const {
return {{&BuiltinStrings, BuiltinInfos}};
}