23 Commits

Author SHA1 Message Date
Fangrui Song
e24457a330 [ELF] Migrate away from global ctx 2024-11-14 22:17:10 -08:00
Fangrui Song
cf57a670bb [ELF] ScriptParser: pass Ctx to ScriptParser and ScriptLexer. NFC 2024-09-21 11:06:06 -07:00
Fangrui Song
a7e8bddfc1 [ELF] Respect --sysroot for INCLUDE
If an included script is under the sysroot directory, when it opens an
absolute path file (`INPUT` or `GROUP`), add sysroot before the absolute
path. When the included script ends, the `isUnderSysroot` state is
restored.
2024-07-28 11:43:27 -07:00
Fangrui Song
8f72b0cb08 [ELF] Fix INCLUDE cycle detection
Fix #93947: the cycle detection mechanism added by
https://reviews.llvm.org/D37524 also disallowed including a file twice,
which is an unnecessary limitation.

Now that we have an include stack #100493, supporting multiple inclusion
is trivial. Note: a filename can be referenced with many different
paths, e.g. a.lds, ./a.lds, ././a.lds. We don't attempt to detect the
cycle in the earliest point.
2024-07-27 17:25:13 -07:00
Fangrui Song
9328c20cc8 [ELF] Track line number precisely
`getLineNumber` is both imprecise (when `INCLUDE` is used) and
inefficient (see https://reviews.llvm.org/D104137). Track line number
precisely now that we have `struct Buffer` abstraction from #100493.
2024-07-27 14:46:41 -07:00
Fangrui Song
2a89356d64 [ELF] Add till and rewrite while (... consume("}"))
After #100493, the idiom `while (!errorCount() && !consume("}"))` could
lead to inaccurate diagnostics or dead loops. Introduce till to change
the code pattern.
2024-07-26 17:13:37 -07:00
Fangrui Song
1978c21d96
[ELF] ScriptLexer: generate tokens lazily
The current tokenize-whole-file approach has a few limitations.

* Lack of state information: `maybeSplitExpr` is needed to parse
  expressions. It's infeasible to add new states to behave more like GNU
  ld.
* `readInclude` may insert tokens in the middle, leading to a time
  complexity issue with N-nested `INCLUDE`.
* line/column information for diagnostics are inaccurate, especially
  after an `INCLUDE`.
* `getLineNumber` cannot be made more efficient without significant code
  complexity and memory consumption. https://reviews.llvm.org/D104137

The patch switches to a traditional lexer that generates tokens lazily.

* `atEOF` behavior is modified: we need to call `peek` to determine EOF.
* `peek` and `next` cannot call `setError` upon `atEOF`.
* Since `consume` no longer reports an error upon `atEOF`, the idiom `while (!errorCount() && !consume(")"))`
  would cause a dead loop. Use `while (peek() != ")" && !atEOF()) { ... } expect(")")` instead.
* An include stack is introduced to handle `readInclude`. This can be
  utilized to address #93947 properly.
* `tokens` and `pos` are removed.
* `commandString` is reimplemented. Since it is used in -Map output,
  `\n` needs to be replaced with space.

Pull Request: https://github.com/llvm/llvm-project/pull/100493
2024-07-26 14:26:38 -07:00
Hongyu Chen
2ae862b74b
[ELF] Remove consumeLabel in ScriptLexer (#99567)
This commit removes `consumeLabel` since we can just use consume
function to have the same functionalities.
2024-07-23 22:03:46 -07:00
Hongyu Chen
b828c13f3c
[ELF] Delete peek2 in Lexer (#99790)
Thanks to Fangrui's change

28045ceab0
so peek2 can be removed.
2024-07-20 16:35:38 -07:00
Fangrui Song
43b13341fb
[ELF] Add internal InputFile (#78944)
Based on https://reviews.llvm.org/D45375 . Introduce a new InputFile
kind `InternalKind`, use it for

* `ctx.internalFile`: for linker-defined symbols and some synthesized
`Undefined`
* `createInternalFile`: for symbol assignments and --defsym

I picked "internal" instead of "synthetic" to avoid confusion with
SyntheticSection.

Currently a symbol's file is one of: nullptr, ObjKind, SharedKind,
BitcodeKind, BinaryKind. Now it's non-null (I plan to add an
`assert(file)` to Symbol::Symbol and change `toString(const InputFile
*)`
separately).

Debugging and error reporting gets improved. The immediate user-facing
difference is more descriptive "File" column in the --cref output. This
patch may unlock further simplification.

Currently each symbol assignment gets its own
`createInternalFile(cmd->location)`. Two symbol assignments in a linker
script do not share the same file. Making the file the same would be
nice, but would require non trivial code.
2024-01-22 09:09:46 -08:00
Nico Weber
87248ba5b1 [lld/elf] Use C++17 nested namespace syntax in most places
Like D131405, but for ELF.

No behavior change.

Differential Revision: https://reviews.llvm.org/D131612
2022-08-10 16:47:30 -04:00
Fangrui Song
27bb799095 [ELF] Clean up headers. NFC 2022-02-07 21:53:34 -08:00
Colin Cross
e387778722 [ELF] Optimize ScriptLexer::getLineNumber by caching the previous line number and offset
getLineNumber() was counting the number of line feeds from the start of
the buffer to the current token. For large linker scripts this became a
performance bottleneck. For one 4MB linker script over 4 minutes was
spent in getLineNumber's StringRef::count.

Store the line number from the last token, and only count the additional
line feeds since the last token.

Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D104137
2021-06-22 15:35:24 -07:00
Georgii Rymar
ae4279bd3e [LLD][ELF] - Linkerscript: report location for the "unclosed comment in a linker script" error.
Currently we print "error: unclosed comment in a linker script", which doesn't
provide information about the real error location.

Fixes https://bugs.llvm.org/show_bug.cgi?id=46793.

Differential revision: https://reviews.llvm.org/D84300
2020-07-24 11:38:26 +03:00
Fangrui Song
c384ca3c6a [ELF] For relative paths in INPUT() and GROUP(), search the directory of the current linker script before searching other paths
For a relative path in INPUT() or GROUP(), this patch changes the search order by adding the directory of the current linker script.
The new search order (consistent with GNU ld >= 2.35 regarding the new test `test/ELF/input-relative.s`):

1. the directory of the current linker script (GNU ld from Binutils 2.35 onwards; https://sourceware.org/bugzilla/show_bug.cgi?id=25806)
2. the current working directory
3. library paths (-L)

This behavior makes it convenient to replace a .so or .a with a linker script with additional input. For example, glibc

```
% cat /usr/lib/x86_64-linux-gnu/libm.a
/* GNU ld script
*/
OUTPUT_FORMAT(elf64-x86-64)
GROUP ( /usr/lib/x86_64-linux-gnu/libm-2.29.a /usr/lib/x86_64-linux-gnu/libmvec.a )
```

could be simplified as `GROUP(libm-2.29.a libmvec.a)`.

Another example is to make libc++.a a linker script:
```
INPUT(libc++.a.1 libc++abi.a)
```

Note, -l is not affected.

Reviewed By: psmith

Differential Revision: https://reviews.llvm.org/D77779
2020-04-22 12:34:20 -07:00
Rui Ueyama
3837f4273f [Coding style change] Rename variables so that they start with a lowercase letter
This patch is mechanically generated by clang-llvm-rename tool that I wrote
using Clang Refactoring Engine just for creating this patch. You can see the
source code of the tool at https://reviews.llvm.org/D64123. There's no manual
post-processing; you can generate the same patch by re-running the tool against
lld's code base.

Here is the main discussion thread to change the LLVM coding style:
https://lists.llvm.org/pipermail/llvm-dev/2019-February/130083.html
In the discussion thread, I proposed we use lld as a testbed for variable
naming scheme change, and this patch does that.

I chose to rename variables so that they are in camelCase, just because that
is a minimal change to make variables to start with a lowercase letter.

Note to downstream patch maintainers: if you are maintaining a downstream lld
repo, just rebasing ahead of this commit would cause massive merge conflicts
because this patch essentially changes every line in the lld subdirectory. But
there's a remedy.

clang-llvm-rename tool is a batch tool, so you can rename variables in your
downstream repo with the tool. Given that, here is how to rebase your repo to
a commit after the mass renaming:

1. rebase to the commit just before the mass variable renaming,
2. apply the tool to your downstream repo to mass-rename variables locally, and
3. rebase again to the head.

Most changes made by the tool should be identical for a downstream repo and
for the head, so at the step 3, almost all changes should be merged and
disappear. I'd expect that there would be some lines that you need to merge by
hand, but that shouldn't be too many.

Differential Revision: https://reviews.llvm.org/D64121

llvm-svn: 365595
2019-07-10 05:00:37 +00:00
Chandler Carruth
2946cd7010 Update the file headers across all of the LLVM projects in the monorepo
to reflect the new license.

We understand that people may be surprised that we're moving the header
entirely to discuss the new license. We checked this carefully with the
Foundation's lawyer and we believe this is the correct approach.

Essentially, all code in the project is now made available by the LLVM
project under our new license, so you will see that the license headers
include that license only. Some of our contributors have contributed
code under our old license, and accordingly, we have retained a copy of
our old license notice in the top-level files in each project and
repository.

llvm-svn: 351636
2019-01-19 08:50:56 +00:00
George Rimar
a46d08ebe6 [LLD][ELD] - Do not reject INFO output section type when used with a start address.
This is https://bugs.llvm.org/show_bug.cgi?id=38625

LLD accept this: 

".stack (INFO) : {", 

but not this:

".stack address_expression (INFO) :"

The patch fixes it.

Differential revision: https://reviews.llvm.org/D51027

llvm-svn: 340804
2018-08-28 08:39:21 +00:00
Rui Ueyama
3f851704c1 Move new lld's code to Common subdirectory.
New lld's files are spread under lib subdirectory, and it isn't easy
to find which files are actually maintained. This patch moves maintained
files to Common subdirectory.

Differential Revision: https://reviews.llvm.org/D37645

llvm-svn: 314719
2017-10-02 21:00:41 +00:00
George Rimar
ce6080819c [ELF] - Remove ScriptLexer::Error field and check ErrorCount instead.
D35945 introduces change when there is useless to check Error flag
in few places, but ErrorCount must be checked instead.

But then we probably can just check ErrorCount always. That should simplify
things. Patch do that.

Differential revision: https://reviews.llvm.org/D36266

llvm-svn: 310046
2017-08-04 10:34:14 +00:00
Rui Ueyama
f5fce48679 Handle ":" as a regular token character in linker scripts.
This is an alternative to https://reviews.llvm.org/D30500 to simplify the
version definition parser and allow ":" in symbol names.

Differential Revision: https://reviews.llvm.org/D30722

llvm-svn: 297402
2017-03-09 19:23:00 +00:00
Rui Ueyama
731a66ae98 Apply different tokenization rules to linker script expressions.
The linker script lexer is context-sensitive. In the regular context,
arithmetic operator characters are regular characters, but in the
expression context, they are independent tokens. This afects how the
lexer tokenizes "3*4", for example. (This kind of expression is real;
the Linux kernel uses it.)

This patch defines function `maybeSplitExpr`. This function splits the
current token into multiple expression tokens if the lexer is in the
expression context.

Differential Revision: https://reviews.llvm.org/D29963

llvm-svn: 295225
2017-02-15 19:58:17 +00:00
Rui Ueyama
794366a237 Rename ScriptParser.{cpp,h} -> ScriptLexer.{cpp,h}.
These files contain a lexer, so the new names are better.
The parser is in LinkerScript.{cpp,h}.

llvm-svn: 295022
2017-02-14 04:47:05 +00:00