mirror of
https://github.com/llvm/llvm-project.git
synced 2025-04-19 08:56:42 +00:00

Implements [[ https://wg21.link/p2071r1 | P2071 Named Universal Character Escapes ]] - as an extension in all language mode, the patch not warn in c++23 mode will be done later once this paper is plenary approved (in July). We add * A code generator that transforms `UnicodeData.txt` and `NameAliases.txt` to a space efficient data structure that can be queried in `O(NameLength)` * A set of functions in `Unicode.h` to query that data, including * A function to find an exact match of a given Unicode character name * A function to perform a loose (ignoring case, space, underscore, medial hyphen) matching * A function returning the best matching codepoint for a given string per edit distance * Support of `\N{}` escape sequences in String and character Literals, with loose and typos diagnostics/fixits * Support of `\N{}` as UCN with loose matching diagnostics/fixits. Loose matching is considered an error to match closely the semantics of P2071. The generated data contributes to 280kB of data to the binaries. `UnicodeData.txt` and `NameAliases.txt` are not committed to the repository in this patch, and regenerating the data is a manual process. Reviewed By: tahonermann Differential Revision: https://reviews.llvm.org/D123064
30 lines
1.8 KiB
C
30 lines
1.8 KiB
C
// RUN: not %clang_cc1 -fsyntax-only -fdiagnostics-parseable-fixits %s 2>&1 | FileCheck -check-prefix=CHECK-MACHINE %s
|
|
const char*
|
|
\N{GREEK_SMALL_LETTER-OMICRON} = // expected-error {{'GREEK_SMALL_LETTER-OMICRON' is not a valid Unicode character name}} \
|
|
// expected-note {{sensitive to case and whitespaces}}
|
|
// CHECK-MACHINE: fix-it:"{{.*}}":{[[@LINE-2]]:4-[[@LINE-2]]:30}:"GREEK SMALL LETTER OMICRON"
|
|
|
|
"\N{zero width no break space}" // expected-error {{'zero width no break space' is not a valid Unicode character name}} \
|
|
// expected-note {{sensitive to case and whitespaces}}
|
|
// CHECK-MACHINE: fix-it:"{{.*}}":{[[@LINE-2]]:5-[[@LINE-2]]:30}:"ZERO WIDTH NO-BREAK SPACE"
|
|
|
|
"abc\N{MAN IN A BUSINESS SUIT LEVITATING}" // expected-error {{'MAN IN A BUSINESS SUIT LEVITATING' is not a valid Unicode character name}} \
|
|
// expected-note {{did you mean MAN IN BUSINESS SUIT LEVITATING ('🕴' U+1F574)?}}
|
|
// CHECK-MACHINE: fix-it:"{{.*}}":{[[@LINE-2]]:8-[[@LINE-2]]:41}:"MAN IN BUSINESS SUIT LEVITATING"
|
|
|
|
"\N{AAA}" // expected-error {{'AAA' is not a valid Unicode character name}} \
|
|
// expected-note 5{{did you mean}}
|
|
// CHECK-MACHINE: fix-it:"{{.*}}":{[[@LINE-2]]:5-[[@LINE-2]]:8}:"ANT"
|
|
// CHECK-MACHINE: fix-it:"{{.*}}":{[[@LINE-3]]:5-[[@LINE-3]]:8}:"ARC"
|
|
// CHECK-MACHINE: fix-it:"{{.*}}":{[[@LINE-4]]:5-[[@LINE-4]]:8}:"AXE"
|
|
// CHECK-MACHINE: fix-it:"{{.*}}":{[[@LINE-5]]:5-[[@LINE-5]]:8}:"BAT"
|
|
// CHECK-MACHINE: fix-it:"{{.*}}":{[[@LINE-6]]:5-[[@LINE-6]]:8}:"CAT"
|
|
|
|
"\N{BLACKCHESSBISHOP}" // expected-error {{'BLACKCHESSBISHOP' is not a valid Unicode character name}} \
|
|
// expected-note {{sensitive to case and whitespaces}}
|
|
// CHECK-MACHINE: fix-it:"{{.*}}":{[[@LINE-2]]:5-[[@LINE-2]]:21}:"BLACK CHESS BISHOP"
|
|
|
|
;
|
|
|
|
|