Richard Smith 77091b167f Warn if we find a Unicode homoglyph for a symbol in an identifier.
Specifically, warn if:
 * we find a character that the language standard says we must treat as an
   identifier, and
 * that character is not reasonably an identifier character (it's a punctuation
   character or similar), and 
 * it renders identically to a valid non-identifier character in common
   fixed-width fonts.

Some tools "helpfully" substitute the surprising characters for the expected
characters, and replacing semicolons with Greek question marks is a common
"prank".

llvm-svn: 320697
2017-12-14 13:15:08 +00:00

41 lines
1.5 KiB
C
Raw Blame History

This file contains invisible Unicode characters

This file contains invisible Unicode characters that are indistinguishable to humans but may be processed differently by a computer. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

// RUN: %clang_cc1 -fsyntax-only -verify -x c -std=c11 %s
// RUN: %clang_cc1 -fsyntax-only -verify -x c++ -std=c++11 %s
// RUN: %clang_cc1 -E -DPP_ONLY=1 %s -o %t
// RUN: FileCheck --strict-whitespace --input-file=%t %s
// This file contains Unicode characters; please do not "fix" them!
extern int x; // expected-warning {{treating Unicode character as whitespace}}
extern int x; // expected-warning {{treating Unicode character as whitespace}}
// CHECK: extern int {{x}}
// CHECK: extern int {{x}}
#pragma mark ¡Unicode!
#define COPYRIGHT Copyright © 2012
#define XSTR(X) #X
#define STR(X) XSTR(X)
static const char *copyright = STR(COPYRIGHT); // no-warning
// CHECK: static const char *copyright = "Copyright © {{2012}}";
#if PP_ONLY
COPYRIGHT
// CHECK: Copyright © {{2012}}
CHECK: The preprocessor should not complain about Unicode characters like ©.
#endif
// A 🌹 by any other name....
extern int 🌹;
int 🌵(int 🌻) { return 🌻+ 1; }
int main () {
int 🌷 = 🌵(🌹);
return 🌷;
}
int n; = 3; // expected-warning {{treating Unicode character <U+037E> as identifier character rather than as ';' symbol}}
int *nv = &n;; // expected-warning 2{{treating Unicode character <U+A789> as identifier character rather than as ':' symbol}}
// expected-warning@-1 {{treating Unicode character <U+037E> as identifier character rather than as ';' symbol}}
int vautoreturnx; // expected-warning 12{{treating Unicode character}}