diff --git a/flang/C++style.md b/flang/C++style.md index acc995c6ccc1..e5889d7cc889 100755 --- a/flang/C++style.md +++ b/flang/C++style.md @@ -20,6 +20,8 @@ in foo.cc.) 1. In the source file "foo.cc", put the #include of "foo.h" first. Then #include other project headers in alphabetic order; then C++ standard headers, also alphabetically; then C and system headers. +1. Don't include the standard iostream header. If you need it for debugging, +remove the inclusion before committing. ### Naming 1. C++ names that correspond to STL names should look like those STL names (e.g., *clear()* and *size()* member functions in a class that implements @@ -40,7 +42,7 @@ especially when you can declare them directly in a for()/while()/if() condition. Otherwise, prefer complete English words to abbreviations when creating names. ### Commentary -1. Use // for all comments except for short notes within statements. +1. Use // for all comments except for short notes within expressions. 1. When // follows code on a line, precede it with two spaces. 1. Comments should matter. Assume that the reader knows current C++ at least as well as you do and avoid distracting her by calling out usage of new diff --git a/flang/ParserCombinators.md b/flang/ParserCombinators.md new file mode 100644 index 000000000000..82032b692f68 --- /dev/null +++ b/flang/ParserCombinators.md @@ -0,0 +1,145 @@ +## Concept +The Fortran language recognizer here can be classified as an LL recursive +descent parser. It is composed from a *parser combinator* library that +defines a few fundamental parsers and a few ways to compose them into more +powerful parsers. + +For our purposes here, a *parser* is any object that can attempt to recognize +an instance of some syntax from an input stream. It may succeed or fail. +On success, it may return some semantic value to its caller. + +In C++ terms, a parser is any instance of a class that +1. has a *constexpr* default constructor, +1. defines a resultType type, and +1. provides a member or static function that accepts a pointer to a +ParseState as its argument and returns a std::optional as a +result, with the presence or absence of a value in the std::optional<> +signifying success or failure, respectively. + +> std::optional Parse(ParseState *) const; + +The resultType of a parser is typically the class type of some particular +node type in the parse tree. + +*ParseState* is a class that encapsulates a position in the source stream, +collects messages, and holds a few state flags that determive tokenization +(e.g., are we in a character literal?). Instances of *ParseState* are +independent and complete -- they are cheap to duplicate whenever necessary to +implement backtracking. + +The constexpr default constructor of a parser is important. The functions +(below) that operate on instances of parsers are themselves all constexpr. +This use of compile-time expressions allows the entirety of a recursive +descent parser for a language to be constructed at compilation time through +the use of templates. + +### Fundamental Predefined Parsers +These objects and functions are (or return) the fundamental parsers: + +* *ok* is a trivial parser that always succeeds without advancing. +* "pure(x)" returns a trivial parser that always succeeds without advancing, + returning some value *x*. +* "fail(msg)" denotes a trivial parser that always fails, emitting the + given message. The template parameter is the type of the value that + the parser never returns. +* *cut* is a trivial parser that always fails silently. +* "guard(pred)" returns a parser that succeeds if and only if the predicate + expression evaluates to true. +* *rawNextChar* returns the next raw character, and fails at EOF. +* *cookedNextChar* returns the next character after preprocessing, skipping + Fortran line continuations and comments; it also fails at EOF + +### Combinators +These functions and operators combine parsers to generate new parsers. + +* "!p" succeeds if p fails, and fails if p succeeds. +* "p >> q" fails if p does, otherwise running q and returning its value when + it succeeds. +* "p / q" fails if p does, otherwise running q and returning *p's* value + if q succeeds. +* "p || q" succeeds if p does, otherwise running q. The two parsers must + have the same type, and the value returned by the first succeeding parser + is the value of the combination. +* "lookAhead(p)" succeeds if p does, but doesn't modify any state. +* "attempt(p)" succeeds if p does, safely preserving state on failure. +* "many(p)" recognizes a greedy sequence of zero or more nonempty successes + of *p*, and returns std::list<> of their values. It always succeeds. +* "some(p)" recognized a greedy sequence of one or more successes of *p*. + It fails if p immediately fails. +* "skipMany(p)" is the same as "many(p)", but it discards the results. +* "maybe(p)" tries to match *p*, returning an "std::optional" value. + It always succeeds. +* "defaulted(p)" matches *p*, and when *p* fails it returns a + default-constructed instance of *p*'s resultType. It always succeeds. +* "nonemptySeparated(p, q)" repeatedly matches "p q p q p q ... p", + returning a std::list<> of only the values of the p's. It fails if + *p* immediately fails. +* "extension(p)" parses *p* if strict standard compliance is disabled, + or with a warning if nonstandard usage warnings are enabled. +* "deprecated(p)" parses *p* if strict standard compliance is disabled, + with a warning if deprecated usage warnings are enabled. +* "inContext(..., p)" runs *p* within an error message context. + +Note that "a >> b >> c / d / e" matches a sequence of five parsers, +but returns only the result that was obtained by matching c. + +### Applicatives +The following *applicative* combinators combine parsers and modify or +collect the values that they return. + +* "construct{}(p1, p2, ...)" matches zero or more parsers in succession, + collecting their results and then passing them with move semantics to a + constructor for the type *T* if they all succeed. +* "applyFunction(f, p1, p2, ...)" matches one or more parsers in succession, + collecting their results and passing them as rvalue reference arguments to + some function, returning its result. +* "applyLambda([](&&x){}, p1, p2, ...)" is the same thing, but for lambdas + and other function objects. +* "applyMem(mf, p1, p2, ...)" is the same thing, but invokes a member + function of the result of the first parser for updates in place. + +### Non-Advancing State Inquiries and Updates +These are non-advancing state inquiry and update parsers: + +* *getColumn* returns the 1-based column position. +* *inCharLiteral* succeeds under withinCharLiteral. +* *inFortran* succeeds unless in a preprocessing directive. +* *inFixedForm* succeeds in fixed-form source. +* *setInFixedForm* sets the fixed-form flag, returning its prior value. +* *columns* returns the 1-based column number after which source is clipped. +* "setColumns(c)" sets the column limit and returns its prior value. + +### Monadic Combination +When parsing depends on the result values of earlier parses, the +"monadic bind" combinator is available. +Please try to avoid using it, as it makes automatic analysis of the +grammar difficult. +It has the syntax "p >>= f", and it constructs a parser that matches p, +yielding some value x on success, then matches the parser returned from +the function call "f(x)". + +### Token Parsers +Last, we have these basic parsers on which the actual grammar of the Fortran +is built. All of the following parsers consume characters acquired from +*cookedNextChar*. + +* *spaces* always succeeds after consuming any spaces or tabs +* *digit* matches one cooked decimal digit (0-9) +* *letter* matches one cooked letter (A-Z) +* "CharMatch<'c'>{}" matches one specific cooked character. +* "..."_tok match the content of the string, skipping spaces before and + after, and with multiple spaces accepted for any internal space. + (Note that the _tok suffix is optional when the parser appears before + the combinator ">>" or after "/".) +* "parenthesized(p)" is shorthand for "(" >> p / ")". +* "bracketed(p)" is shorthand for "[" >> p / "]". +* "withinCharLiteral(p)" applies the parser *p*, tokenizing for + CHARACTER/Hollerith literals. +* "nonEmptyListOf(p)" matches a comma-separated list of one or more + instances of *p*. +* "optionalListOf(p)" is the same thing, but can be empty, and always succeeds. + +### Debugging Parser +Last, the parser "..."_debug emit the string to the standard error and succeeds. +It is useful for tracing while debugging a parser but should obviously not +be committed for production code. diff --git a/flang/parser-combinators.txt b/flang/parser-combinators.txt deleted file mode 100644 index 456d47e13508..000000000000 --- a/flang/parser-combinators.txt +++ /dev/null @@ -1,127 +0,0 @@ -The Fortran language recognizer here is an LL recursive descent parser -composed from a "parser combinator" library that defines a few fundamental -parsers and a few ways to compose them into more powerful parsers. - -For our purposes here, a *parser* is any object that can attempt to recognize -an instance of some syntax from an input stream. It may succeed or fail. -On success, it may return some semantic value to its caller. - -In C++ terms, a parser is any instance of a class that - (1) has a constexpr default constructor, - (2) defines a resultType typedef, and - (3) provides a member or static function - - std::optional Parse(ParseState *) const; - static std::optional Parse(ParseState *); - - that accepts a pointer to a ParseState as its argument and returns - a std::optional as a result, with the presence or absence - of a value in the std::optional<> signifying success or failure - respectively. - -The resultType of a parser is typically the class type of some particular -node type in the parse tree. - -ParseState is a class that encapsulates a position in the source stream, -collects messages, and holds a few state flags that can affect tokenization -(e.g., are we in a character literal?). Instances of ParseState are -independent and complete -- they are cheap to duplicate when necessary to -implement backtracking. - -The constexpr default constructor of a parser is important. The functions -(below) that operate on instances of parsers are themselves all constexpr. -This use of compile-time expressions allows the entirety of a recursive -descent parser for a language to be constructed at compilation time through -the use of templates. - -These objects and functions are (or return) the fundamental parsers: - - ok always succeeds without advancing - pure(x) always succeeds without advancing, returning some value x - fail(msg) always fails with the given message; optionally typed - cut always fails, with no message - guard(pred) succeeds if the predicate expression evaluates to true - rawNextChar returns the next raw character; fails at EOF - cookedNextChar returns the next character after preprocessing, skipping - Fortran line continuations and comments; fails at EOF - -These functions and operators generate new parsers from combinations of -other parsers: - - !p ok if p fails, cut if p succeeds - p >> q match p, then q, returning q's value - p / q match p, then q, returning p's value - p || q match p if it succeeds, else match q; p and q must be same type - lookAhead(p) succeeds iff p does, but doesn't modify state - attempt(p) succeeds iff p does, safely preserving state on failure - many(p) a greedy sequence of zero or more nonempty successes of p; - returns std::list<> of values - some(p) a greedy sequence of one or more successes of p - skipMany(p) same as many(p), but discards result (performance optimizer) - maybe(p) try to match p, returning optional - defaulted(p) matches p, or else returns a default-constructed instance - of p's resultType - nonemptySeparated(p, q) repeatedly match p q p q p q ... p, returning - the values of the p's - extension(p) parses p if strict standard compliance is disabled, - with a warning if nonstandard usage warnings are enabled - deprecated(p) parses p if strict standard compliance is disabled, - with a warning if deprecated usage warnings are enabled - inContext("...", p) run p within an error message context - -Note that "a >> b >> c / d / e" matches a sequence of five parsers, -but returns only the result that was obtained by matching c. - -The following "applicative" combinators modify or combine the values returned -by parsers: - - construct{}(p1, p2, ...) - matches zero or more parsers in succession, collecting their - results and then passing them with move semantics to a - constructor for the type T if they all succeed - applyFunction(f, p1, p2, ...) - matches one or more parsers in succession, collecting their - results and passing them as rvalue reference arguments to - some function, returning its result - applyLambda([](&&x){}, p1, p2, ...) - is the same thing, but for lambdas and other function objects - applyMem(mf, p1, p2, ...) - is the same thing, but invokes a member function of the - result of the first parser - -These are non-advancing state inquiry and update parsers: - - getColumn returns 1-based column position - inCharLiteral succeeds under withinCharLiteral - inFortran succeeds unless in a preprocessing directive - inFixedForm succeeds in fixed-form source - setInFixedForm sets the fixed-form flag, returns prior value - columns returns the 1-based column number after which source is clipped - setColumns(c) sets "columns", returns prior value - -When parsing depends on the result values of earlier parses, the -"monadic bind" combinator is available (but please try to avoid using it, -as it makes automatic analysis of the grammar difficult): - - p >>= f match p, yielding some value x on success, then match the - parser returned from the function call f(x) - -Last, we have these basic parsers on which the actual grammar of the Fortran -is built. All of the following parsers consume characters acquired from -"cookedNextChar". - - spaces always succeeds after consuming any spaces or tabs - digit matches one cooked decimal digit (0-9) - letter matches one cooked letter (A-Z) - CharMatch<'c'>{} matches one specific cooked character - "..."_tok match contents, skipping spaces before and after, and - with multiple spaces accepted for any internal space - "..." >> p the tok suffix is optional on a string before >> and after / - parenthesized(p) shorthand for "(" >> p / ")" - bracketed(p) shorthand for "[" >> p / "]" - - withinCharLiteral(p) apply p, tokenizing for CHARACTER/Hollerith literals - nonEmptyListOf(p) matches a comma-separated list of one or more p's - optionalListOf(p) ditto, but can be empty - - "..."_debug emit the string and succeed, for parser debugging