14 Commits

Author SHA1 Message Date
Dmitri Gribenko
6bab9113b0 Remove the useless CommentOptions class.
llvm-svn: 162986
2012-08-31 10:35:30 +00:00
Dmitri Gribenko
3ca956f715 Comment HTML tag name machers: move from StringSwitch to an efficient
TableGen-generated string matcher.

llvm-svn: 162969
2012-08-31 02:21:44 +00:00
Dmitri Gribenko
107618a6cb Comment parsing: parse "<blah" as an HTML tag only if "blah" is a known tag
name.  This should reduce the amount of warning false positives about bad HTML
in comments when the comment author intended to put a reference to a template.
This change will also enable us parse the comment as intended in these cases.

Fixes part 1 of PR13374.

llvm-svn: 162407
2012-08-22 22:56:08 +00:00
Dmitri Gribenko
ca7f80ada0 Comment parsing: extract TableGen'able pieces into new CommandTraits class.
llvm-svn: 161548
2012-08-09 00:03:17 +00:00
Dmitri Gribenko
4586df765e Implement resolving of HTML character references (named: &amp;, decimal: &#42;,
hex: &#x1a;) during comment parsing.

Now internal representation of plain text in comment AST does not contain
character references, but the characters themselves.

llvm-svn: 160891
2012-07-27 20:37:06 +00:00
Dmitri Gribenko
e4a3997d70 Comment parsing: don't parse whitespace before \endverbatim as a separate line of whitespace.
llvm-svn: 160464
2012-07-18 23:01:58 +00:00
Dmitri Gribenko
e00ffc7bb8 Comment parsing: repaint the bikesched: rename 'HTML open tags' to 'HTML start tags' and 'HTML close tags' to 'HTML end tags' according to HTML spec.
llvm-svn: 160153
2012-07-13 00:44:24 +00:00
Dmitri Gribenko
f26054f0fb Enable comment parsing and semantic analysis to emit diagnostics. A few
diagnostics implemented -- see testcases.

I created a new TableGen file for comment diagnostics,
DiagnosticCommentKinds.td, because comment diagnostics don't logically
fit into AST diagnostics file.  But I don't feel strongly about it.

This also implements support for self-closing HTML tags in comment
lexer and parser (for example, <br />).

In order to issue precise diagnostics CommentSema needs to know the
declaration the comment is attached to.  There is no easy way to find a decl by 
comment, so we match comments and decls in lockstep: after parsing one
declgroup we check if we have any new, not yet attached comments.  If we do --
then we do the usual comment-finding process.

It is interesting that this automatically handles trailing comments.
We pick up not only comments that precede the declaration, but also
comments that *follow* the declaration -- thanks to the lookahead in
the lexer: after parsing the declgroup we've consumed the semicolon
and looked ahead through comments.

Added -Wdocumentation-html flag for semantic HTML errors to allow the user to 
disable only HTML warnings (but not HTML parse errors, which we emit as
warnings in -Wdocumentation).

llvm-svn: 160078
2012-07-11 21:38:39 +00:00
Dmitri Gribenko
17709ae8d9 Comment lexing: fix lexing to actually work in non-error cases.
llvm-svn: 159963
2012-07-09 21:32:40 +00:00
Dmitri Gribenko
ec92531c29 Implement AST classes for comments, a real parser for Doxygen comments and a
very simple semantic analysis that just builds the AST; minor changes for lexer
to pick up source locations I didn't think about before.

Comments AST is modelled along the ideas of HTML AST: block and inline content.

* Block content is a paragraph or a command that has a paragraph as an argument
  or verbatim command.
* Inline content is placed within some block.  Inline content includes plain
  text, inline commands and HTML as tag soup.

llvm-svn: 159790
2012-07-06 00:28:32 +00:00
Dmitri Gribenko
632d58afab Fix an infinite loop in comment lexer: we were not advancing in the input character stream when we saw a '<' that is not a start of an HTML tag.
llvm-svn: 159303
2012-06-27 23:28:29 +00:00
Dmitri Gribenko
1669f70273 Remove unsigned and a pointer from a comment token (so that each token can have only one semantic string value attached to it), at a cost of adding an additional token.
llvm-svn: 159270
2012-06-27 16:53:58 +00:00
Dmitri Gribenko
60ddd8a1b1 Comment lexer: counting backwards from token end is thought to be confusing. We already have a pointer to the beginning of the token, so use it to extract the text instead.
llvm-svn: 159269
2012-06-27 16:30:35 +00:00
Dmitri Gribenko
5188c4b9cc Implement a lexer for structured comments.
llvm-svn: 159223
2012-06-26 20:39:18 +00:00