-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Parsing error on caret-escaped special characters #111
Comments
This one is a bit cursed... Comment = ^{; Anyway, we'll definitely need to handle these caret-notation escapes as tokens in the lexer rather than high-level syntax structures in the parser. Unlike most of the things in the lexer, these things are annoyingly context-dependent. Here's how freepascal handles them: '''','#','^' :
begin
len:=0;
cstringpattern:='';
iswidestring:=false;
if c='^' then
begin
readchar;
c:=upcase(c);
if (block_type in [bt_type,bt_const_type,bt_var_type]) or
(lasttoken=_ID) or (lasttoken=_NIL) or (lasttoken=_OPERATOR) or
(lasttoken=_RKLAMMER) or (lasttoken=_RECKKLAMMER) or (lasttoken=_CARET) then
begin
token:=_CARET;
goto exit_label;
end
else
begin
inc(len);
setlength(cstringpattern,256);
if c<#64 then
cstringpattern[len]:=chr(ord(c)+64)
else
cstringpattern[len]:=chr(ord(c)-64);
readchar;
end;
end; Notes:
Lexing rules:
Expression rules:
|
This was a naive thought, we definitely do need to know what block we're in while lexing this token. This one is significantly complicated by ANTLR making the lexer and parser totally independent, which isn't generally how Pascal is parsed. It's simply not a context-free grammar. |
Yeah I worked that out the hard way when I tried implementing a lexer for this 😆 .
I don't know what level of freedom you have in the ANTLR lexer, but my idea for how to implement this in a pure 'lexer' was to maintain a stack of contexts (e.g. Another implementation option is to lex Yet another option is to just say we won't support this... |
Prerequisites
SonarDelphi version
1.0.0
SonarQube version
No response
Issue description
An undocumented Delphi feature (carried forward from TurboPascal) are escaped 'control characters'.
For example
const CtrlC = ^C;
While it may be intended for use with visible characters, it's valid to escape any
single-byteascii character with this technique.The relevant section of the grammar in SonarDelphi currently has a few problems
^
; only one (single-byteascii) character may be escaped{
are handled by the 'hidden' channel and never make it to this sectionFor more info about the caret-escaped characters, see:
Steps to reproduce
Run SonarDelphi on the following program
observe the error
Minimal Delphi code exhibiting the issue
No response
The text was updated successfully, but these errors were encountered: