NewCompiler weird ANSI

Sun May 27 21:33:03 UTC 2007

Lex Spoon a écrit :
> nicolas cellier <ncellier at ifrance.com> writes:
>> As stated in NewCompiler's code, the ANSI syntax:
> 
> These are fun challenges, Nicolas!  They will help us get a compiler
> that does precisely what we want from it.
> 
> In deciding the behavior we want, I would propose two principles, the
> first being stronger than the last:
> 
>   1. Be compatible on the trivial stuff.  Save Squeak-isms for places
>   where there is a real advantage.
> 
>   2. Lean towards accepting more rather than less.  Especially we
>   should try to accept things that other implementations accept.
> 

Very reasonnable.
But which dialect does interpret '- 1' the ANSI way?
Not VW, nor gst, nor stx... (didn't check Dolphin nor VA)

> 
> Here are my attempts to figure out what ANSI actually wants in these
> cases.  Be aware it is not always what the compiler is currently
> doing.
> 
> 
>> ClosureCompiler evaluate: '- 1'
>>
>> is WEIRD!
>> It answers -1 as a literal negative number, space not being
>> significant. BEWARE a tab or cr are significant in current
>> implementation (ANSI?)
>>
>> This is more confusing than usefull.
>> It makes people think of a prefixed operator like other languages.
> 
> The standard is clear.  A unary - is allowed to dangle way ahead of
> the the number literal that it modifies.
> 
> 
> Section 3.4.6.1 includes the appropriate rule.  Note the last
> sentence.
> 
> 
>       <number literal> ::= ['-'] <number>
>       <number> ::= integer | float | scaledDecimal
> 
>     If the preceding '-' is not present the value of the numeric
>     object is a positive number. If the '-' is present the value of
>     the numeric object is the negative number that is the negation of
>     the positive number defined by the <number> clause. White space is
>     allowed between the '-' and the <number>.
> 
> 
> As best as I can tell, the standard is factored this way so that you
> can divide your parser in the standard way into a tokenizer and a
> parser.  When the tokenizer sees "-1", it should divide it into two
> tokens, "-" and "1".  Then, the parser is free to interpreter this as
> either a literal, as in "x := -1", or a subtraction of 1, as in
> "y := x-1".
> 
> Since it's handled at the level of parsing, white space is allowed
> for consistency.  You can even put comments in there if you like.
> 
> That's my rationalization, anyway.  :) The rationale doesn't say, but
> the spec is clear that spaces are allowed there.
> 

This makes some sense. Though not implemented that way in new compiler 
(only character spaces are allowed, not logical spaces).

This is more the gramar rule which is questionable. It's not based on 
any dialect customs, nor historical roots. Maybe the fact that people 
coming from other language may appreciate...

> 
> 
>> Also, as already said, inside literal array #(- 1) space is
>> significant.
> 
> In this case, it should probably be the same as #(-1). The standard is
> generally too quiet about array literals, but in this case it does
> define a parse, so I guess we should use the standard parse.
> 
> If you want to parse a two-element array out of the above, you can
> write it as: #(#- 1) .
> 
> 

I would not like it. I prefer to understand the rule as:
1) a space separates two tokens
2) -1 is a single token not two, while - 1 is two tokens
3) a literal array is an array of literal tokens

My guess is that this was the rule in the mind of Smalltalk gods.
Only a guess...

Anyway, it's the actual behaviour of most Smalltalks.

> 
>  
>> And what about the sign of exponent?
>> ClosureCompiler evaluate: '-1.0e- 1'. Message not understood e
>> ((-1.0) e) - (1), so space is significant here.
> 
> According to the standard, the only place you can insert a space
> inside a number literal is if the whole literal starts with a "-".
> This (ugly) exception only applies at the beginning, and not in the
> example you give.
> 
> 

Not a brilliant example anyway, forget it.

> 
> 
>> Beside, as NewCompiler accepts minus as last character of a
>> multi-character binary selector, this causes further ambiguity.
>>
>> ClosureCompiler evaluate: '0--1'. is 1 (0-(-1)) last minus is attached
>> to digit because there is no space.
> 
> This code is incorrect by the standard, and I'd be happy with
> rejecting it.  ANSI uses normal old longest match.  Section 3.5 says:
> 
>     Unless otherwise specified, white space or another separator must
>     appear between any two tokens if the initial characters of the
>     second token would be a valid extension of the first token.
> 
> Thus, 0--1 should tokenize as "0", "--", "1", which then does not
> parse.
> 
> 
> 
> Lex Spoon
> 
> 
> 

Yes, just like #||, #-- is not standard. It is an extension.
For the very reason to avoid ambiguity caused when mixed with negative 
literal numbers if i understood it well.

It would be easy to have a pedantic interactive compiler forcing user to 
disambiguate (at least warning, or a menu proposing various 
interpretations with auto-inserting space action).

See:
http://lists.squeakfoundation.org/pipermail/squeak-dev/2006-May/thread.html#103895
http://lists.squeakfoundation.org/pipermail/squeak-dev/2006-May/104088.html
http://lists.squeakfoundation.org/pipermail/squeak-dev/2006-May/103907.html
etc... (binary selectors ambiguity and space)

http://bugs.squeak.org/view.php?id=3616