NewCompiler weird ANSI

Sat May 26 21:39:38 UTC 2007

nicolas cellier <ncellier at ifrance.com> writes:
> As stated in NewCompiler's code, the ANSI syntax:

These are fun challenges, Nicolas!  They will help us get a compiler
that does precisely what we want from it.

In deciding the behavior we want, I would propose two principles, the
first being stronger than the last:

  1. Be compatible on the trivial stuff.  Save Squeak-isms for places
  where there is a real advantage.

  2. Lean towards accepting more rather than less.  Especially we
  should try to accept things that other implementations accept.

Here are my attempts to figure out what ANSI actually wants in these
cases.  Be aware it is not always what the compiler is currently
doing.

> ClosureCompiler evaluate: '- 1'
> 
> is WEIRD!
> It answers -1 as a literal negative number, space not being
> significant. BEWARE a tab or cr are significant in current
> implementation (ANSI?)
> 
> This is more confusing than usefull.
> It makes people think of a prefixed operator like other languages.

The standard is clear.  A unary - is allowed to dangle way ahead of
the the number literal that it modifies.

Section 3.4.6.1 includes the appropriate rule.  Note the last
sentence.

      <number literal> ::= ['-'] <number>
      <number> ::= integer | float | scaledDecimal

    If the preceding '-' is not present the value of the numeric
    object is a positive number. If the '-' is present the value of
    the numeric object is the negative number that is the negation of
    the positive number defined by the <number> clause. White space is
    allowed between the '-' and the <number>.

As best as I can tell, the standard is factored this way so that you
can divide your parser in the standard way into a tokenizer and a
parser.  When the tokenizer sees "-1", it should divide it into two
tokens, "-" and "1".  Then, the parser is free to interpreter this as
either a literal, as in "x := -1", or a subtraction of 1, as in
"y := x-1".

Since it's handled at the level of parsing, white space is allowed
for consistency.  You can even put comments in there if you like.

That's my rationalization, anyway.  :) The rationale doesn't say, but
the spec is clear that spaces are allowed there.

> Also, as already said, inside literal array #(- 1) space is
> significant.

In this case, it should probably be the same as #(-1). The standard is
generally too quiet about array literals, but in this case it does
define a parse, so I guess we should use the standard parse.

If you want to parse a two-element array out of the above, you can
write it as: #(#- 1) .

> And what about the sign of exponent?
> ClosureCompiler evaluate: '-1.0e- 1'. Message not understood e
> ((-1.0) e) - (1), so space is significant here.

According to the standard, the only place you can insert a space
inside a number literal is if the whole literal starts with a "-".
This (ugly) exception only applies at the beginning, and not in the
example you give.

> Beside, as NewCompiler accepts minus as last character of a
> multi-character binary selector, this causes further ambiguity.
> 
> ClosureCompiler evaluate: '0--1'. is 1 (0-(-1)) last minus is attached
> to digit because there is no space.

This code is incorrect by the standard, and I'd be happy with
rejecting it.  ANSI uses normal old longest match.  Section 3.5 says:

    Unless otherwise specified, white space or another separator must
    appear between any two tokens if the initial characters of the
    second token would be a valid extension of the first token.

Thus, 0--1 should tokenize as "0", "--", "1", which then does not
parse.

Lex Spoon