Syntax & Semantics [was: Re: [Enough already] Re: Proposal3:

Tue Jun 6 15:49:36 UTC 2000

I mostly just read this list, and I'm not very experienced with
Smalltalk, but I have some experience with this problem.

First, syntax is important.  With unchanging semantics, you
can win or lose depending on what you do to the syntax.  I
watched the Modula-3 crowd stick to their long and verbose
keywords and get ignored by a world full of people accustomed
to curly braces.  Alternate syntax was proposed, but it never
happened.  Perl's main appeal is that it makes it look easy
(easy syntax) to assemble and dismantle strings and manipulate
files and processes; there's nothing very exciting happening
semantically there that couldn't also be done in any other
recently popular language.  And, as was recently noted here,
BCPL is a lot prettier than its tasteless successors (I'm
a big fan -- I wrote an FP84 interpreter in BCPL under VM/CMS
once, and I've got an incredibly stale and slightly buggy
copy of TopExpress BCPL for the Mac stashed away somewhere).

Second, this problem is a lot harder than people seem to realize.
My most thorough experience with this is with Fortran, Fortran
ASTs (there's a concept) and a programming environment.  In most
language processing systems (compilers, interpreters, that sort of
thing) the first thing to leave the parse tree are the comments.
Many lexers often convert them to white space.  This does not
work for what has been proposed here; code needs comments.
In practice, not only must the comments be preserved, their
formatting must also be preserved, and they must in some way
"stick" to the code that they describe.  Just going from Fortran
to Fortran via the AST, we could get in trouble with the comments.
Beyond that, if the comments make reference to control flow
structures in the source languages (for instance, a "switch"
statement, as in "this switch statement blah-blah-blah") and
one language has "switch" and the other does not, well, that's
a problem.  People writing programs will need to be aware of
this if they are reading and writing code with other people
who use different dialects.

As far as unparsing goes, we had at least four written for
Fortran.  One was "card image" (aka "the ugly-printer"), another
was a conventional "pretty" printer, another was an AST dumper
that got augmented into a Lisp unparser, and the third was
unparsing for the screen (a different problem, what with
elided text, fonts, that sort of thing).

My suggestions are:

1. plan ahead.  It's hard to interparse between Smalltalk, Lisp and Java
   (assuming it can even be done) without garbling the comments that
   come with the code.  It is much easier to work with a
   family of pre-designed languages (just as Java was predesigned to
   resemble C) with identical underlying semantics, and syntax superficially
   resembling Lisp, C, Pascal, and Smalltalk, but with all the hard
   incompatibilities removed.  In general, the hardest incompatibility 
   is comments.  I would not remove them, but I'd think hard about
   integrating them into the parse tree, and having the "parsers" for
   these languages enforce various commenting conventions so that the
   comments could be translated.  In a world of good, graphical tools,
   it should be possible to view code side-by-side with other unparsings
   of that code.

2. Be very wary of "macros", at least of the CPP and M4 variety.

3. Read about other people's work here.  Robert Ballance did some
   work on this when he was at Berkeley, perhaps afterwards.  I
   know that there were some papers out of Rice after I left (1987),
   probably with authors Don Baker and/or Scott Warren, on this
   problem (just going from Fortran->AST->Fortran).

4. In particular, if you are going to pick alternate syntaxes, you
   might pick some that have been empirically found to be "better".
   I cannot recall the reference (it's within the last five years)
   but people have studied this sort of thing.  An example of a bad
   choice is C's default-fallthrough between arms of a case statement;
   better to have the default be "break", add a keyword for "fallthrough",
   and make it easier to express sets of values for the case labels
   (which would not be too hard if the comma-operator were also banished).

>The main architectural idea I am proposing is that as long as something 
>maps bijectively to a standard parse tree form (or a standard textual 
>form) it will work. To understand this it is important to note that this 
>is about syntax entirely. As soon as semantic issues are introduced it 
>becomes too complicated, at least for the time beeing.
>Note also the posting where i wrote:
>A B C 
>\ | / 
>S 
>/ | \ 
>X Y Z

At 03:16 PM 6/6/00 +0000, Mats Nygren wrote:
>Peter Crowther <Peter.Crowther at IT-IQ.com> wrote:
>> [snip]
>> This also opens up an interesting area of development: a 'pretty-printer'
>> that is optimised for seeing a particular kind of error, such as misplaced
>> else-parts in if-then-else.  You could open it on an existing parse tree and
>> view the code in a whole new light.
>
>It opens up for possibilities dont it? Graphical representations, icons
>for classnames, many different things.

David Chase
chase at world.std.com
drchase at alumni.rice.edu
http://www.incompetentsoftwarehucksters.com