Syntaxes & Block Closures

Thu Sep 14 15:44:22 UTC 2000

Paul,

Paul Fernhout <pdfernhout at kurtz-fernhout.com> wrote:
> ..
> I'll be curious about whether you intend to work from a common new
> intermediate language format or if Smalltalk represents that?

The common intermediary I have intended is parse trees. However there
are many ways of doing that too, so I will make a concrete suggestion
and have handy ways of making homomorphism's to map to other
parse-trees. There are already several different: The compilers
ParseNode-subclasses, the T<X>Node's, TGen's (several) way of doing
things, the refactoring browser's. So the goal is to have one standard
kind of parse tree but that will not be achieved within months, so I
will map them to each other. And within one particular kind of
parse tree comments can be handled in different ways. That messes
things up, but have to be dealt with.

And of course the same kind of parse-tree can be interpreted in many
ways, etc. But useful things can be done anyway.

> I like the idea of reading back in C code (perhaps written out with
> useful comments), however that certainly adds to the challenge. I would
> suggest that part of the project be a second phase -- unless you
> consider writing that functionality in parallel with the generator to be
> essential for designing an integrated system (which might well be the
> case). 

Agreed, I will try to make a new version of C writing before C reading.
And I do consider having the reader essential, have written about that
earlier. I believe this will give immediate payoffs, people working with
Slang can consider it C if they want to. People not knowing Smalltalk
can use a C(like)-syntax for all of Smalltalk.

> Also, because I like indentational syntax languages, please keep
> in mind whether your parser for C might be extended to handle a
> whitespace (indentational) delimited language like Python, or parse and
> translate C written in indentational style (without braces). I'd be
> happy to discuss this at length in the future as I made a parser for
> Scheme (in Scheme) which reduces the need for edge parentheses (making
> Scheme look more like Python).

I have made parsers for both C and Python outside of Squeak, they will
be my models. Because I know them well. We can get back to the subject
later.

> One suggestion might be that classes or methods could have some sort of
> indicator (in a comment?) about translation rules adhered to or assumed
> when writing the method. That is, one might reference a "dialect" of
> translateable Smalltalk that adheres to certain idioms, in the way that
> the curren VM methods are in a subset of Smalltalk that translates
> easily to "C". Since various target languages have their own idioms,
> this might make it easier to accomplish a Smalltalk->otherLanguages
> solution, at a possible loss of generality in some cases. 
> 
> For example:
> 
> MyClass>>myForthishMethod
>   "Restrict: Smalltalk->Forth"
>   VM push: 10.
>   VM push: 20.
>   VM plus.
>   VM print.
> 
> MyClass>>myPythonishMethod
>   "Restrict: Smalltalk->Python (level 2)"
>   |foo|
>   foo := OrderedCollection new.
>   foo add: 1.
>   foo add: 2.
>   foo add: 3.
>   foo do: [:each | Transcript show: each.; cr.]
> 
> Note, like the current approach for translating to C, this requires the
> programmer to be aware of restrictions based on the target language and
> the current limits of the translation process. However, it would be
> useful in a method-by-method case to add immediate feedback on save such
> as viewing the output in the target language (or languages if perhaps
> more than one output is desired) or immediate feedback on
> difficult-to-translate idioms (or idioms beyond the current state of the
> translator).

Agreed. Some kind of such mechanisms can be useful. I made a posting
earlier that it will be useful to have the class definition in source
form. At present try writing a comment in a class definition, accept it
and the comment is thrown away. I looked into the matter then and saw
that it takes some work to do this but not that much. Instance variables
are added automatically sometimes, and ChangeSet's should be informed
appropriately, and of course, the text should be stored, fileOut's and
fileIn's are affected, perhaps a little more. That would also solve the
problem of instance variabel comments, the present way of doing it is
not going to work. Writing comments at another place. I fact there are
already examples of comments for variables that no longer exist, and
lack of comments for new variables. (And probably comments that
are no longer correct)

I strongly recommends this (saving source for class definitions). It will
create a space for your suggestions and also other similar things. Several
of these will be of immediate use for code generation. In fact in a variation
I'm considering of how types are defined for Slang it will be essential.

This is related to what is called metaclasses in CLOS. Something similar can
be done in Squeak but will probably meet resistance.

> As some general (perhaps unneeded) open source project management
> advice, I'd suggest if possible structuring your approach based on the
> time you expect to devote to it so that you invest no more than 5-10% of
> your time in each phase before having some sort of deliverable that
> people on the list can try out and give you feedback on.

Good idea. I will try to make a version for "public" testing soon. I hope I've
passed the 10%-level already though.

> Also, I'd suggest at the start deciding on a license for your work
> (Squeak-L-like?, MIT/X?, BSD-Revised?) for additions not covered by the
> Squeak-L if you want maximum participation and interest from others.
> Also, you should be sure that people contributing to your license are
> willing to license their contributions explicitly (in at least an email
> not or readme file) under compatible terms.

I know not these things. Perhaps you could make a concrete suggestion.
Copywright issues will get more interesting with automatic translators
don't you think?

> In any case, great initiative!

Thanks.

Paul Fernhout <pdfernhout at kurtz-fernhout.com> wrote:
> ..
> Another suggestion. For practicality, one might want to include a way
> (in a comment?) to specify language specific stuff in the target
> language or languages, like for example libraries to include in Python
> or header files to include in C. 
> Example: 
>   "Python: include mysupportmodule"

("include" should be "import")

>   "C: #include \"mysupportmodule.h\""
> [Actually those escaped double quotes won't work in a comment, now will
> they?]

Like this works:
   "C: #include ""mysupportmodule.h"""
as
   'C: #include ''mysupportmodule.h'''
would have for (squeak)strings.

> Later some of this could be generalized, perhaps as in this case:
>   self includeForTranslation: 'mysupportmodule'.
> Obviously all this stuff should be implemented later in the project, but
> it is might be good to bear it in mind as you design.

I consider include in C multiple inheritance for modules (= classes with
only class methods, no instances). (This is a place where Smalltalk can
learn from Python, IM(H;)O).
Python modules likewise. With some variation of how class definitions
are written that allows multiple inheritance to be expressed naturally I
would use that. The fact that multiple inheritance isn't available in
Squeak doesn't mean to me that it couldn't be naturally expressed
syntactically. Widening the syntax slightly increases reuse of the
parser and pretty printer. (To the point of being able to naturally
express any C program, Python program etc, given a few tricks)

It is a good idea not having pieces of code in strings behind the back
of the syntax machinery. This is already apparent in the present
Slang to C machinery. (Where inlining cannot be done of "C-quotes"
is used, the check for local variable use misses it, perhaps more).

> If you go down this route, you might consider adding triple quote
> comments or a triple quote string to Squeak, such as Python has, which
> would make embedding typical C code in a comment easier. (Obviously it
> would also make the Squeak code very non-standard Smalltalk.) 

One central goal is to make it possible for anyone interested in
syntaxes to be able to incorporate a new syntax and test it. With the
constraint that everthing should be bijective as far as possible, to keep
the system as a whole coherent despite that.

Being myself one of the mentioned "anyone"'s, I have a new string syntax,
which when added to C-syntax is as follows:

void f()
{
   if (p)
      print
       (\ everything after a backslash to the end of line including the
        \ end of line-marker
        \ gets collected to a string, note that ", ', other \-es and
        \ any character whatever
        \ is simply written, furthermore the string can be indented so
        \ the code looks good on the page.

        \ And each line is a token in itself, I dislike
        \ multiline tokens.
        // one reason for this is the possibility of making comments in
        // the middle of the constant and empty lines as above are ignored
        \ get it?
       );
   else
      print
        (\// it is designed to make code-writing code look good
         \   print("he's dog said 'this \"works\" as expected'\n");
         \   if (p)
         \      print
         \       (\ everything after a ..
         \        \ bla bla
         \        );
         \   else
         \      print("what's ");
        );
}

This is my favorite, one version can have $-expression like in Perl.
Excepting that, which is very useful, there are no subleties at all.

With the $-signs I have tried some home-brew Perl/Python-like scripting
languages. Quite nice, and I am of course completely objective at all
times:-)

As far as I know this could be it's first public description.

In Squeak $\ is used already, but the idea could be used.

> I think there has got to be a better solution for embedding code than
> comments (given the potential quote problem), but I'm not sure exactly
> what. 

It is important to try to avoid that. This can be done by making
the involved syntaxes a little more flexible, or inventing new ways of
interpreting existing syntactic constructs. I have in this way
eliminated the need to write C-code directly in strings as is done now.
(The quote problem doesn't exist, see above)

> One thing to think about because it is different (not necessarily
> duplicate) is "Guilt". Gordon Matzigkeit made a comment on Guilt at:
>   http://advogato.org/article/146.html
> And:
>   http://fig.org/figure/

from the site:
<quote>
FIG is the name of a simple philosophy that completely explains the
meaning of the universe. It is nothing more, and nothing less. 
</quote>

What can I say, it certainly is nothing more, it *is* something less.
Very much less, in fact.

I've had my share of hybris-fire too. If you know the guy, advice
him to read about Ikaros. Or perhaps "Tao te ching", for example
the part "if you are filled with tao, bevare of being filled with
yourself". Something like that. And for more logically minded people
there are the Goedel incompleteness theorem. A pearl. But of course
people who really need such insights don't get it that easy.

That's not entirely incompatible with doing work interesting for
computing however.

> [snip]
> I know this isn't exactly the direction you are working toward in
> translation, but I think there is something interesting here to think
> about for a multi-lingual system.

I have played around with similar things. But the main interest right
now is to give (restricted but nontrivial) control over the syntax to the
reader. Things like the above (and having classes in Smalltalk define
their syntax as in Smalltalk-76) make that more difficult but, I agree,
is interesting.

Since XML is gaining ground it is probably a good idea to use that. That
also addresses the problem of collecting program code with other text in
useful ways, hyperlinked comments, many possibilities. It also makes it
natural to mix programs and data. Despite it's rigidness I consider
XML/SGML worth using and hope some people will work out the details for
Squeak. It should be possible to connect it to my stuff since both are
tree-oriented. (As are morphs, classes, and many other things)

/Mats