An alternative FFI/Parser proposal

Andrew Tween amtween at
Sun Aug 20 10:11:41 UTC 2006

Hi all,
Having had a chance to ponder this, I think it could be evolved into a good

> Instead of insisting that the FFI syntax needs change, let's assume that
> there may be differences in the pragma formats. That has been true in
> the past and in my experience it is likely that it will be true and
> valuable in the future, too (like for example, if somebody wants to use
> blocks in a pragma).

Given that the FFI syntax isn't going to change; it is a *certainty* that there
are differences in pragma formats. So any solution should allow for that. If FFI
is allowed to have its own format, then my imaginary XYZ extension should also
be allowed to.

> A good way of dealing with these differences would be if a client could
> register specific pragmas which are parsed in a client dependent manner.
> So that, for example, the FFI would register <apicall:> and <cdecl:> for
> FFI specs and Tweak may register <on:> for parsing method triggers[*].
> In this case, Parser could simply invoke the proper client for a
> registered pragma, pass it a stream and let it decide what to do. Given
> a sufficient interface for client and Parser, this would leave the
> entire responsibility with the client instead of Parser, but Parser
> could still provide a default implementation.

The implementation could be simplified by allowed only the tokens after the
first keyword to vary, rather than everything in a <..>. This is enough for FFI,
and retains at least part of the pragma syntax. e.g.
    <a: i * j  b: k > is allowed.
    <a: i > j b: k>  is not allowed (can't have embedded > )
    <a b c d> is not allowed (can only be free form if first token is keyword)

(If an extension wants to allow embedded > then it can specify that they are
doubled i.e. >> )

 The parser can now collect the data from the stream for each <...> construct.
i.e. record the start point, skip all tokens until '>' is reached, and then
store the source from start to end. These are then recorded as AngleConstructs
(or whatever).

For example,
    <a: i * j b: k> produces this AngleConstruct
            selector: #a:
            source: 'a: i * j b: k' )

(Note that the b: keyword part does NOT form part of the selector)

Having parsed all "pragmas" as free form angleConstructs, the parser then
decides what to do with each one.

angleConstructs do:[angle |
    self compilerExtensions
         detect:[extension |
            (handled := extension canHandle: angle)
                    (extension compile: angle for: self)
                    (realPragma := extension pragmaFor: angle)
                        ifNotNil:[pragmas addLast: realPragma]].
         ifNone:["error - no handler for this <...> "]].

with some extra error handling etc.
The key point is that there is a sequence of compilerExtensions, and so there is
a precedence. Currently the order will be

Each handler performs its own parse of the AngleConstruct's source.

Each handler can determine whether to record each of the angle constructs it
handles as a real Pragma object (or specialized kind of Pragma etc).
So, for example,  <a: i * j b: k> could be stored as...
    MyPragma(selector: #a:b: , arguments: #( 'i * j' 'k' ))
 or as
    MyPragma(selector: #a: , arguments: #( 'i * j' #b: 'k'))

If a handler chooses to add a Pragma (or specialized form of Pragma), then all
the searching senders free stuff will be utilised.
This could be extended to allow a handler to add any number of real Pragmas. For
    <a: 1 ; b:2 ; c: 3> might result in Pragmas (#a #(1)) , (#b #(2)) and (#c

FFI compilation (rightly) fails if FFI is not installed, because the call syntax
is such that an FFI call can never be a valid Pragma. The distinct syntax can
therefore be seen as an advantage, rather than a disadvantage.

As Andreas has previously stated (in this, or another, thread) the specialized
Pragmas can also deal with decompilation.

> [*] The main reason for Tweak to parse triggers separately is to provide
> semantic checks. For example, the <on: event in: signaler> annotation
> requires the signaler to be a field of the receiver. Being able to hook
> into the parse in this way can be useful for other kinds of semantic
> assertions.
> Unfortunately, there are also a couple of gotchas with the proposal:
> Most importantly, it requires that any parser can hand off the current
> input stream to a client and continue after it's getting the stream
> back. Not sure if all parsers could easily do that. In addition, the

My suggestion avoids that problem. I am not sure what the cost is - efficiency

> client would need to have sufficient access to the parser to perform
> whatever action it requires, including (potentially) correction or error
> handling. This may be tricky since the existing parsers have no common
> protocols for that. Lastly there is an issue with what exactly should

I've only sketched out a protocol in one direction. I haven't considered how the
extensions talk to the parser. But I don't think it would be too complex. What
should the interface be?

> happen if we're trying to parse a pragma but lack the proper support
> (like parsing an FFI spec without the FFI being present). I'd rather
> have it if that the parser is aware of such problems and can raise an
> error instead of trying to get the user to use something that won't work
> anyway, but this may not be possible.

I think that the 'trick' is to ensure that a non-pragma syntax is used, so that
compilation fails when the extension is missing (assuming, of course, that if
the extension is missing, then so is its parser/compiler extensions)

> In any case, this is a clear alternative that offers the same benefits
> of the original proposal ("clean", "extensible") while avoiding
> fundamentally breaking the FFI for no good reason. If anyone were to
> implement that proposal it would certainly find my support.
> Cheers,
>    - Andreas

More information about the Squeak-dev mailing list