Ship it with Squeak

Marcel Weiher 320089961391-0001 at t-online.de
Sun Jul 2 10:56:44 UTC 2000


> From: "Jay Carlson" <nop at nop.com>

[ skins ->  C vs. Scheme semantics ]

Where did I say anything about bridging incompatible semantics?  How  
would this be useful?  Please, that's just a complete straw man.  I  
mean, I say I have a perfectly fine car and you say it doesn't fly  
you to the moon.  Well, flying to the moon has never been in the  
requirements for a perfectly fine car.

Once again, I was talking about "skins", like the different skins  
for Squeak-Smalltalk now showing.  An XML-based encoding  
(marshalling) of parse-trees could serve as a skin-neutral storage  
format.  And could probably be easily translated into (some) other OO  
languages as well.

> > Anyway, look at SOAP for an example of *easy* interoperability.  You 
> > can write SOAP commands using any old text editor, or even via
> > direct telnet connection.  That's very low overhead for trying
> > something out and doing things in an ad-hoc way.
>
> Doing this has no support for any kind of input-side validity.

Exactly!  The point is that extremely light-weight or ad-hoc access  
is possible without all that overhead, while at the same time fully  
validated access is just as easy (or easier) than with the  
heavy-weight equivalents.

[more of same snipped]

> > Try the same with a
> > CORBA ORB!!
>
> Why?  I can do the same kind of thing from bsh or Tcl or Python  
when I want
> to play around with CORBA interfaces.  The responsibility for  
validity is
> split between client and server.

Compare the overhead of the two solutions:

On the client side, a simple text editor or even just telnet, both  
of which are simply available vs. a whole programming language plus  
an integrated ORB, which has to implement IDL-parsing, IIOP  
parsing/generation and all the other services, none of which leverage  
common technologies but have to implemented just for CORBA!

On the server side, you can hook up one of the many, many XML  
parsers available off the shelf and then just read out the DOM (or  
handle the SAX events), again compared to a full CORBA ORB with all  
the trappings and no common components.

The point here is that the barriers to entry are just incredibly  
low.  If you want to do something very simple, you can do so with  
very simple means.  If you want to do something complex or highly  
secure, you need to spend more.  But the people who want to do  
something simple should not be penalized because some people want to  
do something complex, and it would be good if both could use the same  
basic protocol, and they can.

> > As for interoperability, the format is easy to
> > parse/generate with just about any language/OS I can think of.
>
> It's only easy to generate if you don't care about correctness and 
> robustness. String manipulation is a truly awful basis for the core of 
> computation, and I'll even give you an example of how it loses:
> http://www.cert.org/advisories/CA-2000-02.html

XML parsers and generators are commonly available and easy to write.  
 If you think XML-parsing should be done by regex-matching or  
strcmp(), you get what you deserve.  The CERT advisory clearly only  
talks about incorrect servers, and you will note that my scenarious  
have been about easy, ad-hoc *client* access.  Furthermore, the  
problem in the CERT advisory is really the insecurity of having  
script access enabled.  If you have that turned on, you're wide open  
to attacks anyhow, the only difference with being that the audience  
you're wide open to has increased.

Still I'll wager with you that a SOAP server requires a fraction of  
the code of a fully functioning CORBA ORB.  (Not that I am a SOAP  
fan, but the general idea of encoding messages as XML has a lot of  
appeal and SOAP is one spec I am aware of).

> Deeply, the problem is that generating HTML or XML is *not* about
> concatenating strings together.

Of course not, whowever said that it is?  For programmatic access,  
you should build the (simple) libs that do it properly.  However, you  
*can* reasonably build or edit a simple XML file in a text editor,  
which you can't reasonably do with various binary formats, and you  
can also *examine* XML files in a text editor.

>  Generating HTML or XML is about building
> trees of elements and then marshalling them into ASCII.

Yes of course.  Where did I say anything different?  I always groan  
when I see code that concatenates HTML/XML tags to strings in an  
ad-hoc fashion.

My own XML-framework (for Objective-C) both has a proper parser and  
a generator, with the generator being a type of EncodingFilter.

> OK, so maybe this is a straw man.

Exactly!

> You probably agree with me that real XML
> and SOAP applications need a library or set of classes to
> generate/parse/validate their data, since it's not strings.

My point from the start is that it is *both* trees of semantically  
rich objects *and* strings, though one has to take care, because it  
is not simple strings.

>   But once you
> have that library and code to it, the particular advantages of  
SOAP over
> (say) CORBA doesn't look so one-sided.

Yes it does.  You get to leverage common components, the overall  
complexity is much lower, you get a human-readable (and debuggable),  
standardized interchange format/wire protocol, you get easy ad-hoc  
usage, etc.

> Some of the magic binary bits are well-known and documented via  
OLE storage.

AFAIK, the whole word file format is now documented (somewhere).   
That doesn't change the fact that writing custom binary-format  
parsers is going to be a lot more complex than (a) just opening the  
XML file in a text-editor and *looking* at it  and/or (b) using one  
of dozens/hundreds of off-the-shelf XML tools with it.

> > XML files can be self-describing:
>
> Yes, I agree, with a qualification: XML files *can* have  
self-describing
> syntax.  But you still have to understand what that syntax means.

Yes, if the semantics are inherently complex or intentionally  
obscured, XML won't solve that, because, as I've explained before,  
that's outside of the domain of problems it is trying to solve.

However, with XML you at least have a good chance of understanding  
what is there, assuming that the creator of the schema is not an  
adversary trying to confuse you.

<Quantity>1</Quantity> vs  (hex)  fc00000001 or was that fc01000000?

> > > > Furthermore, it combines the worlds of UNIX (everything is a text 
> > > > file)
> > >
> > > This is a strawman (I hope).  If everything is a text file,  
why aren't
> > > vi+pipelines the only applications I use to edit and view things? 
> >
> > I think you misunderstood:  the "everything is a text file" is an 
> > assumption that most of the UNIX tools are built around.  It makes 
> > many powerful things possible, especially for *ad hoc* processing. 
> > However, it loses out on richer structures, for example those typical 
> > in OO systems.  XML is a way of combining the two worlds.
>
> Again, only if you're willing to give up correctness, or do a lot  
more work
> to try to patch around end cases.  The right way to process  
objects is as
> objects.

The "right" way depends on your application and on the economics of  
the situation.  For example, say you have two databases and you need  
to dump the contents of one into the other, once.  They have schemas  
that are principally compatible but differ slightly.

Now, you could (and it seems you argue one should) build complete  
class hierarchies modelling the schemas of the two databases,  
routines reading correct object-graphs out of one database  
(potentially running out of memory), transforming the object-graphs  
in memory (running out again) and then writing the resulting object  
graph to the other database.

Or you could dump one DB out to XML-format (with an off-the-shelf  
tool), use an off-the-shelf generic XML-transformation tool that  
remaps the file purely syntactically (with your knowledge of the  
semantics controlling the syntactic transformations) and import the  
resulting XML-file into database two.

Notice that I didn't say "use sed", because that won't work.

> Purely syntactic queries are limited to identity queries, which aren't 
> terribly useful.

Where do you get that idea? With an XML-aware tool, I could  
certainly do more, based on structure, context and values.  The user  
of the tools can have knowledge (or make assumptions) about the  
semantics.

>  You can move up to queries based on string operations (is
> this attribute value in the canonical Unicode sort order between  
"Carlson"
> and "Weiher"?) but that's not much better.

What would prevent me from doing queries based on structural or  
numerical relationships?  Structural queries are even in the base  
XPointer, XML query languages such as Quilt support rich queries  
expanding on those available in SQL.  ( see  
http://www.almaden.ibm.com/cs/people/chamberlin/quilt_euro.html)

> IMO what XML provides now is three things:
>
>   o  A not-so-awful generic syntax
o self describing, human readable, editable, robust
>   o  that has a framework for discussing, documenting, and  
standardizing the
> semantics, and
o is usable *without* a machine encoded semantics
o thus providing a low barrier to entry, just like HTML/HTTP
o but scales to higher demands just as well
>   o  A political atmosphere where information and service providers are 
> expected to do so.

> XML hype is *good* as long as it's a tool to accomplish that last  
goal.  But
> we shouldn't forget that it's just one of many possible vehicles to get 
> this, and even with full documentation of formats there is a vast  
amount of
> work in creating interoperation, which often starts with  
standardization on
> models.

Yes, in many cases that is the case, and it is happening.  Just  
watch www.oasis.org to see industries scrambling to get their common  
vocabularies defined.  But once again, it is possible to do partial,  
but correct, processing of files even when you don't know what half  
(or more) of it means.

Marcel





More information about the Squeak-dev mailing list