[ENH] Display := when pretty printing ( [sm][et][er][cd] [approved] )

ducasse ducasse at iam.unibe.ch
Thu Oct 16 05:59:01 UTC 2003


On Jeudi, oct 16, 2003, at 02:02 Europe/Zurich, Richard A. O'Keefe 
wrote:

> 	You are right and I'm wrong.this was easy because I'm complete
> 	naive in this topic.
> 	
> I'm assuming this is sarcasm.
No it was not. ASCII seems to be really bad.

>
> 	My first name is spelled with an accent é so this is just now
> 	that I start to receive international mail, bills or email not
> 	written st&*phane.
>
> ISO 8859-1 is has been around for over 15 years, hasn't it?
> MS-DOS codepage 850 had all the ISO 8859-1 characters, just not in
> the same places.  (And Xerox were using the 16-bit XNS character set
> back in 1984, at least.)
>
> So it's taken _this_ look for people to catch on that ASCII's dead?
>
> 	So sure ASCII is bad.  Now Squeak can have a really fancy
> 	wonderful character set.
>
> I don't say "fancy".  In fact, when it comes to Unicode, I say
> "insanely complicated".  I don't say "wonderful", just "better than 
> ASCII".
> The important thing about Unicode is that it's *one* character set that
> the computing world is converging on.  Windows, MacOS, and UNIX all 
> support
> it.  Perl, Python, and even Tcl support it, not to mention Java, Ada, 
> C++,
> and C.  There are several editors out there that support Unicode, sam 
> and
> Yudit spring to mind.  XML is based on it.  It's the one character set 
> we
> _have_ to have if we're not going to have the Python folk sniggering 
> at us.

Ok I learned something.
Do yo mean (I just read some python doc and saw they have u"blal" for a 
unicode string.) that
their method source code in stored using unicode?

So what is the path to go there?


>
> 	Three remarks:
> 	- My point that it should be consistent,
> 	
> I don't understand.  What should be consistent with what?

between fonts size, fonts, (you know that you do not get _ and <- 
depending where you are in which fonts, styles...)

>
> 	- Then I really wonder why Squeak that tries to be ANSI (you
> 	were the first one to claim that for initialize) is not ANSI
> 	with assignemnt.  This is fun to have different compatibility
> 	policy.  But I can live with that because I do not care about a
> 	bad standard.
> 	
> I'm having a bit of trouble parsing that, so I'll respond to what I
> _think_ you meant, fully aware that I could be quite wrong.
>
> The ANSI Smalltalk standard is flawed in a number of ways.  It looks as
> though nobody bothered to do any proof-reading at all.  It is sloppy; 
> it
> looks as though very little thought went into corner cases.  It is 
> verbose
> and occasionally vague, suffering from the ills common to informal
> specifications.  And it leaves a lot out.
>
> BUT it is the only standard we have, and the Smalltalk community is not
> so large that building artificial barriers is a good idea.
>
> The ANSI Smalltalk standard says that "_" is a character that is usable
> in identifiers.  I think this is a Good Thing and I would like Squeak 
> to
> support it.  The idea of putting space between words to make text much
> more readable was discovered about 500 years ago.  It's time Squeak 
> caught up.

I wonder what vendors pushed this bad idea. I think that this is really 
not a problem
and none of the current Smalltalk have method written that way and this 
would be a bad idea to start
this is been different for the sake of it. I prefer to see Squeak 
evolving for real change

>
> (Oddly enough, the S programming language used both _ and <- for 
> assignment.
> The 1.8.0 release of R has finally killed off the _ spelling of <- .)
>
> HOWEVER, while I think it is important that ANSI-compatible identifiers
> with underscores should be supported by Squeak, I *don't* want to lose
> the assignment arrow.  If Unicode didn't exist, I'd be suggesting that 
> we
> add the four arrows at code positions 31 (left arrow) 30 (up arrow)
> 29 (down arrow) and 28 (right arrow) -- it's no coincidence that Ctrl-_
> is 31 and Ctrl-^ is 30) -- to all the Squeak fonts.  Since Unicode does
> exist, U+2190 will do perfectly.  I want ALL THREE things supported:
> 	_	in identifiers (ANSI-compatible)
> 	:=	assignmentOperator (ANSI-compatible)
> 	U+2190	assignmentOperator (historic, readable)
>
Why not I would like to have the most compatible with the rest of the 
world
solution + consistency after I can use _ or := this is not the problem.

> To quote the ANSI standard:
>   "An implementation may define characters in addition to those listed
>    below in each character category.  While the meaning of a program
>    that uses any such characters is well defined it may not be
>    portable between conforming implementations."
> and:
>   "Three types of operator tokens are defined in Smalltalk: binary
>    selectors, the return operator, and the assignment operator.  ...
>    An implementation may define additional binaryCharacters but their
>    use may result in a non-portable program."
>
> This really isn't what I wanted to hear.  It appears that while you
> _are_ allowed to define new digits, new letters, and new binary
> characters, you _aren't_ allowed to define new comment delimiters,
> new assignment operators, or new return operators.  However, I don't
> think that needs to stop us.  We can give non-conformant meanings to
> non-ANSI-Smalltalk characters as long as we don't expect code using
> those characters to port to other Smalltalks.  Using left arrow is
> quite harmless, because when you ask for code to be saved out in a
> portable way (which .cs and .st files are not) left arrows not in a
> string or comment can be automatically converted to :=.

Yes this is what I do usually but my process requires pretty printing 
which is never the way you would like
to have it.

>
> Note, by the way, that ":=" is incompatible with mathematical usage.
> In mathematics "x := y" doesn't mean "change x to have y as value",
> it means "define x to be y all the time".
>
> Similarly, if we are to render (X)HTML and other files correctly,
> we want ^ to display as ^.  And the ANSI Smalltalk standard says that
> ^ is the return operator.  But it doesn't say that the up arrow isn't
> _also_ a return operator, and that's what I want:
>
> 	^	returnOperator (ANSI-compatible) displays as ^
> 	U+2191	returnOperator (historic, readable)
>
> While not strictly kosher according to ANSI, it's less of a syntactic
> extension than curly brace notation for arrays.  When you ask for code
> to be saved out in a portable way, up arrows not in a string or comment
> can be automatically converted to ^ .

Yes I do not that and I agree with  the {}

>
> With this approach, we can translate old Squeak sources to Unicode
> (basically use the MacRoman -> Unicode mapping given in
> MAPPINGS/VENDORS/APPLE/ROMAN.TXT modified to map ^ and _ to the
> appropriate arrows (I have a modified ROMAN.TXT called SQUEAK.TXT that 
> I
> can post if anyone would be interested).
>
> 	- Third my goal is that Smalltalk or any new
> 	Smalltalk-based-better system grows and get more programmers.
>
> Good.  We share that goal.  This is one reason why ANSI compatibility
> is important.  This is one reason why a book should use the ANSI 
> symbols.
>
> 	Now if I want to attract programmers of other languages:  I do
> 	not know any mainstream language (I'm certainly wrong here again
> 	I'm not expert but just an experienced promotor of Smalltalk)
> 	that does not have an ASCII basis
>
> I guess you haven't looked at C, C++, or Java lately.  Java, C++, and
> C99 allow any defined non-formatting Unicode character in (some) 
> strings
> and in comments,

I'm learning.
But I was talking about strings, I talked about code I saw that you 
have U"jkjlj" in python.
the x _ 2. y _ y + 1.


> nd while not being exactly compatible with the Unicode
> rules for identifiers they allow *most* of the Unicode identifier 
> characters
> in identifiers.  For transport purposes these languages provide
> 	\uxxxx		16-bit Unicode character U+xxxx
> 	\U00xxxxxx	21-bit Unicode character U+xxxxxx
> for use when a character is not expressible in a particular encoding.
> Quoting a draft of the C++ standard,
>     [Translation phase 1]
> 	Physical source file characters are mapped, in an implementation-
> 	defined manner, to the source character set (introducing new-line
> 	characters for end-of-line indicators) if necessary.  Trigraph
> 	sequences [...] are replaced by corresponding single-character
> 	internal representations.  Any source file character not in the
> 	basic source character set [...] is replaced by the universal-
> 	character-name that designates that character.

This is interesting.
Do I interpret that correctly that they have a kind of standard source 
character set?
>
> 	<item>
> 	Physical source file characters are mapped, in an implementation-
> 	defined manner, to the source character set (introducing new-line
> 	characters for end-of-line indicators) if necessary.  Trigraph
> 	sequences [ref] are replaced by corresponding single-character
> 	internal representations.  Any source file character not in the
> 	basic source character set [ref] is replaced by the universal-
> 	character-name that designates that character.
> 	<footnote>
> 	The process of handling extended characters is specified in terms
> 	of mapping to an encoding that uses only the basic source character
> 	set, and, in the case of character literals and strings, further
> 	mapping to the execution character set.  In practical terms, however,
> 	any internal encoding may be used, so long as an actual extended
> 	character encountered in the input, and the same extended character
> 	expressed in the input as a universal-character-name (i.e.
> 	using the [\u and \U] notation), are handled equivalently.
> 	</footnote>
> 	</item>
> I suppose you could argue that having \u00e9 as a transport/internal
> representation for &eacute; means that C99, C++, and Java still have
> "an ASCII basis", but the intent is not that people should write
> \u0039gal as an identifier, but rather that they should write *gal.
>
> 	meaning for me:  that I can type with vi (that I hate), emacs or
> 	**any** text editor.
>
> Thanks to Squeak's insistence on using Ctrl-M as line terminator,
> I can't use "any" text editor on Squeak sources right now.  Try using
> vi or ex or edit or ed or even view on a .st or .cs file.  Even using
> Emacs, what you see is a sea of ^Ms.

I know that feature of Squeak.

> 	
> Limit yourself to what you have use in "**any** text editor", and you
> can forget about using any accented letters anywhere.

Yes. I know

>
> Sam, Yudit, and MULE exist.  And I am NOT saying that nobody should be
> allowed to use ":=" for assignment and "^" for returning, quite the
> contrary.  What I am saying is that nobody should be FORCED to use 
> them.
>
> 	So sure we can be the one that does not have the problems you
> 	mention but right now the only thing I see is inconsistencies
> 	everywhere.  So may be I'm just too pragmatic.
> 	
> No, someone who's too pragmatic would put up with inconsistency.
>
> 	So as I say I'm wrong but I feel like in a museum sometimes
>
> Insisting on being able to use any old text editor certainly seems 
> that way.
> But then, I write HTML in an ASCII editor, and still get to use &larr;
> and &uarr; ...

Ok thanks for your time. I appreciate a lot that information.
So now what do we do? What is the way to go to improve the situation?
Would you have the time to propose a step by step process or sketch
that people could agree on or modify? You convince me that a solution is
possible.

Stef


>
>



More information about the Squeak-dev mailing list