[Vm-dev] Help: weird bug in inspecting characters

Ron Teitelbaum ron at usmedrec.com
Mon Jul 2 18:42:50 UTC 2018


Hi Tobias,

Interesting!  I think 128 is the Euro symbol in extended ASCII on MS
Windows.

https://www.petefreitag.com/cheatsheets/ascii-codes/

Thanks for the explanation!  It makes sense that it would have been a hack
for code printing.

All the best,

Ron Teitelbaum

On Mon, Jul 2, 2018 at 2:26 PM, Tobias Pape <Das.Linux at gmx.de> wrote:

> Hi all
>
> On 02.07.2018, at 19:29, Ron Teitelbaum <ron at usmedrec.com> wrote:
>
> That is strange.
>
> On Squeak 4.1
>
> $^ charCode -> 94
> 94 asCharacter -> $^
> 128 asCharacter -> $€ charCode -> 128  (doesn't show properly in text on
> email but does in squeak, see image).
>
> <Capture.PNG>
>
> In other words, if I use my keyboard to type it in, it seems to be
> represented fine and it evaluates to charCode 94 as expected.
>
> But something created with 128 charCode also is represented with the same
> symbol and it also retains it's 128 charCode as you can see with you send
> charCode to the string representation that was created.
>
> If this was filed out it would seem that either version could have been
> used in the code and you wouldn't notice it.  Manually changing it by
> typing in ^ and fileing it out again will probably fix it.  An external
> editor changing 128 to 94 chars will also probably work.
>
> All the best,
>
>
> Maybe I can shed a bit light on things here.
>
> If you look at the attached image (which is one of the default fonts we
> use), you see that ^ and _ are present after [\] but ALSO after {|}~. This
> seems to be intentional so that you, should you want, can switch betwen
> caret/underscore and up-arrow/left-arrow printing for return/assignment and
> here's how it's done:
>
> StrikeFont>>useLeftArrow
> self characterToGlyphMap.
> characterToGlyphMap at: 96 put: 95.
> characterToGlyphMap at: 95 put: 94
>
> and
>
> StrikeFont>>useUnderscore
> self characterToGlyphMap.
> characterToGlyphMap at: 96 put: 129.
> characterToGlyphMap at: 95 put: 128
>
>
> There's the 128.
>
> What happens here, too, is that 128 is no proper character to begin with.
>
> Our characters represent unicode codepoints, which, for ByteStrings,
> happen to _exactly_ match the ISO 8859-1 (Latin1) encoding. (In fact, that
> was a design decision for Unicode to begins with; does NOT hold for UTF-8
> tho).
>
> In both, Unicode and ISO 8859-1, certain "character codes" are not
> actually characters. The control characters (<32) are intentionally
> undefined, as are codes between 128 and 159 (includes 128). However, 8859-1
> was often combined with Ansi escape codes (aka ISO 6429), which defines the
> codes from 128 as Control Block C1, which Unicode subsequently adopted.
>
> Long story short, Characters between 128 and 159 are inherently
> non-printable. Either they control output or format output, but cannot in
> themselves be displayed. The StrikeFonts utilize that and use those code
> points in fonts to relocate caret, underscore, left-arrow and right-arrow
> so that they can serve as substitutes when you don't want ^ _ in code but
> rather arrows.
>
> =-=-=-=-=
>
> That being said, I just saw that the fileList forces MacRoman encoding
> (deprecated since MacOS X 10.0 in 2001....) which _has_ a proper meaning
> for 128, namely Ä. However, the respective method probably needs an
> overhaul:
>
> FileList>>defaultEncoderFor: aFileName
>
> "This method just illustrates the stupidest possible implementation of
> encoder selection."
> | l |
> l := aFileName asLowercase.
> " ((l endsWith: FileStream multiCs) or: [
> l endsWith: FileStream multiSt]) ifTrue: [
> ^ UTF8TextConverter new.
> ].
> "
> ((l endsWith: FileStream cs) or: [
> l endsWith: FileStream st]) ifTrue: [
> ^ MacRomanTextConverter new.
> ].
>
> ^ Latin1TextConverter new.
>
> =-=-=-=-=-=
>
> Indeed, the file x.cs contains an 128 at the indicated position. Which is
> in the middle of a binary SmartRefStream-dump. Maybe we must change the
> fileIn logic to make the stream binary when it encounters a smartrefstream?
> that would certainly help.
>
> Best regards
> -Tobias
>
>
>
>
>
> Ron Teitelbaum
>
>
> On Mon, Jul 2, 2018 at 12:23 PM, Eliot Miranda <eliot.miranda at gmail.com
> > wrote:
>
> Hi Subbu,
>
> > On Jul 2, 2018, at 7:24 AM, K K Subbu <kksubbu.ml at gmail.com> wrote:
> >
> > Hi,
> >
> > I need help in tracing a bug (see attached picture) which triggered a
> MNU while trying to view a .cs file in FileTool. I traced the problem to
> peek on aStream returning a nil because a wrong character code was being
> returned in generated.
> >
> > In the attached picture, aStream isBinary is false and the basicNext
> returns the correct $^ character which gets stored in character1 local var.
> But an inspector displays it as Character 128. In the same inspector window
> $^ shows the correct character code as 94.
> >
> > This is on Squeak5.2alpha-64b-Linux-18127. What is happening here?
>
> No idea.  Do you have a test case?
>
> > Has anyone seen this type of behavior before?
> >
> >
> > Thanks in advance .. Subbu
> > <strangeCharBug.png>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.squeakfoundation.org/pipermail/vm-dev/attachments/20180702/ede1f1a1/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: dejavu_new.png
Type: image/png
Size: 10528 bytes
Desc: not available
URL: <http://lists.squeakfoundation.org/pipermail/vm-dev/attachments/20180702/ede1f1a1/attachment-0001.png>


More information about the Vm-dev mailing list