[Vm-dev] Help: weird bug in inspecting characters

Ron Teitelbaum ron at usmedrec.com
Mon Jul 2 19:02:12 UTC 2018


Hi Tobias,

Yeah, I saw that it was an MS Windows only thing.

It makes your head spin!!

:) All the best,

Ron

On Mon, Jul 2, 2018 at 2:57 PM, Tobias Pape <Das.Linux at gmx.de> wrote:

> Hi Ron
>
> > On 02.07.2018, at 20:42, Ron Teitelbaum <Ron at USMedRec.com> wrote:
> >
> > Hi Tobias,
> >
> > Interesting!  I think 128 is the Euro symbol in extended ASCII on MS
> Windows.
>
> The Euro is 128 in Windows Codepage 1252, true. However, there's not one
> "true" extended Ascii, but everything that is 8-bit and includes ascii is
> somehow extended Ascii. To quote Wikipedia[https://en.
> wikipedia.org/wiki/Extended_ASCII]:
>
>         "There are many extended ASCII encodings (more than 220 DOS and
> Windows codepages)."
>
> For example in Latin-9 (ISO 8895-15), Euro is at 164.
> In IBM CodePage 850 (or, 858, to be precise), Euro replaced dotless i at
> 213.
> Mac Roman replaced the generic currency sign and put Euro at 219,
> (all "extended ascii")
>
> To complete the list, Unicode assigned CodePoint U+20AC for Euro.
>
> It's hard to get your Euro's worth… :D
>
>
> Best regards
>         -Tobias
>
> >
> > https://www.petefreitag.com/cheatsheets/ascii-codes/
> >
> > Thanks for the explanation!  It makes sense that it would have been a
> hack for code printing.
> >
> > All the best,
> >
> > Ron Teitelbaum
> >
> > On Mon, Jul 2, 2018 at 2:26 PM, Tobias Pape <Das.Linux at gmx.de> wrote:
> > Hi all
> >
> >> On 02.07.2018, at 19:29, Ron Teitelbaum <ron at usmedrec.com> wrote:
> >>
> >> That is strange.
> >>
> >> On Squeak 4.1
> >>
> >> $^ charCode -> 94
> >> 94 asCharacter -> $^
> >> 128 asCharacter -> $€ charCode -> 128  (doesn't show properly in text
> on email but does in squeak, see image).
> >>
> >> <Capture.PNG>
> >>
> >> In other words, if I use my keyboard to type it in, it seems to be
> represented fine and it evaluates to charCode 94 as expected.
> >>
> >> But something created with 128 charCode also is represented with the
> same symbol and it also retains it's 128 charCode as you can see with you
> send charCode to the string representation that was created.
> >>
> >> If this was filed out it would seem that either version could have been
> used in the code and you wouldn't notice it.  Manually changing it by
> typing in ^ and fileing it out again will probably fix it.  An external
> editor changing 128 to 94 chars will also probably work.
> >>
> >> All the best,
> >
> > Maybe I can shed a bit light on things here.
> >
> > If you look at the attached image (which is one of the default fonts we
> use), you see that ^ and _ are present after [\] but ALSO after {|}~. This
> seems to be intentional so that you, should you want, can switch betwen
> caret/underscore and up-arrow/left-arrow printing for return/assignment and
> here's how it's done:
> >
> > StrikeFont>>useLeftArrow
> >       self characterToGlyphMap.
> >       characterToGlyphMap at: 96 put: 95.
> >       characterToGlyphMap at: 95 put: 94
> >
> > and
> >
> > StrikeFont>>useUnderscore
> >       self characterToGlyphMap.
> >       characterToGlyphMap at: 96 put: 129.
> >       characterToGlyphMap at: 95 put: 128
> >
> >
> > There's the 128.
> >
> > What happens here, too, is that 128 is no proper character to begin with.
> >
> > Our characters represent unicode codepoints, which, for ByteStrings,
> happen to _exactly_ match the ISO 8859-1 (Latin1) encoding. (In fact, that
> was a design decision for Unicode to begins with; does NOT hold for UTF-8
> tho).
> >
> > In both, Unicode and ISO 8859-1, certain "character codes" are not
> actually characters. The control characters (<32) are intentionally
> undefined, as are codes between 128 and 159 (includes 128). However, 8859-1
> was often combined with Ansi escape codes (aka ISO 6429), which defines the
> codes from 128 as Control Block C1, which Unicode subsequently adopted.
> >
> > Long story short, Characters between 128 and 159 are inherently
> non-printable. Either they control output or format output, but cannot in
> themselves be displayed. The StrikeFonts utilize that and use those code
> points in fonts to relocate caret, underscore, left-arrow and right-arrow
> so that they can serve as substitutes when you don't want ^ _ in code but
> rather arrows.
> >
> > =-=-=-=-=
> >
> > That being said, I just saw that the fileList forces MacRoman encoding
> (deprecated since MacOS X 10.0 in 2001....) which _has_ a proper meaning
> for 128, namely Ä. However, the respective method probably needs an
> overhaul:
> >
> > FileList>>defaultEncoderFor: aFileName
> >
> >       "This method just illustrates the stupidest possible
> implementation of encoder selection."
> >       | l |
> >       l := aFileName asLowercase.
> > "     ((l endsWith: FileStream multiCs) or: [
> >               l endsWith: FileStream multiSt]) ifTrue: [
> >               ^ UTF8TextConverter new.
> >       ].
> > "
> >       ((l endsWith: FileStream cs) or: [
> >               l endsWith: FileStream st]) ifTrue: [
> >               ^ MacRomanTextConverter new.
> >       ].
> >
> >       ^ Latin1TextConverter new.
> >
> > =-=-=-=-=-=
> >
> > Indeed, the file x.cs contains an 128 at the indicated position. Which
> is in the middle of a binary SmartRefStream-dump. Maybe we must change the
> fileIn logic to make the stream binary when it encounters a smartrefstream?
> that would certainly help.
> >
> > Best regards
> >       -Tobias
> >
> >
> >
> > <dejavu_new.png>
> >>
> >> Ron Teitelbaum
> >>
> >>
> >> On Mon, Jul 2, 2018 at 12:23 PM, Eliot Miranda <eliot.miranda at gmail.com>
> wrote:
> >>
> >> Hi Subbu,
> >>
> >> > On Jul 2, 2018, at 7:24 AM, K K Subbu <kksubbu.ml at gmail.com> wrote:
> >> >
> >> > Hi,
> >> >
> >> > I need help in tracing a bug (see attached picture) which triggered a
> MNU while trying to view a .cs file in FileTool. I traced the problem to
> peek on aStream returning a nil because a wrong character code was being
> returned in generated.
> >> >
> >> > In the attached picture, aStream isBinary is false and the basicNext
> returns the correct $^ character which gets stored in character1 local var.
> But an inspector displays it as Character 128. In the same inspector window
> $^ shows the correct character code as 94.
> >> >
> >> > This is on Squeak5.2alpha-64b-Linux-18127. What is happening here?
> >>
> >> No idea.  Do you have a test case?
> >>
> >> > Has anyone seen this type of behavior before?
> >> >
> >> >
> >> > Thanks in advance .. Subbu
> >> > <strangeCharBug.png>
> >>
> >
> >
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.squeakfoundation.org/pipermail/vm-dev/attachments/20180702/fb405dfa/attachment.html>


More information about the Vm-dev mailing list