3.9 Oddities

Rich Warren rwmlist at gmail.com
Tue Sep 5 09:42:04 UTC 2006


On Sep 4, 2006, at 9:12 PM, stéphane ducasse wrote:

>>>
>>> The "odd boxes" are bad characters that before were displayed as  
>>> character space.
>>> And it helps showing when you are copying and pasting code from  
>>> web page and other foreign source.
>>> Often I could not find the bugs and rewrite all the code that the  
>>> students copied from pdf for example.
>>
>> They seem so prevalent. I though they might be a difference in EOL  
>> symbols between Win/Mac/Unix. Does squeak use a standard end of  
>> line character? Or does it vary based on OS. Is there a way you  
>> can adjust this setting, or automatically convert documents?
>>
>> More to the point, why do they show up in 3.9 but not in 3.8?
>
> because before there was no glyph for them.

I'm sorry, but this really doesn't feel like a satisfactory answer.

I did some digging. It turns out the problem is line feeds. In the  
good sample I looked at, Squeak was using CR (hex 0D) to represent  
the end of a line. In the annoying box example, it used CR LF (hex 0D  
0A).

Now, if I remember correctly, the first is the standard ASCII newline  
for Unix. The second is the standard ASCII newline for DOS/Windows.  
They're both standard EOL markers on their respective platforms (both  
platforms Squeak supports).

Here's my point. As a cross-platform editor, Squeak must be able to  
handle these transparently. Either it needs to automatically  
normalize everything to a single Squeak-standard newline, or it needs  
to accept both of these (and others, old Mac os used a third newline  
variant, and there may be some I'm not familiar with) in a reasonable  
and transparent way.

I should be able to open any ascii text file (regardless of which OS  
it was written on) and it should appear as it was intended-- 
regardless of the particular newline encoding.

Displaying the box glyphs is possibly good in some cases (for  
example, non-ascii codes that may inadvertently get copied from  
pdfs). A better solution would be to strip out any invalid characters  
automatically. After all, if they're invalid, they can't be doing  
anything constructive.

But displaying the box glyphs for standard ASCII DOS/WIN newlines  
feels like a big step backwards.

-Rich-


More information about the Squeak-dev mailing list