[squeak-dev] The Trunk: Tools-mt.1118.mcz

Marcel Taeumel marcel.taeumel at hpi.de
Sat Jan 29 12:08:45 UTC 2022


Hmm... I wonder whether those diff issues in Monticello tools originate from an inconsistency between an .mcz's snapshot.bin and snapshot.st. The .st is explicitely written in utf-8. The .bin relies on whatever that DataStream-on-RWBinaryOrTextStream does ... if contents are written binary, there is no chance that the text converter can remove that #leadingChar from the characters in, e.g., class comments or method sources. I think.

Jakob? Maybe this one is for you? =D

MCPatchOperation >> #sourceText
MCPatchOperation >> #fromSource

MCPatchOperation >> #toSource

MCMczWriter >> #writeSnapshot:

A simple test is to revert the comment of ThirtyTwoBitRegister to "ul 4/13/2015" and then use the "history" of the "System" package to compare System-mt.1294 against <working copy>. Given that you have your locale set to 'ja', your #languageEnvironment will have #leadingChar 5 and then you see this:



Interestingly, #utf8ToSqueak in both #fromSource and #toSource solved this issue here. So, I am wondering whether the #leadingChar > 0 does only reveal the actual issue, which is an utf-8 encoding not being yet decoded to Squeak. However, one would assume that the snapshot.bin has actual Squeak instances of MCClassDefinition etc. in it, right? Hmm....

Best,
Marcel
Am 29.01.2022 12:27:05 schrieb commits at source.squeak.org <commits at source.squeak.org>:
Marcel Taeumel uploaded a new version of Tools to project The Trunk:
http://source.squeak.org/trunk/Tools-mt.1118.mcz

==================== Summary ====================

Name: Tools-mt.1118
Author: mt
Time: 29 January 2022, 12:26:50.298306 pm
UUID: 60dc6fe1-c374-f941-9ea2-51a514ffe136
Ancestors: Tools-mt.1117

Adds a simple way to browse method sources and class comments that either benefit from UTF8 or suffer from Squeak's #leadingChar encoding, i.e., code-point > 255.

non-ASCII method sources: 62
non-ASCII class comments: 7
leading-char method sources: 6
leading-char class comments: 0

See method commentary for more information.

=============== Diff against Tools-mt.1117 ===============

Item was added:
+ ----- Method: Unicode class>>browseClassCommentsWithLeadingCharEncoding (in category '*Tools-Browsing') -----
+ browseClassCommentsWithLeadingCharEncoding
+ "See commentary in browseMethodsWithLeadingCharEncoding."
+
+ ^ self systemNavigation
+ browseMessageList: (Array streamContents: [:s |
+ self systemNavigation
+ allClassesDo: [:cls |
+ (cls comment asString anySatisfy: [:each | each codePoint > 255])
+ ifTrue: [s nextPut: (MethodReference class: cls selector: #Comment)] ]])
+ name: 'Class comments affected by #leadingChar'!

Item was added:
+ ----- Method: Unicode class>>browseClassCommentsWithNonAsciiEncoding (in category '*Tools-Browsing') -----
+ browseClassCommentsWithNonAsciiEncoding
+ "See commentary in browseMethodsWithNonAsciiEncoding."
+
+ ^ self systemNavigation
+ browseMessageList: (Array streamContents: [:s |
+ self systemNavigation
+ allClassesDo: [:cls | cls comment asString isAsciiString
+ ifFalse: [s nextPut: (MethodReference class: cls selector: #Comment)] ]])
+ name: 'Class comments with non-ASCII contents'!

Item was added:
+ ----- Method: Unicode class>>browseMethodsWithLeadingCharEncoding (in category '*Tools-Browsing') -----
+ browseMethodsWithLeadingCharEncoding
+ "Browse a list of methods whose sources are affected by Squeak's #leadingChar magic, which can confuse tools such as TextDiff depending on your current encoding (i.e., EncodedCharSet or LanguageEnvironment). NOTE THAT if you want to change those methods, ensure that your current encoding uses a leadingChar of 0, e.g., locale en-US and Latin1Environment. See UTF8TextConverter class >> #decodeByteString: and Unicode class >> #value:."
+
+ ^ self systemNavigation
+ browseMessageList: (self systemNavigation
+ allMethodsSelect: [:method | method getSource asString
+ anySatisfy: [:each | each codePoint > 255]])
+ name: 'Methods sources affected by #leadingChar'!

Item was added:
+ ----- Method: Unicode class>>browseMethodsWithNonAsciiEncoding (in category '*Tools-Browsing') -----
+ browseMethodsWithNonAsciiEncoding
+ "Browse a list of methods whose sources benefit from UTF8 encoding."
+
+ ^ self systemNavigation
+ browseMessageList: (self systemNavigation
+ allMethodsSelect: [:method | method getSource asString isAsciiString not])
+ name: 'Methods with non-ASCII sources'!


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.squeakfoundation.org/pipermail/squeak-dev/attachments/20220129/5066b220/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image.png
Type: image/png
Size: 60566 bytes
Desc: not available
URL: <http://lists.squeakfoundation.org/pipermail/squeak-dev/attachments/20220129/5066b220/attachment.png>


More information about the Squeak-dev mailing list