[squeak-dev] The Trunk: Multilingual-mt.261.mcz

commits at source.squeak.org commits at source.squeak.org
Sat Jan 29 09:59:30 UTC 2022


Marcel Taeumel uploaded a new version of Multilingual to project The Trunk:
http://source.squeak.org/trunk/Multilingual-mt.261.mcz

==================== Summary ====================

Name: Multilingual-mt.261
Author: mt
Time: 29 January 2022, 10:59:27.832024 am
UUID: c87a7c62-86fe-204d-b0cf-12e891df09e2
Ancestors: Multilingual-mt.260

Fixes file-in/out (.cs .st) regression. Always use UTF8 but allow for trying MacRoman on InvalidUTF8 error. Still write out that UTF8 BOM to make new file-outs again compatible with older images' file-in code.

Since 2018, we are filing-out UTF8 only - assuming that nobody played around with Locale and LanguageEnvironment because the defaults for MultiByteFileStream are derived from there. So, let's hope that "TextConverter defaultSystemConverter" answered an UTF8 converter for all (or most) file-outs since 2018. Same goes for our .sources and .changes file, which now default to UTF8 explicitely, not by accident. Regardless of any Locale's LanguageEnvironment's #systemConverterClass, which is actually useful for non-code text-file access.

Note that if there are any .cs or .st files out there with an encoding other than MacRoman or UTF8, there are ways to re-encode them before filing them in. Yet, I don't think that it's an actual issue at this point. Maybe Latin1, but this was never used as a file-out encoding.

This commit is a coherent part of a slightly larger Locale cleanup. It's almost ready. Then we can update those translations for the next release. :-)

=============== Diff against Multilingual-mt.260 ===============

Item was changed:
  ----- Method: MultiByteBinaryOrTextStream>>fileOutClass:andObject: (in category 'fileIn/Out') -----
  fileOutClass: extraClass andObject: theObject 
+ 
+ 	self nextPutUTF8BOM.
- 	UTF8TextConverter writeBOMOn: self.
  	^ super fileOutClass: extraClass andObject: theObject!

Item was added:
+ ----- Method: MultiByteBinaryOrTextStream>>nextPutUTF8BOM (in category 'fileIn/Out') -----
+ nextPutUTF8BOM
+ 	"Backwards compatibility. BOM was used in older file-outs to switch from mac-roman to utf-8 on file-in. Nowadays we always expect utf-8 on file-in and fall back only on InvalidUTF8 exception."
+ 	
+ 	self binary.
+ 	UTF8TextConverter writeBOMOn: self.
+ 	self text.!

Item was changed:
  ----- Method: MultiByteBinaryOrTextStream>>setConverterForCode (in category 'fileIn/Out') -----
  setConverterForCode
+ 	"Always use and expect UTF-8 encoding for source code so that we can share those files. Any existing BOM should be skipped automatically when decoding content. See #writeBOMOn: and #decodeByteString:."
  
+ 	self converter: UTF8TextConverter new.!
- 	| current |
- 	current := converter saveStateOf: self.
- 	self position: 0.
- 	self binary.
- 	((self next: 3) = #[ 16rEF 16rBB 16rBF ]) ifTrue: [
- 		self converter: UTF8TextConverter new
- 	] ifFalse: [
- 		self converter: MacRomanTextConverter new.
- 	].
- 	converter restoreStateOf: self with: current.
- 	self text.
- !

Item was added:
+ ----- Method: MultiByteBinaryOrTextStream>>setConverterForOldCode (in category 'fileIn/Out') -----
+ setConverterForOldCode
+ 
+ 	self converter: MacRomanTextConverter new.!

Item was changed:
  ----- Method: MultiByteBinaryOrTextStream>>setEncoderForSourceCodeNamed: (in category 'fileIn/Out') -----
  setEncoderForSourceCodeNamed: streamName
  
+ 	self deprecated.
+ 	self setConverterForCode.!
- 	| l |
- 	l := streamName asLowercase.
- "	((l endsWith: FileStream multiCs) or: [
- 		(l endsWith: FileStream multiSt) or: [
- 			(l endsWith: (FileStream multiSt, '.gz')) or: [
- 				(l endsWith: (FileStream multiCs, '.gz'))]]]) ifTrue: [
- 					self converter: UTF8TextConverter new.
- 					^ self.
- 	].
- "
- 	((l endsWith: FileStream cs) or: [
- 		(l endsWith: FileStream st) or: [
- 			(l endsWith: (FileStream st, '.gz')) or: [
- 				(l endsWith: (FileStream cs, '.gz'))]]]) ifTrue: [
- 					self converter: MacRomanTextConverter new.
- 					^ self.
- 	].
- 
- 	self converter: UTF8TextConverter new.
- !

Item was changed:
  ----- Method: MultiByteFileStream>>fileOutClass:andObject: (in category 'fileIn/Out') -----
  fileOutClass: extraClass andObject: theObject 
+ 
+ 	self nextPutUTF8BOM.
- 	self binary.
- 	UTF8TextConverter writeBOMOn: self.
- 	self text.
  	^ super fileOutClass: extraClass andObject: theObject!

Item was added:
+ ----- Method: MultiByteFileStream>>isSourceFile (in category 'private') -----
+ isSourceFile
+ 
+ 	self flag: #InvalidUTF8. "Ignore source files. Those MUST BE valid utf-8. Broken chunks cannot be fixed by changing converters in the middle of file reading."
+ 	(SourceFiles at: 1) ifNotNil: [self fullName = (SourceFiles at: 1) fullName ifTrue: [^ true]].
+ 	(SourceFiles at: 2) ifNotNil: [self fullName = (SourceFiles at: 2) fullName ifTrue: [^ true]].
+ 
+ 	^ false!

Item was added:
+ ----- Method: MultiByteFileStream>>nextPutUTF8BOM (in category 'fileIn/Out') -----
+ nextPutUTF8BOM
+ 	"Backwards compatibility. BOM was used in older file-outs to switch from mac-roman to utf-8 on file-in. Nowadays we always expect utf-8 on file-in and fall back only on InvalidUTF8 exception."
+ 
+ 	self binary.
+ 	UTF8TextConverter writeBOMOn: self.
+ 	self text.!

Item was changed:
+ ----- Method: MultiByteFileStream>>setConverterForCode (in category 'fileIn/Out') -----
- ----- Method: MultiByteFileStream>>setConverterForCode (in category 'private') -----
  setConverterForCode
+ 	"Always use and expect UTF-8 encoding for source code so that we can share those files. Any existing BOM should be skipped automatically when decoding content. See #writeBOMOn: and #decodeByteString:."
+ 	
+ 	self converter: UTF8TextConverter new.!
- 
- 	| currentPosition |
- 	(SourceFiles at: 2)
- 		ifNotNil: [self fullName = (SourceFiles at: 2) fullName ifTrue: [^ self]].
- 	currentPosition := self position.
- 	self position: 0.
- 	self binary.
- 	((self next: 3) = #[ 16rEF 16rBB 16rBF ]) ifTrue: [
- 		self converter: UTF8TextConverter new
- 	] ifFalse: [
- 		self converter: MacRomanTextConverter new.
- 	].
- 	self position: currentPosition.
- 	self text.
- !

Item was added:
+ ----- Method: MultiByteFileStream>>setConverterForOldCode (in category 'fileIn/Out') -----
+ setConverterForOldCode
+ 
+ 	self isSourceFile ifTrue: [^ self].
+ 	self converter: MacRomanTextConverter new.!



More information about the Squeak-dev mailing list