[squeak-dev] MultiByteFileStream upToAll: strange bug

David T. Lewis lewis at mail.msen.com
Sun Jan 21 16:05:59 UTC 2018


I added Bob's fix to trunk.

Dave


On Sun, Jan 21, 2018 at 07:01:37AM -0500, Bob Arning wrote:
> The culprit is
> 
> 'From Squeak6.0alpha of 9 September 2017 [latest update: #17382] on 21 
> January 2018 at 6:58:58 am'!
> 
> !MultiByteFileStream methodsFor: 'accessing' stamp: 'raa 1/21/2018 06:57'!
> upToPosition: anInteger
> ??? "Answer a subcollection containing items starting from the current 
> position and ending including the given position. Usefully different to 
> #next: in that positions measure *bytes* from the file, where #next: 
> wants to measure *characters*."
> ??? ^self collectionSpecies new: 1000 streamContents: [ :stream |
> ??? ??? | ch |
> ??? ??? [ (ch := self next) == nil or: [ self position > anInteger ] ]
> ??? ??? ??? whileFalse: [ stream nextPut: ch ] ]! !
> 
> which was referencing the instVar <position> directly. Changing that to 
> "self position" allows it to stop at the right place.
> 
> 
> On 1/20/18 4:24 PM, Bernhard Pieber wrote:
> >Hi everyone,
> >
> >I think I found a really strange bug in MultiByteFileStream. I am on macOS 
> >Sierra and used the latest VM from bintray and an updated trunk image.
> >
> >I try to parse anchors from a UTF-8 encoded HTML file (see attachment). It 
> >uses a MultiByteFileStream with a UTF8TextConverter.
> >
> >Here is the code that shows the bug:
> >
> >FileStream readOnlyFileNamed: 'test.html' do: [:stream |
> >	| result |
> >	result := OrderedCollection new.
> >	[stream atEnd] whileFalse: [
> >		stream match: '<A HREF="'.
> >		result add: (stream upToAll: '</A>')].
> >	result at: 13
> >].
> >
> >It answers the following string:
> >'https://www.europa.de/produkte/lebensversicherung">Darlehen sichern: 
> >Variable Risiko-Lebensversicherung</A>
> >				<DT><A 
> >				HREF="http://orf.at/stories/2358210/2358209/">Banken im Zinsdilemma</A>
> >			</DL><p>
> >		</DL><p>
> >	</DL><p>
> ></HTML>
> >'
> >
> >You can see that it did not stop at the </A> as it should have but answers 
> >the rest of the file. The strange thing is that the next anchor looks like 
> >this:
> >'http://orf.at/stories/2358210/2358209/">Banken im Zinsdilemma</A>
> >			</DL><p>
> >		</DL><p>
> >	</DL><p>
> ></HTML>
> >'
> >So it read part of the file again.
> >
> >I tried making the file smaller but the bug goes away then.
> >
> >As a cross check when I read the whole file at once it parses correctly.
> >
> >FileStream readOnlyFileNamed: 'test.html' do: [:fileStream |
> >	| stream result |
> >	stream := fileStream contentsOfEntireFile readStream.
> >	result := OrderedCollection new.
> >	[stream atEnd] whileFalse: [
> >		stream match: '<A HREF="'.
> >		result add: (stream upToAll: '</A>')].
> >	result at: 13
> >].
> >
> >Any ideas anyone?
> >
> >Bernhard
> >
> >
> >
> 

> 



More information about the Squeak-dev mailing list