[squeak-dev] MultiByteFileStream upToAll: strange bug
Bob Arning
arning315 at comcast.net
Sun Jan 21 12:01:37 UTC 2018
The culprit is
'From Squeak6.0alpha of 9 September 2017 [latest update: #17382] on 21
January 2018 at 6:58:58 am'!
!MultiByteFileStream methodsFor: 'accessing' stamp: 'raa 1/21/2018 06:57'!
upToPosition: anInteger
"Answer a subcollection containing items starting from the current
position and ending including the given position. Usefully different to
#next: in that positions measure *bytes* from the file, where #next:
wants to measure *characters*."
^self collectionSpecies new: 1000 streamContents: [ :stream |
| ch |
[ (ch := self next) == nil or: [ self position > anInteger ] ]
whileFalse: [ stream nextPut: ch ] ]! !
which was referencing the instVar <position> directly. Changing that to
"self position" allows it to stop at the right place.
On 1/20/18 4:24 PM, Bernhard Pieber wrote:
> Hi everyone,
>
> I think I found a really strange bug in MultiByteFileStream. I am on macOS Sierra and used the latest VM from bintray and an updated trunk image.
>
> I try to parse anchors from a UTF-8 encoded HTML file (see attachment). It uses a MultiByteFileStream with a UTF8TextConverter.
>
> Here is the code that shows the bug:
>
> FileStream readOnlyFileNamed: 'test.html' do: [:stream |
> | result |
> result := OrderedCollection new.
> [stream atEnd] whileFalse: [
> stream match: '<A HREF="'.
> result add: (stream upToAll: '</A>')].
> result at: 13
> ].
>
> It answers the following string:
> 'https://www.europa.de/produkte/lebensversicherung">Darlehen sichern: Variable Risiko-Lebensversicherung</A>
> <DT><A HREF="http://orf.at/stories/2358210/2358209/">Banken im Zinsdilemma</A>
> </DL><p>
> </DL><p>
> </DL><p>
> </HTML>
> '
>
> You can see that it did not stop at the </A> as it should have but answers the rest of the file. The strange thing is that the next anchor looks like this:
> 'http://orf.at/stories/2358210/2358209/">Banken im Zinsdilemma</A>
> </DL><p>
> </DL><p>
> </DL><p>
> </HTML>
> '
> So it read part of the file again.
>
> I tried making the file smaller but the bug goes away then.
>
> As a cross check when I read the whole file at once it parses correctly.
>
> FileStream readOnlyFileNamed: 'test.html' do: [:fileStream |
> | stream result |
> stream := fileStream contentsOfEntireFile readStream.
> result := OrderedCollection new.
> [stream atEnd] whileFalse: [
> stream match: '<A HREF="'.
> result add: (stream upToAll: '</A>')].
> result at: 13
> ].
>
> Any ideas anyone?
>
> Bernhard
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.squeakfoundation.org/pipermail/squeak-dev/attachments/20180121/668abac4/attachment.html>
More information about the Squeak-dev
mailing list
|