[squeak-dev] HtmlParser MNU: ByteString>>replaceHtmlCharRefs

Bob Arning arning315 at comcast.net
Sun Oct 22 17:46:36 UTC 2017


If you just want to replace it yourself, try this:


'From Squeak3.4alpha of ''11 November 2002'' [latest update: #5109] on 
16 November 2002 at 8:06:43 pm'
"Change Set:        ISO8859
Date:            15 November 2002
Author:            Boris Gaertner

Jean-Marie Zajac pointed out that accented characters in ISO-8859-1 
encoding are not displayed as expected. Scamper is not encoding-aware, 
but it translates ISO-8859-1 to the encoding that is used in Squeak. 
Unfortunately, due to a subtle bug the translation is done twice: first, 
the entire source is translated, later parsed entities are translated 
again. This change set drops the translation of parsed entites. To make 
it work, it adds the translation of character entity references 
(characters that are written in the form &#<integer>; or in the form 
&<character name>; see sections 5.3.1 and 5.3.2 of the HTML 4.0 
specification.)

Jean-Marie tested a first version and found a new bug, later he tested a 
second version that is seemingly ok. With his test he helped me to 
understand where the real problem was burried. Thanks a lot!

"
HtmlTextmethodsFor: 'private-initialization' stamp: 'BG 11/15/2002 21:40'
initialize:source0
     super initialize: source0.
     self text: source0replaceHtmlCharRefs.
StringmethodsFor: 'internet' stamp: 'BG 11/15/2002 21:18'
replaceHtmlCharRefs

| pos ampIndex scIndex special specialValue outString outPos newOutPos |

outString ← String new: self size.
outPos ← 0.

pos ← 1.

[ pos <= self size ] whileTrue: [
"read up to the next ampersand"
ampIndex ← self indexOf: $& startingAt: pos ifAbsent: [0].

ampIndex = 0 ifTrue: [
pos = 1 ifTrue: [ ↑self ] ifFalse: [ ampIndex ← self size+1 ] ].

newOutPos ← outPos + ampIndex - pos.
outString
replaceFrom: outPos + 1
to: newOutPos
with: self
startingAt: pos.
outPos ← newOutPos.
pos ← ampIndex.

ampIndex <= self size ifTrue: [
"find the $;"
scIndex ← self indexOf: $; startingAt: ampIndex ifAbsent: [ self size + 1 ].

special ← self copyFrom: ampIndex+1 to: scIndex-1.
specialValue ← HtmlEntity valueOfHtmlEntity: special.

specialValue
ifNil: [
"not a recognized entity. wite it back"
                                  scIndex > self size ifTrue: [ scIndex 
← self size ].

newOutPos ← outPos + scIndex - ampIndex + 1.
outString
replaceFrom: outPos+1
to: newOutPos
with: self
startingAt: ampIndex.
outPos ← newOutPos.]
ifNotNil: [
outPos ← outPos + 1.
outString at: outPos put: specialValue isoToSqueak.].

pos ← scIndex + 1. ]. ].


↑outString copyFrom: 1 to: outPos


On 10/22/17 1:05 PM, Bernhard Pieber wrote:
> Dear Squeakers,
>
> I tried to parse an HTML file like this in a trunk image and ran into a MNU:
> FileStream fileNamed: ’some.html’ do: [:stream | HtmlParser parse: stream]
>
> In HtmlText>>#initialize the message #replaceHtmlCharRefs is sent. I suppose this method was once the image. Otherwise HtmlParser would never have worked. How can I find out, when it got lost? How would you do it?
>
> Cheers,
> Bernhard
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.squeakfoundation.org/pipermail/squeak-dev/attachments/20171022/844672ac/attachment-0001.html>


More information about the Squeak-dev mailing list