<br><font size=2 face="sans-serif">HtmlTokenizer helps here. Here's
a bit of code I added to String class to give you an idea of how to use
it.</font>
<br>
<br><font size=2 face="sans-serif">tagsOfType: aString</font>
<br><font size=2 face="sans-serif"> "return
all tags found in self of type aString"</font>
<br><font size=2 face="sans-serif"> </font>
<br><font size=2 face="sans-serif"> |
endTag |</font>
<br><font size=2 face="sans-serif"> endTag
:= '</' , aString , '>'.</font>
<br><font size=2 face="sans-serif"> ^
((HtmlTokenizer on: self) upToEnd </font>
<br><font size=2 face="sans-serif">
select: [ :ea | ea name = aString])</font>
<br><font size=2 face="sans-serif">
reject: [ :ea | ea source = endTag]</font>
<br>
<br>
<br>
<br><font size=2 face="sans-serif">Here's another example that is slightly
richer (and probably could be improved but what the heck).</font>
<br>
<br><font size=2 face="sans-serif">textOfType: aString</font>
<br><font size=2 face="sans-serif"> "return
a collection of triples of all tags found in self of type aString with
start tag, intermediate text if any, and end tag if any"</font>
<br><font size=2 face="sans-serif"> </font>
<br><font size=2 face="sans-serif"> |
stream element endTag triple answer |</font>
<br><font size=2 face="sans-serif"> endTag
:= '</' , aString , '>'.</font>
<br><font size=2 face="sans-serif"> answer
:= OrderedCollection new.</font>
<br><font size=2 face="sans-serif"> stream
:= ReadStream on: ((HtmlTokenizer on: self) upToEnd).</font>
<br><font size=2 face="sans-serif"> [stream
atEnd] whileFalse: [</font>
<br><font size=2 face="sans-serif">
(element := stream next) name = aString ifTrue:
[ "start tag found"</font>
<br><font size=2 face="sans-serif">
triple
:= Array new: 3.</font>
<br><font size=2 face="sans-serif">
triple
at: 1 put: element.</font>
<br><font size=2 face="sans-serif">
stream
peek class = HtmlText ifTrue: [</font>
<br><font size=2 face="sans-serif">
triple at: 2 put: stream next.</font>
<br><font size=2 face="sans-serif">
stream peek source = endTag ifTrue: [</font>
<br><font size=2 face="sans-serif">
triple
at: 3 put: stream next</font>
<br><font size=2 face="sans-serif">
]</font>
<br><font size=2 face="sans-serif">
].</font>
<br><font size=2 face="sans-serif">
answer
add: triple</font>
<br><font size=2 face="sans-serif">
]</font>
<br><font size=2 face="sans-serif">
].</font>
<br><font size=2 face="sans-serif"> ^
answer</font>
<br>
<br>
<br>
<br>
<table width=100%>
<tr valign=top>
<td width=40%><font size=1 face="sans-serif"><b>Louis LaBrunda <Lou@Keystone-Software.com></b>
</font>
<br><font size=1 face="sans-serif">Sent by: squeak-dev-bounces@lists.squeakfoundation.org</font>
<p><font size=1 face="sans-serif">06/16/08 11:57 AM</font>
<table border>
<tr valign=top>
<td bgcolor=white>
<div align=center><font size=1 face="sans-serif">Please respond to<br>
Lou@Keystone-Software.com; Please respond to<br>
The general-purpose Squeak developers list <squeak-dev@lists.squeakfoundation.org></font></div></table>
<br>
<td width=59%>
<table width=100%>
<tr valign=top>
<td>
<div align=right><font size=1 face="sans-serif">To</font></div>
<td><font size=1 face="sans-serif">squeak-dev@lists.squeakfoundation.org</font>
<tr valign=top>
<td>
<div align=right><font size=1 face="sans-serif">cc</font></div>
<td>
<tr valign=top>
<td>
<div align=right><font size=1 face="sans-serif">Subject</font></div>
<td><font size=1 face="sans-serif">[squeak-dev] Re: Extracting data from
web pages using Squeak</font></table>
<br>
<table>
<tr valign=top>
<td>
<td></table>
<br></table>
<br>
<br>
<br><tt><font size=2>Hi Cédrick,<br>
<br>
Thanks for the hint.<br>
<br>
>I would use:<br>
>HTTPClient httpGet: 'http://url.com' to get the html stream.<br>
>Then you can parse it...<br>
<br>
Are there parsers available to get say table data into some kind of collection?<br>
<br>
Lou<br>
-----------------------------------------------------------<br>
Louis LaBrunda<br>
Keystone Software Corp.<br>
SkypeMe callto://PhotonDemon<br>
mailto:Lou@Keystone-Software.com http://www.Keystone-Software.com<br>
<br>
<br>
</font></tt>
<br>