[squeak-dev] Re: Extracting data from web pages using Squeak
Hwee-Boon Yar
hboon at motionobj.com
Tue Jun 17 04:44:45 UTC 2008
Or port BeautifulSoup :)
--
Hwee-Boon
On Tue, Jun 17, 2008 at 3:51 AM, John Richards <ajtr at us.ibm.com> wrote:
>
> HtmlTokenizer helps here. Here's a bit of code I added to String class to
> give you an idea of how to use it.
>
> tagsOfType: aString
> "return all tags found in self of type aString"
>
> | endTag |
> endTag := '</' , aString , '>'.
> ^ ((HtmlTokenizer on: self) upToEnd
> select: [ :ea | ea name = aString])
> reject: [ :ea | ea source = endTag]
>
>
>
> Here's another example that is slightly richer (and probably could be
> improved but what the heck).
>
> textOfType: aString
> "return a collection of triples of all tags found in self of type
> aString with start tag, intermediate text if any, and end tag if any"
>
> | stream element endTag triple answer |
> endTag := '</' , aString , '>'.
> answer := OrderedCollection new.
> stream := ReadStream on: ((HtmlTokenizer on: self) upToEnd).
> [stream atEnd] whileFalse: [
> (element := stream next) name = aString ifTrue: [ "start
> tag found"
> triple := Array new: 3.
> triple at: 1 put: element.
> stream peek class = HtmlText ifTrue: [
> triple at: 2 put: stream next.
> stream peek source = endTag ifTrue: [
> triple at: 3 put: stream next
> ]
> ].
> answer add: triple
> ]
> ].
> ^ answer
>
>
>
> Louis LaBrunda <Lou at Keystone-Software.com>
> Sent by: squeak-dev-bounces at lists.squeakfoundation.org
>
> 06/16/08 11:57 AM
>
> Please respond to
> Lou at Keystone-Software.com; Please respond to
> The general-purpose Squeak developers list
> <squeak-dev at lists.squeakfoundation.org>
> To
> squeak-dev at lists.squeakfoundation.org
> cc
> Subject
> [squeak-dev] Re: Extracting data from web pages using Squeak
>
>
>
>
> Hi Cédrick,
>
> Thanks for the hint.
>
>>I would use:
>>HTTPClient httpGet: 'http://url.com' to get the html stream.
>>Then you can parse it...
>
> Are there parsers available to get say table data into some kind of
> collection?
>
> Lou
> -----------------------------------------------------------
> Louis LaBrunda
> Keystone Software Corp.
> SkypeMe callto://PhotonDemon
> mailto:Lou at Keystone-Software.com http://www.Keystone-Software.com
>
>
>
>
>
>
>
--
Hwee-Boon
More information about the Squeak-dev
mailing list
|