[squeak-dev] [Q] Regex seems do not works as should be

Levente Uzonyi leves at elte.hu
Mon Aug 8 15:52:47 UTC 2011


On Mon, 8 Aug 2011, Edgar J. De Cleene wrote:

> I trying to do some HTML parsing using examples on
> http://www.regular-expressions.info/examples.html, but gives me errors.
> Also, using the single <.*> and
> Transcript clear.
> self regex: '<.+>' matchesDo: [: ea| Transcript show: ea ;cr. self halt]
> Here self is a string and ea in Transcript  show long lines and not the
> short ones Bbedit shows as Grep.
>
> Any clues ? Best examples i should look at ?

Seems like + is greedy with VBRegex. Since . matches the > character, it's 
okay that multiple html tags will be returned as a single match. You can 
use this pattern to achieve what you want: '<[^>]+>', but be aware that 
you won't be able to validate or build a tree from a html document with 
just regular expressions. If that's your goal, then it's better if you use 
an html parser instead.


Levente

>
> Edgar
>



More information about the Squeak-dev mailing list