[squeak-dev] [Q] Regex seems do not works as should be
leves at elte.hu
Mon Aug 8 15:52:47 UTC 2011
On Mon, 8 Aug 2011, Edgar J. De Cleene wrote:
> I trying to do some HTML parsing using examples on
> http://www.regular-expressions.info/examples.html, but gives me errors.
> Also, using the single <.*> and
> Transcript clear.
> self regex: '<.+>' matchesDo: [: ea| Transcript show: ea ;cr. self halt]
> Here self is a string and ea in Transcript show long lines and not the
> short ones Bbedit shows as Grep.
> Any clues ? Best examples i should look at ?
Seems like + is greedy with VBRegex. Since . matches the > character, it's
okay that multiple html tags will be returned as a single match. You can
use this pattern to achieve what you want: '<[^>]+>', but be aware that
you won't be able to validate or build a tree from a html document with
just regular expressions. If that's your goal, then it's better if you use
an html parser instead.
More information about the Squeak-dev