GPL and Squeak and Stuff [CSOTD]

Fri Nov 9 06:03:13 UTC 2001

> <COTD>
> "Really bizarre way of removing tags in a String... Just print it."
> (('a <tagged>String</tagged> you say?' findTokens: '<>' keep: '<')
> 	inject: ''->true into: [:x :s |
> 		(s = '<')
> 			ifTrue:[x key->false]
> 			ifFalse:[x value
> 						ifTrue:[x key,s->true]
> 						ifFalse:[x key->true]]]) key
> </COTD>
> 

By the way, almost any parser based on findTokens: will have mistakes. 
Please don't do this if it's code that counts!  The problem is that findTokens:
can rarely grip the problem properly -- it can take a nice big
whack at it, but it leaves behind a lot of little pieces that are hard to clean up.
In this case, extra #> characters are being allowed and ignored.

	'a <tagged>>>>>>>>>>String</tagged>>>>> you say?'

The case is the same with regular expressions on Unix.  While
tools that use grep can be very handy, they often have holes.
And Murphey has assured us that the holes will come up at
the worst possible time, which is some time other than development
time.

Another take on this problem is that strings are extremely poor for data
structures, anyway.  It's much better to use objects, e.g.:

    ParagraphEntity contents: {
		StringEntity string: 'a '.
		TagEntity named: 'tagged' contents: {
			StringEntity string: 'String' }.
		StringEntity string: ' you say?' }

If there is no parsing, then there are no parse errors.  :)

-Lex