From lenglish5@cox.net Mon Aug 1 08:03:12 2011 From: Lawson English To: beginners@lists.squeakfoundation.org Subject: [Newbies] VB-Regex issue... Date: Mon, 01 Aug 2011 01:02:58 -0700 Message-ID: <4E365DB2.3000906@cox.net> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="===============2230166470403773078==" --===============2230166470403773078== Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit anyone familiar with VB-Regex package? this code should be returning an ordered collection of 25 hits (I thought). Instead it returns one long string of all 25 hits: http://pastebin.com/eGcX6vg2 source string: http://pastebin.com/AkyQrXGD I've been playing with this one for an several hours. I can't tell if the strings are too complicated or if I'm just using the wrong syntax, though the simple example works just fine. '\w+' asRegex matchesIn: 'Now is the Time' => an OrderedCollection('Now' 'is' 'the' 'Time') Thanks. Lawson --===============2230166470403773078==-- From hebbarp@gmail.com Mon Aug 1 10:14:58 2011 From: Prashanth Hebbar To: beginners@lists.squeakfoundation.org Subject: Re: [Newbies] VB-Regex issue... Date: Mon, 01 Aug 2011 15:44:58 +0530 Message-ID: In-Reply-To: <4E365DB2.3000906@cox.net> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="===============5258449882388849504==" --===============5258449882388849504== Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit On Mon, Aug 1, 2011 at 1:32 PM, Lawson English wrote: > anyone familiar with VB-Regex package? > this code should be returning an ordered collection of 25 hits (I thought). > Instead it returns one long string of all 25 hits: > http://pastebin.com/eGcX6vg2 > source string: http://pastebin.com/AkyQrXGD > I've been playing with this one for an several hours. I can't tell if the > strings are too complicated or if I'm just using the wrong syntax, though > the simple example works just fine. > > '\w+' asRegex matchesIn: 'Now is the Time' => an OrderedCollection('Now' > 'is' 'the' 'Time') > > > Perhaps it was the string size which has multiple quotes and is not all escaped. Sean DeNigris writes about an interesting trick to preserve all the quotation marks inside long strings, especially html-strings. See this post from Sean for this trick http://seandenigris.com/blog/?p=647. This code returns the OrderedCollection as expected by you. source := htmltext678 contents. aString := '(.*).'. matcher := RxMatcher forString: aString. matcher matchesIn: source. "Transcript show: (matcher matchesIn: source); cr." The htmltext678 is the TextMorph where I stored your html page and extracted the contents to preserve all inline quotes. I took a shorter match-string (aString). One thing i noticed in the referred code that there was an ordered collection being created which wasn't doing anything. Regards, -- Prashanth Hebbar --===============5258449882388849504== Content-Type: text/html Content-Transfer-Encoding: base64 Content-Disposition: attachment; filename="attachment.html" MIME-Version: 1.0 PGJyPjxicj48ZGl2IGNsYXNzPSJnbWFpbF9xdW90ZSI+T24gTW9uLCBBdWcgMSwgMjAxMSBhdCAx OjMyIFBNLCBMYXdzb24gRW5nbGlzaCA8c3BhbiBkaXI9Imx0ciI+Jmx0OzxhIGhyZWY9Im1haWx0 bzpsZW5nbGlzaDVAY294Lm5ldCI+bGVuZ2xpc2g1QGNveC5uZXQ8L2E+Jmd0Ozwvc3Bhbj4gd3Jv dGU6PGJyPjxibG9ja3F1b3RlIGNsYXNzPSJnbWFpbF9xdW90ZSIgc3R5bGU9Im1hcmdpbjowIDAg MCAuOGV4O2JvcmRlci1sZWZ0OjFweCAjY2NjIHNvbGlkO3BhZGRpbmctbGVmdDoxZXg7Ij4KYW55 b25lIGZhbWlsaWFyIHdpdGggVkItUmVnZXggcGFja2FnZT88YnI+CnRoaXMgY29kZSBzaG91bGQg YmUgcmV0dXJuaW5nIGFuIG9yZGVyZWQgY29sbGVjdGlvbiBvZiAyNSBoaXRzIChJIHRob3VnaHQp LiBJbnN0ZWFkIGl0IHJldHVybnMgb25lIGxvbmcgc3RyaW5nIG9mIGFsbCAyNSBoaXRzOiCgPGEg aHJlZj0iaHR0cDovL3Bhc3RlYmluLmNvbS9lR2NYNnZnMiIgdGFyZ2V0PSJfYmxhbmsiPmh0dHA6 Ly9wYXN0ZWJpbi5jb20vZUdjWDZ2ZzI8L2E+PGJyPgpzb3VyY2Ugc3RyaW5nOiA8YSBocmVmPSJo dHRwOi8vcGFzdGViaW4uY29tL0FreVFyWEdEIiB0YXJnZXQ9Il9ibGFuayI+aHR0cDovL3Bhc3Rl YmluLmNvbS9Ba3lRclhHRDwvYT48YnI+CkkmIzM5O3ZlIGJlZW4gcGxheWluZyB3aXRoIHRoaXMg b25lIGZvciBhbiBzZXZlcmFsIGhvdXJzLiCgSSBjYW4mIzM5O3QgdGVsbCBpZiB0aGUgc3RyaW5n cyBhcmUgdG9vIGNvbXBsaWNhdGVkIG9yIGlmIEkmIzM5O20ganVzdCB1c2luZyB0aGUgd3Jvbmcg c3ludGF4LCB0aG91Z2ggdGhlIHNpbXBsZSBleGFtcGxlIHdvcmtzIGp1c3QgZmluZS48YnI+Cjxi cj4KJiMzOTtcdysmIzM5OyBhc1JlZ2V4IG1hdGNoZXNJbjogJiMzOTtOb3cgaXMgdGhlIFRpbWUm IzM5OyA9Jmd0OyBhbiBPcmRlcmVkQ29sbGVjdGlvbigmIzM5O05vdyYjMzk7ICYjMzk7aXMmIzM5 OyAmIzM5O3RoZSYjMzk7ICYjMzk7VGltZSYjMzk7KTxicj4KPGJyPjxicj48L2Jsb2NrcXVvdGU+ PGRpdj6gPC9kaXY+PC9kaXY+UGVyaGFwcyBpdCB3YXMgdGhlIHN0cmluZyBzaXplIHdoaWNoIGhh cyBtdWx0aXBsZSBxdW90ZXMgYW5kIGlzIG5vdCBhbGwgZXNjYXBlZC4gU2VhbiBEZU5pZ3JpcyB3 cml0ZXMgYWJvdXQgYW4gaW50ZXJlc3RpbmcgdHJpY2sgdG8gcHJlc2VydmUgYWxsIHRoZSBxdW90 YXRpb24gbWFya3MgaW5zaWRlIGxvbmcgc3RyaW5ncywgZXNwZWNpYWxseSBodG1sLXN0cmluZ3Mu oFNlZSB0aGlzIHBvc3QgZnJvbSBTZWFuIGZvciB0aGlzIHRyaWNroDxhIGhyZWY9Imh0dHA6Ly9z ZWFuZGVuaWdyaXMuY29tL2Jsb2cvP3A9NjQ3Ij5odHRwOi8vc2VhbmRlbmlncmlzLmNvbS9ibG9n Lz9wPTY0NzwvYT4uPGJyPgqgPGJsb2NrcXVvdGUgY2xhc3M9IndlYmtpdC1pbmRlbnQtYmxvY2tx dW90ZSIgc3R5bGU9Im1hcmdpbjogMCAwIDAgNDBweDsgYm9yZGVyOiBub25lOyBwYWRkaW5nOiAw cHg7Ij48ZGl2PjxkaXY+PGJyPjwvZGl2PjwvZGl2PjwvYmxvY2txdW90ZT5UaGlzIGNvZGUgcmV0 dXJucyB0aGUgT3JkZXJlZENvbGxlY3Rpb24gYXMgZXhwZWN0ZWQgYnkgeW91Ljxicj48YmxvY2tx dW90ZSBjbGFzcz0id2Via2l0LWluZGVudC1ibG9ja3F1b3RlIiBzdHlsZT0ibWFyZ2luOiAwIDAg MCA0MHB4OyBib3JkZXI6IG5vbmU7IHBhZGRpbmc6IDBweDsiPgo8ZGl2PjxkaXY+PGJyPjwvZGl2 PjxkaXY+c291cmNlIDo9IGh0bWx0ZXh0Njc4IGNvbnRlbnRzLjwvZGl2PjwvZGl2PjxkaXY+PGRp dj5hU3RyaW5nIDo9ICYjMzk7Jmx0O2EgaHJlZj0mcXVvdDtiaWxsaW9uYWlyZXMwOF8oLiopaHRt bCZxdW90OyZndDsoLiopLiZsdDsvYSZndDsmbHQ7L3RkJmd0OyYjMzk7LjwvZGl2PjwvZGl2Pjxk aXY+PGRpdj5tYXRjaGVyIDo9IFJ4TWF0Y2hlciBmb3JTdHJpbmc6IGFTdHJpbmcuPC9kaXY+Cjxk aXY+bWF0Y2hlciBtYXRjaGVzSW46IHNvdXJjZS48c3BhbiBjbGFzcz0iQXBwbGUtdGFiLXNwYW4i IHN0eWxlPSJ3aGl0ZS1zcGFjZTpwcmUiPgkJCQkJCQkJCTwvc3Bhbj48L2Rpdj48L2Rpdj48ZGl2 PjxkaXY+JnF1b3Q7VHJhbnNjcmlwdCBzaG93OiAobWF0Y2hlciBtYXRjaGVzSW46IHNvdXJjZSk7 IGNyLiZxdW90OzwvZGl2PjwvZGl2PjwvYmxvY2txdW90ZT48ZGl2Pjxicj48L2Rpdj4KPGRpdj48 YnI+PC9kaXY+PGRpdj5UaGUgaHRtbHRleHQ2NzggaXMgdGhlIFRleHRNb3JwaCB3aGVyZSBJIHN0 b3JlZCB5b3VyIGh0bWwgcGFnZSBhbmQgZXh0cmFjdGVkIHRoZSBjb250ZW50cyB0byBwcmVzZXJ2 ZSBhbGwgaW5saW5lIHF1b3Rlcy6gSSB0b29rIGEgc2hvcnRlciBtYXRjaC1zdHJpbmcgKGFTdHJp bmcpLqA8L2Rpdj48ZGl2Pjxicj48L2Rpdj48ZGl2Pk9uZSB0aGluZyBpIG5vdGljZWQgaW4gdGhl IHJlZmVycmVkIGNvZGUgdGhhdCB0aGVyZSB3YXMgYW4gb3JkZXJlZCBjb2xsZWN0aW9uIGJlaW5n IGNyZWF0ZWQgd2hpY2ggd2FzbiYjMzk7dCBkb2luZyBhbnl0aGluZy48L2Rpdj4KPGRpdj48YnI+ PC9kaXY+PGRpdj5SZWdhcmRzLDxicj4tLSA8YnI+UHJhc2hhbnRoIEhlYmJhcjxicj48YnI+Cjwv ZGl2Pgo= --===============5258449882388849504==-- From lenglish5@cox.net Mon Aug 1 13:04:54 2011 From: Lawson English To: beginners@lists.squeakfoundation.org Subject: Re: [Newbies] VB-Regex issue... Date: Mon, 01 Aug 2011 06:04:50 -0700 Message-ID: <4E36A472.6060307@cox.net> In-Reply-To: MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="===============2878035105622167275==" --===============2878035105622167275== Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit On 8/1/11 3:14 AM, Prashanth Hebbar wrote: > See this post from Sean for this trick > http://seandenigris.com/blog/?p=647. Thanks for the tip. I'll use it from now on, just in case. However, it turns out that my ancient eyes were missing a few extra symbols in the html, and once I more carefully set up my regex, I started getting hits. Lawson --===============2878035105622167275==--