<html>

  <head>

    <meta content="text/html; charset=utf-8" http-equiv="Content-Type">

  </head>

  <body bgcolor="#FFFFFF" text="#000000">

    <div class="moz-cite-prefix">Am 18.11.2016 um 14:39 schrieb Edgar De

      Cleene:<br>

    </div>

    <blockquote

      cite="mid:0DEC33D8-FB02-426D-8DD3-5081280341F9@gmail.com"

      type="cite">

      <pre wrap="">Folks:

I wish remove tags from HTMlL

According to <a class="moz-txt-link-freetext" href="https://regex101.com/">https://regex101.com/</a> and <a class="moz-txt-link-freetext" href="http://www.freeformatter.com/regex-tester.html">http://www.freeformatter.com/regex-tester.html</a> and also of my old Nissus Pro.

<.+?>

Should be a valid expression.

But 

 regex|

regex := RxMatcher forString: '<.+?>’.

Gives my an error.

Any help ?

Edgar

@morplenauta

</pre>

    </blockquote>

    <p>I was going to write this:</p>

    <blockquote>

      <p>The "+" already means "match one or more of the previous",

        where "previous" in this case is ".", which means "any

        character".</p>

      <p>The "?" means "match zero or one of the previous", but it

        cannot be cmobined with "+".</p>

    </blockquote>

    <p>But then I realized that "+?" is defined in regex syntax as

      "lazy" matching, i.e. it finds as few of the previous tokens as

      needed to to make the pattern match (in contrast, standard "+"

      matches greedily, so it consumes as much as possible while still

      matching the pattern).</p>

    <p>However, the Rx framework in Squeak is quite old and does not

      have these extensions. A pattern that should work would be

      "<[^>]+>" which matches an opening angle bracket, any

      characters that are not closing angle brackets, and finally the

      closing bracket.</p>

    <p>Be aware though that correctly stripping tags from HTML is not

      possible (or at least not trivial) with regex. For example, in

      your pattern, the "." would not match newlines, but tags can

      extend over multiple lines, so you would not be able to strip out

      a multiline tag. My pattern apparently works with newlines, too,

      but there are other cases that it does not handle (for example,

      see

<a class="moz-txt-link-freetext" href="http://stackoverflow.com/questions/94528/is-u003e-greater-than-sign-allowed-inside-an-html-element-attribute-value">http://stackoverflow.com/questions/94528/is-u003e-greater-than-sign-allowed-inside-an-html-element-attribute-value</a>).</p>

    <p>So unless you know that your input is going to be fairly regular,

      don't rely on regex to strip tags. Use a proper HTML/SGML/XML

      parser, they are designed to do it right.</p>

    <p>Cheers,</p>

    <p>Hans-Martin<br>

    </p>

  </body>

</html>