<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">

<html>

<head>

  <meta content="text/html;charset=ISO-8859-1" http-equiv="Content-Type">

</head>

<body bgcolor="#ffffff" text="#000000">

<font face="Helvetica, Arial, sans-serif">Thanks a lot.<br>

<br>

/Leandro<br>

</font><br>

Levente Uzonyi wrote:

<blockquote

 cite="mid:Pine.LNX.4.64.1002131706190.7251@login01.caesar.elte.hu"

 type="cite">On Sat, 13 Feb 2010, Leandro wrote:

  <br>

  <br>

  <blockquote type="cite">Levente,

    <br>

    <br>

Thank you for remarking this. Could you please suggest a reference

where I can learn more about the O(n+m) algorithm? (Note however that

our "trick" could also be useful with the fast algorithm because it

avoids comparing individual words of lines that are common).

    <br>

  </blockquote>

  <br>

TextDiffBuilder uses a modified version of the algorithm described

here:

  <br>

<a class="moz-txt-link-freetext" href="http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.4.6927">http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.4.6927</a>

  <br>

You can find our implementation in TextDiffBuilder &gt;&gt;

#lcsFor:and:.

  <br>

O(n+m) is only true for typical source code/text diffs, where the

number of differences between the two sequences is small.

TextDiffBuilder also uses an extra filtering which reduces n, m and the

number of differences by removing elements which occur in only one of

the sequences (this is common if the elements are limes).

  <br>

  <br>

Note that there are plenty of algorithms with various time/space

complexity for the lcs problem, even hierachical ones which may be

better for line+word diffs.

  <br>

  <br>

We chose this one, because it's widely used (current unix diff

implementations use the recursive version of this algorithm which has

guaranteed O(n+m) space complexity), relatively easy to implement and

works well with source code and text.

  <br>

  <br>

  <br>

Levente

  <br>

  <br>

  <blockquote type="cite"><br>

/Leandro

    <br>

    <br>

Levente Uzonyi wrote:

    <br>

    <blockquote type="cite">On Fri, 12 Feb 2010, Leandro wrote:

      <br>

      <br>

      <blockquote type="cite"><br>

Levente Uzonyi wrote:

        <br>

        <br>

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; You can also get the word based diff using TextDiffBuilder by

subclassing it and overriding #split:. Actually ClassDiffBuilder does

exactly

        <br>

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; this, so using ClassDiffBuilder instead of TextDiffBuilder will

give you the expected result.

        <br>

        <br>

        <br>

I think that the difference with the implementation Valeria and I did

is that ours switches from lines to words automatically. It also

defines "words"

        <br>

as Smalltalk words, meaning that it understands keyword limits, which

gives better results. The differentiator starts comparing lines, and if

the

        <br>

proportion of line differences isn't too high, then it reutilizes the

work already done (with lines) and refines it to get differences in

between

        <br>

keywords. The refining strategy was crucial for achieving good

performance.

        <br>

      </blockquote>

      <br>

If you subclass ClassDiffBuilder (let's call the subclass

KeywordDiffBuilder)

      <br>

and implement KeywordDiffBuilder &gt;&gt; #split: just like

DifferenceFinder &gt;&gt; #keywordsAndBlanksFrom: you get even better

results.

      <br>

      <br>

DifferenceFinder has to use such tricks/hacks to have good performance,

because it uses the simplest lcs algorithm which requires n*m time and

space in all cases, while for practical cases TextDiffBuilder requires

O(n+m) time and space.

      <br>

      <br>

      <br>

Levente

      <br>

      <br>

      <blockquote type="cite"><br>

Not sure if the full functionality I'm describing here is in the last

Cuis version though.

        <br>

        <br>

/Leandro

        <br>

        <br>

        <br>

      </blockquote>

      <br>

    </blockquote>

    <br>

    <br>

  </blockquote>

  <br>

</blockquote>

</body>

</html>