[Challenge] large files smart compare (was: Re: Squeak for I/O and Memory Intensive tasks )

Yoel Jacobsen yoel at emet.co.il
Wed Jan 30 07:17:35 UTC 2002


This is not correct since I need to compare entry to entry not pair to 
pair. Sort will only damage the LDIF files.

    Yoel

danielv at netvision.net.il wrote:

>>0. Any good idea about how to make it practical for 450K entries (18M 
>>lines)? What should I  use for persistence?
>>
>
>Assuming that the entries have to be string equal to be equal and thus
>"not differences" and thus boring:
>1. Use some generic sorting utility like unix 'sort' to sort both
>inputs. They're pretty good at doing this for big files.
>2. Do something akin to a phase in merge sort - read both files in a
>synched manner. Any lines that match from both files, ignore. Any lines
>without matchers, keep. If you have many matchers, don't keep in memory,
>but write them to a file.
>
>This should be fast, and more useful than the python code.
>
>>    Thanks
>>            Yoel
>>
>
>Daniel
>
>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.squeakfoundation.org/pipermail/squeak-dev/attachments/20020130/89e5828b/attachment.htm


More information about the Squeak-dev mailing list