So, I've started dorking with the LWZ threading algo (http://www.jwz.org/doc/threading.html) and something somewhat unrelated smacked me on the head: msgIDs suck.
For the uninitiated, Celeste assigns an unique id to each mail message it stores and uses these ids for a good chunk of it's functionality. E.g., Categories are basically just named collections of these ids. Alas, these ids are unique only to the particular Celeste. From MailDB>>nextUnusedID:
"Each message needs to have a unique ID. Message ID's are a monotonically increasing integers roughly related to the time that they were requested. The last ID used is kept in lastIssuedMsgID, to guard against reuse (e.g. if the clock changes)."
Thus, unless I'm confused or mistaken, I cannot share a categorization of a number of messages with someone else without sharing the particular message file (or subset thereof).
That kinda sucks!
I'm not quite sure what to do about this. Ideally, the Message-ID header would be ubiquitous and correct...but I suspect it gets garbled. I suppose some sort of heuristic thingy that preserved this as much as possible yadda yadda yadda checksums yadda yadda mappings yadda yadda yadda.
Just wondering if anyone else felt this and if so (or even if not) what are folks thoughts on solving this.
Cheers, Bijan Parsia.
Bijan Parsia bparsia@email.unc.edu is widely believed to have written:
So, I've started dorking with the LWZ threading algo (http://www.jwz.org/doc/threading.html) and something somewhat unrelated smacked me on the head: msgIDs suck.
I'm pretty sure somebody else mentioned this a few weeks ago; in particular how it got in the way of 'leave on server' capabilities.
I'm particularly interested in seeing that work properly since I rely on it when I'm at the other house and using Communicator to fetch mail on my PB instead of MessengerPro on my Acorn. I like to use 'leave on server' so that when return to main base I can get all the messages on my main machine, but I don't want to refetch all the messages every time I connect via the Mac. If Celeste could cope with this I could avoid having to suffer Communicator.
Wouldn't the message's own ID be usable? For example, Bihan's message has an ID of Pine.A41.4.21L1.0110301238390.15548-100000@login9.isis.unc.edu.
tim
On Tue, 30 Oct 2001, Tim Rowledge wrote:
[snip]
Wouldn't the message's own ID be usable? For example, Bihan's message
^^^^^ hmmm :)
has an ID of Pine.A41.4.21L1.0110301238390.15548-100000@login9.isis.unc.edu.
Yes, but Message-ID is an optional-field. So you're not promised that it'll be there.
I don't know how much of a problem this is these days, but it's worth noting.
And while they're *supposed* to be unique...well, who knows :)
But they're a good start, espeically if we can figure something out for IDless ones and have a good policy on resolving misleading situations.
Cheers, Bijan Parsia.
What's wrong with using an MD5 hash of the entire message? While collisions are possible, they're very unlikely.
-- Duane
----- Original Message ----- From: "Bijan Parsia" bparsia@email.unc.edu To: squeak-dev@lists.squeakfoundation.org Sent: Tuesday, October 30, 2001 11:18 AM Subject: Re: [Celeste] msgIDs and sharing categories
On Tue, 30 Oct 2001, Tim Rowledge wrote:
[snip]
Wouldn't the message's own ID be usable? For example, Bihan's message
^^^^^ hmmm :)
has an ID of Pine.A41.4.21L1.0110301238390.15548-100000@login9.isis.unc.edu.
Yes, but Message-ID is an optional-field. So you're not promised that it'll be there.
I don't know how much of a problem this is these days, but it's worth noting.
And while they're *supposed* to be unique...well, who knows :)
But they're a good start, espeically if we can figure something out for IDless ones and have a good policy on resolving misleading situations.
Cheers, Bijan Parsia.
"Duane Maxwell" dmaxwell@san.rr.com is widely believed to have written:
What's wrong with using an MD5 hash of the entire message? While collisions are possible, they're very unlikely.
Urk, how long would that take? What about one of those _really_ long messages were some dipstick quotes an entire 100kb digest, adds a 250kb code 'snippet' and dear ol'M$ s/w adds it all again in html plus one of those '.vcf' thingies? Would hashing the message's header be plausible? Is there any identifier that the pop/imap server one gets the message from can be persuaded to provide?
tim
Tim Rowledge wrote:
Duane Maxwell wrote:
What's wrong with using an MD5 hash of the entire message? While
collisions
are possible, they're very unlikely.
Urk, how long would that take? What about one of those _really_ long messages were some dipstick quotes an entire 100kb digest, adds a 250kb code 'snippet' and dear ol'M$ s/w adds it all again in html plus one of those '.vcf' thingies?
Well, just in case the Squeak MD5 code were too slow, somebody (who shall remain nameless) wrote a Slang version for an MD5 plugin, so it wouldn't really take all that long. The computation would likely be small compared to the time to download the message in the first place.
Would hashing the message's header be plausible?
Maybe.
Is there any identifier that the pop/imap server one gets the message from can be persuaded to provide?
AFAIK, there's nothing one can rely on.
-- Duane
squeak-dev@lists.squeakfoundation.org