Celeste progress is gridlocked right now. In no particular order:
Message ID's are pretty reliable. The RFC says they "should" be present. Has anyone
MD5 is a risky approach, because inserting one random space character will invalidate the MD5 completely. MD5's by design have no notion of closeness.
Celeste can certainly access message-id's; you just have to read the message in and then look for the header. Something like:
(mailDB getMessage: id) fieldNamed: 'message-id'
The above is currently expensive, but, it could be much faster if the index file format were updated, so that message-id's could be saved. With an improved index file format, I don't think you'd care that Celeste is using its own message id's.
If the index file is updated, then it also becomes much easier to implement a "leave messages on server" so that it won't download messages multiple times.
There have been proposals for updating the index file format, but they are waiting on Filtering Celeste to settle down.
Filtering Celeste is fairly well settled, but it can't go into the image until LargeLists does. If LargeLists doesn't, then FilteringCeleste is impractical and needs some redesign.
There is dead silence regarding LargeLists, so I'm just a waiting.
Probably, people could actually go ahead and work on the index file format if they want to, since it's fairly independent from filtering celeste. But, this trend cannot go on too long -- eventually we need to get all our patches together, or Celeste will disintegrate in a mass of good intentions.
It's not inconceivable that modules will arrive before the next Celeste patch gets in the main image. At any rate, whenever Squeak has a good module system, we'll probably at least *try* to offload Celeste (and LargeLists, if necessary) into a separate system for a while.
Phew.
-Lex
squeak-dev@lists.squeakfoundation.org