I've got a PhD student working on Information Retrieval. He has an IR engine which he's pretty pleased with, and I thought it would be neat to drop the Squeak mailing list archives into it and see what happens.
Question 1: what are the authoritative sites for the Squeak mailing list archives? With a Google search, I've found what look like three.
Question 2: can I just FTP the lot with perhaps a small script rather than ruining my wrists with clicky-clicky?
Question 3: roughly how large _are_ the archives? I've estimated 2-3kB per message from a sample, 10-40 messages per day, call it 1/2 to 3 1/2 MB per month. So my very rough estimate is 40 to 300MB, and the student (who has worked in the industry) calls 300 MB "small".
Question 4: would there be any interest in having the mailing list archives accessible through an Information Retrieval engine?