I've got a PhD student working on Information Retrieval. He has an IR engine which he's pretty pleased with, and I thought it would be neat to drop the Squeak mailing list archives into it and see what happens.
Question 1: what are the authoritative sites for the Squeak mailing list archives? With a Google search, I've found what look like three.
Question 2: can I just FTP the lot with perhaps a small script rather than ruining my wrists with clicky-clicky?
Question 3: roughly how large _are_ the archives? I've estimated 2-3kB per message from a sample, 10-40 messages per day, call it 1/2 to 3 1/2 MB per month. So my very rough estimate is 40 to 300MB, and the student (who has worked in the industry) calls 300 MB "small".
Question 4: would there be any interest in having the mailing list archives accessible through an Information Retrieval engine?
Richard,
Perhaps this link
http://macos.tuwien.ac.at:9009/Server.home
could be of interest to you: joining forces? Follow the link at the bottom.
-- Christian
Christian Eitner Sub-Omnipotent/SoftwareEngineer
Zitiere "Richard A. O'Keefe" ok@cs.otago.ac.nz:
I've got a PhD student working on Information Retrieval. He has an IR engine which he's pretty pleased with, and I thought it would be neat to drop the Squeak mailing list archives into it and see what happens.
Question 1: what are the authoritative sites for the Squeak mailing list archives? With a Google search, I've found what look like three.
Question 2: can I just FTP the lot with perhaps a small script rather than ruining my wrists with clicky-clicky?
Question 3: roughly how large _are_ the archives? I've estimated 2-3kB per message from a sample, 10-40 messages per day, call it 1/2 to 3 1/2 MB per month. So my very rough estimate is 40 to 300MB, and the student (who has worked in the industry) calls 300 MB "small".
Question 4: would there be any interest in having the mailing list archives accessible through an Information Retrieval engine?
I would think the easiest, best resource would be the host of this mailing list. The archives seem to go back to July of 2001 (when the hosting changed) and are available as a monthly zipped mailbox or you can get the entire 68mb Mailbox. The archives are kept current.
http://lists.squeakfoundation.org/pipermail/squeak-dev/
For any of the older archives maybe someone at UIUC? would have a copy they could make available.
Jimmie Houchin
se99011@fhs-hagenberg.ac.at wrote:
Richard,
Perhaps this link
http://macos.tuwien.ac.at:9009/Server.home
could be of interest to you: joining forces? Follow the link at the bottom.
-- Christian
Christian Eitner Sub-Omnipotent/SoftwareEngineer
Zitiere "Richard A. O'Keefe" ok@cs.otago.ac.nz:
I've got a PhD student working on Information Retrieval. He has an IR engine which he's pretty pleased with, and I thought it would be neat to drop the Squeak mailing list archives into it and see what happens.
Question 1: what are the authoritative sites for the Squeak mailing list archives? With a Google search, I've found what look like three.
Question 2: can I just FTP the lot with perhaps a small script rather than ruining my wrists with clicky-clicky?
Question 3: roughly how large _are_ the archives? I've estimated 2-3kB per message from a sample, 10-40 messages per day, call it 1/2 to 3 1/2 MB per month. So my very rough estimate is 40 to 300MB, and the student (who has worked in the industry) calls 300 MB "small".
Question 4: would there be any interest in having the mailing list archives accessible through an Information Retrieval engine?
If you haven't already seen it, the Mailing List Archives FAQ page on the Swiki covers some of these topics:
http://minnow.cc.gatech.edu/squeak/775
It looks like there are five archives of varying completeness... the Yahoo archives appear to go back the furthest, back to November 1998.
- Doug Way Detroit, MI
"Richard A. O'Keefe" wrote:
I've got a PhD student working on Information Retrieval. He has an IR engine which he's pretty pleased with, and I thought it would be neat to drop the Squeak mailing list archives into it and see what happens.
Question 1: what are the authoritative sites for the Squeak mailing list archives? With a Google search, I've found what look like three.
Question 2: can I just FTP the lot with perhaps a small script rather than ruining my wrists with clicky-clicky?
Question 3: roughly how large _are_ the archives? I've estimated 2-3kB per message from a sample, 10-40 messages per day, call it 1/2 to 3 1/2 MB per month. So my very rough estimate is 40 to 300MB, and the student (who has worked in the industry) calls 300 MB "small".
Question 4: would there be any interest in having the mailing list archives accessible through an Information Retrieval engine?
"Richard A. O'Keefe" wrote:
Question 4: would there be any interest in having the mailing list archives accessible through an Information Retrieval engine?
This would be cool. Lot's of interesting bit's of information are stored in this mailing list and a fast way to use this would be valueable.
Karl
At 12:06 Uhr +1200 10.07.2002, Richard A. O'Keefe wrote:
Question 3: roughly how large _are_ the archives? I've estimated 2-3kB per message from a sample, 10-40 messages per day, call it 1/2 to 3 1/2 MB per month. So my very rough estimate is 40 to 300MB, and the student (who has worked in the industry) calls 300 MB "small".
Currently about 40 MB per year.
Georg
Richard A. O'Keefe wrote:
Question 3: roughly how large _are_ the archives? I've estimated 2-3kB per message from a sample, 10-40 messages per day, call it 1/2 to 3 1/2 MB per month. So my very rough estimate is 40 to 300MB, and the student (who has worked in the industry) calls 300 MB "small".
He,s just being polite. He actually suppressed the urge to call it 'tiny' ;-)
Question 4: would there be any interest in having the mailing list archives accessible through an Information Retrieval engine?
Not only the list archives, but all relevant sites as well. In fact, you're beating me to it: I have been toying with the idea to set up a proof-of-concept 'topic-specific' search engine with a focus on Squeak, where the on-line archives, Squeak-specific websites, etcetera would all be collected.
What's the difference between an IR engine and a Internet search engine?
squeak-dev@lists.squeakfoundation.org