Take a look at www.ogg.org for open-source codecs. They already have an audio spec which can be used by most current rippers and players, and they are working on a video spec (codenamed Tarkin).
Russell
-----Original Message----- From: squeak-dev-admin@lists.squeakfoundation.org [mailto:squeak-dev-admin@lists.squeakfoundation.org]On Behalf Of Jan Bottorff Sent: Saturday, 1 December 2001 11:23 PM To: squeak-dev@lists.squeakfoundation.org Subject: Re: Movie-JPEG and other video info
I saw something on the jpeg.org site that suggested that there was an ongoing attempt to create a standard for M-JPEG but, with several strong competing formats floating around, it might be many years before everyone complies with the new standard...
I view M-JPEG as a format at the end of it's life cycle. I doubt there will be any new standards for it. For a while, it was the technology balance point because reasonable priced chips to do compression/decompression were available. Now, you can get reasonable priced chips to do MPEG-I/II and DV formats, which are technically superior.
I'm not even sure that the individual frames of M-JPEG are compatible with our still-image JPEG plugin, I'm satisfied that creating a new file format for Squeak JPEG movies is appropriate.
The M-JPEG compressed frames in a AVI or QuickTime are NOT directly compatible with JPEG files. Some of the header tags are different, and M-JPEG's all use a common huffman table. The color space is a bit different too (range limited YCrCb instead of YUV).
I'd tend to agree that a unique Squeak movie format is the way to go (see below), with import/export. Some of the theory behind current formats would be very appropriate to reuse.
Jan, can you suggest a simple but widely supported import/export format? It has to be something we can encode from Squeak, of course. Maybe M-JPEG AVI or
QT?
Although there *are* open-source C libraries for doing MPEG encoding, it's my understanding that various patents apply to the MPEG encoding process, especially to MP3 sound encoding, so we would need to get permission from the patent holders to distribute such code with Squeak.
I see the QuickTime file format is documented at http://developer.apple.com/techpubs/quicktime/qtdevdocs/QTFF/qtff.html although don't know if there are any legal issues. Uncompressed frames would be the simplest codec format for inport/export, with PCM sound. The "Photo JPEG" codec might be very close or even identical to the common JPEG file format. Even though there's lots of complexity defined in QuickTime file format, I believe the required stuff is pretty simple. Much of the info in a QuickTime file can be ignored or not written.
I'd avoid MPEG patent dangers, they are quite serious about defending their territory.
I haven't looked at recent Squeak code, so ignore all this if it's already done... For a close match to existing Squeak architecture, you could store movies in a hierarchical binary object stream format. Basically, the output from multiple ReferenceStream's, with the chunks listed in a directory of file offset's. Each video frame/sound chunk could be a different object tree. ReferenceStream doesn't seem like it has any way to prioritize the ordering of objects written, so you would need to keep the subtree small. Conceptually you might just want to have ReferenceStream write a whole movie, with meta info objects, video frame objects, and sounds chunk objects. Realistically, you need to control the ordering of little subtrees, and also allow reading of subtrees for playback. See below for more comments on streaming.
To manipulate the movie, you load the objects that represent the movie meta data (which includes an OrderedCollection of frames chunk offsets), and then load image/sound/whatever chunks dynamically. ReferenceStream looks like it already knows how to read/write multiple object trees, all that's needed is some way to randomly access chunks in the file for editing, like a chunk directory at the end with the starting offset of each chunk.
The goal would be to allow playing the movie as a stream, from the beginning, without random seeks, but yet access to each frame without scanning the whole file for editing, so the ORDERING of object subtrees is really important, as is having appropriate random access chunk directories. Having some playback meta info at the beginning of the file, and then frame/sound chunks, with a frame directory object tree at the end might do the trick. MPEG files (even I-Frame only) are not so good for editing because they lack this random access directory. Being able to write an output movie file as a stream would be desirable too, so then you can stream a Squeak movie file from a live camera to a remote Squeak for live playback over a network (i.e. videoconferencing).
Getting the file format to work nice for random access editing, streaming playback (no seeks), and streaming writing (no seeks) is the magic of a good video file format. There also are multiple streams (at least sound and video) that you want interleaved into the master stream, so you don't have to buffer huge amounts of one stream to get the next chunk of another stream. The interleaving of streams is closely related to timing. You can't just say for every video frame you'll emit 1000 bytes of sound samples, as the sound may be compressed to variable size, so 1000 bytes can represent variable times. Really you need to interleave streams with the timebase for each stream in sync. So the file has the video frame for 0:42:12.10 next to the sound that gets played at 0:42:12.10. Playback software can buffer things some, to account for different latencies in the video playback from the sound playback, but if the two streams aren't timebase synced, the buffer requirements get bigger and bigger as the movie plays, and if the stream aren't timebase synced, you may have to start doing a disk seek for a sound chunk and a seek for the video chunk, this is very bad for performance and smooth playback. Streaming of the video file is also impossible if the video and sound interleave aren't timebase synced.
Collecting large amount of directory data to emit before groups of frames is undesirable too, as you would have to buffer variable sized frames from future file positions, which will introduce latency in live video processing. This suggest directory info should be after the frames (or at the end to prevent interrupt in the smooth flow of video/audio data). If you've been watching the CNN interviews with reporters on satellite videoconference phones, you'll notice a couple seconds between when the local newscaster asks a question and you start to hear the response, this is because of the video compression latency. Having the Squeak video format work well with videoconferencing would be real nice.
For editing, movie frame and sound chunks need to be loaded as needed from any point randomly, and probably cached for a while, and dumped when memory is getting full, maybe by the the GC (is there a LRU cache collection object that pays attention to memory consumption?). Generating megabytes of garbage objects per second might be hard on the GC, although the number of objects and references to follow might not be very big (sound samples and video frames could be non-pointer objects).
The issue is you can't keep the whole movie in memory, so need to have part of the object tree loaded and dumped as needed, which seems like a very similar problem to code modules or projects dynamically coming and going. Movies are unique in that streaming playback is desirable, for use over a network and also just to optimize disk throughput (a random disk seek per frame would be bad).
Offhand, it seems like some Smalltalk wizard could put this all together pretty quick. I'd work on it myself, but am currently furiously working on a paying device driver project, with a deadline real soon now.
- Jan