Movie-JPEG and other video info

Sun Dec 2 00:17:01 UTC 2001

Take a look at www.ogg.org for open-source codecs. They already have an
audio spec which can be used by most current rippers and players, and they
are working on a video spec (codenamed Tarkin).

Russell

-----Original Message-----
From: squeak-dev-admin at lists.squeakfoundation.org
[mailto:squeak-dev-admin at lists.squeakfoundation.org]On Behalf Of Jan
Bottorff
Sent: Saturday, 1 December 2001 11:23 PM
To: squeak-dev at lists.squeakfoundation.org
Subject: Re: Movie-JPEG and other video info

>I saw something on the
>jpeg.org site that suggested that there was an ongoing attempt to create a
>standard
>for M-JPEG but, with several strong competing formats floating around, it
>might
>be many years before everyone complies with the new standard...

I view M-JPEG as a format at the end of it's life cycle. I doubt there will
be any new standards for it. For a while, it was the technology balance
point because reasonable priced chips to do compression/decompression were
available. Now, you can get reasonable priced chips to do MPEG-I/II and DV
formats, which are technically superior.

>I'm not even sure that the individual frames of M-JPEG are compatible
>with our still-image JPEG plugin, I'm satisfied that creating a new file
>format for
>Squeak JPEG movies is appropriate.

The M-JPEG compressed frames in a AVI or QuickTime are NOT directly
compatible with JPEG files. Some of the header tags are different, and
M-JPEG's all use a common huffman table. The color space is a bit different
too (range limited YCrCb instead of YUV).

I'd tend to agree that a unique Squeak movie format is the way to go (see
below), with import/export. Some of the theory behind current formats would
be very appropriate to reuse.

>Jan, can you suggest a simple but widely supported import/export format?
>It has
>to be something we can encode from Squeak, of course. Maybe M-JPEG AVI or
QT?
>Although there *are* open-source C libraries for doing MPEG encoding, it's
>my understanding
>that various patents apply to the MPEG encoding process, especially to MP3
>sound
>encoding, so we would need to get permission from the patent holders to
>distribute
>such code with Squeak.

I see the QuickTime file format is documented at
http://developer.apple.com/techpubs/quicktime/qtdevdocs/QTFF/qtff.html
although don't know if there are any legal issues. Uncompressed frames
would be the simplest codec format for inport/export, with PCM sound. The
"Photo JPEG" codec might be very close or even identical to the common JPEG
file format. Even though there's lots of complexity defined in QuickTime
file format, I believe the required stuff is pretty simple. Much of the
info in a QuickTime file can be ignored or not written.

I'd avoid MPEG patent dangers, they are quite serious about defending their
territory.

I haven't looked at recent Squeak code, so ignore all this if it's already
done... For a close match to existing Squeak architecture, you could store
movies in a hierarchical binary object stream format. Basically, the output
from multiple ReferenceStream's, with the chunks listed in a directory of
file offset's. Each video frame/sound chunk could be a different object
tree. ReferenceStream doesn't seem like it has any way to prioritize the
ordering of objects written, so you would need to keep the subtree small.
Conceptually you might just want to have ReferenceStream write a whole
movie, with meta info objects, video frame objects, and sounds chunk
objects. Realistically, you need to control the ordering of little
subtrees, and also allow reading of subtrees for playback. See below for
more comments on streaming.

To manipulate the movie, you load the objects that represent the movie meta
data (which includes an OrderedCollection of frames chunk offsets), and
then load image/sound/whatever chunks dynamically. ReferenceStream looks
like it already knows how to read/write multiple object trees, all that's
needed is some way to randomly access chunks in the file for editing, like
a chunk directory at the end with the starting offset of each chunk.

The goal would be to allow playing the movie as a stream, from the
beginning, without random seeks, but yet access to each frame without
scanning the whole file for editing, so the ORDERING of object subtrees is
really important, as is having appropriate random access chunk directories.
Having some playback meta info at the beginning of the file, and then
frame/sound chunks, with a frame directory object tree at the end might do
the trick. MPEG files (even I-Frame only) are not so good for editing
because they lack this random access directory. Being able to write an
output movie file as a stream would be desirable too, so then you can
stream a Squeak movie file from a live camera to a remote Squeak for live
playback over a network (i.e. videoconferencing).

Getting the file format to work nice for random access editing, streaming
playback (no seeks), and streaming writing (no seeks) is the magic of a
good video file format. There also are multiple streams (at least sound and
video) that you want interleaved into the master stream, so you don't have
to buffer huge amounts of one stream to get the next chunk of another
stream. The interleaving of streams is closely related to timing. You can't
just say for every video frame you'll emit 1000 bytes of sound samples, as
the sound may be compressed to variable size, so 1000 bytes can represent
variable times. Really you need to interleave streams with the timebase for
each stream in sync. So the file has the video frame for 0:42:12.10 next to
the sound that gets played at 0:42:12.10. Playback software can buffer
things some, to account for different latencies in the video playback from
the sound playback, but if the two streams aren't timebase synced, the
buffer requirements get bigger and bigger as the movie plays, and if the
stream aren't timebase synced, you may have to start doing a disk seek for
a sound chunk and a seek for the video chunk, this is very bad for
performance and smooth playback. Streaming of the video file is also
impossible if the video and sound interleave aren't timebase synced.

Collecting large amount of directory data to emit before groups of frames
is undesirable too, as you would have to buffer variable sized frames from
future file positions, which will introduce latency in live video
processing. This suggest directory info should be after the frames (or at
the end to prevent interrupt in the smooth flow of video/audio data). If
you've been watching the CNN interviews with reporters on satellite
videoconference phones, you'll notice a couple seconds between when the
local newscaster asks a question and you start to hear the response, this
is because of the video compression latency. Having the Squeak video format
work well with videoconferencing would be real nice.

For editing, movie frame and sound chunks need to be loaded as needed from
any point randomly, and probably cached for a while, and dumped when memory
is getting full, maybe by the the GC (is there a LRU cache collection
object that pays attention to memory consumption?). Generating megabytes of
garbage objects per second might be hard on the GC, although the number of
objects and references to follow might not be very big (sound samples and
video frames could be non-pointer objects).

The issue is you can't keep the whole movie in memory, so need to have part
of the object tree loaded and dumped as needed, which seems like a very
similar problem to code modules or projects dynamically coming and
going.  Movies are unique in that streaming playback is desirable, for use
over a network and also just to optimize disk throughput (a random disk
seek per frame would be bad).

Offhand, it seems like some Smalltalk wizard could put this all together
pretty quick. I'd work on it myself, but am currently furiously working on
a paying device driver project, with a deadline real soon now.

- Jan