Movie-JPEG and other video info
Jan Bottorff
janb at pmatrix.com
Thu Nov 29 06:36:06 UTC 2001
>More questions about the Movie-JPEG format, please:
>
>- Why was it decided to invent a format rather than use the existing
>Motion-JPEG standard (which I didn't know about until Bolot sent me these
>URLs):
>
>http://bmrc.berkeley.edu/research/cmt/versions/4.0/doc/cmtmjpeg/MJPEG_
>chunkfile.html
>http://neptune.netcomp.monash.edu.au/cpe3013/MPEG/Reading/MJPEG/step1.htm
I developed and sold a commercial M-JPEG video codec for Windows platforms
for many years so know HEAPS about M-JPEG formats and digital video in
general. Hopefully this message can educate people about video formats, and
point out the potholes.
The first issue is there are a bunch of M-JPEG formats, not just "the"
standard like MPEG. The two most widely used M-JPEG formats are probably
Microsoft AVI's using an Open DML compatable codec and Quicktime using
M-JPEG A or B. The above links are some proprietary format that happen to
use JPEG like compression. Also note that even though companies claim to
conform to a "standard" often they don't, so lots of compatibility issues
come up with M-JPEG.
Unlike MPEG, the file format (AVI or QuickTime) are very distinct from the
codec format (the compressed frame format). The thing to do would be to
write Squeak code that understood one or both of these file formats, and
then also had some code that implemented codec's. File formats tend to be
pretty stable, codec's change rapidly. The simplest codec format is
uncompressed RGBA or YCrCb.
M-JPEG as a codec format has some significant limitations compared to newer
formats like DV or i-Frame MPEG. Including:
- there is no universal M-JPEG format
- it's quite tricky to get constant data rates using M-JPEG, easily
available JPEG code sets a "quality" factor before compression, which
depending on the frame contents will give a large range of compressed frame
sizes (pure random noise frames actually can be larger after compression)
- JPEG compression also has a single "quality" (the quantization factor)
for the WHOLE image, which is one reason it's hard to generate constant
data rates, this also degrades image quality for a given compressed size,
because you can't allocate more bits to picture areas that have more
detail, both MPEG and DV can change the quantization dynamically through
the frame, for M-JPEG the easy strategy to make a constant data rate (or at
least not above some data rate) is to compress the frame and then keep
recompressing it (a binary search), adjusting the global quality until the
frame is an ok size (not so good for performance, but predicting quality
settings from previous frames helps, except for scene transitions that
suddenly change the amount of picture detail), there also are some patented
algorithms to estimate the correct quality setting to use based one samples
of the data
- movies for actual video display, as opposed to computer display, are also
generally interlaced, blindly compressing a "frame" vs. just a field (every
other scan line) will often not work so well, M-JPEG typically compresses
each field separately, concatenating the result as a "frame", DV and MPEG-2
have algorithmic support to deal with picture areas where the two fields
have significant interframe motion, specifically they have alternative
DCT's (a normal one and one that understand that alternating lines may not
correlate) that get chosen on a 8x8 cell basis
For high quality video editing, nothing beats uncompressed fields. Most of
the compression formats subsample the color resolution, which makes things
like chroma keying not work so well. If you have to make multiple
compression/decompression passes, most codec's also introduce ugly
artifacts, as you build up layers for the final output. The downside to
uncompressed video editing is high data rates. CPU loads are actually less
than with compressed formats, but merging two uncompressed full quality
data streams and writing an output stream is a total disk data rate of 62
MBytes/sec (for NTSC 29.97 fps*720x480*2 bytes/pixel (assuming YCrCb color
space). Seeking is also super simple on uncompressed data, as all frames
are a fixed size.
Also note that NTSC or PAL video is NOT square pixels. I should also add
that fun things like gamma correction and color gamut mapping should be
done to make high quality output. It's a LOT more complex than just taking
your RGB animation and feeding it to a JPEG algorithm.
A BIG advantage of M-JPEG format is it's almost totally free of patent
issues. I've been told DV format is also mostly not a patent issue. MPEG on
the other hand is a patent mine field.
A very viable way to edit video might be to keep a shadow file of metadata
for an i-Frame MPEG file (or other video file format). Frames could be
decompressed using a standard MPEG decoder code. Frames could be assembled
by seeking to the correct file offset based on the metadata. Output could
be uncompressed or i-Frame MPEG. Deferring decompression of all the frames
(or even just bypassing decompressing, by copying the input to the output)
is best, but often not posible. There would be a reasonable fast pre-edit
step to parse the input file into metatdata (no decompression, just finding
the frame boundaries). Having pluggable file format's (MPEG flavors,
QuickTime, AVI) and compression formats (MPEG I/II, DV, uncompressed,
M-JPEG) would be best.
Other random thoughts on video:
- alpha is important, and generally ignored by most compressed formats,
uncompressed+alpha is probably the ideal working video format
- gamma corrected YCrCb with 4:2:2 subsampling is closest to most native
video formats, so doing effects in a YCrCbA color space might be desirable,
I see 160 GByte disks were for sale at $299
- sound is a whole can of worms, consumer DV cameras use what's called
unlocked audio, which means the number of sound samples varies for each
frame, this confuses timebase logic, generally you have to make video run
at the correct frame rate (about 29.97 fps for NTSC, but not exactly), and
fixup the sound samples. Pro video devices often have a common clock for
video and audio samples, so can keep the two synchronized.
- all these details can be ignored if of you just want to play little
videos on your computer screen, if you want to produce video that shows up
on the Discovery Channel, you have to get it right
- Jan
More information about the Squeak-dev
mailing list
|