Re: Movie-JPEG and other video info

29 Nov 2001


      ...
More questions about the Movie-JPEG format, please:

Why was it decided to invent a format rather than use the existing

Motion-JPEG standard (which I didn't know about until Bolot sent me these 
URLs):
http://bmrc.berkeley.edu/research/cmt/versions/4.0/doc/cmtmjpeg/MJPEG_ 
chunkfile.html
http://neptune.netcomp.monash.edu.au/cpe3013/MPEG/Reading/MJPEG/step1.htm
I developed and sold a commercial M-JPEG video codec for Windows platforms 
for many years so know HEAPS about M-JPEG formats and digital video in 
general. Hopefully this message can educate people about video formats, and 
point out the potholes.
The first issue is there are a bunch of M-JPEG formats, not just "the" 
standard like MPEG. The two most widely used M-JPEG formats are probably 
Microsoft AVI's using an Open DML compatable codec and Quicktime using 
M-JPEG A or B. The above links are some proprietary format that happen to 
use JPEG like compression. Also note that even though companies claim to 
conform to a "standard" often they don't, so lots of compatibility issues 
come up with M-JPEG.
Unlike MPEG, the file format (AVI or QuickTime) are very distinct from the 
codec format (the compressed frame format). The thing to do would be to 
write Squeak code that understood one or both of these file formats, and 
then also had some code that implemented codec's. File formats tend to be 
pretty stable, codec's  change rapidly. The simplest codec format is 
uncompressed RGBA or YCrCb.
M-JPEG as a codec format has some significant limitations compared to newer 
formats like DV or i-Frame MPEG. Including:
- there is no universal M-JPEG format
- it's quite tricky to get constant data rates using M-JPEG, easily 
available JPEG code sets a "quality" factor before compression, which 
depending on the frame contents will give a large range of compressed frame 
sizes (pure random noise frames actually can be larger after compression)
- JPEG compression also has a single "quality" (the quantization factor) 
for the WHOLE image, which is one reason it's hard to generate constant 
data rates, this also degrades image quality for a given compressed size, 
because you can't allocate more bits to picture areas that have more 
detail, both MPEG and DV can change the quantization dynamically through 
the frame, for M-JPEG the easy strategy to make a constant data rate (or at 
least not above some data rate) is to compress the frame and then keep 
recompressing it (a binary search), adjusting the global quality until the 
frame is an ok size (not so good for performance, but predicting quality 
settings from previous frames helps, except for scene transitions that 
suddenly change the amount of picture detail), there also are some patented 
algorithms to estimate the correct quality setting to use based one samples 
of the data
- movies for actual video display, as opposed to computer display, are also 
generally interlaced, blindly compressing a "frame" vs. just a field (every 
other scan line) will often not work so well, M-JPEG typically compresses 
each field separately, concatenating the result as a "frame", DV and MPEG-2 
have algorithmic support to deal with picture areas where the two fields 
have significant interframe motion, specifically they have alternative 
DCT's (a normal one and one that understand that alternating lines may not 
correlate) that get chosen on a 8x8 cell basis
For high quality video editing, nothing beats uncompressed fields. Most of 
the compression formats subsample the color resolution, which makes things 
like chroma keying not work so well. If you have to make multiple 
compression/decompression passes, most codec's also introduce ugly 
artifacts, as you build up layers for the final output. The downside to 
uncompressed video editing is high data rates. CPU loads are actually less 
than with compressed formats, but merging two uncompressed full quality 
data streams and writing an output stream is a total disk data rate of 62 
MBytes/sec (for NTSC 29.97 fps*720x480*2 bytes/pixel (assuming YCrCb color 
space). Seeking is also super simple on uncompressed data, as all frames 
are a fixed size.
Also note that NTSC or PAL video is NOT square pixels. I should also add 
that fun things like gamma correction and color gamut mapping should be 
done to make high quality output. It's a LOT more complex than just taking 
your RGB animation and feeding it to a JPEG algorithm.
A BIG advantage of M-JPEG format is it's almost totally free of patent 
issues. I've been told DV format is also mostly not a patent issue. MPEG on 
the other hand is a patent mine field.
A very viable way to edit video might be to keep a shadow file of metadata 
for an i-Frame MPEG file (or other video file format). Frames could be 
decompressed using a standard MPEG decoder code. Frames could be assembled 
by seeking to the correct file offset based on the metadata. Output could 
be uncompressed or i-Frame MPEG. Deferring decompression of all the frames 
(or even just bypassing decompressing, by copying the input to the output) 
is best, but often not posible. There would be a reasonable fast pre-edit 
step to parse the input file into metatdata (no decompression, just finding 
the frame boundaries). Having pluggable file format's (MPEG flavors, 
QuickTime, AVI) and compression formats (MPEG I/II, DV, uncompressed, 
M-JPEG) would be best.
Other random thoughts on video:
- alpha is important, and generally ignored by most compressed formats, 
uncompressed+alpha is probably the ideal working video format
- gamma corrected YCrCb with 4:2:2 subsampling is closest to most native 
video formats, so doing effects in a YCrCbA color space might be desirable, 
I see 160 GByte disks were for sale at $299
- sound is a whole can of worms, consumer DV cameras use what's called 
unlocked audio, which means the number of sound samples varies for each 
frame, this confuses timebase logic, generally you have to make video run 
at the correct frame rate (about 29.97 fps for NTSC, but not exactly), and 
fixup the sound samples. Pro video devices often have a common clock for 
video and audio samples, so can keep the two synchronized.
- all these details can be ignored if of you just want to play little 
videos on your computer screen, if you want to produce video that shows up 
on the Discovery Channel, you have to get it right
- Jan