#LyX 1.3 created this file. For more info see http://www.lyx.org/ \lyxformat 221 \textclass article \language english \inputencoding auto \fontscheme default \graphics default \paperfontsize default \spacing single \papersize Default \paperpackage a4 \use_geometry 0 \use_amsmath 0 \use_natbib 0 \use_numerical_citations 0 \paperorientation portrait \secnumdepth 3 \tocdepth 3 \paragraph_separation skip \defskip medskip \quotes_language english \quotes_times 2 \papercolumns 1 \papersides 1 \paperpagestyle default \layout Title \noindent Description of the 4XM Video Codec \layout Author by Michael Niedermayer \layout Standard \begin_inset LatexCommand \tableofcontents{} \end_inset \layout Section Introduction \layout Standard The 4XM video codec is a mixture between a very simplified JPEG scheme and rectangular block based fullpel motion compensation with DC difference coding. The codec uses 4:2:0 YCbCr colorspace for the JPEG part but converts it to RGB16 before using it for motion compensation. \layout Standard The latest version of this document is available at http://www.mplayerhq.hu/~micha el/4xm.{lyx,txt,html,ps} \layout Standard 4XM video is normally encapsulated in the proprietary \begin_inset LatexCommand \url[4XM format]{http://www.pcisys.net/~melanson/codecs/4xm-format.txt} \end_inset . \layout Standard \noindent This document assumes familiarity with mathematical and coding concepts such as the discrete cosine transform, quantization, YCbCr colorspaces, macroblocks, and variable length codes (VLCs). A familiarity with the standard JPEG coding method is also helpful. \layout Section Terms and Definitions \layout List \labelwidthstring 00.00.0000 AC Any DCT coefficient for which the frequency in one or both dimensions is non-zero. \layout List \labelwidthstring 00.00.0000 DC The DCT coefficient for which the frequency is zero in both dimensions \layout List \labelwidthstring 00.00.0000 (I)DCT (Inverse) Discrete Cosine Transform \layout List \labelwidthstring 00.00.0000 VLC Variable Length Code \layout List \labelwidthstring 00.00.0000 AAN\SpecialChar ~ IDCT IDCT algorithm by Arai, Agui, and Nakajima \layout List \labelwidthstring 00.00.0000 JPEG Joint Photographic Expert Group \layout Section High-level Description \layout Standard The 4XM video coding method embodies 3 types of frames: I-frames, P-frames, and C-frames. I-frames are intraframes and stand on their own. P and C-frames are Interframes. \layout Subsection I-Frame \layout Standard I-frames are practically the same as JPEG images. Differences include just a single Huffman table, different headers, and a bitstream split into 2 partitions with one partition written in 32-bit byteswapped order. There are also no parameters for rate or quality control. \layout Standard The picture is split into macroblocks which are coded left->right, top->bottom. \layout Subsubsection Macroblock \layout Standard 16x16 luma + 8x8 chroma as 4 8x8 luma blocks and 2 8x8 chroma blocks : \layout Standard Y: \begin_inset Tabular \begin_inset Text \layout Standard 0 \end_inset \begin_inset Text \layout Standard 1 \end_inset \begin_inset Text \layout Standard 2 \end_inset \begin_inset Text \layout Standard 3 \end_inset \end_inset Cb: \begin_inset Tabular \begin_inset Text \layout Standard 4 \end_inset \end_inset Cr: \begin_inset Tabular \begin_inset Text \layout Standard 5 \end_inset \end_inset \layout Subsubsection DC Prediction \layout Standard DC values are predicted from the last coded block. The initial prediction value used for the first top left luma block is 0. No special handling is done between luma and chroma blocks or at the right border, so the DC value of the rightmost 8x8 Cr block of the first row will be used as the predictor for the first/top-left 8x8 luma block of the second MB row. \layout Subsubsection Dequantization and IDCT \layout Standard 4XM uses an AAN IDCT with the premultiply table merged with the quantization table. The quantization table is the default luma table used in JPEG. \layout Paragraph default luma quantization table used in JPEG \layout LyX-Code 16, 11, 10, 16, 24, 40, 51, 61, \layout LyX-Code 12, 12, 14, 19, 26, 58, 60, 55, \layout LyX-Code 14, 13, 16, 24, 40, 57, 69, 56, \layout LyX-Code 14, 17, 22, 29, 51, 87, 80, 62, \layout LyX-Code 18, 22, 37, 56, 68, 109, 103, 77, \layout LyX-Code 24, 35, 55, 64, 81, 104, 113, 92, \layout LyX-Code 49, 64, 78, 87, 103, 121, 120, 101, \layout LyX-Code 72, 92, 95, 98, 112, 100, 103, 99 \layout Paragraph AAN premultiply table \layout LyX-Code 16384, 22725, 21407, 19266, 16384, 12873, 8867, 4520, \layout LyX-Code 22725, 31521, 29692, 26722, 22725, 17855, 12299, 6270, \layout LyX-Code 21407, 29692, 27969, 25172, 21407, 16819, 11585, 5906, \layout LyX-Code 19266, 26722, 25172, 22654, 19266, 15137, 10426, 5315, \layout LyX-Code 16384, 22725, 21407, 19266, 16384, 12873, 8867, 4520, \layout LyX-Code 12873, 17855, 16819, 15137, 12873, 10114, 6967, 3552, \layout LyX-Code 8867 , 12299, 11585, 10426, 8867, 6967, 4799, 2446, \layout LyX-Code 4520 , 6270, 5906, 5315, 4520, 3552, 2446, 1247 \layout Paragraph merged table used in 4XM \layout LyX-Code 16, 15, 13, 19, 24, 31, 28, 17, \layout LyX-Code 17, 23, 25, 31, 36, 63, 45, 21, \layout LyX-Code 18, 24, 27, 37, 52, 59, 49, 20, \layout LyX-Code 16, 28, 34, 40, 60, 80, 51, 20, \layout LyX-Code 18, 31, 48, 66, 68, 86, 56, 21, \layout LyX-Code 19, 38, 56, 59, 64, 64, 48, 20, \layout LyX-Code 27, 48, 55, 55, 56, 51, 35, 15, \layout LyX-Code 20, 35, 34, 32, 31, 22, 15, 8, \layout Standard This is simply the element-wise product of the quantization table and the AAN table divided by \begin_inset Formula $2^{16}$ \end_inset . \layout Standard 4XM's AAN IDCT uses (a*const)>>16 to approximate multiplications and simply shifts the transformed result 16 bits to the right. The scaled constants are: \layout Standard \begin_inset Tabular \begin_inset Text \layout Standard exact \end_inset \begin_inset Text \layout Standard scaled constant \end_inset \begin_inset Text \layout Standard 1.082392200... \end_inset \begin_inset Text \layout Standard \family roman \series medium \shape up \size normal \emph off \bar no \noun off \color none 70936 \end_inset \begin_inset Text \layout Standard \family roman \series medium \shape up \size normal \emph off \bar no \noun off \color none 1 \family default \series default \shape default \size default \emph default \bar default \noun default \color default . \family roman \series medium \shape up \size normal \emph off \bar no \noun off \color none 414213562 \family default \series default \shape default \size default \emph default \bar default \noun default \color default ... \end_inset \begin_inset Text \layout Standard \family roman \series medium \shape up \size normal \emph off \bar no \noun off \color none 92682 \end_inset \begin_inset Text \layout Standard 1.847759065... \end_inset \begin_inset Text \layout Standard 121095 \end_inset \begin_inset Text \layout Standard 2.613125930... \end_inset \begin_inset Text \layout Standard \family roman \series medium \shape up \size normal \emph off \bar no \noun off \color none 171254 \end_inset \end_inset \layout Standard which are simply the exact constants multiplied by \begin_inset Formula $2^{16}$ \end_inset and rounded to the nearest integer. \layout Standard see \begin_inset LatexCommand \htmlurl[4xm.c]{http://cvs.sourceforge.net/cgi-bin/viewcvs.cgi/ffmpeg/ffmpeg/libavcodec/4xm.c?rev=HEAD&content-type=text/vnd.viewcvs-markup} \end_inset \layout Subsubsection YCbCr 4:2:0 -> RGB565 colorspace transform \layout Standard Chroma is first upsampled by sample replication / nearest neighbor scaling, so that the same Cb and Cr samples are used for each 2x2 Y samples \layout Standard R= (Y + Cr + 128)>>3 \layout Standard G= (Y - ((Cb+Cr)>>1) + 128)>>2 \layout Standard B= (Y + 2Cb + 128)>>3 \layout Standard There is no check or protection against overflow, so values will wrap around if they are too large or small. \layout Subsection P-Frame \layout Standard A P-frame picture is split into blocks which are coded left->right, top->bottom. Each block contains 8x8 samples in RGB565 format (5 bits for red, 6 bits for green, 5 bits for blue). Each block can be recursively split into 2, down to 2x1/1x2 sized blocks. \layout Standard A P-frame block can be coded using 1 of 7 methods: \layout Enumerate motion compensated with 1 vector \layout Enumerate horizontally split in the middle \layout Enumerate vertically split in the middle \layout Enumerate skipped (block data copied from the frame before the last) \newline Example: Intraframe, Interframe1, Interframe2, Interframe3 \newline a skipped block in Interframe3 will use the data from Interframe1; skipped blocks in Interframe1 are dissallowed as there is no source frame \layout Enumerate motion vector + DC difference (the 16-bit words of the DC and the source block are simply added, there is no special handling of overflows) \layout Enumerate DC only, the whole block is filled with the DC color \layout Enumerate hardcoded pixel values (left->right, top->bottom) \layout Standard Block splitting is only available if the resulting blocks are larger than 1x2/2x1. Hardcoded pixel values are only available for 1x2/2x1 sized blocks \layout Standard Motion compensation assumes that the number of words (16-bit RGB565 pixels) per line is equal to the width, so that motion vectors which point right or left outside of the picture use the pixels from the other side. There is no subpel motion compensation or filtering which means that motion compensation can simply be done by copying the pixels from the motion block. \layout Subsubsection Motion Vector table \layout Standard See mv[256][2] at \begin_inset LatexCommand \htmlurl[4xm.c]{http://cvs.sourceforge.net/cgi-bin/viewcvs.cgi/ffmpeg/ffmpeg/libavcodec/4xm.c?rev=HEAD&content-type=text/vnd.viewcvs-markup} \end_inset . \layout Subsection C-Frame \layout Standard A C-frame is essentially a partial P frame. It has all the same coding options but a different header. \layout Section Bitstream \layout Standard All 32-bit values are in little endian byte order. \layout Subsection I-Frame \layout List \labelwidthstring 00.00.0000 32bit 'ifrm' \layout List \labelwidthstring 00.00.0000 32bit chunk length \layout List \labelwidthstring 00.00.0000 32bit 0 (unknown) \layout List \labelwidthstring 00.00.0000 32bit bitstream size \layout List \labelwidthstring 00.00.0000 n\SpecialChar ~ byte bitstream \layout List \labelwidthstring 00.00.0000 32bit prefixstream size / 4 \layout List \labelwidthstring 00.00.0000 32bit token_count \layout List \labelwidthstring 00.00.0000 n\SpecialChar ~ byte prefixstream \layout Subsubsection Prefix stream \layout LyX-Code start 8bit \layout LyX-Code end 8bit \layout LyX-Code do{ \layout LyX-Code for(i=start; i<=end; i++) \layout LyX-Code frequency[i] 8bit \layout LyX-Code start 8bit \layout LyX-Code if(start==0) break; \layout LyX-Code end 8bit \layout LyX-Code } \layout LyX-Code while(not 32bit aligned) \layout LyX-Code 0 8bit \layout LyX-Code for(i=0; i>4; \layout LyX-Code level_prefix= ac_prefix&0xF; \layout LyX-Code level_suffix level_prefix bits bitstream \layout LyX-Code block[ zigzag[i] ]= level; \layout LyX-Code i++; \layout LyX-Code } \layout LyX-Code } \layout Subsection P-Frame \layout List \labelwidthstring 00.00.0000 32bit 'pfrm' \layout List \labelwidthstring 00.00.0000 32bit chunk size \layout List \labelwidthstring 00.00.0000 32bit 0 (unknown) \layout List \labelwidthstring 00.00.0000 32bit unknown, perhaps a checksum \layout List \labelwidthstring 00.00.0000 32bit unknown \layout List \labelwidthstring 00.00.0000 32bit bitstream size \layout List \labelwidthstring 00.00.0000 32bit wordstream size \layout List \labelwidthstring 00.00.0000 32bit bytestream size \layout List \labelwidthstring 00.00.0000 n\SpecialChar ~ bytes bitstream, stored in byteswapped 32-bit words \layout List \labelwidthstring 00.00.0000 n\SpecialChar ~ bytes RGB16 wordstream, stored in little endian order \layout List \labelwidthstring 00.00.0000 n\SpecialChar ~ bytes bytestream \layout Subsubsection Block \layout LyX-Code block(){ \layout LyX-Code mode vlc bitstream \layout LyX-Code if(mode==h_split || mode==v_split){ \layout LyX-Code block() \layout LyX-Code block() \layout LyX-Code } \layout LyX-Code if(mode==mc || mode==mcdc) \layout LyX-Code mv 8bit bytestream \layout LyX-Code if(mode==dc || mode==mcdc) \layout LyX-Code dc 16bit wordstream \layout LyX-Code if(mode==esc){ \layout LyX-Code col1 16bit wordstream \layout LyX-Code col2 16bit wordstream \layout LyX-Code } \layout LyX-Code } \layout Subsection C-Frame \layout List \labelwidthstring 00.00.0000 32bit 'cfrm' \layout List \labelwidthstring 00.00.0000 32bit chunk size \layout List \labelwidthstring 00.00.0000 32bit 0 (unknown) \layout List \labelwidthstring 00.00.0000 32bit frame number / frame id, this is the frame number where the frame will be shown, it is also the frame number at which the last cframe part of this frame will be; note, all parts of the same cframe contain the same id here \layout List \labelwidthstring 00.00.0000 32bit whole frame size \layout List \labelwidthstring 00.00.0000 * p frame, this is (unk, unk, bitstream size, wordstream size, ...) for the first c frame chunk of a c frame \layout Section VLC Codes \layout Subsection prefix_vlc in I frames \layout Standard The prefix_vlc table is generated from the frequencies stored in the prefix stream. Additionally, the element 256 is added with an implicit frequency of 1. For the exact algorithm see libavcodec/4xm.c read_huffman_tables(). \layout Subsection level vlc in I Frames \layout Standard Identical to JPEG \layout Standard \begin_inset Tabular \begin_inset Text \layout Standard prefix \end_inset \begin_inset Text \layout Standard vlc \end_inset \begin_inset Text \layout Standard level \end_inset \begin_inset Text \layout Standard 0 \end_inset \begin_inset Text \layout Standard \end_inset \begin_inset Text \layout Standard 0 \end_inset \begin_inset Text \layout Standard 1 \end_inset \begin_inset Text \layout Standard 0/1 \end_inset \begin_inset Text \layout Standard -1/1 \end_inset \begin_inset Text \layout Standard 2 \end_inset \begin_inset Text \layout Standard 0X/1X \end_inset \begin_inset Text \layout Standard -3..-2/2..3 \end_inset \begin_inset Text \layout Standard 3 \end_inset \begin_inset Text \layout Standard 0XX/1XX \end_inset \begin_inset Text \layout Standard -7..-4/4..7 \end_inset \begin_inset Text \layout Standard ... \end_inset \begin_inset Text \layout Standard ... \end_inset \begin_inset Text \layout Standard ... \end_inset \end_inset \layout Standard One way to decode this is: \layout LyX-Code if(prefix){ \layout LyX-Code v= get_bits(prefix); \layout LyX-Code if((v & (1<<(prefix-1))) == 0) \layout LyX-Code v= (-1 < \begin_inset Text \layout Standard 0 \end_inset \begin_inset Text \layout Standard mc \end_inset \begin_inset Text \layout Standard 10 \end_inset \begin_inset Text \layout Standard h_split \end_inset \begin_inset Text \layout Standard 110 \end_inset \begin_inset Text \layout Standard v_split \end_inset \begin_inset Text \layout Standard 1110 \end_inset \begin_inset Text \layout Standard skip \end_inset \begin_inset Text \layout Standard 11110 \end_inset \begin_inset Text \layout Standard mcdc \end_inset \begin_inset Text \layout Standard 11111 \end_inset \begin_inset Text \layout Standard dc \end_inset \end_inset \layout Paragraph For blocks 8x1, 4x1 \layout Standard \begin_inset Tabular \begin_inset Text \layout Standard 0 \end_inset \begin_inset Text \layout Standard mc \end_inset \begin_inset Text \layout Standard 10 \end_inset \begin_inset Text \layout Standard h_split \end_inset \begin_inset Text \layout Standard 110 \end_inset \begin_inset Text \layout Standard skip \end_inset \begin_inset Text \layout Standard 1110 \end_inset \begin_inset Text \layout Standard mcdc \end_inset \begin_inset Text \layout Standard 1111 \end_inset \begin_inset Text \layout Standard dc \end_inset \end_inset \layout Paragraph For blocks 1x8, 1x4 \layout Standard \begin_inset Tabular \begin_inset Text \layout Standard 0 \end_inset \begin_inset Text \layout Standard mc \end_inset \begin_inset Text \layout Standard 10 \end_inset \begin_inset Text \layout Standard v_split \end_inset \begin_inset Text \layout Standard 110 \end_inset \begin_inset Text \layout Standard skip \end_inset \begin_inset Text \layout Standard 1110 \end_inset \begin_inset Text \layout Standard mcdc \end_inset \begin_inset Text \layout Standard 1111 \end_inset \begin_inset Text \layout Standard dc \end_inset \end_inset \layout Paragraph For blocks 2x1, 1x2 \layout Standard \begin_inset Tabular \begin_inset Text \layout Standard 0 \end_inset \begin_inset Text \layout Standard mc \end_inset \begin_inset Text \layout Standard 10 \end_inset \begin_inset Text \layout Standard skip \end_inset \begin_inset Text \layout Standard 110 \end_inset \begin_inset Text \layout Standard mcdc \end_inset \begin_inset Text \layout Standard 1110 \end_inset \begin_inset Text \layout Standard dc \end_inset \begin_inset Text \layout Standard 1111 \end_inset \begin_inset Text \layout Standard esc \end_inset \end_inset \layout Section Applications and Platforms \layout Standard The 4XM video codec is intended for gaming applications. It is known to operate on these computing platforms: \layout Itemize PC/Microsoft Windows \layout Itemize Apple Macintosh \layout Itemize Sega Dreamcast \layout Itemize Nintendo Gameboy Advance \layout Standard While the Dreamcast and the targeted PC/Mac platforms have quite a bit of computing power (at least 200 MHz), the GBA has an ARM RISC CPU running at 16-17 MHz. \layout Standard The 4XM coding method seems a little odd in its mixture of YCbCr and RGB colorspaces. In the end, all of the output data is RGB565. It is useful to note that many video game consoles can efficiently manuipulate this colorspace with video hardware. By contrast, many video consoles have no, or very limited, facilites for direct YCbCr rendering, particularly planar YCbCr modes. \layout Standard One more note about interframe block addition: One possible approach to implementing this part of the method on console hardware, at least the Sega Dreamcast, would be to fill a texture with all zero values, skip all blocks that are not coded, and fill the coded blocks with the coded RGB565 difference. Then, the final texture could be added to the current frame. This also has the implicit side effect of saturating the addition so that the resulting pixels do not wrap around. \layout Section Changelog \layout List \labelwidthstring 00.00.0000 0.01 2003-06-01 \newline initial version by Michael Niedermayer \layout List \labelwidthstring 00.00.0000 0.02 2003-06-07 \newline minor changes \layout List \labelwidthstring 00.00.0000 0.03 2003-06-08 \newline peer review, grammar/spelling/punctuation fixes and \begin_inset Quotes eld \end_inset Applications and Platforms \begin_inset Quotes erd \end_inset section by Mike Melanson \newline minor changes by Michael \layout List \labelwidthstring 00.00.0000 0.04 2003-06-08 \newline minor changes by Mike Melanson and Michael Niedermayer \layout Section Copyright \layout Standard Copyright 2003 Michael Niedermayer \newline This text can be used under the GNU Free Documentation License or GNU General Public License. See \begin_inset LatexCommand \htmlurl{http://www.gnu.org/licenses/fdl.txt} \end_inset . \the_end