[FFmpeg-devel] [PATCH 2/5] libx264: Update ROI behaviour to match documentation

Tue Mar 5 02:52:19 EET 2019

On 28/02/2019 06:38, Guo, Yejun wrote:
>> -----Original Message-----
>> From: ffmpeg-devel [mailto:ffmpeg-devel-bounces at ffmpeg.org] On Behalf
>> Of mypopy at gmail.com
>> Sent: Thursday, February 28, 2019 11:26 AM
>> To: FFmpeg development discussions and patches <ffmpeg-
>> devel at ffmpeg.org>
>> Subject: Re: [FFmpeg-devel] [PATCH 2/5] libx264: Update ROI behaviour to
>> match documentation
>>
>> On Thu, Feb 28, 2019 at 10:53 AM Guo, Yejun <yejun.guo at intel.com> wrote:
>>>> -----Original Message-----
>>>> From: ffmpeg-devel [mailto:ffmpeg-devel-bounces at ffmpeg.org] On
>> Behalf
>>>> Of Mark Thompson
>>>> Sent: Thursday, February 28, 2019 6:00 AM
>>>> To: ffmpeg-devel at ffmpeg.org
>>>> Subject: [FFmpeg-devel] [PATCH 2/5] libx264: Update ROI behaviour to
>>>> match documentation
>>>>
>>>> Fix the quantisation offset - use the whole range, and don't change the
>>>> offset size based on bit depth.
>>>>
>>>> Use correct bottom/right edge locations (they are offsets from
>> bottom/right,
>>>> not from top/left).
>>>>
>>>> Iterate the list in reverse order.  The regions are in order of
>> decreasing
>>>> importance, so the most important must be applied last.
>>>> ---
>>>>  libavcodec/libx264.c | 50 ++++++++++++++++++++++++--------------------
>>>>  1 file changed, 27 insertions(+), 23 deletions(-)
>>>>
>>>> diff --git a/libavcodec/libx264.c b/libavcodec/libx264.c
>>>> index a3493f393d..475719021e 100644
>>>> --- a/libavcodec/libx264.c
>>>> +++ b/libavcodec/libx264.c
>>>> @@ -285,16 +285,18 @@ static int X264_frame(AVCodecContext *ctx,
>>>> AVPacket *pkt, const AVFrame *frame,
>>>>      int nnal, i, ret;
>>>>      x264_picture_t pic_out = {0};
>>>>      int pict_type;
>>>> +    int bit_depth;
>>>>      int64_t *out_opaque;
>>>>      AVFrameSideData *sd;
>>>>
>>>>      x264_picture_init( &x4->pic );
>>>>      x4->pic.img.i_csp   = x4->params.i_csp;
>>>>  #if X264_BUILD >= 153
>>>> -    if (x4->params.i_bitdepth > 8)
>>>> +    bit_depth = x4->params.i_bitdepth;
>>>>  #else
>>>> -    if (x264_bit_depth > 8)
>>>> +    bit_depth = x264_bit_depth;
>>>>  #endif
>>>> +    if (bit_depth > 8)
>>>>          x4->pic.img.i_csp |= X264_CSP_HIGH_DEPTH;
>>>>      x4->pic.img.i_plane = avfmt2_num_planes(ctx->pix_fmt);
>>>>
>>>> @@ -359,45 +361,47 @@ static int X264_frame(AVCodecContext *ctx,
>>>> AVPacket *pkt, const AVFrame *frame,
>>>>                  if (frame->interlaced_frame == 0) {
>>>>                      int mbx = (frame->width + MB_SIZE - 1) / MB_SIZE;
>>>>                      int mby = (frame->height + MB_SIZE - 1) / MB_SIZE;
>>>> +                    int qp_range = 51 + 6 * (bit_depth - 8);
>>>
>>> just found following from "$ ./x264 --fullhelp", not sure what 81 means
>> here. Shall we change our qoffset formula accordingly?
>>>       --qpmin <integer>       Set min QP [0]
>>>       --qpmax <integer>       Set max QP [81]

libx264 applies a fixed offset to the QP range to make it nonnegative (by adding QpBdOffsetY).  The maximum of 81 therefore corresponds to a bit depth of (81-51)/6+8 = 13 bits, the higher values only being valid at the higher depths.

I guess that's set for a higher depth than libx264 actually supports to avoid any future problems with range (e.g. 12-bit support would not be an unreasonable feature).

>>>>                      int nb_rois;
>>>> -                    AVRegionOfInterest* roi;
>>>> -                    float* qoffsets;
>>>> +                    const AVRegionOfInterest *roi;
>>>> +                    float *qoffsets;
>>>>                      qoffsets = av_mallocz_array(mbx * mby,
>> sizeof(*qoffsets));
>>>>                      if (!qoffsets)
>>>>                          return AVERROR(ENOMEM);
>>>>
>>>> -                    nb_rois = sd->size / sizeof(AVRegionOfInterest);
>>>> -                    roi = (AVRegionOfInterest*)sd->data;
>>>> -                    for (int count = 0; count < nb_rois; count++) {
>>>> -                        int starty = FFMIN(mby, roi->top / MB_SIZE);
>>>> -                        int endy   = FFMIN(mby, (roi->bottom + MB_SIZE
>> - 1)/ MB_SIZE);
>>>> -                        int startx = FFMIN(mbx, roi->left / MB_SIZE);
>>>> -                        int endx   = FFMIN(mbx, (roi->right + MB_SIZE
>> - 1)/ MB_SIZE);
>>>> +                    roi = (const AVRegionOfInterest*)sd->data;
>>>> +                    if (!roi->self_size || sd->size % roi->self_size
>> != 0) {
>>>> +                        av_log(ctx, AV_LOG_ERROR, "Invalid
>>>> AVRegionOfInterest.self_size.\n");
>>>> +                        return AVERROR(EINVAL);
>>>> +                    }
>>>> +                    nb_rois = sd->size / roi->self_size;
>>>> +
>>>> +                    // This list must be iterated in reverse because
>> regions are
>>>> +                    // defined in order of decreasing importance.
>>>
>>> Nit:   the reason may be more straight forward.
>>> This list must be iterated in reverse because: when overlapping regions
>> are defined, the first region containing a given area of the frame applies.

Right, yes.  I've updated the comment appropriately.

>>>> +                    for (int i = nb_rois - 1; i >= 0; i--) {
>>>> +                        int startx, endx, starty, endy;
>>>>                          float qoffset;
>>>>
>>>> +                        roi = (const AVRegionOfInterest*)(sd->data +
>> roi->self_size * i);
>>>> +
>>>> +                        starty = av_clip(roi->top / MB_SIZE, 0, mby);
>>>> +                        endy   = av_clip((frame->height - roi->bottom
>> + MB_SIZE - 1) /
>>>> MB_SIZE, 0, mby);
>>>> +                        startx = av_clip(roi->left / MB_SIZE, 0, mbx);
>>>> +                        endx   = av_clip((frame->width - roi->right +
>> MB_SIZE - 1) /
>>>> MB_SIZE, 0, mbx);
>>>
>>> not quite understand why endx/endy is calculated so.
>>> For example, for a 1920x1080 frame, and roi->top is 0, and roi->bottom is
>> 1080, then,
>>> starty is 0, endy is 0, which make the following loop does not work.
>>
>> I think Mark use the (left/top) and (right/bottom) as the offset, like this:
>> in the fig, (left/top) == (s1x, s1y), (right/bottom) ==(s2x,s2y)
>>
>>
>> +-----------------------+------> w (x)
>> |          ^            |
>> |          | s1y        |
>> |          V            |
>> |      +***********+    |
>> | s1x  *           * s2x|
>> |<-->  *  ROI      *<-->|
>> |      *           *    |
>> |      +***********+    |
>> |          ^            |
>> |          | s2y        |
>> |          V            |
>> |-----------------------+
>> |
>> V
>>
>> h (y)
>>
> 
> thanks, I guess so.  But, I'm not quite understand why use this style. 

This definition is what is in the current documentation: <http://git.videolan.org/?p=ffmpeg.git;a=blob;f=libavutil/frame.h;h=8aa3e88367ad3605033dbfe92a431d97e0834023;hb=HEAD#l215>.  I followed that when writing the VAAPI support, and only afterwards realised that it didn't match what libx264 was doing.

I wouldn't mind a different way to define the rectangle if everyone agrees.  This method matches cropping in AVFrame and various codecs like H.264.  The current libx26[45] code instead defines the coordinates of the top left and bottom right of the region.  A third alternative would be the coordinates of the top left combined with a width and height, which would match some other filters as suggested by Moritz in the addroi thread (that was primarily talking about the filter option interface, but I don't think it's unreasonable to want the API to match).

Thanks

- Mark