[FFmpeg-devel] GSOC 2018 qualification task.

Michael Niedermayer michael at niedermayer.cc
Fri Apr 13 01:34:55 EEST 2018


On Fri, Apr 13, 2018 at 02:13:53AM +0530, ANURAG SINGH IIT BHU wrote:
> Hello,
> I have implemented the reviews mentioned on previous patch, now there is no
> need to provide any subtitle file to the filter, I am attaching the
> complete patch of the hellosubs filter.
> 
> Command to run the filter
> ffmpeg -i <videoname> -vf hellosubs=<videoname> helloout.mp4
> 
> 
> Thanks and regards,
> Anurag Singh.
> 
> 
>> 
> On Tue, Apr 10, 2018 at 4:55 AM, Rostislav Pehlivanov <atomnuker at gmail.com>
> wrote:
> 
> > On 9 April 2018 at 19:10, Paul B Mahol <onemda at gmail.com> wrote:
> >
> > > On 4/9/18, Rostislav Pehlivanov <atomnuker at gmail.com> wrote:
> > > > On 9 April 2018 at 03:59, ANURAG SINGH IIT BHU <
> > > > anurag.singh.phy15 at iitbhu.ac.in> wrote:
> > > >
> > > >> This mail is regarding the qualification task assigned to me for the
> > > >> GSOC project
> > > >> in FFmpeg for automatic real-time subtitle generation using speech to
> > > text
> > > >> translation ML model.
> > > >>
> > > >
> > > > i really don't think lavfi is the correct place for such code, nor that
> > > the
> > > > project's repo should contain such code at all.
> > > > This would need to be in another repo and a separate library.
> > >
> > > Why? Are you against ocr filter too?
> > >
> >
> > The OCR filter uses libtessract so I'm fine with it. Like I said, as long
> > as the actual code to do it is in an external library I don't mind.
> > Mozilla recently released Deep Speech (https://github.com/mozilla/
> > DeepSpeech)
> > which does pretty much exactly speech to text and is considered to have the
> > most accurate one out there. Someone just needs to convert the tensorflow
> > code to something more usable.
> > _______________________________________________
> > ffmpeg-devel mailing list
> > ffmpeg-devel at ffmpeg.org
> > http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
> >

>  Makefile       |    1 
>  allfilters.c   |    1 
>  vf_hellosubs.c |  513 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  3 files changed, 515 insertions(+)
> 2432f100fddb7ec84e771be8282d4b66e3d1f50a  0001-avfilter-add-hellosubs-filter.patch
> From ac0e09d431ea68aebfaef6e2ed0b450e76d473d9 Mon Sep 17 00:00:00 2001
> From: ddosvulnerability <anurag.singh.phy15 at iitbhu.ac.in>
> Date: Thu, 12 Apr 2018 22:06:43 +0530
> Subject: [PATCH] avfilter: add hellosubs filter.
> 
> ---
>  libavfilter/Makefile       |   1 +
>  libavfilter/allfilters.c   |   1 +
>  libavfilter/vf_hellosubs.c | 513 +++++++++++++++++++++++++++++++++++++++++++++
>  3 files changed, 515 insertions(+)
>  create mode 100644 libavfilter/vf_hellosubs.c
> 
> diff --git a/libavfilter/Makefile b/libavfilter/Makefile
> index a90ca30..770b1b5 100644
> --- a/libavfilter/Makefile
> +++ b/libavfilter/Makefile
> @@ -331,6 +331,7 @@ OBJS-$(CONFIG_SSIM_FILTER)                   += vf_ssim.o framesync.o
>  OBJS-$(CONFIG_STEREO3D_FILTER)               += vf_stereo3d.o
>  OBJS-$(CONFIG_STREAMSELECT_FILTER)           += f_streamselect.o framesync.o
>  OBJS-$(CONFIG_SUBTITLES_FILTER)              += vf_subtitles.o
> +OBJS-$(CONFIG_HELLOSUBS_FILTER)              += vf_hellosubs.o
>  OBJS-$(CONFIG_SUPER2XSAI_FILTER)             += vf_super2xsai.o
>  OBJS-$(CONFIG_SWAPRECT_FILTER)               += vf_swaprect.o
>  OBJS-$(CONFIG_SWAPUV_FILTER)                 += vf_swapuv.o
> diff --git a/libavfilter/allfilters.c b/libavfilter/allfilters.c
> index 6eac828..a008908 100644
> --- a/libavfilter/allfilters.c
> +++ b/libavfilter/allfilters.c
> @@ -322,6 +322,7 @@ extern AVFilter ff_vf_ssim;
>  extern AVFilter ff_vf_stereo3d;
>  extern AVFilter ff_vf_streamselect;
>  extern AVFilter ff_vf_subtitles;
> +extern AVFilter ff_vf_hellosubs;
>  extern AVFilter ff_vf_super2xsai;
>  extern AVFilter ff_vf_swaprect;
>  extern AVFilter ff_vf_swapuv;
> diff --git a/libavfilter/vf_hellosubs.c b/libavfilter/vf_hellosubs.c
> new file mode 100644
> index 0000000..b994050
> --- /dev/null
> +++ b/libavfilter/vf_hellosubs.c
> @@ -0,0 +1,513 @@
> +/*
> + * Copyright (c) 2011 Baptiste Coudurier
> + * Copyright (c) 2011 Stefano Sabatini
> + * Copyright (c) 2012 Clément Bœsch
> + *
> + * This file is part of FFmpeg.
> + *
> + * FFmpeg is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU Lesser General Public
> + * License as published by the Free Software Foundation; either
> + * version 2.1 of the License, or (at your option) any later version.
> + *
> + * FFmpeg is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> + * Lesser General Public License for more details.
> + *
> + * You should have received a copy of the GNU Lesser General Public
> + * License along with FFmpeg; if not, write to the Free Software
> + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
> + */
> +
> +/**
> + * @file
> + * Libass hellosubs burning filter.
> + *
> + 
> + */
> +
> +#include <ass/ass.h>
> +
> +#include "config.h"
> +#if CONFIG_SUBTITLES_FILTER
> +# include "libavcodec/avcodec.h"
> +# include "libavformat/avformat.h"
> +#endif
> +#include "libavutil/avstring.h"
> +#include "libavutil/imgutils.h"
> +#include "libavutil/opt.h"
> +#include "libavutil/parseutils.h"
> +#include "drawutils.h"
> +#include "avfilter.h"
> +#include "internal.h"
> +#include "formats.h"
> +#include "video.h"
> +#include <stdio.h>
> +#include <stdlib.h>
> +#include <string.h>
> +
> +typedef struct AssContext {
> +    const AVClass *class;
> +    ASS_Library  *library;
> +    ASS_Renderer *renderer;
> +    ASS_Track    *track;
> +    char *filename;
> +    char *fontsdir;
> +    char *charenc;
> +    char *force_style;
> +    int stream_index;
> +    int alpha;
> +    uint8_t rgba_map[4];
> +    int     pix_step[4];       ///< steps per pixel for each plane of the main output
> +    int original_w, original_h;
> +    int shaping;
> +    FFDrawContext draw;
> +} AssContext;
> +
> +#define OFFSET(x) offsetof(AssContext, x)
> +#define FLAGS AV_OPT_FLAG_FILTERING_PARAM|AV_OPT_FLAG_VIDEO_PARAM
> +
> +#define COMMON_OPTIONS \
> +    {"filename",       "set the filename of file to read",                         OFFSET(filename),   AV_OPT_TYPE_STRING,     {.str = NULL},  CHAR_MIN, CHAR_MAX, FLAGS }, \
> +    {"f",              "set the filename of file to read",                         OFFSET(filename),   AV_OPT_TYPE_STRING,     {.str = NULL},  CHAR_MIN, CHAR_MAX, FLAGS }, \
> +    {"original_size",  "set the size of the original video (used to scale fonts)", OFFSET(original_w), AV_OPT_TYPE_IMAGE_SIZE, {.str = NULL},  CHAR_MIN, CHAR_MAX, FLAGS }, \
> +    {"fontsdir",       "set the directory containing the fonts to read",           OFFSET(fontsdir),   AV_OPT_TYPE_STRING,     {.str = NULL},  CHAR_MIN, CHAR_MAX, FLAGS }, \
> +    {"alpha",          "enable processing of alpha channel",                       OFFSET(alpha),      AV_OPT_TYPE_BOOL,       {.i64 = 0   },         0,        1, FLAGS }, \
> +
> +/* libass supports a log level ranging from 0 to 7 */
> +static const int ass_libavfilter_log_level_map[] = {
> +    [0] = AV_LOG_FATAL,     /* MSGL_FATAL */
> +    [1] = AV_LOG_ERROR,     /* MSGL_ERR */
> +    [2] = AV_LOG_WARNING,   /* MSGL_WARN */
> +    [3] = AV_LOG_WARNING,   /* <undefined> */
> +    [4] = AV_LOG_INFO,      /* MSGL_INFO */
> +    [5] = AV_LOG_INFO,      /* <undefined> */
> +    [6] = AV_LOG_VERBOSE,   /* MSGL_V */
> +    [7] = AV_LOG_DEBUG,     /* MSGL_DBG2 */
> +};
> +
> +static void ass_log(int ass_level, const char *fmt, va_list args, void *ctx)
> +{
> +    const int ass_level_clip = av_clip(ass_level, 0,
> +        FF_ARRAY_ELEMS(ass_libavfilter_log_level_map) - 1);
> +    const int level = ass_libavfilter_log_level_map[ass_level_clip];
> +
> +    av_vlog(ctx, level, fmt, args);
> +    av_log(ctx, level, "\n");
> +}
> +
> +static av_cold int init(AVFilterContext *ctx)
> +{
> +    AssContext *ass = ctx->priv;
> +
> +    if (!ass->filename) {
> +        av_log(ctx, AV_LOG_ERROR, "No filename provided!\n");
> +        return AVERROR(EINVAL);
> +    }
> +
> +    ass->library = ass_library_init();
> +    if (!ass->library) {
> +        av_log(ctx, AV_LOG_ERROR, "Could not initialize libass.\n");
> +        return AVERROR(EINVAL);
> +    }
> +    ass_set_message_cb(ass->library, ass_log, ctx);
> +
> +    ass_set_fonts_dir(ass->library, ass->fontsdir);
> +
> +    ass->renderer = ass_renderer_init(ass->library);
> +    if (!ass->renderer) {
> +        av_log(ctx, AV_LOG_ERROR, "Could not initialize libass renderer.\n");
> +        return AVERROR(EINVAL);
> +    }
> +
> +    return 0;
> +}
> +
> +static av_cold void uninit(AVFilterContext *ctx)
> +{
> +    AssContext *ass = ctx->priv;
> +
> +    if (ass->track)
> +        ass_free_track(ass->track);
> +    if (ass->renderer)
> +        ass_renderer_done(ass->renderer);
> +    if (ass->library)
> +        ass_library_done(ass->library);
> +}
> +
> +static int query_formats(AVFilterContext *ctx)
> +{
> +    return ff_set_common_formats(ctx, ff_draw_supported_pixel_formats(0));
> +}
> +
> +static int config_input(AVFilterLink *inlink)
> +{
> +    AssContext *ass = inlink->dst->priv;
> +
> +    ff_draw_init(&ass->draw, inlink->format, ass->alpha ? FF_DRAW_PROCESS_ALPHA : 0);
> +
> +    ass_set_frame_size  (ass->renderer, inlink->w, inlink->h);
> +    if (ass->original_w && ass->original_h)
> +        ass_set_aspect_ratio(ass->renderer, (double)inlink->w / inlink->h,
> +                             (double)ass->original_w / ass->original_h);
> +    if (ass->shaping != -1)
> +        ass_set_shaper(ass->renderer, ass->shaping);
> +
> +    return 0;
> +}
> +
> +/* libass stores an RGBA color in the format RRGGBBTT, where TT is the transparency level */
> +#define AR(c)  ( (c)>>24)
> +#define AG(c)  (((c)>>16)&0xFF)
> +#define AB(c)  (((c)>>8) &0xFF)
> +#define AA(c)  ((0xFF-(c)) &0xFF)
> +
> +static void overlay_ass_image(AssContext *ass, AVFrame *picref,
> +                              const ASS_Image *image)
> +{
> +    for (; image; image = image->next) {
> +        uint8_t rgba_color[] = {AR(image->color), AG(image->color), AB(image->color), AA(image->color)};
> +        FFDrawColor color;
> +        ff_draw_color(&ass->draw, &color, rgba_color);
> +        ff_blend_mask(&ass->draw, &color,
> +                      picref->data, picref->linesize,
> +                      picref->width, picref->height,
> +                      image->bitmap, image->stride, image->w, image->h,
> +                      3, 0, image->dst_x, image->dst_y);
> +    }
> +}
> +
> +static int filter_frame(AVFilterLink *inlink, AVFrame *picref)
> +{
> +    AVFilterContext *ctx = inlink->dst;
> +    AVFilterLink *outlink = ctx->outputs[0];
> +    AssContext *ass = ctx->priv;
> +    int detect_change = 0;
> +    double time_ms = picref->pts * av_q2d(inlink->time_base) * 1000;
> +    ASS_Image *image = ass_render_frame(ass->renderer, ass->track,
> +                                        time_ms, &detect_change);
> +
> +    if (detect_change)
> +        av_log(ctx, AV_LOG_DEBUG, "Change happened at time ms:%f\n", time_ms);
> +
> +    overlay_ass_image(ass, picref, image);
> +
> +    return ff_filter_frame(outlink, picref);
> +}
> +
> +static const AVFilterPad ass_inputs[] = {
> +    {
> +        .name             = "default",
> +        .type             = AVMEDIA_TYPE_VIDEO,
> +        .filter_frame     = filter_frame,
> +        .config_props     = config_input,
> +        .needs_writable   = 1,
> +    },
> +    { NULL }
> +};
> +
> +static const AVFilterPad ass_outputs[] = {
> +    {
> +        .name = "default",
> +        .type = AVMEDIA_TYPE_VIDEO,
> +    },
> +    { NULL }
> +};
> +
> +
> +
> +
> +
> +static const AVOption hellosubs_options[] = {
> +    COMMON_OPTIONS
> +    {"charenc",      "set input character encoding", OFFSET(charenc),      AV_OPT_TYPE_STRING, {.str = NULL}, CHAR_MIN, CHAR_MAX, FLAGS},
> +    {"stream_index", "set stream index",             OFFSET(stream_index), AV_OPT_TYPE_INT,    { .i64 = -1 }, -1,       INT_MAX,  FLAGS},
> +    {"si",           "set stream index",             OFFSET(stream_index), AV_OPT_TYPE_INT,    { .i64 = -1 }, -1,       INT_MAX,  FLAGS},
> +    {"force_style",  "force subtitle style",         OFFSET(force_style),  AV_OPT_TYPE_STRING, {.str = NULL}, CHAR_MIN, CHAR_MAX, FLAGS},
> +    {NULL},
> +};
> +
> +static const char * const font_mimetypes[] = {
> +    "application/x-truetype-font",
> +    "application/vnd.ms-opentype",
> +    "application/x-font-ttf",
> +    NULL
> +};
> +
> +static int attachment_is_font(AVStream * st)
> +{
> +    const AVDictionaryEntry *tag = NULL;
> +    int n;
> +
> +    tag = av_dict_get(st->metadata, "mimetype", NULL, AV_DICT_MATCH_CASE);
> +
> +    if (tag) {
> +        for (n = 0; font_mimetypes[n]; n++) {
> +            if (av_strcasecmp(font_mimetypes[n], tag->value) == 0)
> +                return 1;
> +        }
> +    }
> +    return 0;
> +}
> +
> +AVFILTER_DEFINE_CLASS(hellosubs);
> +
> +static av_cold int init_hellosubs(AVFilterContext *ctx)
> +{
> +    int j, ret, sid;long int z=0;int t1=0;
> +    int k = 0;
> +    AVDictionary *codec_opts = NULL;
> +    AVFormatContext *fmt = NULL;
> +    AVCodecContext *dec_ctx = NULL;
> +    AVCodec *dec = NULL;
> +    const AVCodecDescriptor *dec_desc;
> +    AVStream *st;
> +    AVPacket pkt;
> +    AssContext *ass = ctx->priv;

> +    FILE *file;
> +    if ((file = fopen("hello.srt", "r")))

there is no need for accessing an external file for the task of
drawing a line of text.


> +    {
> +        fclose(file);
> +        
> +    }
> +    else
> +   {
> +   FILE * fp;
> +   fp = fopen ("hello.srt","w");

thats even more true for writing such file.
It also would not work predictable with multiple filters


> +   fprintf (fp, "1\n");
> +   fprintf (fp, "00:00:05,615 --> 00:00:08,083\n");
> +   fprintf (fp, "%s",ass->filename);
> +   fclose (fp);
> +
> +   char cmd[300];
> +   strcpy(cmd,"ffmpeg -i ");
> +   strcat(cmd,ass->filename);
> +   char fn[200];
> +   strcpy(fn,ass->filename);
> +   strcat(cmd," -vf hellosubs=hello.srt helloout");
> +   int m=0;
> +   for(int w=(strlen(fn)-1);w>=0;w--)
> +   {if (fn[w]=='.')
> +   {m=w;
> +   break;}}
> +   char join[5];
> +   for(int loc=m;loc<strlen(fn);loc++)
> +   join[loc-m]=fn[loc];
> +   char rem[100];
> +   char join1[100];
> +   strcpy(join1,join);
> +   strcpy(rem,"helloout");
> +   strcat(rem,join1);
> +   remove(rem);
> +  
> +   strcat(cmd,join);
> +   system(cmd);
> +   remove("hello.srt");
> +
> +exit(0);

also a filter cannot call exit(), in fact a library like libavfilter must not
call exit()


> +}
> +
> +    /* Init libass */
> +    ret = init(ctx);
> +    if (ret < 0)
> +        return ret;
> +    ass->track = ass_new_track(ass->library);
> +    if (!ass->track) {
> +        av_log(ctx, AV_LOG_ERROR, "Could not create a libass track\n");
> +        return AVERROR(EINVAL);
> +    }
> +
> +

> +    ret = avformat_open_input(&fmt, ass->filename, NULL, NULL);
> +    if (ret < 0) {
> +        av_log(ctx, AV_LOG_ERROR, "Unable to open %s\n", ass->filename);
> +        
> +    }

also no function from libavformat is needed, this filter should draw a line of
text, not demux a file.
You maybe misinterpredted my previous review. All unneeded code like every bit of
libavformat use must be removed.

You seem to be trying to workaround what i suggest not actually solve the issues
raised.
Like writing a file to replace the impossibility of accessing some input file
directly. There really is no file and none can be written. 

The goal of this filter was to create subtitle packets/frames and pass them on.
As this turned out too hard in the time available. The simpler goal now is to
draw that text on a video frame.

The filter gets video frames on its input and it passes them on to the output.
In there it should write that Hello world text with the advancing number onto
it
For this there is no need to access any files, or use any demuxers.
you can use the libass code from the subtitle filter as you do but that code
uses a external subtitle file. You have to change this so it no longer uses a
external file or demuxes this with libavformat. These steps are not needed
and are incorrect for this task

i suggest you remove "include "libavformat *" that way you will see
exactly what must be removed
and this should make the code simpler, it just isnt needed to have this
baggage between the avcodec/libass and what you want to draw

the libavformat code is there to read a subtitle file.
There is no subtitle file. The filter should just draw a line saying
hello world with a number.


[...]

-- 
Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

Dictatorship: All citizens are under surveillance, all their steps and
actions recorded, for the politicians to enforce control.
Democracy: All politicians are under surveillance, all their steps and
actions recorded, for the citizens to enforce control.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 181 bytes
Desc: not available
URL: <http://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20180413/d855d306/attachment.sig>


More information about the ffmpeg-devel mailing list