[FFmpeg-devel] GSOC 2018 qualification task.

Sun Apr 15 17:06:09 EEST 2018

Hello Sir,

I have implemented the adviced changes for the hellosubs filter for the
qualification task which writes Hello World time on frames, now the filter
does not uses libavformat, and it uses libfreetype to draw over video
frames. I have attached the complete patch.

libfretype and libfontconfig should be enabled to run the filter.
(libfontconfig if no font file is provided.)

Command to run the filter
ffmpeg -i <videoname> -vf hellosubs <outputfilename>

Thanks and regards,
Anurag Singh.

‌

On Fri, Apr 13, 2018 at 9:39 AM, ANURAG SINGH IIT BHU <
anurag.singh.phy15 at iitbhu.ac.in> wrote:

> Thank you sir, I'll implement the suggested reviews as soon as possible.
>
>
>
>
> ‌
>
> On Fri, Apr 13, 2018 at 4:04 AM, Michael Niedermayer <
> michael at niedermayer.cc> wrote:
>
>> On Fri, Apr 13, 2018 at 02:13:53AM +0530, ANURAG SINGH IIT BHU wrote:
>> > Hello,
>> > I have implemented the reviews mentioned on previous patch, now there
>> is no
>> > need to provide any subtitle file to the filter, I am attaching the
>> > complete patch of the hellosubs filter.
>> >
>> > Command to run the filter
>> > ffmpeg -i <videoname> -vf hellosubs=<videoname> helloout.mp4
>> >
>> >
>> > Thanks and regards,
>> > Anurag Singh.
>> >
>> >
>> > ‌
>> >
>> > On Tue, Apr 10, 2018 at 4:55 AM, Rostislav Pehlivanov <
>> atomnuker at gmail.com>
>> > wrote:
>> >
>> > > On 9 April 2018 at 19:10, Paul B Mahol <onemda at gmail.com> wrote:
>> > >
>> > > > On 4/9/18, Rostislav Pehlivanov <atomnuker at gmail.com> wrote:
>> > > > > On 9 April 2018 at 03:59, ANURAG SINGH IIT BHU <
>> > > > > anurag.singh.phy15 at iitbhu.ac.in> wrote:
>> > > > >
>> > > > >> This mail is regarding the qualification task assigned to me for
>> the
>> > > > >> GSOC project
>> > > > >> in FFmpeg for automatic real-time subtitle generation using
>> speech to
>> > > > text
>> > > > >> translation ML model.
>> > > > >>
>> > > > >
>> > > > > i really don't think lavfi is the correct place for such code,
>> nor that
>> > > > the
>> > > > > project's repo should contain such code at all.
>> > > > > This would need to be in another repo and a separate library.
>> > > >
>> > > > Why? Are you against ocr filter too?
>> > > >
>> > >
>> > > The OCR filter uses libtessract so I'm fine with it. Like I said, as
>> long
>> > > as the actual code to do it is in an external library I don't mind.
>> > > Mozilla recently released Deep Speech (https://github.com/mozilla/
>> > > DeepSpeech)
>> > > which does pretty much exactly speech to text and is considered to
>> have the
>> > > most accurate one out there. Someone just needs to convert the
>> tensorflow
>> > > code to something more usable.
>> > > _______________________________________________
>> > > ffmpeg-devel mailing list
>> > > ffmpeg-devel at ffmpeg.org
>> > > http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>> > >
>>
>> >  Makefile       |    1
>> >  allfilters.c   |    1
>> >  vf_hellosubs.c |  513 ++++++++++++++++++++++++++++++
>> +++++++++++++++++++++++++++
>> >  3 files changed, 515 insertions(+)
>> > 2432f100fddb7ec84e771be8282d4b66e3d1f50a
>> 0001-avfilter-add-hellosubs-filter.patch
>> > From ac0e09d431ea68aebfaef6e2ed0b450e76d473d9 Mon Sep 17 00:00:00 2001
>> > From: ddosvulnerability <anurag.singh.phy15 at iitbhu.ac.in>
>> > Date: Thu, 12 Apr 2018 22:06:43 +0530
>> > Subject: [PATCH] avfilter: add hellosubs filter.
>> >
>> > ---
>> >  libavfilter/Makefile       |   1 +
>> >  libavfilter/allfilters.c   |   1 +
>> >  libavfilter/vf_hellosubs.c | 513 ++++++++++++++++++++++++++++++
>> +++++++++++++++
>> >  3 files changed, 515 insertions(+)
>> >  create mode 100644 libavfilter/vf_hellosubs.c
>> >
>> > diff --git a/libavfilter/Makefile b/libavfilter/Makefile
>> > index a90ca30..770b1b5 100644
>> > --- a/libavfilter/Makefile
>> > +++ b/libavfilter/Makefile
>> > @@ -331,6 +331,7 @@ OBJS-$(CONFIG_SSIM_FILTER)                   +=
>> vf_ssim.o framesync.o
>> >  OBJS-$(CONFIG_STEREO3D_FILTER)               += vf_stereo3d.o
>> >  OBJS-$(CONFIG_STREAMSELECT_FILTER)           += f_streamselect.o
>> framesync.o
>> >  OBJS-$(CONFIG_SUBTITLES_FILTER)              += vf_subtitles.o
>> > +OBJS-$(CONFIG_HELLOSUBS_FILTER)              += vf_hellosubs.o
>> >  OBJS-$(CONFIG_SUPER2XSAI_FILTER)             += vf_super2xsai.o
>> >  OBJS-$(CONFIG_SWAPRECT_FILTER)               += vf_swaprect.o
>> >  OBJS-$(CONFIG_SWAPUV_FILTER)                 += vf_swapuv.o
>> > diff --git a/libavfilter/allfilters.c b/libavfilter/allfilters.c
>> > index 6eac828..a008908 100644
>> > --- a/libavfilter/allfilters.c
>> > +++ b/libavfilter/allfilters.c
>> > @@ -322,6 +322,7 @@ extern AVFilter ff_vf_ssim;
>> >  extern AVFilter ff_vf_stereo3d;
>> >  extern AVFilter ff_vf_streamselect;
>> >  extern AVFilter ff_vf_subtitles;
>> > +extern AVFilter ff_vf_hellosubs;
>> >  extern AVFilter ff_vf_super2xsai;
>> >  extern AVFilter ff_vf_swaprect;
>> >  extern AVFilter ff_vf_swapuv;
>> > diff --git a/libavfilter/vf_hellosubs.c b/libavfilter/vf_hellosubs.c
>> > new file mode 100644
>> > index 0000000..b994050
>> > --- /dev/null
>> > +++ b/libavfilter/vf_hellosubs.c
>> > @@ -0,0 +1,513 @@
>> > +/*
>> > + * Copyright (c) 2011 Baptiste Coudurier
>> > + * Copyright (c) 2011 Stefano Sabatini
>> > + * Copyright (c) 2012 Clément Bœsch
>> > + *
>> > + * This file is part of FFmpeg.
>> > + *
>> > + * FFmpeg is free software; you can redistribute it and/or
>> > + * modify it under the terms of the GNU Lesser General Public
>> > + * License as published by the Free Software Foundation; either
>> > + * version 2.1 of the License, or (at your option) any later version.
>> > + *
>> > + * FFmpeg is distributed in the hope that it will be useful,
>> > + * but WITHOUT ANY WARRANTY; without even the implied warranty of
>> > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
>> > + * Lesser General Public License for more details.
>> > + *
>> > + * You should have received a copy of the GNU Lesser General Public
>> > + * License along with FFmpeg; if not, write to the Free Software
>> > + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA
>> 02110-1301 USA
>> > + */
>> > +
>> > +/**
>> > + * @file
>> > + * Libass hellosubs burning filter.
>> > + *
>> > +
>> > + */
>> > +
>> > +#include <ass/ass.h>
>> > +
>> > +#include "config.h"
>> > +#if CONFIG_SUBTITLES_FILTER
>> > +# include "libavcodec/avcodec.h"
>> > +# include "libavformat/avformat.h"
>> > +#endif
>> > +#include "libavutil/avstring.h"
>> > +#include "libavutil/imgutils.h"
>> > +#include "libavutil/opt.h"
>> > +#include "libavutil/parseutils.h"
>> > +#include "drawutils.h"
>> > +#include "avfilter.h"
>> > +#include "internal.h"
>> > +#include "formats.h"
>> > +#include "video.h"
>> > +#include <stdio.h>
>> > +#include <stdlib.h>
>> > +#include <string.h>
>> > +
>> > +typedef struct AssContext {
>> > +    const AVClass *class;
>> > +    ASS_Library  *library;
>> > +    ASS_Renderer *renderer;
>> > +    ASS_Track    *track;
>> > +    char *filename;
>> > +    char *fontsdir;
>> > +    char *charenc;
>> > +    char *force_style;
>> > +    int stream_index;
>> > +    int alpha;
>> > +    uint8_t rgba_map[4];
>> > +    int     pix_step[4];       ///< steps per pixel for each plane of
>> the main output
>> > +    int original_w, original_h;
>> > +    int shaping;
>> > +    FFDrawContext draw;
>> > +} AssContext;
>> > +
>> > +#define OFFSET(x) offsetof(AssContext, x)
>> > +#define FLAGS AV_OPT_FLAG_FILTERING_PARAM|AV_OPT_FLAG_VIDEO_PARAM
>> > +
>> > +#define COMMON_OPTIONS \
>> > +    {"filename",       "set the filename of file to read",
>>              OFFSET(filename),   AV_OPT_TYPE_STRING,     {.str = NULL},
>> CHAR_MIN, CHAR_MAX, FLAGS }, \
>> > +    {"f",              "set the filename of file to read",
>>              OFFSET(filename),   AV_OPT_TYPE_STRING,     {.str = NULL},
>> CHAR_MIN, CHAR_MAX, FLAGS }, \
>> > +    {"original_size",  "set the size of the original video (used to
>> scale fonts)", OFFSET(original_w), AV_OPT_TYPE_IMAGE_SIZE, {.str = NULL},
>> CHAR_MIN, CHAR_MAX, FLAGS }, \
>> > +    {"fontsdir",       "set the directory containing the fonts to
>> read",           OFFSET(fontsdir),   AV_OPT_TYPE_STRING,     {.str =
>> NULL},  CHAR_MIN, CHAR_MAX, FLAGS }, \
>> > +    {"alpha",          "enable processing of alpha channel",
>>              OFFSET(alpha),      AV_OPT_TYPE_BOOL,       {.i64 = 0   },
>>      0,        1, FLAGS }, \
>> > +
>> > +/* libass supports a log level ranging from 0 to 7 */
>> > +static const int ass_libavfilter_log_level_map[] = {
>> > +    [0] = AV_LOG_FATAL,     /* MSGL_FATAL */
>> > +    [1] = AV_LOG_ERROR,     /* MSGL_ERR */
>> > +    [2] = AV_LOG_WARNING,   /* MSGL_WARN */
>> > +    [3] = AV_LOG_WARNING,   /* <undefined> */
>> > +    [4] = AV_LOG_INFO,      /* MSGL_INFO */
>> > +    [5] = AV_LOG_INFO,      /* <undefined> */
>> > +    [6] = AV_LOG_VERBOSE,   /* MSGL_V */
>> > +    [7] = AV_LOG_DEBUG,     /* MSGL_DBG2 */
>> > +};
>> > +
>> > +static void ass_log(int ass_level, const char *fmt, va_list args, void
>> *ctx)
>> > +{
>> > +    const int ass_level_clip = av_clip(ass_level, 0,
>> > +        FF_ARRAY_ELEMS(ass_libavfilter_log_level_map) - 1);
>> > +    const int level = ass_libavfilter_log_level_map[ass_level_clip];
>> > +
>> > +    av_vlog(ctx, level, fmt, args);
>> > +    av_log(ctx, level, "\n");
>> > +}
>> > +
>> > +static av_cold int init(AVFilterContext *ctx)
>> > +{
>> > +    AssContext *ass = ctx->priv;
>> > +
>> > +    if (!ass->filename) {
>> > +        av_log(ctx, AV_LOG_ERROR, "No filename provided!\n");
>> > +        return AVERROR(EINVAL);
>> > +    }
>> > +
>> > +    ass->library = ass_library_init();
>> > +    if (!ass->library) {
>> > +        av_log(ctx, AV_LOG_ERROR, "Could not initialize libass.\n");
>> > +        return AVERROR(EINVAL);
>> > +    }
>> > +    ass_set_message_cb(ass->library, ass_log, ctx);
>> > +
>> > +    ass_set_fonts_dir(ass->library, ass->fontsdir);
>> > +
>> > +    ass->renderer = ass_renderer_init(ass->library);
>> > +    if (!ass->renderer) {
>> > +        av_log(ctx, AV_LOG_ERROR, "Could not initialize libass
>> renderer.\n");
>> > +        return AVERROR(EINVAL);
>> > +    }
>> > +
>> > +    return 0;
>> > +}
>> > +
>> > +static av_cold void uninit(AVFilterContext *ctx)
>> > +{
>> > +    AssContext *ass = ctx->priv;
>> > +
>> > +    if (ass->track)
>> > +        ass_free_track(ass->track);
>> > +    if (ass->renderer)
>> > +        ass_renderer_done(ass->renderer);
>> > +    if (ass->library)
>> > +        ass_library_done(ass->library);
>> > +}
>> > +
>> > +static int query_formats(AVFilterContext *ctx)
>> > +{
>> > +    return ff_set_common_formats(ctx, ff_draw_supported_pixel_format
>> s(0));
>> > +}
>> > +
>> > +static int config_input(AVFilterLink *inlink)
>> > +{
>> > +    AssContext *ass = inlink->dst->priv;
>> > +
>> > +    ff_draw_init(&ass->draw, inlink->format, ass->alpha ?
>> FF_DRAW_PROCESS_ALPHA : 0);
>> > +
>> > +    ass_set_frame_size  (ass->renderer, inlink->w, inlink->h);
>> > +    if (ass->original_w && ass->original_h)
>> > +        ass_set_aspect_ratio(ass->renderer, (double)inlink->w /
>> inlink->h,
>> > +                             (double)ass->original_w /
>> ass->original_h);
>> > +    if (ass->shaping != -1)
>> > +        ass_set_shaper(ass->renderer, ass->shaping);
>> > +
>> > +    return 0;
>> > +}
>> > +
>> > +/* libass stores an RGBA color in the format RRGGBBTT, where TT is the
>> transparency level */
>> > +#define AR(c)  ( (c)>>24)
>> > +#define AG(c)  (((c)>>16)&0xFF)
>> > +#define AB(c)  (((c)>>8) &0xFF)
>> > +#define AA(c)  ((0xFF-(c)) &0xFF)
>> > +
>> > +static void overlay_ass_image(AssContext *ass, AVFrame *picref,
>> > +                              const ASS_Image *image)
>> > +{
>> > +    for (; image; image = image->next) {
>> > +        uint8_t rgba_color[] = {AR(image->color), AG(image->color),
>> AB(image->color), AA(image->color)};
>> > +        FFDrawColor color;
>> > +        ff_draw_color(&ass->draw, &color, rgba_color);
>> > +        ff_blend_mask(&ass->draw, &color,
>> > +                      picref->data, picref->linesize,
>> > +                      picref->width, picref->height,
>> > +                      image->bitmap, image->stride, image->w, image->h,
>> > +                      3, 0, image->dst_x, image->dst_y);
>> > +    }
>> > +}
>> > +
>> > +static int filter_frame(AVFilterLink *inlink, AVFrame *picref)
>> > +{
>> > +    AVFilterContext *ctx = inlink->dst;
>> > +    AVFilterLink *outlink = ctx->outputs[0];
>> > +    AssContext *ass = ctx->priv;
>> > +    int detect_change = 0;
>> > +    double time_ms = picref->pts * av_q2d(inlink->time_base) * 1000;
>> > +    ASS_Image *image = ass_render_frame(ass->renderer, ass->track,
>> > +                                        time_ms, &detect_change);
>> > +
>> > +    if (detect_change)
>> > +        av_log(ctx, AV_LOG_DEBUG, "Change happened at time ms:%f\n",
>> time_ms);
>> > +
>> > +    overlay_ass_image(ass, picref, image);
>> > +
>> > +    return ff_filter_frame(outlink, picref);
>> > +}
>> > +
>> > +static const AVFilterPad ass_inputs[] = {
>> > +    {
>> > +        .name             = "default",
>> > +        .type             = AVMEDIA_TYPE_VIDEO,
>> > +        .filter_frame     = filter_frame,
>> > +        .config_props     = config_input,
>> > +        .needs_writable   = 1,
>> > +    },
>> > +    { NULL }
>> > +};
>> > +
>> > +static const AVFilterPad ass_outputs[] = {
>> > +    {
>> > +        .name = "default",
>> > +        .type = AVMEDIA_TYPE_VIDEO,
>> > +    },
>> > +    { NULL }
>> > +};
>> > +
>> > +
>> > +
>> > +
>> > +
>> > +static const AVOption hellosubs_options[] = {
>> > +    COMMON_OPTIONS
>> > +    {"charenc",      "set input character encoding", OFFSET(charenc),
>>     AV_OPT_TYPE_STRING, {.str = NULL}, CHAR_MIN, CHAR_MAX, FLAGS},
>> > +    {"stream_index", "set stream index",
>>  OFFSET(stream_index), AV_OPT_TYPE_INT,    { .i64 = -1 }, -1,
>>  INT_MAX,  FLAGS},
>> > +    {"si",           "set stream index",
>>  OFFSET(stream_index), AV_OPT_TYPE_INT,    { .i64 = -1 }, -1,
>>  INT_MAX,  FLAGS},
>> > +    {"force_style",  "force subtitle style",
>>  OFFSET(force_style),  AV_OPT_TYPE_STRING, {.str = NULL}, CHAR_MIN,
>> CHAR_MAX, FLAGS},
>> > +    {NULL},
>> > +};
>> > +
>> > +static const char * const font_mimetypes[] = {
>> > +    "application/x-truetype-font",
>> > +    "application/vnd.ms-opentype",
>> > +    "application/x-font-ttf",
>> > +    NULL
>> > +};
>> > +
>> > +static int attachment_is_font(AVStream * st)
>> > +{
>> > +    const AVDictionaryEntry *tag = NULL;
>> > +    int n;
>> > +
>> > +    tag = av_dict_get(st->metadata, "mimetype", NULL,
>> AV_DICT_MATCH_CASE);
>> > +
>> > +    if (tag) {
>> > +        for (n = 0; font_mimetypes[n]; n++) {
>> > +            if (av_strcasecmp(font_mimetypes[n], tag->value) == 0)
>> > +                return 1;
>> > +        }
>> > +    }
>> > +    return 0;
>> > +}
>> > +
>> > +AVFILTER_DEFINE_CLASS(hellosubs);
>> > +
>> > +static av_cold int init_hellosubs(AVFilterContext *ctx)
>> > +{
>> > +    int j, ret, sid;long int z=0;int t1=0;
>> > +    int k = 0;
>> > +    AVDictionary *codec_opts = NULL;
>> > +    AVFormatContext *fmt = NULL;
>> > +    AVCodecContext *dec_ctx = NULL;
>> > +    AVCodec *dec = NULL;
>> > +    const AVCodecDescriptor *dec_desc;
>> > +    AVStream *st;
>> > +    AVPacket pkt;
>> > +    AssContext *ass = ctx->priv;
>>
>> > +    FILE *file;
>> > +    if ((file = fopen("hello.srt", "r")))
>>
>> there is no need for accessing an external file for the task of
>> drawing a line of text.
>>
>>
>> > +    {
>> > +        fclose(file);
>> > +
>> > +    }
>> > +    else
>> > +   {
>> > +   FILE * fp;
>> > +   fp = fopen ("hello.srt","w");
>>
>> thats even more true for writing such file.
>> It also would not work predictable with multiple filters
>>
>>
>> > +   fprintf (fp, "1\n");
>> > +   fprintf (fp, "00:00:05,615 --> 00:00:08,083\n");
>> > +   fprintf (fp, "%s",ass->filename);
>> > +   fclose (fp);
>> > +
>> > +   char cmd[300];
>> > +   strcpy(cmd,"ffmpeg -i ");
>> > +   strcat(cmd,ass->filename);
>> > +   char fn[200];
>> > +   strcpy(fn,ass->filename);
>> > +   strcat(cmd," -vf hellosubs=hello.srt helloout");
>> > +   int m=0;
>> > +   for(int w=(strlen(fn)-1);w>=0;w--)
>> > +   {if (fn[w]=='.')
>> > +   {m=w;
>> > +   break;}}
>> > +   char join[5];
>> > +   for(int loc=m;loc<strlen(fn);loc++)
>> > +   join[loc-m]=fn[loc];
>> > +   char rem[100];
>> > +   char join1[100];
>> > +   strcpy(join1,join);
>> > +   strcpy(rem,"helloout");
>> > +   strcat(rem,join1);
>> > +   remove(rem);
>> > +
>> > +   strcat(cmd,join);
>> > +   system(cmd);
>> > +   remove("hello.srt");
>> > +
>> > +exit(0);
>>
>> also a filter cannot call exit(), in fact a library like libavfilter must
>> not
>> call exit()
>>
>>
>> > +}
>> > +
>> > +    /* Init libass */
>> > +    ret = init(ctx);
>> > +    if (ret < 0)
>> > +        return ret;
>> > +    ass->track = ass_new_track(ass->library);
>> > +    if (!ass->track) {
>> > +        av_log(ctx, AV_LOG_ERROR, "Could not create a libass track\n");
>> > +        return AVERROR(EINVAL);
>> > +    }
>> > +
>> > +
>>
>> > +    ret = avformat_open_input(&fmt, ass->filename, NULL, NULL);
>> > +    if (ret < 0) {
>> > +        av_log(ctx, AV_LOG_ERROR, "Unable to open %s\n",
>> ass->filename);
>> > +
>> > +    }
>>
>> also no function from libavformat is needed, this filter should draw a
>> line of
>> text, not demux a file.
>> You maybe misinterpredted my previous review. All unneeded code like
>> every bit of
>> libavformat use must be removed.
>>
>> You seem to be trying to workaround what i suggest not actually solve the
>> issues
>> raised.
>> Like writing a file to replace the impossibility of accessing some input
>> file
>> directly. There really is no file and none can be written.
>>
>> The goal of this filter was to create subtitle packets/frames and pass
>> them on.
>> As this turned out too hard in the time available. The simpler goal now
>> is to
>> draw that text on a video frame.
>>
>> The filter gets video frames on its input and it passes them on to the
>> output.
>> In there it should write that Hello world text with the advancing number
>> onto
>> it
>> For this there is no need to access any files, or use any demuxers.
>> you can use the libass code from the subtitle filter as you do but that
>> code
>> uses a external subtitle file. You have to change this so it no longer
>> uses a
>> external file or demuxes this with libavformat. These steps are not needed
>> and are incorrect for this task
>>
>> i suggest you remove "include "libavformat *" that way you will see
>> exactly what must be removed
>> and this should make the code simpler, it just isnt needed to have this
>> baggage between the avcodec/libass and what you want to draw
>>
>> the libavformat code is there to read a subtitle file.
>> There is no subtitle file. The filter should just draw a line saying
>> hello world with a number.
>>
>>
>> [...]
>>
>> --
>> Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB
>>
>> Dictatorship: All citizens are under surveillance, all their steps and
>> actions recorded, for the politicians to enforce control.
>> Democracy: All politicians are under surveillance, all their steps and
>> actions recorded, for the citizens to enforce control.
>>
>> _______________________________________________
>> ffmpeg-devel mailing list
>> ffmpeg-devel at ffmpeg.org
>> http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>>
>>
>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0001-avfilter-add-hellosubs-filter.patch
Type: text/x-patch
Size: 35289 bytes
Desc: not available
URL: <http://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20180415/15c84f3e/attachment.bin>