[FFmpeg-devel] swscale/swscale_unscaled : add X86_64 (SSE2, AVX) for uyvyto422

Hendrik Leppkes h.leppkes at gmail.com
Tue Apr 3 12:20:35 EEST 2018


On Tue, Apr 3, 2018 at 2:10 AM, James Almer <jamrial at gmail.com> wrote:
> On 4/2/2018 8:33 PM, Carl Eugen Hoyos wrote:
>> 2018-04-02 23:26 GMT+02:00, Martin Vignali <martin.vignali at gmail.com>:
>>
>>> Around 20% faster  (on a "benchmark cmd", who test pix_fmt conversion)
>>> (4.2s with the patch, 5.2s without)
>>>
>>> Pass fate test for me.
>>>
>>> Checkasm result :
>>> uyvytoyuv422_c: 14146.6
>>> uyvytoyuv422_mmx: 13696.4
>>> uyvytoyuv422_mmxext: 19395.9
>>
>> Something looks wrong here...
>>
>> Carl Eugen
>
> On a Haswell using GCC i get
>
> uyvytoyuv422_c: 44884.2
> uyvytoyuv422_mmx: 15284.5
> uyvytoyuv422_mmxext: 28656.5
> uyvytoyuv422_sse2: 10921.8
> uyvytoyuv422_avx: 10606.5
>
> Martin is using a Clang version that is for some reason ignoring our
> attempts at disabling tree vectorization, so his C function is optimized
> with simd by the compiler, hence the good result.
>
> The mmxext version being slower than the mmx one seems however to be an
> existing issue in the tree, which we should probably deal with. Unless
> of course the test is wrong.

Its mmx, dealing with it would probably entail just deleting it. Can
leave the ordinary mmx and remove the ext version, or perhaps just
both.

- Hendrik


More information about the ffmpeg-devel mailing list