[FFmpeg-devel] [PATCH 4/5] x86/opusdsp: implement FMA3 accelerated postfilter and deemphasis

James Almer jamrial at gmail.com
Tue Apr 2 17:39:50 EEST 2019


On 4/2/2019 9:16 AM, Hendrik Leppkes wrote:
> On Mon, Apr 1, 2019 at 4:59 PM James Almer <jamrial at gmail.com> wrote:
>>
>> On 4/1/2019 9:13 AM, Lynne wrote:
>>> Mar 31, 2019, 11:49 PM by jamrial at gmail.com:
>>>
>>>> A float ret value needs to be in xmm0, and you swapped m0 with m2 on
>>>> Win64. This is the source of the fate failure.
>>>>
>>> Attached a patch to fix this.
>>>
>>>> %if WIN64
>>>> -    SWAP 0, 2
>>>> -%endif
>>>> +    shufps m0, m2, 0
>>>> +%else
>>>>       shufps m0, m0, 0
>>>> +%endif
>>>> %endif
>>>
>>> 0001-x86-opusdsp-fix-WIN64-return-value.patch
>>>
>>> From 98e93b6f322a2a9dee7499fe87b74cf50a33b022 Mon Sep 17 00:00:00 2001
>>> From: Lynne <dev at lynne.ee>
>>> Date: Mon, 1 Apr 2019 13:06:34 +0100
>>> Subject: [PATCH] x86/opusdsp: fix WIN64 return value
>>>
>>> ---
>>>  libavcodec/x86/opusdsp.asm | 5 +++--
>>>  1 file changed, 3 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/libavcodec/x86/opusdsp.asm b/libavcodec/x86/opusdsp.asm
>>> index ed65614e06..a7bff4f0b3 100644
>>> --- a/libavcodec/x86/opusdsp.asm
>>> +++ b/libavcodec/x86/opusdsp.asm
>>> @@ -40,9 +40,10 @@ cglobal opus_deemphasis, 4, 4, 8, out, in, coeff, len
>>>      VBROADCASTSS m0, coeffm
>>>  %else
>>>  %if WIN64
>>> -    SWAP 0, 2
>>> -%endif
>>> +    shufps m0, m2, 0
>>
>> No, like this shufps will interleave the values from m2 and m0. The
>> correct instruction would be
>>
>> shufps m0, m2, m2, 0
>>
>> Where m2 is used for both input arguments, and the VEX encoding lets us
>> use another reg as output.
>> Tested and pushed it with that change. Thanks.
> 
> Opus tests still fail on win32, both mingw and msvc.
> 
> - Hendrik

x86_32 regardless of OS was crashing, because of unaligned access.
Should be fixed.


More information about the ffmpeg-devel mailing list