Bullet Physics四元数sse实施疑点(Bullet Physics quaternion sse implementation doubts)

我正在研究四元数SSE实现以了解它们是如何工作的(因为我实现了自己的实现)并且我遇到了这个四元数乘法的Bullet实现:

VECTORMATH_FORCE_INLINE const Quat Quat::operator *( const Quat &quat ) const { __m128 ldata, rdata, qv, tmp0, tmp1, tmp2, tmp3; __m128 product, l_wxyz, r_wxyz, xy, qw; ldata = mVec128; rdata = quat.mVec128; tmp0 = _mm_shuffle_ps( ldata, ldata, _MM_SHUFFLE(3,0,2,1) ); tmp1 = _mm_shuffle_ps( rdata, rdata, _MM_SHUFFLE(3,1,0,2) ); tmp2 = _mm_shuffle_ps( ldata, ldata, _MM_SHUFFLE(3,1,0,2) ); tmp3 = _mm_shuffle_ps( rdata, rdata, _MM_SHUFFLE(3,0,2,1) ); qv = vec_mul( vec_splat( ldata, 3 ), rdata ); qv = vec_madd( vec_splat( rdata, 3 ), ldata, qv ); qv = vec_madd( tmp0, tmp1, qv ); qv = vec_nmsub( tmp2, tmp3, qv ); product = vec_mul( ldata, rdata ); l_wxyz = vec_sld( ldata, ldata, 12 ); r_wxyz = vec_sld( rdata, rdata, 12 ); qw = vec_nmsub( l_wxyz, r_wxyz, product ); xy = vec_madd( l_wxyz, r_wxyz, product ); qw = vec_sub( qw, vec_sld( xy, xy, 8 ) ); VM_ATTRIBUTE_ALIGN16 unsigned int sw[4] = {0, 0, 0, 0xffffffff}; return Quat( vec_sel( qv, qw, sw ) ); }

我关心的是这两行:

l_wxyz = vec_sld( ldata, ldata, 12 ); r_wxyz = vec_sld( rdata, rdata, 12 );

宏实现:

#define _mm_ror_ps(vec,i) \ (((i)%4) ? (_mm_shuffle_ps(vec,vec, _MM_SHUFFLE((unsigned char)(i+3)%4,(unsigned char)(i+2)%4,(unsigned char)(i+1)%4,(unsigned char)(i+0)%4))) : (vec)) #define vec_sld(vec,vec2,x) _mm_ror_ps(vec, ((x)/4))

如果我理解正确,对于一个不能被4整除的数字(3不是[12/4 = 3]), vec_sld宏将减少为:

l_wxyz = ldata;//vec_sld( ldata, ldata, 12 ); r_wxyz = rdata;//vec_sld( rdata, rdata, 12 );

哪个实际上什么都不做。

如果值可以被4整除:

q = vec_sld( x, x, 16 );

宏将减少到:

q = _mm_shuffle_ps( x, x, _MM_SHUFFLE(3,2,1,0) );

这又是什么都不做,因为_MM_SHUFFLE(3,2,1,0)在当前位置留下x,y,z和w。

如果vec_sld没有做任何事情,它的目的是什么?

我错过了什么吗?

编辑:这是源代码来自的两个文件

quat_aos.h (operator *()) vectormath_aos.h (vec_sld和_mm_ror_ps的定义)

I was researching quaternion SSE implementations to understand how they worked (since I'm implementing my own) and I came across this Bullet implementation for quaternion multiplication:

VECTORMATH_FORCE_INLINE const Quat Quat::operator *( const Quat &quat ) const { __m128 ldata, rdata, qv, tmp0, tmp1, tmp2, tmp3; __m128 product, l_wxyz, r_wxyz, xy, qw; ldata = mVec128; rdata = quat.mVec128; tmp0 = _mm_shuffle_ps( ldata, ldata, _MM_SHUFFLE(3,0,2,1) ); tmp1 = _mm_shuffle_ps( rdata, rdata, _MM_SHUFFLE(3,1,0,2) ); tmp2 = _mm_shuffle_ps( ldata, ldata, _MM_SHUFFLE(3,1,0,2) ); tmp3 = _mm_shuffle_ps( rdata, rdata, _MM_SHUFFLE(3,0,2,1) ); qv = vec_mul( vec_splat( ldata, 3 ), rdata ); qv = vec_madd( vec_splat( rdata, 3 ), ldata, qv ); qv = vec_madd( tmp0, tmp1, qv ); qv = vec_nmsub( tmp2, tmp3, qv ); product = vec_mul( ldata, rdata ); l_wxyz = vec_sld( ldata, ldata, 12 ); r_wxyz = vec_sld( rdata, rdata, 12 ); qw = vec_nmsub( l_wxyz, r_wxyz, product ); xy = vec_madd( l_wxyz, r_wxyz, product ); qw = vec_sub( qw, vec_sld( xy, xy, 8 ) ); VM_ATTRIBUTE_ALIGN16 unsigned int sw[4] = {0, 0, 0, 0xffffffff}; return Quat( vec_sel( qv, qw, sw ) ); }

The bit I am concerned about is these two lines:

l_wxyz = vec_sld( ldata, ldata, 12 ); r_wxyz = vec_sld( rdata, rdata, 12 );

Macros implementation:

#define _mm_ror_ps(vec,i) \ (((i)%4) ? (_mm_shuffle_ps(vec,vec, _MM_SHUFFLE((unsigned char)(i+3)%4,(unsigned char)(i+2)%4,(unsigned char)(i+1)%4,(unsigned char)(i+0)%4))) : (vec)) #define vec_sld(vec,vec2,x) _mm_ror_ps(vec, ((x)/4))

If I understand it correctly, for a number that is not divisible by 4 (3 isn't [12/4 = 3]), the vec_sld macro will reduce to:

l_wxyz = ldata;//vec_sld( ldata, ldata, 12 ); r_wxyz = rdata;//vec_sld( rdata, rdata, 12 );

Which is effectively doing nothing.

And if the value is divisible by 4:

q = vec_sld( x, x, 16 );

The macro will reduce to:

q = _mm_shuffle_ps( x, x, _MM_SHUFFLE(3,2,1,0) );

Which, again, is like doing nothing, since _MM_SHUFFLE(3,2,1,0) is leaving x, y, z, and w in their current places.

If vec_sld is not doing anything, what is its purpose?

Am I missing anything?

EDIT: Here are the two files the source code comes from

quat_aos.h (operator*()) vectormath_aos.h (definition of vec_sld and _mm_ror_ps)

最满意答案

我认为你在这里感到困惑的是, ((i)%4)当i 不是 4的倍数时,评估为TRUE,所以你得到4的非倍数的_mm_shuffle_ps ,否则你只得到原始向量(因为a旋转4的倍数是无操作。

一些可能有用的背景:

vec_XXX宏表示此代码最初是从PowerPC / AltiVec移植的。 vec_sld是一个AltiVec内在函数,它将一对向量移位给定的字节数 。 在这种情况下,似乎vec_sld用于旋转单个向量,因为两个输入向量是相同的,并且看起来12作为字节移位传递(即旋转3个浮点数)。

因此vec_sld(v, v, 12)被转换为_mm_ror_ps(v, 12/4) = _mm_ror_ps(v, 3) ,然后扩展为:

_mm_shuffle_ps(v, v, _MM_SHUFFLE(2, 1, 0, 3);

所以它看起来好像代码正在做正确的事情。

I think where you got confused here is that ((i)%4) evaluates to TRUE when i is not a multiple of 4, so you get an _mm_shuffle_ps for non-multiples of 4, otherwise you just get the original vector (since a rotate by a multiple of 4 is a no-op).

Some background which may be useful:

The vec_XXX macros indicate that this code was originally ported from PowerPC/AltiVec. vec_sld is an AltiVec intrinsic which shifts a pair of vectors by a given number of bytes. In this context it appears that vec_sld is being used to rotate a single vector, since the two input vectors are the same, and it appears that 12 is being passed as a byte shift (i.e. rotate by 3 floats).

So vec_sld(v, v, 12) gets translated to _mm_ror_ps(v, 12/4) = _mm_ror_ps(v, 3) which then gets expanded to:

_mm_shuffle_ps(v, v, _MM_SHUFFLE(2, 1, 0, 3);

so it does look as if the code is doing the right thing.

更多推荐