我正在研究四元数SSE实现以了解它们是如何工作的(因为我实现了自己的实现)并且我遇到了这个四元数乘法的Bullet实现:
VECTORMATH_FORCE_INLINE const Quat Quat::operator *( const Quat &quat ) const { __m128 ldata, rdata, qv, tmp0, tmp1, tmp2, tmp3; __m128 product, l_wxyz, r_wxyz, xy, qw; ldata = mVec128; rdata = quat.mVec128; tmp0 = _mm_shuffle_ps( ldata, ldata, _MM_SHUFFLE(3,0,2,1) ); tmp1 = _mm_shuffle_ps( rdata, rdata, _MM_SHUFFLE(3,1,0,2) ); tmp2 = _mm_shuffle_ps( ldata, ldata, _MM_SHUFFLE(3,1,0,2) ); tmp3 = _mm_shuffle_ps( rdata, rdata, _MM_SHUFFLE(3,0,2,1) ); qv = vec_mul( vec_splat( ldata, 3 ), rdata ); qv = vec_madd( vec_splat( rdata, 3 ), ldata, qv ); qv = vec_madd( tmp0, tmp1, qv ); qv = vec_nmsub( tmp2, tmp3, qv ); product = vec_mul( ldata, rdata ); l_wxyz = vec_sld( ldata, ldata, 12 ); r_wxyz = vec_sld( rdata, rdata, 12 ); qw = vec_nmsub( l_wxyz, r_wxyz, product ); xy = vec_madd( l_wxyz, r_wxyz, product ); qw = vec_sub( qw, vec_sld( xy, xy, 8 ) ); VM_ATTRIBUTE_ALIGN16 unsigned int sw[4] = {0, 0, 0, 0xffffffff}; return Quat( vec_sel( qv, qw, sw ) ); }我关心的是这两行:
l_wxyz = vec_sld( ldata, ldata, 12 ); r_wxyz = vec_sld( rdata, rdata, 12 );宏实现:
#define _mm_ror_ps(vec,i) \ (((i)%4) ? (_mm_shuffle_ps(vec,vec, _MM_SHUFFLE((unsigned char)(i+3)%4,(unsigned char)(i+2)%4,(unsigned char)(i+1)%4,(unsigned char)(i+0)%4))) : (vec)) #define vec_sld(vec,vec2,x) _mm_ror_ps(vec, ((x)/4))如果我理解正确,对于一个不能被4整除的数字(3不是[12/4 = 3]), vec_sld宏将减少为:
l_wxyz = ldata;//vec_sld( ldata, ldata, 12 ); r_wxyz = rdata;//vec_sld( rdata, rdata, 12 );哪个实际上什么都不做。
如果值可以被4整除:
q = vec_sld( x, x, 16 );宏将减少到:
q = _mm_shuffle_ps( x, x, _MM_SHUFFLE(3,2,1,0) );这又是什么都不做,因为_MM_SHUFFLE(3,2,1,0)在当前位置留下x,y,z和w。
如果vec_sld没有做任何事情,它的目的是什么?
我错过了什么吗?
编辑:这是源代码来自的两个文件
quat_aos.h (operator *()) vectormath_aos.h (vec_sld和_mm_ror_ps的定义)I was researching quaternion SSE implementations to understand how they worked (since I'm implementing my own) and I came across this Bullet implementation for quaternion multiplication:
VECTORMATH_FORCE_INLINE const Quat Quat::operator *( const Quat &quat ) const { __m128 ldata, rdata, qv, tmp0, tmp1, tmp2, tmp3; __m128 product, l_wxyz, r_wxyz, xy, qw; ldata = mVec128; rdata = quat.mVec128; tmp0 = _mm_shuffle_ps( ldata, ldata, _MM_SHUFFLE(3,0,2,1) ); tmp1 = _mm_shuffle_ps( rdata, rdata, _MM_SHUFFLE(3,1,0,2) ); tmp2 = _mm_shuffle_ps( ldata, ldata, _MM_SHUFFLE(3,1,0,2) ); tmp3 = _mm_shuffle_ps( rdata, rdata, _MM_SHUFFLE(3,0,2,1) ); qv = vec_mul( vec_splat( ldata, 3 ), rdata ); qv = vec_madd( vec_splat( rdata, 3 ), ldata, qv ); qv = vec_madd( tmp0, tmp1, qv ); qv = vec_nmsub( tmp2, tmp3, qv ); product = vec_mul( ldata, rdata ); l_wxyz = vec_sld( ldata, ldata, 12 ); r_wxyz = vec_sld( rdata, rdata, 12 ); qw = vec_nmsub( l_wxyz, r_wxyz, product ); xy = vec_madd( l_wxyz, r_wxyz, product ); qw = vec_sub( qw, vec_sld( xy, xy, 8 ) ); VM_ATTRIBUTE_ALIGN16 unsigned int sw[4] = {0, 0, 0, 0xffffffff}; return Quat( vec_sel( qv, qw, sw ) ); }The bit I am concerned about is these two lines:
l_wxyz = vec_sld( ldata, ldata, 12 ); r_wxyz = vec_sld( rdata, rdata, 12 );Macros implementation:
#define _mm_ror_ps(vec,i) \ (((i)%4) ? (_mm_shuffle_ps(vec,vec, _MM_SHUFFLE((unsigned char)(i+3)%4,(unsigned char)(i+2)%4,(unsigned char)(i+1)%4,(unsigned char)(i+0)%4))) : (vec)) #define vec_sld(vec,vec2,x) _mm_ror_ps(vec, ((x)/4))If I understand it correctly, for a number that is not divisible by 4 (3 isn't [12/4 = 3]), the vec_sld macro will reduce to:
l_wxyz = ldata;//vec_sld( ldata, ldata, 12 ); r_wxyz = rdata;//vec_sld( rdata, rdata, 12 );Which is effectively doing nothing.
And if the value is divisible by 4:
q = vec_sld( x, x, 16 );The macro will reduce to:
q = _mm_shuffle_ps( x, x, _MM_SHUFFLE(3,2,1,0) );Which, again, is like doing nothing, since _MM_SHUFFLE(3,2,1,0) is leaving x, y, z, and w in their current places.
If vec_sld is not doing anything, what is its purpose?
Am I missing anything?
EDIT: Here are the two files the source code comes from
quat_aos.h (operator*()) vectormath_aos.h (definition of vec_sld and _mm_ror_ps)最满意答案
我认为你在这里感到困惑的是, ((i)%4)当i 不是 4的倍数时,评估为TRUE,所以你得到4的非倍数的_mm_shuffle_ps ,否则你只得到原始向量(因为a旋转4的倍数是无操作。
一些可能有用的背景:
vec_XXX宏表示此代码最初是从PowerPC / AltiVec移植的。 vec_sld是一个AltiVec内在函数,它将一对向量移位给定的字节数 。 在这种情况下,似乎vec_sld用于旋转单个向量,因为两个输入向量是相同的,并且看起来12作为字节移位传递(即旋转3个浮点数)。
因此vec_sld(v, v, 12)被转换为_mm_ror_ps(v, 12/4) = _mm_ror_ps(v, 3) ,然后扩展为:
_mm_shuffle_ps(v, v, _MM_SHUFFLE(2, 1, 0, 3);所以它看起来好像代码正在做正确的事情。
I think where you got confused here is that ((i)%4) evaluates to TRUE when i is not a multiple of 4, so you get an _mm_shuffle_ps for non-multiples of 4, otherwise you just get the original vector (since a rotate by a multiple of 4 is a no-op).
Some background which may be useful:
The vec_XXX macros indicate that this code was originally ported from PowerPC/AltiVec. vec_sld is an AltiVec intrinsic which shifts a pair of vectors by a given number of bytes. In this context it appears that vec_sld is being used to rotate a single vector, since the two input vectors are the same, and it appears that 12 is being passed as a byte shift (i.e. rotate by 3 floats).
So vec_sld(v, v, 12) gets translated to _mm_ror_ps(v, 12/4) = _mm_ror_ps(v, 3) which then gets expanded to:
_mm_shuffle_ps(v, v, _MM_SHUFFLE(2, 1, 0, 3);so it does look as if the code is doing the right thing.
更多推荐
发布评论