Fröhling

Niels Fröhling about [X]HTML, JS and technical tidbits

  • Home
  • About
15. February 2010

OP-equivalent series (pavgd)

Ethatron in Equivalence

pavgd (32bit average [(a + b + 1) >> 1]:

/* add (a + b), and compare overflow */
movq	mm6, mmc
paddd	mmc, mmd
psubd	mm6, 0x80000000
psubd	mmc, 0x80000000
pcmpgtd	mm6, mmc
paddd	mmc, 0x80000000

/* add (ab + 1), and compare overflow */
pcmpeqd	mm5, mm5
pcmpeqd	mmd, mmd
pcmpeqd	mm5, mmc
psubd	mmc, mmd

/* shift carry in */
por	mm6, mm5
pslld	mm6, 31
psrld	mmc, 1
por	mmc, mm6
Kein Kommentar
14. January 2010

OP-equivalent series (psubq)

Ethatron in Equivalence

psubq (64bit - 64bit = 64bit):

movq	mm7, mmc
psubd	mmc, mms
movq	mm6, mmc
psubd	mm7, 0x80000000
psubd	mm6, 0x80000000
pcmpgtd	mm6, mm7
psllq	mm6, 32
psubd	mmc, mm6
Kein Kommentar
08. January 2010

OP-equivalent series (paddq)

Ethatron in Equivalence

paddq (64bit + 64bit = 64bit):

movq	mm7, mmc
paddd	mmc, mma
psubd	mm7, 0x80000000
psubd	mmc, 0x80000000
pcmpgtd	mm7, mmc
paddd	mmc, 0x80000000
psllq	mm7, 32
psubd	mmc, mm7
Kein Kommentar
08. January 2010

OP-equivalent series (packusqd, packssqud)

Ethatron in Equivalence

packusqd (saturated clamp from unsigned long long to unsigned long),
packssqud (saturated clamp from signed long long to unsigned long):

There is a condition for unsigned long long inputs, range is “only” [0x0000000000000000, 0x7FFFFFFFFFFF].
There is a no condition for signed long long inputs.

// from-scratch, no helper available

    /* -1 */
    pcmpeqd	mm6, mm6

    /* x >> 32 > -1 */
    movq	mm4, mmc0
    movq	mm5, mm2

    /* no psraq/pshufd/pshufw available,
     * duplicate ((x >> 32) | x) */
    punpckhdq	mm4, mmc0
    punpckhdq	mm5, mm2

    pcmpgtd	mm4, mm6
    pcmpgtd	mm5, mm6

    pand	mmc0, mm4
    pand	mm2, mm5

    /* 0 */
    pxor	mm7, mm7

    /* x >> 32 == 0 */
    movq	mm4, mmc0
    movq	mm5, mm2

    /* no psraq/pshufd/pshufw available,
     * duplicate ((x >> 32) | x) */
    punpckhdq	mm4, mmc0
    punpckhdq	mm5, mm2

    pcmpeqd	mm4, mm7
    pcmpeqd	mm5, mm7

    pand	mmc0, mm4
    pand	mm2, mm5

    /* 0xFFFFFFFF */
    pandn	mm4, mm6
    pandn	mm5, mm6

    por		mmc0, mm4
    por		mm2, mm5

    punpckldq	mmc0, mm2
Kein Kommentar
07. January 2010

OP-equivalent series (packusdw, packssduw)

Ethatron in Equivalence

packusdw (saturated clamp from unsigned long to unsigned short),
packssduw (saturated clamp from signed long to unsigned short):

There is a condition for unsigned long inputs, range is “only” [0x00000000, 0x7FFFFFFF].
There is a condition for signed long inputs, range is “only” [0x80008000, 0x7FFFFFFF]. You can go to full signed long if there would exist “psubsd”, which does not.

// via packssdw

psubd		xmmx, 0x00008000	// signed short in long
psubd		xmmy, 0x00008000
packssdw	xmmx, xmmy		// cast long to short
paddw		xmmy, 0x8000		// unsigned short
// with punpck and variable

movdqa		xmm?1, 0x00008000
movdqa		xmm?2, xmm?1
puncklwd	xmm?2, xmm?1		// 0v0000000080008000
punckldq	xmm?2, xmm?2		// 0v8000800080008000

psubd		xmmx, xmm?1
psubd		xmmy, xmm?1
packssdw	xmmx, xmmy
paddw		xmmy, xmm?2
// with pshufw and variable

movdqa		xmm?1, 0x00008000
pshuflw		xmm?2, xmm?1, 2|2|0|0	// 0v?????????80008000
pshufhw		xmm?2, xmm?1, 2|2|0|0	// 0v8000800080008000

psubd		xmmx, xmm?1
psubd		xmmy, xmm?1
packssdw	xmmx, xmmy
paddw		xmmy, xmm?2
// with pshufw and variable and no memory access

pcmpeqd		xmm?1, xmm?1		// 0xFFFFFFFF
pslld		xmm?1, 31		// 0x80000000
pslrd		xmm?1, 16		// 0x00008000
pshuflw		xmm?2, xmm?1, 2|2|0|0	// 0v?????????80008000
pshufhw		xmm?2, xmm?1, 2|2|0|0	// 0v8000800080008000

psubd		xmmx, xmm?1
psubd		xmmy, xmm?1
packssdw	xmmx, xmmy
paddw		xmmy, xmm?2
Kein Kommentar
02. January 2010

OP-equivalent series (padddqd)

Ethatron in Equivalence

padddq (128bit + 64bit = 128bit):

// carry emulation via pcmpgtq equivalent

movdqa		xmmx, xmma
movdqa		xmmy, xmmb
movdqa		xmmz, xmma
paddq		xmma, xmmb
pcmpgtd		xmmx, xmmb
pcmpgtd		xmmy, xmma
pcmpeqd		xmmz, xmmb
pshufd		xmmx, xmmx, 1|1|1|1
pshufd		xmmz, xmmz, 1|1|1|1
pand		xmmz, xmmy
por		xmmz, xmmx
punpcklqdq	xmmz, xmmz
pslldq		xmmz, 8		or pshufd
psubq		xmma, xmmz
Kein Kommentar
02. January 2010

OP-equivalent series (pcmpgtq, pcmpeqq)

Ethatron in Equivalence

When you don’t have SSE4 available you have to compensate for the lack of ome essencial ops. In this series I’m going to write down equivalents of either missing or (currently) non-AMD ops.

pcmpeqq (64bit integer compare):

// equality == (hi == hi) && (lo == lo)

movdqa		xmm?3, xmma
pcmpeqd		xmm?3, xmmb
pshufd		xmm?1, xmm?3, 3|3|1|1	(b == a)
pshufd		xmm?2, xmm?3, 2|2|0|0	(B == A)
pand		xmm?1, xmm?2		(b == a) && (B == A)

pcmpgtq (64bit integer compare):

// greater == (hi > hi) || ((hi == hi) && (lo > lo))

movdqa		xmm?1, xmma
movdqa		xmm?2, xmma
pcmpgtd		xmm?1, xmmb
pcmpeqd		xmm?2, xmmb
pshufd		xmm?3, xmm?1, 3|3|1|1	(b > a)
pshufd		xmm?4, xmm?2, 3|3|1|1	(b == a)
por		xmm?2, xmm?1		(B > A) || (B == A)
pshufd		xmm?2, xmm?2, 2|2|0|0	(B >= A)
pand		xmm?2, xmm?4		(b == a) && (B >= A)
por		xmm?2, xmm?3		(b > a) || ((b == a) && (B >= A))
Kein Kommentar

Tags

  • Algorithms Assembler bash C++ Homemade Javascript MMX Optimization Rants TYPO3

Categories

    • Applications
      • TYPO3
    • TidBits
      • Algorithms
      • Approximations
      • Equivalence
      • Fixes
      • Javascript
      • Optimizations
      • Scripts
    • Uncategorized

Meta

    • Log in
    • Entries RSS
    • Comments RSS
    • WordPress.org

Recent Posts

    • jQuery’s animate is short-thought
    • movntq alignment
    • OP-equivalent series (pavgd)
    • OP-equivalent series (psubq)
    • OP-equivalent series (paddq)

 

  • September 2010
    M T W T F S S
    « Aug    
     12345
    6789101112
    13141516171819
    20212223242526
    27282930  

Archives

    • August 2010
    • April 2010
    • February 2010
    • January 2010
    • July 2009
    • February 2009
    • December 2008
    • November 2008
    • August 2008
    • July 2008
    • May 2008
© 2010 Wired by Fröhling
Design von Dezzain Studio
Übersetzt von Htwo
Nature Pictures | Bamboo Blinds