Fröhling

Niels Fröhling about [X]HTML, JS and technical tidbits

  • Home
  • About
January 14th, 2010

OP-equivalent series (psubq)

Ethatron in Equivalence

psubq (64bit - 64bit = 64bit):
movq mm7, mmc
psubd mmc, mms
movq mm6, mmc
psubd mm7, 0×80000000
psubd mm6, 0×80000000
pcmpgtd mm6, mm7
psllq mm6, 32
psubd mmc, mm6

Weiterlesen...

Kein Kommentar
January 8th, 2010

OP-equivalent series (paddq)

Ethatron in Equivalence

paddq (64bit + 64bit = 64bit):
movq mm7, mmc
paddd mmc, mma
psubd mm7, 0×80000000
psubd mmc, 0×80000000
pcmpgtd mm7, mmc
paddd mmc, 0×80000000
psllq mm7, 32
psubd mmc, mm7

Weiterlesen...

Kein Kommentar
January 8th, 2010

OP-equivalent series (packusqd, packssqud)

Ethatron in Equivalence

packusqd (saturated clamp from unsigned long long to unsigned long),
packssqud (saturated clamp from signed long long to unsigned long):
There is a condition for unsigned long long inputs, range is “only” [0x0000000000000000, 0x7FFFFFFFFFFF].
There is a no condition for signed long long inputs.
// from-scratch, no helper available

/* -1 */
pcmpeqd mm6, [...]

Weiterlesen...

Kein Kommentar
January 7th, 2010

OP-equivalent series (packusdw, packssduw)

Ethatron in Equivalence

packusdw (saturated clamp from unsigned long to unsigned short),
packssduw (saturated clamp from signed long to unsigned short):
There is a condition for unsigned long inputs, range is “only” [0x00000000, 0x7FFFFFFF].
There is a condition for signed long inputs, range is “only” [0x80008000, 0x7FFFFFFF]. You can go to full signed long if there would exist “psubsd”, which does [...]

Weiterlesen...

Kein Kommentar
January 2nd, 2010

Cubic root approximation

Ethatron in Approximations

The SSE-ops lack a huge amount of very usefull functions. Basically every single mathematical relevant function is absent, worst, trying to re-create them and reach the same precision as the FPU-routines results in magnitude slower code. Sometimes one can find a usefull short-cut, but not very often.
Here I tried to make a cbrt(x), the cubic [...]

Weiterlesen...

Kein Kommentar
January 2nd, 2010

OP-equivalent series (padddqd)

Ethatron in Equivalence

padddq (128bit + 64bit = 128bit):
// carry emulation via pcmpgtq equivalent

movdqa xmmx, xmma
movdqa xmmy, xmmb
movdqa xmmz, xmma
paddq xmma, xmmb
pcmpgtd xmmx, xmmb
pcmpgtd xmmy, xmma
pcmpeqd xmmz, xmmb
pshufd xmmx, xmmx, 1|1|1|1
pshufd xmmz, xmmz, 1|1|1|1
pand xmmz, xmmy
por xmmz, xmmx
punpcklqdq xmmz, xmmz
pslldq xmmz, 8 or pshufd
psubq xmma, xmmz

Weiterlesen...

Kein Kommentar
January 2nd, 2010

OP-equivalent series (pcmpgtq, pcmpeqq)

Ethatron in Equivalence

When you don’t have SSE4 available you have to compensate for the lack of ome essencial ops. In this series I’m going to write down equivalents of either missing or (currently) non-AMD ops.
pcmpeqq (64bit integer compare):
// equality == (hi == hi) && (lo == lo)

movdqa xmm?3, xmma
pcmpeqd xmm?3, xmmb
pshufd xmm?1, xmm?3, 3|3|1|1 (b == a)
pshufd xmm?2, xmm?3, 2|2|0|0 (B == A)
pand xmm?1, [...]

Weiterlesen...

Kein Kommentar

Tags

  • Algorithms Assembler bash C++ Homemade Javascript MMX Optimization Rants TYPO3

Categories

    • Applications
      • TYPO3
    • TidBits
      • Algorithms
      • Approximations
      • Equivalence
      • Fixes
      • Javascript
      • Optimizations
      • Scripts
    • Uncategorized

Meta

    • Log in
    • Entries RSS
    • Comments RSS
    • WordPress.org

Recent Posts

    • jQuery’s animate is short-thought
    • movntq alignment
    • OP-equivalent series (pavgd)
    • OP-equivalent series (psubq)
    • OP-equivalent series (paddq)

 

  • January 2010
    M T W T F S S
    « Jul   Feb »
     123
    45678910
    11121314151617
    18192021222324
    25262728293031

Archives

    • August 2010
    • April 2010
    • February 2010
    • January 2010
    • July 2009
    • February 2009
    • December 2008
    • November 2008
    • August 2008
    • July 2008
    • May 2008
© 2010 Wired by Fröhling
Design von Dezzain Studio
Übersetzt von Htwo
Nature Pictures | Bamboo Blinds