<?xml version="1.0" encoding="UTF-8"?><!-- generator="WordPress/2.5.1" -->
<rss version="0.92">
<channel>
	<title>Fröhling</title>
	<link>http://blog.frohling.biz</link>
	<description>Niels Fröhling about [X]HTML, JS and technical tidbits</description>
	<lastBuildDate>Sat, 21 Aug 2010 19:13:15 +0000</lastBuildDate>
	<docs>http://backend.userland.com/rss092</docs>
	<language>en</language>
	
	<item>
		<title>jQuery&#8217;s animate is short-thought</title>
		<description>
Currently I'm experimenting with jQuery (always being told prototype.js is evil). In my project I have to make quite some complex animations and I tried to detect if I can map them to the animate() syntax. Uh, and all hell broke loose. Here is my 24h experience:

	You can't animate regular ...</description>
		<link>http://blog.frohling.biz/2010/08/jquerys-animate-is-short-thought/</link>
			</item>
	<item>
		<title>movntq alignment</title>
		<description>
Uh, sometimes you don't know what's in the mind of intel guys.

They state in their documentation that the movntq-op (taken over from MMXExt) has to take place on 16-byte aligned memory. What??? Either it's a copy &#38; paste error, it's probably suppose to mean 8-byte aligned memory, or you got ...</description>
		<link>http://blog.frohling.biz/2010/04/movntq-alignment/</link>
			</item>
	<item>
		<title>OP-equivalent series (pavgd)</title>
		<description>
pavgd (32bit average [(a + b + 1) >> 1]:
/* add (a + b), and compare overflow */
movq	mm6, mmc
paddd	mmc, mmd
psubd	mm6, 0x80000000
psubd	mmc, 0x80000000
pcmpgtd	mm6, mmc
paddd	mmc, 0x80000000

/* add (ab + 1), and compare overflow */
pcmpeqd	mm5, mm5
pcmpeqd	mmd, mmd
pcmpeqd	mm5, mmc
psubd	mmc, mmd

/* shift carry in */
por	mm6, mm5
pslld	mm6, 31
psrld	mmc, 1
por	mmc, mm6

 </description>
		<link>http://blog.frohling.biz/2010/02/op-equivalent-series-pavgd/</link>
			</item>
	<item>
		<title>OP-equivalent series (psubq)</title>
		<description>
psubq (64bit - 64bit = 64bit):
movq	mm7, mmc
psubd	mmc, mms
movq	mm6, mmc
psubd	mm7, 0x80000000
psubd	mm6, 0x80000000
pcmpgtd	mm6, mm7
psllq	mm6, 32
psubd	mmc, mm6

 </description>
		<link>http://blog.frohling.biz/2010/01/op-equivalent-series-psubq/</link>
			</item>
	<item>
		<title>OP-equivalent series (paddq)</title>
		<description>
paddq (64bit + 64bit = 64bit):
movq	mm7, mmc
paddd	mmc, mma
psubd	mm7, 0x80000000
psubd	mmc, 0x80000000
pcmpgtd	mm7, mmc
paddd	mmc, 0x80000000
psllq	mm7, 32
psubd	mmc, mm7

 </description>
		<link>http://blog.frohling.biz/2010/01/op-equivalent-series-paddq/</link>
			</item>
	<item>
		<title>OP-equivalent series (packusqd, packssqud)</title>
		<description>
packusqd (saturated clamp from unsigned long long to unsigned long),
packssqud (saturated clamp from signed long long to unsigned long):

There is a condition for unsigned long long inputs, range is "only" [0x0000000000000000, 0x7FFFFFFFFFFF].
There is a no condition for signed long long inputs.

// from-scratch, no helper available

    /* -1 ...</description>
		<link>http://blog.frohling.biz/2010/01/op-equivalent-series-packusqd/</link>
			</item>
	<item>
		<title>OP-equivalent series (packusdw, packssduw)</title>
		<description>
packusdw (saturated clamp from unsigned long to unsigned short),
packssduw (saturated clamp from signed long to unsigned short):

There is a condition for unsigned long inputs, range is "only" [0x00000000, 0x7FFFFFFF].
There is a condition for signed long inputs, range is "only" [0x80008000, 0x7FFFFFFF]. You can go to full signed long if there ...</description>
		<link>http://blog.frohling.biz/2010/01/op-equivalent-series-packusdw/</link>
			</item>
	<item>
		<title>Cubic root approximation</title>
		<description>
The SSE-ops lack a huge amount of very usefull functions. Basically every single mathematical relevant function is absent, worst, trying to re-create them and reach the same precision as the FPU-routines results in magnitude slower code. Sometimes one can find a usefull short-cut, but not very often.

Here I tried to ...</description>
		<link>http://blog.frohling.biz/2010/01/cubic-root-approximation/</link>
			</item>
	<item>
		<title>OP-equivalent series (padddqd)</title>
		<description>
padddq (128bit + 64bit = 128bit):
// carry emulation via pcmpgtq equivalent

movdqa		xmmx, xmma
movdqa		xmmy, xmmb
movdqa		xmmz, xmma
paddq		xmma, xmmb
pcmpgtd		xmmx, xmmb
pcmpgtd		xmmy, xmma
pcmpeqd		xmmz, xmmb
pshufd		xmmx, xmmx, 1&#124;1&#124;1&#124;1
pshufd		xmmz, xmmz, 1&#124;1&#124;1&#124;1
pand		xmmz, xmmy
por		xmmz, xmmx
punpcklqdq	xmmz, xmmz
pslldq		xmmz, 8		or pshufd
psubq		xmma, xmmz

 </description>
		<link>http://blog.frohling.biz/2010/01/op-equivalent-series-padddqd/</link>
			</item>
	<item>
		<title>OP-equivalent series (pcmpgtq, pcmpeqq)</title>
		<description>
When you don't have SSE4 available you have to compensate for the lack of ome essencial ops. In this series I'm going to write down equivalents of either missing or (currently) non-AMD ops.

pcmpeqq (64bit integer compare):
// equality == (hi == hi) && (lo == lo)

movdqa		xmm?3, xmma
pcmpeqd		xmm?3, xmmb
pshufd		xmm?1, xmm?3, 3&#124;3&#124;1&#124;1	(b == ...</description>
		<link>http://blog.frohling.biz/2010/01/op-equivalent-series-pcmpgtq-pcmpeqq/</link>
			</item>
</channel>
</rss>
