Ei, once in a while you ask yourself about the (in)sanity of the X86-architecture. Well everybody knows, the ops offered by MMX upto SSE58942 are actually choosen by dicing 42 out of 255 possible once. Well I just don’t go too deep into emotional depression here. Just my list of missing and missguided ops:
- pmulbw
- pmulsdq
- pmulhd/pmulld
- paddsd/psubsd/paddusd/psubusd
- pmaddbw
- pmadddq
- pmaddubw/pmadduwd/pmaddudq
- psadwd (horizontal add of shorts)
- psaddq (horizontal add of longs)
- packusdw (hell this is so obvious)
- packssqd (psubq + pshuflw + pshufhw + pshufd + paddd but with saturation)
- packusqd (pshuflw + pshufhw + pshufd but with saturation)
- pnot (pcmpeq + pxor)
- pneg (pcmpeq + pxor + psub)
- pshufsw/pshufslw/pshufhw (equivalent to shufps for shorts, taking from src and dst)
- pshufsd (equivalent to shufps, taking from src and dst, all the other equivalents are there like pand/andps, but this one not???)
- pshufsq (equivalent to shufpd, taking from src and dst, all the other equivalents are there like pand/andpd, but this one not???)
- pcmpeqq/pcmpgtq (all the other qwords ops are there and this one not???)
- movld/movhd (equivalent to movlps/movhps, all the other equivalents are there like movdqa/movaps, but this one not???)
- movlq/movhq (equivalent to movlpd/movhpd, all the other equivalents are there like movdqa/movapd, but this one not???)
How strange this appears to be the result of the work of highly educated people. Ei, …
I feel sort of pervert at the same moment I get excited finding out that I actually can use the haddps to horizontal add integers upto a sum of 2^23. Ei, …