Mainly, the one to keep in mind is that data must be aligned on 128-bit boundaries.
SSE code poses additional requirements of the data it processes. That makes it virtually impossible for the compiler to optimize your code, and it's very hard for a human to write efficient assembler).Īlternatively, use the intrinsics available with your compiler (if memory serves, they're usually defined in xmmintrin.h)īut again, the performance may not improve. Either write the entire block of code in assembly (probably a bad idea. In any case, you have two options for using these instructions. SSE2, which, as far as I can recall, is the one that offers integer operations, is somewhat more recent (Pentium 3? Although the first AMD Athlon processors didn't support them) That means x86, dating back to the Pentium 2 or so (can't remember exactly when they were introduced, but it's a long time ago) If you use SSE instructions, you're obviously limited to processors that support these. There's plenty of documentation on Intel's website. You need to analyse the algorithm and the data to see if it can be SSE'd and that requires knowing how SSE works. SSE is only on IA32 (Intel/AMD) and not all IA32 cpus support SSE. help! can't conditionally perform this on each column, all columns must do the same thingĢ If the data is not contigous then loading the data into the SIMD instructions is cumbersomeģ The code is processor specific. If (a1
So, you won't get any advantage to using SSE as a straight replacement for the integer operations, you will only get advantages if you can do the operations on multiple data items at once. SIMD, of which SSE is an example, allows you to do the same operation on multiple chunks of data.