I’m using VPADAL.U32 instruction to greatly increase my Addition code speed. However, I would need something to subtract with accumulation and carry (exactly what I got as addition).
Wishful thinking or actually possible?
From what I could gather, I’d need to decrement my 2nd operand, then not its bits … do the VPADAL, and then bit-test for a 1, and subtract 1 from the resulting carry (to either get 0 or -1 — my accumulation).
Am I missing an arcane technique somewhere?
You could use VPADDL to pairwise sum the addends and double the width, then use VQSUB to subtract this term from your total.