I’m working on translating the code below into Neon Assembly. Any help would be greatly appreciated.
void sum(int length, int *a, int *b, int *c, int *d, char *result)
{
int i;
for (i = 0; i < length; i++)
{
int sum = (a[i] + b[i] + c[i] + d[i])/4;
if (sum > threshold)
result[i] = 1;
else
result[i] = 0;
}
}
The actual code is an image binarization algorithm. The above code is just to demonstrate the idea and not to make simple things more complicate.
Here’s a fairly straightforward implementation. Note that we convert the divide and threshold test into just a test against
threshold * 4(in order to eliminate the divide):Notes:
resulthas been changed toint32_t *– it’s not hard to pack down touint8_tbut it adds a lot of complexity to this initial example so I thought I’d keep it simple for nowa,b,c,d,resultall need to be 16 byte alignednneeds to be a multiple of 4a,b,c,dneeds to fit within 32 bit signed intthreshold * 4needs to fit within 32 bit signed int