I have a simple (brute-force) recursive solver algorithm that takes lots of time for bigger values of OpxCnt variable. For small values of OpxCnt, no problem, works like a charm. The algorithm gets very slow as the OpxCnt variable gets bigger. This is to be expected but any optimization or a different algorithm ?
My final goal is that :: I want to read all the True values in the map array by
executing some number of read operations that have the minimum operation
cost. This is not the same as minimum number of read operations.
At function completion, There should be no True value unread.
map array is populated by some external function, any member may be 1 or 0.
For example ::
map[4] = 1;
map[8] = 1;
1 read operation having Adr=4,Cnt=5 has the lowest cost (35)
whereas
2 read operations having Adr=4,Cnt=1 & Adr=8,Cnt=1 costs (27+27=54)
#include <string.h>
typedef unsigned int Ui32;
#define cntof(x) (sizeof(x) / sizeof((x)[0]))
#define ZERO(x) do{memset(&(x), 0, sizeof(x));}while(0)
typedef struct _S_MB_oper{
Ui32 Adr;
Ui32 Cnt;
}S_MB_oper;
typedef struct _S_MB_code{
Ui32 OpxCnt;
S_MB_oper OpxLst[20];
Ui32 OpxPay;
}S_MB_code;
char map[65536] = {0};
static int opx_ListOkey(S_MB_code *px_kod, char *pi_map)
{
int cost = 0;
char map[65536];
memcpy(map, pi_map, sizeof(map));
for(Ui32 o = 0; o < px_kod->OpxCnt; o++)
{
for(Ui32 i = 0; i < px_kod->OpxLst[o].Cnt; i++)
{
Ui32 adr = px_kod->OpxLst[o].Adr + i;
// ...
if(adr < cntof(map)){map[adr] = 0x0;}
}
}
for(Ui32 i = 0; i < cntof(map); i++)
{
if(map[i] > 0x0){return -1;}
}
// calculate COST...
for(Ui32 o = 0; o < px_kod->OpxCnt; o++)
{
cost += 12;
cost += 13;
cost += (2 * px_kod->OpxLst[o].Cnt);
}
px_kod->OpxPay = (Ui32)cost; return cost;
}
static int opx_FindNext(char *map, int pi_idx)
{
int i;
if(pi_idx < 0){pi_idx = 0;}
for(i = pi_idx; i < 65536; i++)
{
if(map[i] > 0x0){return i;}
}
return -1;
}
static int opx_FindZero(char *map, int pi_idx)
{
int i;
if(pi_idx < 0){pi_idx = 0;}
for(i = pi_idx; i < 65536; i++)
{
if(map[i] < 0x1){return i;}
}
return -1;
}
static int opx_Resolver(S_MB_code *po_bst, S_MB_code *px_wrk, char *pi_map, Ui32 *px_idx, int _min, int _max)
{
int pay, kmax, kmin = 1;
if(*px_idx >= px_wrk->OpxCnt)
{
return opx_ListOkey(px_wrk, pi_map);
}
_min = opx_FindNext(pi_map, _min);
// ...
if(_min < 0){return -1;}
kmax = (_max - _min) + 1;
// must be less than 127 !
if(kmax > 127){kmax = 127;}
// is this recursion the last one ?
if(*px_idx >= (px_wrk->OpxCnt - 1))
{
kmin = kmax;
}
else
{
int zero = opx_FindZero(pi_map, _min);
// ...
if(zero > 0)
{
kmin = zero - _min;
// enforce kmax limit !?
if(kmin > kmax){kmin = kmax;}
}
}
for(int _cnt = kmin; _cnt <= kmax; _cnt++)
{
px_wrk->OpxLst[*px_idx].Adr = (Ui32)_min;
px_wrk->OpxLst[*px_idx].Cnt = (Ui32)_cnt;
(*px_idx)++;
pay = opx_Resolver(po_bst, px_wrk, pi_map, px_idx, (_min + _cnt), _max);
(*px_idx)--;
if(pay > 0)
{
if((Ui32)pay < po_bst->OpxPay)
{
memcpy(po_bst, px_wrk, sizeof(*po_bst));
}
}
}
return (int)po_bst->OpxPay;
}
int main()
{
int _max = -1, _cnt = 0;
S_MB_code best = {0};
S_MB_code work = {0};
// SOME TEST DATA...
map[ 4] = 1;
map[ 8] = 1;
/*
map[64] = 1;
map[72] = 1;
map[80] = 1;
map[88] = 1;
map[96] = 1;
*/
// SOME TEST DATA...
for(int i = 0; i < cntof(map); i++)
{
if(map[i] > 0)
{
_max = i; _cnt++;
}
}
// num of Opx can be as much as num of individual bit(s).
if(_cnt > cntof(work.OpxLst)){_cnt = cntof(work.OpxLst);}
best.OpxPay = 1000000000L; // invalid great number...
for(int opx_cnt = 1; opx_cnt <= _cnt; opx_cnt++)
{
int rv;
Ui32 x = 0;
ZERO(work); work.OpxCnt = (Ui32)opx_cnt;
rv = opx_Resolver(&best, &work, map, &x, -42, _max);
}
return 0;
}
You can use dynamic programming to calculate the lowest cost that covers the first i true values in
map[]. Call this f(i). As I’ll explain, you can calculate f(i) by looking at all f(j) for j < i, so this will take time quadratic in the number of true values — much better than exponential. The final answer you’re looking for will be f(n), where n is the number of true values inmap[].A first step is to preprocess
map[]into a list of the positions of true values. (It’s possible to do DP on the rawmap[]array, but this will be slower if true values are sparse, and cannot be faster.)When we’re looking at the subproblem on just the first i true values, what we know is that the ith true value must be covered by a read that ends at i. This block could start at any position j <= i; we don’t know, so we have to test all i of them and pick the best. The key property (Optimal Substructure) that enables DP here is that in any optimal solution to the i-sized subproblem, if the read that covers the ith true value starts at the jth true value, then the preceding j-1 true values must be covered by an optimal solution to the (j-1)-sized subproblem.
So: f(i) = min(f(j) + score(pos(j+1), pos(i)), with the minimum taken over all 1 <= j < i. pos(k) refers to the position of the kth true value in
map[], and score(x, y) is the score of a read from position x to position y, inclusive.Extracting a Solution from Scores
Using this scheme, you can calculate the score of an optimal solution: it’s simply f(n), where n is the number of true values in
map[]. In order to actually construct the solution, you need to read back through the table of f() scores to infer which choice was made:There may be several possible choices, but all of them will produce optimal solutions.
Further Speedups
The solution above will work for an arbitrary scoring function. Because your scoring function has a simple structure, it may be that even faster algorithms can be developed.
For example, we can prove that there is a gap width above which it is always beneficial to break a single read into two reads. Suppose we have a read from position x-a to x, and another read from position y to y+b, with y > x. The combined costs of these two separate reads are 25 + 2 * (a + 1) + 25 + 2 * (b + 1) = 54 + 2 * (a + b). A single read stretching from x-a to y+b would cost 25 + 2 * (y + b – x + a + 1) = 27 + 2 * (a + b) + 2 * (y – x). Therefore the single read costs 27 – 2 * (y – x) less. If y – x > 13, this difference goes below zero: in other words, it can never be optimal to include a single read that spans a gap of 12 or more.
To make use of this property, inside
calcF(), final reads could be tried in decreasing order of start-position (i.e. in increasing order of width), and the inner loop stopped as soon as any gap width exceeds 12. Because that read and all subsequent wider reads tried would contain this too-large gap and therefore be suboptimal, they need not be tried.