(This is a generalization of: Finding duplicates in O(n) time and O(1) space)
Problem: Write a C++ or C function with time and space complexities of O(n) and O(1) respectively that finds the repeating integers in a given array without altering it.
Example: Given {1, 0, -2, 4, 4, 1, 3, 1, -2} function must print 1, -2, and 4 once (in any order).
EDIT: The following solution requires a duo-bit (to represent 0, 1, and 2) for each integer in the range of the minimum to the maximum of the array. The number of necessary bytes (regardless of array size) never exceeds (INT_MAX – INT_MIN)/4 + 1.
#include <stdio.h>
void set_min_max(int a[], long long unsigned size,\
int* min_addr, int* max_addr)
{
long long unsigned i;
if(!size) return;
*min_addr = *max_addr = a[0];
for(i = 1; i < size; ++i)
{
if(a[i] < *min_addr) *min_addr = a[i];
if(a[i] > *max_addr) *max_addr = a[i];
}
}
void print_repeats(int a[], long long unsigned size)
{
long long unsigned i;
int min, max = min;
long long diff, q, r;
char* duos;
set_min_max(a, size, &min, &max);
diff = (long long)max - (long long)min;
duos = calloc(diff / 4 + 1, 1);
for(i = 0; i < size; ++i)
{
diff = (long long)a[i] - (long long)min; /* index of duo-bit
corresponding to a[i]
in sequence of duo-bits */
q = diff / 4; /* index of byte containing duo-bit in "duos" */
r = diff % 4; /* offset of duo-bit */
switch( (duos[q] >> (6 - 2*r )) & 3 )
{
case 0: duos[q] += (1 << (6 - 2*r));
break;
case 1: duos[q] += (1 << (6 - 2*r));
printf("%d ", a[i]);
}
}
putchar('\n');
free(duos);
}
void main()
{
int a[] = {1, 0, -2, 4, 4, 1, 3, 1, -2};
print_repeats(a, sizeof(a)/sizeof(int));
}
The definition of big-O notation is that its argument is a function (f(x)) that, as the variable in the function (x) tends to infinity, there exists a constant K such that the objective cost function will be smaller than Kf(x). Typically f is chosen to be the smallest such simple function such that the condition is satisfied. (It’s pretty obvious how to lift the above to multiple variables.)
This matters because that K — which you aren’t required to specify — allows a whole multitude of complex behavior to be hidden out of sight. For example, if the core of the algorithm is O(n2), it allows all sorts of other O(1), O(logn), O(n), O(nlogn), O(n3/2), etc. supporting bits to be hidden, even if for realistic input data those parts are what actually dominate. That’s right, it can be completely misleading! (Some of the fancier bignum algorithms have this property for real. Lying with mathematics is a wonderful thing.)
So where is this going? Well, you can assume that
intis a fixed size easily enough (e.g., 32-bit) and use that information to skip a lot of trouble and allocate fixed size arrays of flag bits to hold all the information that you really need. Indeed, by using two bits per potential value (one bit to say whether you’ve seen the value at all, another to say whether you’ve printed it) then you can handle the code with fixed chunk of memory of 1GB in size. That will then give you enough flag information to cope with as many 32-bit integers as you might ever wish to handle. (Heck that’s even practical on 64-bit machines.) Yes, it’s going to take some time to set that memory block up, but it’s constant so it’s formally O(1) and so drops out of the analysis. Given that, you then have constant (but whopping) memory consumption and linear time (you’ve got to look at each value to see whether it’s new, seen once, etc.) which is exactly what was asked for.It’s a dirty trick though. You could also try scanning the input list to work out the range allowing less memory to be used in the normal case; again, that adds only linear time and you can strictly bound the memory required as above so that’s constant. Yet more trickiness, but formally legal.
[EDIT] Sample C code (this is not C++, but I’m not good at C++; the main difference would be in how the flag arrays are allocated and managed):