Thank you in advanced for taking the time to read my post. I will be the first to admit that this piece of code is slightly convoluted.
This weekend my project has been making a program that takes a formatted string as an argument and spits out all the possible permutations of that string. The formatted string looks similar to the printf formatted string. Constants are kept in the string, and the variable parts look like this: %[number-of-chars][type-of-permuation] where
[type-of-permutation] can be ‘d’ for integers, ‘u’ for uppercase alpha characters, ‘l’ for lowercase, ‘a’ for all alpha characters, and ‘n’ for alpha numeric.
There is a special case “%w” which reads words from a word list instead of incrementing each individual character.
Things generally work the way I want them to. However, upon testing and debugging with gdb I have discovered two bugs that I cannot explain.
Bug #1: The string partial is initially filled with the minimum characters of that type (eg. %5d would be filled with “00000”) However, after an arbitrary command is executed (fopen) the partial contains arbitrary characters, usually lower in value than 0x09. This occurs at line 143, which I have marked in the source.
Bug #2: When I run the program like so: “./Permute %w%2d -w words”, where “words” is the Ubuntu word list, the program throws a segmentation fault after reaching around the 4,200th line of the word list. This is also perplexing. Because the number 4,200 is so random, I can only assume that the program is running out of stack memory or something like that.
I have this code broken up into several header files. I will post the main code first, and then the included header files underneath.
#include <stdio.h>
#include "../Libraries/permute.h"
#include <malloc.h>
#include <string.h>
void main (int argc, char * argv[]) {
//Allocate ten base pointers to be inserted into our master string.
struct base * part[10];
//Allocate a character string on the heap to be manipulated by our program.
char * master = malloc(100);
//Organize arguments
if(argc < 2) {
printf("Usage:\n./Permute [Format String] -p -w [word-list] -o [output-file]\n");
printf("Format string with the general form %%[number-of-chars][type]\n");
printf("Where [number-of-chars] is between 1 and 9\n");
printf("And [type] is d for integer, a for ALPHA, n for ALPHA_NUMERIC, u for UPPERCASE, l for LOWERCASE\n");
printf("-p increments all parts in parallel. \n");
printf("-w Reads words for %%w from [word-list]\n");
return;
}
int count;
int parallel = 0;
char * wordlist = NULL;
FILE * file = NULL;
char * output = NULL;
FILE * out = NULL;
for(count = 1; count < argc; count++){
char first = *argv[count];
int second = *(argv[count] + 1);
if (first == '-') {
switch (second) {
case 'p':
parallel = 1;
break;
case 'w':
wordlist = argv[count+1];
break;
case 'o':
output = argv[count+1];
break;
}
}
}
//Parse format string
int i = 0;
//This is the counter for the current base struct we're operating on.
int x;
int size;
int type;
//Create a pointer that points to a position inside our master string.
char * current = master;
//Cycle through each character in the format string.
for(count = 0; count < strlen(argv[1]); count++) {
//If the character is a constant, set the current character in the master string to the character in the format string.
if (*(argv[1] + count) != '%') {
*current = *(argv[1] + count);
current++;
}
//Otherwise, initialize a base structure using one of the pointers in the array declared above.
//Fill the structure according to the format string with rules specified above.
else {
size = *(argv[1] + count + 1) - '0';
//If the base part is a wordlist, do special things.
if (*(argv[1] + count + 1) == WORDLIST) {
type = WORDLIST;
part[i] = malloc(sizeof(int) * 3 + sizeof(FILE *) + MAX_WORD_SIZE);
memset(part[i], 0, sizeof(int) * 3 + sizeof(FILE *) + MAX_WORD_SIZE);
part[i]->file = fopen(wordlist, "r");
if (part[i]->file == NULL)
perror("Could not open file");
fgets(part[i]->guts, MAX_WORD_SIZE, part[i]->file);
part[i]->size = StripChar(part[i]->guts, strlen(part[i]->guts), '\n');
}
//Otherwise, fill out the base structure like normal.
else {
type = (int)*(argv[1] + count + 2);
part[i] = malloc(sizeof(int) * 3 + sizeof(FILE *) + size);
part[i]->size = size;
}
part[i]->type = type;
part[i]->offset = (int)(current - master);
//If the base part is not a wordlist (its size does not vary), move the position in the master string forward
//based on the size of the part. Move the position in the format string forward 2 places.
if (type != WORDLIST) {
current += size;
count += 2;
}
//If the base part is a wordlist (meaning its size varies), keep the pointer inside the master string where it is.
//We will move things around later to make everything fit.
else {
count += 1;
}
//Set all characters inside the base part to their minimum value.
for (x = 0; x < size; x++) {
switch (type) {
case NUMERIC:
part[i]->guts[x] = '0';
break;
case ALPHA:
part[i]->guts[x] = 'A';
break;
case ALPHA_NUMERIC:
part[i]->guts[x] = '0';
break;
case UPPERCASE:
part[i]->guts[x] = 'A';
break;
case LOWERCASE:
part[i]->guts[x] = 'a';
break;
}
}
//Move on to the next base structure.
i++;
}
}
//Terminate the masterstring with a null character.
*current = '\0';
//Calculate the size of the master string. (Without wordlist parts)
size_t mastersize = (size_t)(current - master);
//Keep track of the number of characters we've inserted into the master string for each wordlist part.
int wordsize = 0;
//Record the number of parts to cycle through in the master string
int max = i;
if (output != NULL) {
out = fopen(output, "w+");
}
/*BUG #1: THIS IS WHERE THE STRING GETS CHANGED. NO IDEA WHY. part[i]->guts was something like "AAAA", now it's something like 0x0002000200020002*/
while (1) {
wordsize = 0;
//If parallel flag is set, increment all parts at once.
if (parallel) {
for (i = 0; i < max; i++) {
Increment(part[i]->guts, part[i]->size, 0, part[i]->type);
//Copy the incremented part into the master string at offset
strncpy(master + part[i]->offset, part[i]->guts, part[i]->size);
}
//Then, once all parts are copied into the master, insert the words from the wordlist into the master string,
//moving the rest of the string around as needed.
for (i = 0; i < max; i++) {
if (part[i]->type == WORDLIST) {
memmove(master + part[i]->offset + part[i]->size + wordsize, master + part[i]->offset + wordsize, mastersize - part[i]->offset);
strncpy(master + part[i]->offset + wordsize, part[i]->guts, part[i]->size);
wordsize += part[i]->size;
}
}
}
//Otherwise, just increment the rightmost part.
else {
IncrementC(part, max - 1, wordlist);
//Copy the incremented part into the master string at offset
for (i = 0; i < max; i++) {
if (part[i]->type != WORDLIST)
strncpy(master + part[i]->offset, part[i]->guts, part[i]->size);
}
//Then, once all parts are copied into the master, insert the words from the wordlist into the master string,
//moving the rest of the string around as needed.
for (i = 0; i < max; i++) {
if (part[i]->type == WORDLIST) {
memmove(master + part[i]->offset + part[i]->size + wordsize, master + part[i]->offset + wordsize, mastersize - part[i]->offset);
strncpy(master + part[i]->offset + wordsize, part[i]->guts, part[i]->size);
wordsize += part[i]->size;
}
}
}
//Terminate the master string with a zero again (In case we overwrote it)
*(master + mastersize + wordsize) = '\0';
//Print the master string
if (out != NULL) {
fwrite(master, mastersize + wordsize, 1, out);
fwrite("\n", 1, 1, out);
}
printf("%s\n", master);
for (i = 0; i < max; i++) {
if (part[i]->type == WORDLIST) {
memmove(master + part[i]->offset, master + part[i]->offset + part[i]->size, mastersize - part[i]->offset);
}
}
}
}
//PERMUTE.H
#define NUMERIC 100
#define ALPHA 97
#define ALPHA_NUMERIC 110
#define LOWERCASE 108
#define UPPERCASE 117
#define WORDLIST 119
#define MAX_WORD_SIZE 50
#include <stdio.h>
#include "files.h"
//Used for multipart strings
struct base {
int type; //Determines which set of values the character should cycle through.
int offset; //The offset of the part from the beginning of the multipart string.
int size; //Size of the part.
FILE * file; //File pointer used for cycling through word lists
char guts[]; //The actual part to be inserted into the string.
};
//Recursive function Increment takes a string of length len and increments the character found [offset] number of chars
//left of the end of the string. Type determines which values the character should cycle through. (A-Z, a-z, 0-9, etc.)
int Increment(char * start, size_t len, int offset, int type) {
char * stop = start + len - 1;
char * place = stop - offset;
int min, max, count;
//Setup break points to determine if incremented char is at the end of its cycle for a given type.
switch (type) {
case NUMERIC:
min = (int)'0';
max = (int)'9';
break;
case ALPHA:
min = (int)'A';
max = (int)'z';
break;
case ALPHA_NUMERIC:
min = (int)'0';
max = (int)'z';
break;
case UPPERCASE:
min = (int)'A';
max = (int)'Z';
break;
case LOWERCASE:
min = (int)'a';
max = (int)'z';
break;
default:
return -1;
}
//If our specified character is not greater than its maximum value, increment the character based on
//the type of incrementation specified.
if (*place < max) {
switch (type) {
case NUMERIC:
*place += 1;
return 1;
break;
case ALPHA:
if (*place == 'Z')
*place = 'a';
else
*place += 1;
return 1;
break;
case ALPHA_NUMERIC:
if (*place == '9')
*place = 'A';
if (*place == 'Z')
*place = 'a';
else
*place += 1;
return 1;
break;
case UPPERCASE:
*place += 1;
return 1;
break;
case LOWERCASE:
*place += 1;
return 1;
break;
default:
return -1;
}
}
//If the character is greater than the maximum value, set it to it's minum and Increment() the next character
//to the left. If there is no character to the left, set all characters to zero and return 0.
else {
if(place == start) {
for (count = 0; start + count < stop; count++)
*(start + count) = (char)min;
*start = (char)min;
return 0;
}
else {
*place = (char)min;
return Increment(start, len, offset + 1, type);
}
}
}
//IncrementC() stands for Increment Combined. Takes an array of pointers to base structures, an offset, and a string
//with the filename of a wordlist.
int IncrementC (struct base * part[], int offset, char * wordlist) {
//If the type of permutation is anything but a wordlist.
if (part[offset]->type != WORDLIST) {
//Call Increment() on the base structure determined by offset.
//If Increment() returns 0, the function calls itself on the next base structure to the left.
if ( Increment(part[offset]->guts, part[offset]->size, 0, part[offset]->type) == 0) {
if (offset == 0)
return 0;
else
IncrementC(part, offset - 1, wordlist);
}
}
//If the type of permutation is a wordlist
else {
//Check to see if the base structure has an open file descriptor associated with it.
//If not, open the file specified by wordlist.
if (part[offset]->file == NULL) {
if (wordlist != NULL)
part[offset]->file = fopen(wordlist, "r");
if (part[offset]->file == NULL)
perror("Could not open word list");
}
//Get the next line from the wordlist file. If at EOF, reopen the file and get the first line of the new file descriptor.
if(fgets(part[offset]->guts, MAX_WORD_SIZE, part[offset]->file) == NULL) {
fclose(part[offset]->file);
part[offset]->file = fopen(wordlist, "r");
if (part[offset]->file == NULL)
perror("Could not open word list");
fgets(part[offset]->guts, MAX_WORD_SIZE, part[offset]->file);
//If the base structure is the farthest to the left in the string, return 0. Otherwise, call this
//function on the base structure to the left.
if (offset == 0) {
part[offset]->size = StripChar(part[offset]->guts, strlen(part[offset]->guts), '\n');
return 0;
}
else {
IncrementC(part, offset - 1, wordlist);
}
}
//Strip the new line character from the string.
part[offset]->size = StripChar(part[offset]->guts, strlen(part[offset]->guts), '\n');
}
}
//FILES.H
#include <stdio.h>
#include <malloc.h>
#include <string.h>
int ReadFile(char * filename, char * output) {
//Open file and store in packet
FILE *input;
input = fopen(filename, "r");
int count = 0;
char c = fgetc(input);
for(count = 0; c != '\0' && c != EOF; c = fgetc(input)) {
*(char *)(output + count) = c;
count++;
}
fclose(input);
printf("Read: %s from %s\n", output, filename);
return (count - 1);
}
int StripChar (char * in, size_t len, char strip) {
int count;
int outcount = 0;
char * out = malloc(len);
for (count = 0; count < len; count++) {
if(*(in + count) != strip) {
*(out + outcount) = *(in + count);
outcount++;
}
}
*(out + outcount) = '\0';
strcpy(in, out);
return outcount;
}
//End
I’ll keep editing and adding more bugs if i find.
TO start with, There is a bug at :
argvs are strings:argvis a one-dimensional array of strings.you can’t typecast the pointer directly to an
int. if you do so you will only convert the 1st character for that string to anintyou have to use
atoisort of functionality to convert an integer string argument to an integer.2 . You have absolutely no or extremely less error handling in your code that results in lot of issues. Here’s a possible root cause of the stack corruption
In
Permute.hin functionHere if the
(stop - offset)is<yourstartpointer you have a possible memory corruption.Remember the rule :
You can dodge a million bullets. But it takes just one !. All your hard work and effort to write a program is wasted if it crashes. when you go to a industry environment, the quality of code matters a lot, not just functionality. and Quality is improved only by more error checking in the code.I give utmost importance to error checking because as a developer your impression is based upon the crash frequency of your code.
3 . You need to make sure that you allocate the memory appropriately.
In this case what do you mean by
100? are they supposed to be100bytes ? or100 ints?The appropriate allocation would be
4 . You
malloced a bunch of memory in yourmainfunction. But where are youfreeing it ? I don’t see a single call tofree? Remember whatever resource you take (FILE, thread handles , memory etc) from the system, you have to return back . Otherwise that results in a resource leak …!