I am new to C, so I am not exactly sure where my error is. However, I do know that the great portion of the issue lies either in how I am storing the doubles in the d_buffer (double) array or the way I am printing it.
Specifically, my output keeps printing extremely large numbers (with around 10-12 digits before the decimal point and a trail of zeros after it. Additionally, this is an adaptation of an older program to allow for double inputs, so I only really added the two if statements (in the “read” for loop and the “printf” for loop) and the d_buffer declaration.
I would appreciate any input whatsoever as I have spent several hours on this error.
#include <stdio.h>
#include <fcntl.h>
#include <sys/types.h>
#include <unistd.h>
#include <string.h>
struct DataDescription
{
char fieldname[30];
char fieldtype;
int fieldsize;
};
/* -----------------------------------------------
eof(fd): returns 1 if file `fd' is out of data
----------------------------------------------- */
int eof(int fd)
{
char c;
if ( read(fd, &c, 1) != 1 )
return(1);
else
{ lseek(fd, -1, SEEK_CUR);
return(0);
}
}
void main()
{
FILE *fp; /* Used to access meta data */
int fd; /* Used to access user data */
/* ----------------------------------------------------------------
Variables to hold the description of the data - max 10 fields
---------------------------------------------------------------- */
struct DataDescription DataDes[10]; /* Holds data descriptions
for upto 10 fields */
int n_fields; /* Actual # fields */
/* ------------------------------------------------------
Variables to hold the data - max 10 fields....
------------------------------------------------------ */
char c_buffer[10][100]; /* For character data */
int i_buffer[10]; /* For integer data */
double d_buffer[10];
int i, j;
int found;
printf("Program for searching a mini database:\n");
/* =============================
Read in meta information
============================= */
fp = fopen("db-description", "r");
n_fields = 0;
while ( fscanf(fp, "%s %c %d", DataDes[n_fields].fieldname,
&DataDes[n_fields].fieldtype,
&DataDes[n_fields].fieldsize) > 0 )
n_fields++;
/* ---
Prints meta information
--- */
printf("\nThe database consists of these fields:\n");
for (i = 0; i < n_fields; i++)
printf("Index %d: Fieldname `%s',\ttype = %c,\tsize = %d\n",
i, DataDes[i].fieldname, DataDes[i].fieldtype,
DataDes[i].fieldsize);
printf("\n\n");
/* ---
Open database file
--- */
fd = open("db-data", O_RDONLY);
/* ---
Print content of the database file
--- */
printf("\nThe database content is:\n");
while ( ! eof(fd) )
{ /* ------------------
Read next record
------------------ */
for (j = 0; j < n_fields; j++)
{
if ( DataDes[j].fieldtype == 'I' )
read(fd, &i_buffer[j], DataDes[j].fieldsize);
if ( DataDes[j].fieldtype == 'F' )
read(fd, &d_buffer[j], DataDes[j].fieldsize);
if ( DataDes[j].fieldtype == 'C' )
read(fd, &c_buffer[j], DataDes[j].fieldsize);
}
double d;
/* ------------------
Print it...
------------------ */
for (j = 0; j < n_fields; j++)
{
if ( DataDes[j].fieldtype == 'I' )
printf("%d ", i_buffer[j]);
if ( DataDes[j].fieldtype == 'F' )
d = d_buffer[j];
printf("%lf ", d);
if ( DataDes[j].fieldtype == 'C' )
printf("%s ", c_buffer[j]);
}
printf("\n");
}
printf("\n");
printf("\n");
}
Expected Output:
3 rows of data ending with the number “e = 2.18281828”
To reproduce the problem, the following two files need to be in the same directory as the lookup-data.c file:
– [db-data][1]
– [db-description][2]
EDIT: My earlier speculations were all wrong. The problem was that the database file had big-endian numbers and the data were being read on a little-endian computer. Skip down to the section marked “START READING HERE” to avoid wasting your time with the early speculations (left here for their very limited historical value).
I suspect that your problem is related to the
printf()format specification%lfused to print values, together with the fact that you declaredd_bufferto be of typeint. Likelysizeof(double) > sizeof(int)is true, so you are interpreting extra bytes of data to be part of the float values.I don’t know this for sure since I can’t see the data your program is using to run, but if your
fieldsizefor float data issizeof(float)rather thansizeof(double)then perhaps you are storing the float values intod_bufferjust fine but messing them up when printing. Alternatively, iffieldsizeis equal tosizeof(double), andsizeof(double)is greater thansizeof(int), then you are writing off the end ofd_bufferand then something is corrupting your data.I suggest you change the declaration to
double d_buffer[10];and see if your program works better. Also, see iffieldsizeis set tosizeof(float)orsizeof(double); if it issizeof(float)then declared_bufferas typefloatand use this code:EDIT: Also, I recommend that you switch to using
fopen()andfread()for all your I/O. The basicopen()andread()can returnEINTRwhich means you need to retry the operation, so these are only properly used inside a loop that will retry the call if it returnsEINTR. Thefopen()andfread()calls insulate you from that sort of detail, and have some buffering that can make your programs run more smoothly. (I’m pretty sure your current project is only reading and writing small amounts of data, so the performance difference likely doesn’t matter much to you at this time.)Also, I suggest you get rid of your
eof()function; it is very unusual and likely a bit slow to read a character and then usefseek()to put it back. The C library has the functionfgetc()which gets one character from the input stream, and you can test that character to see if it is theEOFvalue to detect end-of-file. And the C library also has the functionungetc()that can push back one character; this will not involve actually seeking the disk, but rather will just put back the character into some buffer somewhere. But you don’t even need to usefgetc()orungetc()for your code; just check the return value fromfread()and if you get a zero-length read, you know you hit end-of-file. In production code, you will need to check the return value of each function call anyway; you can’t get away with just hoping that every read succeeds.EDIT: One more thing you could try: change the format code from
"%lf"to just"%f"and see what happens. I’m not exactly sure what thatlwill do and you shouldn’t need it. Plain old"%f"should format adouble. But it may not change anything: according to this web page I found, inprintf()there is no difference between"%lf"and"%f".http://www.dgp.toronto.edu/~ajr/209/notes/printf.html
* START READING HERE *
EDIT: Okay, one thing I have figured out for sure. The database format is an index value (an integer value), then a float value, then a string value. You will need to read each one to advance the current position within the file. So your current code that is checking format codes and deciding which thing to read? Not correct; you need to read an integer, a float, and a string for each record.
EDIT: Okay, here is a correct Python program that can read the database. It doesn’t bother to read the metadata file; I just hard-coded in the constants (it’s okay since it is just a throw-away program).
So the index values are 32-bit integers, big-endian; the floats are 64-bit floats, big-endian; and the text fields are 20 characters fixed (so a string of 0 to 19 characters plus a terminating NUL byte).
Here’s the output of the above program:
Now, when I try to compile your C code, I get garbage values out because my computer is little-endian. Are you trying to run your C code on a little-endian computer?
EDIT: To answer the most recent comment, which is a question: For each input record, you must call
read()three times. The first time you read the index, which is a 4-byte integer (big-endian). The second read is an 8-byte float value, also big-endian. The third read is 20 bytes as a string. Each read moves the current position in the file; the three reads together read a single record from the file. After you have read and printed three records, you are done.Since my computer is little-endian, it’s tricky to get the values out correctly in C, but I did it. I made a union that lets me read out an 8-byte value as an integer or as a float, and I used that as the buffer for the call to
read(); then I called__builtin_bswap64()(a feature in GCC) to swap the big-endian value to little-endian, stored the result as a 64-bit integer, and read it out as a float. I also used__builtin_bswap32()to swap the integer index. My C program now prints:So, read each record, make sure you are correct with respect to endian-ness of the data, and you will have a working program.
EDIT: Here are code fragments that show how I fixed the endianness problem.
I’m sort of surprised the database was in big-endian format. Any computer using an x86 family processor is little-endian.
You should definitely not hard-code those numbers (4, 8, 20) like I did; you should use the metadata you received. I’ll leave that for you.
EDIT: Also you shouldn’t call
__builtin_bswap32()or__builtin_bswap64(). You should callntohl()and… I’m not sure what the 64-bit one is. Butntohl()is portable; it will skip the swap if you are compiling on a big-endian computer, and do the swap if you are compiling on a little-endian computer.EDIT: I found a solution to a 64-bit equivalent to
ntohl(), right here on StackOverflow.https://stackoverflow.com/a/4410728/166949
It’s simple if you only care about Linux: instead of using
#include <arpa/inet.h>you can use#include <endian.h>and then usebe32toh()andbe64toh().Once you have this you can read the database file like so:
If you compile the above code on a big-endian machine, it will compile to a do-nothing and read the big-endian value. If you compile the above code on a little-endian machine, it will compile to the equivalent of
__builtin_bswap64()and swap the bytes so the 64-bit value is read correctly.EDIT: I said in a couple of places that you need to do three separate reads: one to get the index, one to get the float, and one to get the string. Actually, you can declare a
structand issue a single read that would pull in all the data in one read. The tricky part, though, is that the C compiler might insert “padding” bytes inside yourstructthat would cause the byte structure of thestructto not exactly match the byte structure of the record read from the file. The C compiler should provide a way to control the alignment bytes (a#pragmastatement) but I didn’t want to get into the details. For this simple program, simply reading three times solves the problem.