I need a unique ID for a document read into a c++ program that

Question

0

Asked: June 15, 20262026-06-15T09:59:43+00:00 2026-06-15T09:59:43+00:00

I need a unique ID for a document read into a c++ program that

0

I need a unique ID for a document read into a c++ program that will carry into a database. The ID needs to be the same regardless of whether the document it is tied to is run through the program first, by itself, or in the middle of a stack of other documents. So that I can honor overwrites of the document in the database.

I considered using the ASCII value of the document name such as

Employee Spec Page.doc 358

but it has the same value as

Answer Warnings.doc 358

Which means that when I run the second doc in my program, it overwrites the presence of the first doc.

The ID must be a number and needs to be unique but it must be consistently regeneratable without having to cross-reference the database itself (since this program runs separately from the database import program)

Hoping someone has some ideas because I’m stumped.

EDIT: I tried to use MD5 to convert “Employee Spec Page.doc” and “Answer Warnings.doc” and got the following char representations:

Answer Warnings: 2dcb2503c48f5472bfdbafe28d565a9d

Employee Spec Page: a9be4c1428c11b406072c0bd3dab2dee

However, when I then convert the char* into an unsigned int

char* docID = md5.digestString(pDocument->m_csDocumentName.GetBuffer());
pDocument->m_csDocID.Format("%i",(unsigned int)docID);

I get both being:

Answer Warnings: 1634456

Employee Spec Page: 1634456

I got the md5 class from here: http://bobobobo.wordpress.com/2010/10/17/md5-c-implementation/

What am I doing wrong? I need it to be an integer or else I won’t be able to store the ID in the database.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-15T09:59:45+00:00

what you need is a hash function generating a number big enough to avoid collisions. MD5 (as piokuc above mentioned) should be ok

you can generate shorter keys by simply truncating the MD5 result. but be aware that you increase the chance of collisions. 128 bit has more than 10^38 different keys; 64 bit has more than 10^19; 32 bit has more than 10^9 (4.294.967.296). so 32 bit is near a lottery chance to have a collision between two specific documents. for 10.000 documents you have a chance of 1% thave at least one collision. the acceptance of a certain key length depends on your requirements. you can of course implement collision detection and collision resolution.

if your ‘database’ allows only a short key you have to implement collision resolution. for an idea of how to do that see Hash_table Collision_resolution

from Wikipedia: ’10^−18 to 10^−15 is the uncorrectable bit error rate of a typical hard disk. In theory, MD5 hashes or UUIDs, being 128 bits, should stay within that range until about 820 billion documents’

to your concrete library:

if you look into the md5 header file, there is

public:
    // an MD5 digest is a 16-byte number (32 hex digits)
    BYTE digestRaw[ 16 ] ;

so you can retrieve the binary digest any time

MD5 md5;
char* docID = md5.digestString(pDocument->m_csDocumentName.GetBuffer());
unsigned int hash_ui = *(unsigned int *)digestRaw;

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I need a unique ID for a document read into a c++ program that

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply