How bad is changing generated GUID manually and using it? Is the probability of collision still insignificant or is manipulation with GUIDs dangerous?
Sometimes we just change some letter of previously generated GUID and use it. Should we stop doing it?
Note: This answer misses a few older formats and in the near future also newer variants. This shouldn’t change the basic idea of the answer: Don’t do it, you risk making it non-unique (or increase the chance of a collision).
This depends on the version of the GUID and where you are making the change. Let’s dissect a little how a GUID actually looks like:
A GUID has a version. The 13th hex digit in a GUID marks its version. Current GUIDs are usually generated with version 4. If you change the version backwards you risk collision with a GUID that already exists. Change it forwards and you risk collision with potential future GUIDs.
A GUID has a variant too. The 17th hex digit in a GUID is the variant field. Some values of it are reserved for backward compatibility, one value is reserved for future expansion. So changing something there means you risk collision with previously-generated GUIDs or maybe GUIDs to be generated in the future.
A GUID is structured differently depending on the version. Version 4 GUIDs use (for the most part – excepting the 17th hex digit) truly random or pseudo-random bits (in most implementation pseuso-random). Change something there and your probability of collision remains about the same.
It should be very similar for version 3 and 5 GUIDs which use hashes, although I don’t recall ever seeing one in the wild. Not so much for versions 1 and 2, though. Those have a structure and depending on where you change something you make things difficult.
Version 1 GUIDs include a timestamp and a counter field which gets incremented if two GUIDs are generated in the same clock interval (and thus would lead to the same timestamp). If you change the timestamp you risk colliding with a GUID generated earlier or later on the same machine. If you change the counter you risk colliding with a GUID that was generated at the same time and thus needed the counter as a “uniquifier”.
Version 2 GUIDs expand on version 1 and include a user ID as well. The timestamp is less accurate and contains a user or group ID while a part of the counter is used to indicate which one is meant (but which only has a meaning to the generating machine). So with a change in those parts you risk collision with GUIDs generated by another user on the same machine.
Version 1 and 2 GUIDs include a MAC address. Specifically, the MAC address of the computer that generated them. This ensures that GUIDs from different machines are different even if generated in the very same instant. There is a fallback if a machine doesn’t have a MAC address but then there is no uniqueness guarantee. A MAC address also has a structure and consists of an “Organisationally Unique Identifier” (OUI; which is either locally-administered or handed out by the IEEE) and an unique identifier for the network card.
If you make a change in the OUI you risk colliding with GUIDs generated in computers with network cards of other manufacturers. Unless you make the change so the second-least significant bit of the first octet is 1, in which case you’re switching to a locally-administered OUI and only risk collision with GUIDs generated on computers that have an overridden MAC address (which might include most VMs with virtual network hardware).
If you chance the card identifier you risk collision with GUIDs generated on computers with other network cards by the same manufacturer or, again, with those where the MAC address was overridden.
No other versions exist so far but the gist is the following: A GUID needs all its parts to ensure uniqueness; if you change something you may end up with a GUID which isn’t necessarily unique anymore. So you’re probably making it more of a GID or something. The safest to change are probably the current version 4 GUIDs (which is what Windows and .NET will generate) as they don’t really guarantee uniqueness but instead make it very, very unlikely.
Generally I’d say you’re much better off generating a new GUID, though. This also helps the person reading them because you can tell two GUIDs apart as different easily if they look totally different. If they only differ in a single digit a person is likely to miss the change and assume the GUIDs to be the same.
Furthermore, another person in another place might be compelled to do the same thing you did, perhaps to the same digit and you end up with two equal GUIDs that way. Considering the likelihood of collisions in the normal case (very low), in a large project where potentially multiple GUIDs would be derived from the same one in this manner, this might actually be the most probable cause of a collision.
Further reading: