I have been experimenting with using UUIDs as database keys. I want to take up the least amount of bytes as possible, while still keeping the UUID representation human readable.
I think that I have gotten it down to 22 bytes using base64 and removing some trailing ‘==’ that seem to be unnecessary to store for my purposes. Are there any flaws with this approach?
Basically my test code does a bunch of conversions to get the UUID down to a 22 byte String, then converts it back into a UUID.
import java.io.IOException; import java.util.UUID; public class UUIDTest { public static void main(String[] args){ UUID uuid = UUID.randomUUID(); System.out.println('UUID String: ' + uuid.toString()); System.out.println('Number of Bytes: ' + uuid.toString().getBytes().length); System.out.println(); byte[] uuidArr = asByteArray(uuid); System.out.print('UUID Byte Array: '); for(byte b: uuidArr){ System.out.print(b +' '); } System.out.println(); System.out.println('Number of Bytes: ' + uuidArr.length); System.out.println(); try { // Convert a byte array to base64 string String s = new sun.misc.BASE64Encoder().encode(uuidArr); System.out.println('UUID Base64 String: ' +s); System.out.println('Number of Bytes: ' + s.getBytes().length); System.out.println(); String trimmed = s.split('=')[0]; System.out.println('UUID Base64 String Trimmed: ' +trimmed); System.out.println('Number of Bytes: ' + trimmed.getBytes().length); System.out.println(); // Convert base64 string to a byte array byte[] backArr = new sun.misc.BASE64Decoder().decodeBuffer(trimmed); System.out.print('Back to UUID Byte Array: '); for(byte b: backArr){ System.out.print(b +' '); } System.out.println(); System.out.println('Number of Bytes: ' + backArr.length); byte[] fixedArr = new byte[16]; for(int i= 0; i<16; i++){ fixedArr[i] = backArr[i]; } System.out.println(); System.out.print('Fixed UUID Byte Array: '); for(byte b: fixedArr){ System.out.print(b +' '); } System.out.println(); System.out.println('Number of Bytes: ' + fixedArr.length); System.out.println(); UUID newUUID = toUUID(fixedArr); System.out.println('UUID String: ' + newUUID.toString()); System.out.println('Number of Bytes: ' + newUUID.toString().getBytes().length); System.out.println(); System.out.println('Equal to Start UUID? '+newUUID.equals(uuid)); if(!newUUID.equals(uuid)){ System.exit(0); } } catch (IOException e) { } } public static byte[] asByteArray(UUID uuid) { long msb = uuid.getMostSignificantBits(); long lsb = uuid.getLeastSignificantBits(); byte[] buffer = new byte[16]; for (int i = 0; i < 8; i++) { buffer[i] = (byte) (msb >>> 8 * (7 - i)); } for (int i = 8; i < 16; i++) { buffer[i] = (byte) (lsb >>> 8 * (7 - i)); } return buffer; } public static UUID toUUID(byte[] byteArray) { long msb = 0; long lsb = 0; for (int i = 0; i < 8; i++) msb = (msb << 8) | (byteArray[i] & 0xff); for (int i = 8; i < 16; i++) lsb = (lsb << 8) | (byteArray[i] & 0xff); UUID result = new UUID(msb, lsb); return result; } }
output:
UUID String: cdaed56d-8712-414d-b346-01905d0026fe Number of Bytes: 36 UUID Byte Array: -51 -82 -43 109 -121 18 65 77 -77 70 1 -112 93 0 38 -2 Number of Bytes: 16 UUID Base64 String: za7VbYcSQU2zRgGQXQAm/g== Number of Bytes: 24 UUID Base64 String Trimmed: za7VbYcSQU2zRgGQXQAm/g Number of Bytes: 22 Back to UUID Byte Array: -51 -82 -43 109 -121 18 65 77 -77 70 1 -112 93 0 38 -2 0 38 Number of Bytes: 18 Fixed UUID Byte Array: -51 -82 -43 109 -121 18 65 77 -77 70 1 -112 93 0 38 -2 Number of Bytes: 16 UUID String: cdaed56d-8712-414d-b346-01905d0026fe Number of Bytes: 36 Equal to Start UUID? true
You can safely drop the padding "==" in this application. If you were to decode the base-64 text back to bytes, some libraries would expect it to be there, but since you are just using the resulting string as a key, it’s not a problem.
I’d use Base-64 because its encoding characters can be URL-safe, and it looks less like gibberish. But there’s also Base-85. It uses more symbols and codes 4 bytes as 5 characters, so you could get your text down to 20 characters.