Image Unique Identifier

Tue Aug 22 23:14:47 UTC 2006

> From: Ramon Leon
> Sent: Tuesday, August 22, 2006 6:10 PM
> 
> > Hi Ramon,
> >
> > The thing to keep in mind about a UUID is that it is supposed
> > to be unique.
> > By hashing the UUID you will distribute the number more
> > widely depending on the UUID implementation.  The beginning
> > bytes of the UUID are supposed to be widely spread so I doubt
> > that this makes much difference.
> >
> > For example
> >
> > (1 to: 10) collect: [:i | UUID new]  I get:
> > an UUID('48d374b0-b196-6a40-a15e-79a02a8dde89')
> > an UUID('48eacb8f-4564-d84f-a456-2856c5226b97')
> > an UUID('de67f9a4-4a42-cd44-bf79-8046366e6c9d')
> > an UUID('59edd53d-eb69-164b-b7f6-5e63d6284d12')
> > an UUID('fae32b90-49c8-ca46-a583-129e5a0fdc02')
> > an UUID('cc760f65-1732-3647-b632-9087256a85f1')
> > an UUID('6f23d070-a505-a240-addf-a749b02d6243')
> > an UUID('a620fe16-4701-3d4a-9aee-0b549057d855')
> > an UUID('154d299f-e2fe-3b42-bd3d-0fc4c3362db8')
> > an UUID('0cc33014-4f35-6f4e-b9de-4a005a3ceb00'))
> >
> > Notice the first group of bytes are already well dispersed so
> > there is no benefit to the MD5.
> 
> Ah, yea we were hashing it in hopes that it'd lessen the impact of
> trimming
> it.  I wasn't aware it was already dispersed.
> 
> >
> > Truncating to size 16 will remove some of the uniqueness of
> > the UUID.  Since some of the goal of UUID was to disperse the
> > values the likelihood of a truncated UUID overlapping goes up
> > as the number of objects increases but 16 bytes of 25 (which
> > is what you would get with base 36) is close to the
> > uniqueness you get with UUID.
> >
> 
> Any idea how close?

>From wikipedia: 

The number of theoretically possible UUIDs is therefore 256 raisedTo: 16 or
about 3.4 × 10 raisedTo: 38. This means that during a 10 billion year
lifetime of the Earth, about 1 trillion UUIDs have to be created every
nanosecond to exhaust the number of UUIDs.

So I would say depending on the distribution, if it is evenly spread through
all bytes then the answer to your question is 16/25 (roughly 3/5) of the
above statement.  

I understand the original implementation of UUID used information from the
computer but people said this was a bad thing since data created from a
computer could be traced back to that computer.  I don't see that in this
UUID and I'm not sure of our implementation.  If it is based on SHA then the
distribution should be very close to random meaning 3/5th's.  If it is the
old UUID where the distribution was wider on the beginning bits then I'd say
you have a larger amount of uniqueness in the first 16 of 25 bits, although
guaranteed uniqueness came from the computer id so who knows.

> 
> > But notice the following:
> >
> > A UUID new asInteger printStringBase: 36 returns a string
> > that is 25 bytes long for example:
> > 'BGI8YR7NBJJTUOWIBQJU7E5TA' and if you take the first 16 you
> > get only 16 bytes (one for each character).  But if you store
> > the UUID asInteger guess what?  The integer is only 16 bytes!
> >  So the right thing to do if you have a budget of 16 bytes is
> > to store the UUID as an integer instead of as a string!
> 
> It's not about storage, I can store whatever I want, it's about a short
> string representation for human use.
> 

Does your implementation share resources, like a single database?  If so a
human readable record number could use a db sequence starting at 1.  Those
are the most human readable numbers.  Or consider bar codes of the 16 bit
number instead!

Overall I don't see 'BGI8YR7NBJJTUOWI' being much more readable then
'0cc33014-4f35-6f4e-b9de-4a005a3ceb00'.  I like 123632 better!

Happy coding!

Ron Teitelbaum