Image Unique Identifier
Ron Teitelbaum
Ron at USMedRec.com
Tue Aug 22 23:14:47 UTC 2006
> From: Ramon Leon
> Sent: Tuesday, August 22, 2006 6:10 PM
>
> > Hi Ramon,
> >
> > The thing to keep in mind about a UUID is that it is supposed
> > to be unique.
> > By hashing the UUID you will distribute the number more
> > widely depending on the UUID implementation. The beginning
> > bytes of the UUID are supposed to be widely spread so I doubt
> > that this makes much difference.
> >
> > For example
> >
> > (1 to: 10) collect: [:i | UUID new] I get:
> > an UUID('48d374b0-b196-6a40-a15e-79a02a8dde89')
> > an UUID('48eacb8f-4564-d84f-a456-2856c5226b97')
> > an UUID('de67f9a4-4a42-cd44-bf79-8046366e6c9d')
> > an UUID('59edd53d-eb69-164b-b7f6-5e63d6284d12')
> > an UUID('fae32b90-49c8-ca46-a583-129e5a0fdc02')
> > an UUID('cc760f65-1732-3647-b632-9087256a85f1')
> > an UUID('6f23d070-a505-a240-addf-a749b02d6243')
> > an UUID('a620fe16-4701-3d4a-9aee-0b549057d855')
> > an UUID('154d299f-e2fe-3b42-bd3d-0fc4c3362db8')
> > an UUID('0cc33014-4f35-6f4e-b9de-4a005a3ceb00'))
> >
> > Notice the first group of bytes are already well dispersed so
> > there is no benefit to the MD5.
>
> Ah, yea we were hashing it in hopes that it'd lessen the impact of
> trimming
> it. I wasn't aware it was already dispersed.
>
> >
> > Truncating to size 16 will remove some of the uniqueness of
> > the UUID. Since some of the goal of UUID was to disperse the
> > values the likelihood of a truncated UUID overlapping goes up
> > as the number of objects increases but 16 bytes of 25 (which
> > is what you would get with base 36) is close to the
> > uniqueness you get with UUID.
> >
>
> Any idea how close?
>From wikipedia:
The number of theoretically possible UUIDs is therefore 256 raisedTo: 16 or
about 3.4 × 10 raisedTo: 38. This means that during a 10 billion year
lifetime of the Earth, about 1 trillion UUIDs have to be created every
nanosecond to exhaust the number of UUIDs.
So I would say depending on the distribution, if it is evenly spread through
all bytes then the answer to your question is 16/25 (roughly 3/5) of the
above statement.
I understand the original implementation of UUID used information from the
computer but people said this was a bad thing since data created from a
computer could be traced back to that computer. I don't see that in this
UUID and I'm not sure of our implementation. If it is based on SHA then the
distribution should be very close to random meaning 3/5th's. If it is the
old UUID where the distribution was wider on the beginning bits then I'd say
you have a larger amount of uniqueness in the first 16 of 25 bits, although
guaranteed uniqueness came from the computer id so who knows.
>
> > But notice the following:
> >
> > A UUID new asInteger printStringBase: 36 returns a string
> > that is 25 bytes long for example:
> > 'BGI8YR7NBJJTUOWIBQJU7E5TA' and if you take the first 16 you
> > get only 16 bytes (one for each character). But if you store
> > the UUID asInteger guess what? The integer is only 16 bytes!
> > So the right thing to do if you have a budget of 16 bytes is
> > to store the UUID as an integer instead of as a string!
>
> It's not about storage, I can store whatever I want, it's about a short
> string representation for human use.
>
Does your implementation share resources, like a single database? If so a
human readable record number could use a db sequence starting at 1. Those
are the most human readable numbers. Or consider bar codes of the 16 bit
number instead!
Overall I don't see 'BGI8YR7NBJJTUOWI' being much more readable then
'0cc33014-4f35-6f4e-b9de-4a005a3ceb00'. I like 123632 better!
Happy coding!
Ron Teitelbaum
More information about the Squeak-dev
mailing list
|