Well factored Objects

Richard A. O'Keefe ok at cs.otago.ac.nz
Wed Apr 30 00:04:43 UTC 2003


Jimmie Houchin <jhouchin at texoma.net> wrote about converting
a data base to Smalltalk.  I was _just_ able to keep my oar out
until I spotted this:
	>>JHPersonName (first, middle, last, maiden, suffix)
	JHPersonName probably should be
	JHPersonID and have 'first middle last maiden suffix ssn' instance 
	variables. ssn is the US Social Security Number our government ID number 
	for citizens.
	
First off, no, that's NOT what an SSN is.  It is perfectly possible
for a US citizen not to have an SSN.   It is also possible for a US
citizen to have more than one SSN, or so I am credibly informed.  And
of course it is possible for non-US-citizens to have US SSNs.  The
SSN is basically the same as our TFN (Tax File Number), and the only
legitimate reason for holding someone's SSN (or TFN) in a database
is if the person and the database owner are involved in reportable
taxable transactions.  Read the comp.risks archives for more than you
want to know about the problems of SSNs.

The thing I really want to take issue with is this
"first, middle, last, maiden, suffix" stuff.

These are REAL examples:

* What are you going to do about people who only have one name?

* What are you going to do about people who have exactly two names?

* What are you going to do about people who have more than one
  name, but they don't fall into the pattern of "first" and "last"?

* What are you going to do about people who have two names (person,
  family) in their own language but use a third name in an Anglophone
  country (e.g. "Ling (Charlie) Wong")?

* Do "first" and "last" really mean "written first" and "written last"
  or are they code-words for "personal" and "family" name?  Is there
  an expectation that names should be sorted with "last" name as most
  significant?  In that case, what about Chinese and Hungarian names?

* What about people with more than three names?  (Like my father,
  John AEneas Byron O'Keefe, commonly known as Byron.)

* In general, what about people who have three names, but don't go
  by their first name.  (Like my sister-in-law Andrea Fiona Ferris,
  known as Fiona.)

* What about people with compound surnames, like
  (Cornelius) () (van der Vecker)?  (This one has no middle initial.)

* What about people whose names have an honorific in the middle?
  For example, an Irish speaker told me that my name would be written,
  in Irish, as "richard ursul o caiv" where the "ursul" bit means,
  IIRC, "young knight" and functions like "Mr".

* When it comes to suffix, which rules are to be used?
  According to Miss Manners (Judith something-or-other; I have several
  of her books but haven't unpacked them since the last move) the
  correct rule is that when the oldest bearer of the name dies,
  everyone remaining moves up one, so that if for example all your
  family are dead, you have no suffix, no matter how many of your
  ancestors had the same name.  How is the data base to be informed
  of this?  Or will the data base use the rule Miss Manners says is
  incorrect, and everyone will keep their suffix unchanged?
  (I'm assuming that the suffix is "Jn", "Sen", "3rd" and so on.)

* What about honorifics that go at the end of a name?  (Like my uncle,
  J. Edward O'Keeffe, Esq.  Yes, he used two "f"s, my father used one.
  My grandmother could never make up her mind which to use...)  It's
  in the "suffix" position, so is "Esq(uire)" a suffix?  (He's another
  example of someone going by a middle name, not a first name.)

* What about people who have married more than once?  A Californian
  friend of a friend of my dad's had no fewer than three legal names
  at the same time.

* What people want you to call them is not necessarily how they want
  to identify themselves to you.  Mrs John Wilkes, as she wants you
  to call her (see Miss Manners for why it's "John", not "Mary")
  may want to offer her passport, which still reads "Mary Robinson",
  for identification.  (After my marriage, my wife wished to be known
  as Mrs Jeanene O'Keefe, but her passport said Jeanene Ferris; so
  this is a very real example to me.)

* What about people who change their names?  (For example, William
  Thompson gets a bit political about his Polynesian ancestry and
  changes his name to Wiremu Tamehana.  Or, thinking back to
  "Switched-on Bach", Walter Carlos becomes Wendy Carlos.)

* What happened to honorifics, anyway?  If someone tells you he is
  Professor Dr Sir Karl Elias, he probably doesn't want you to ignore
  most of it.

* From a relational point of view, it doesn't make sense to put
  SSN with name.  Someone can change their name without changing
  their SSN.  Someone can change their SSN without changing their
  name.  The number of names someone has and the number of SSNs
  someone has are not related.

I've sweated over this one for years.  I have been extremely annoyed
at data bases that insisted on getting it wrong (like the people who
send me mail addressed to RA Okeefe--I am not now and never have been
a sun god).  There are complex solutions.  But there's also a simple
solution.

(1) A name is a pair of strings.  One of them is a sort key, and
    the other is for display.  It may suffice to use a bracket
    convention:  "[Wong] Sie Lin" vs "J Strother [Moore]" -vs-
    "{Professor Dr Sir} Karl [Elias]" -vs- "K. Edward [O'Keefe] {Esq}".
    For sorting, take the bracketed part as primary key, everything
    without the braced part as secondary key, and the whole thing as
    tertiary key.  For display, delete the brackets and braces.

(2) People don't have names.  Instead there is a *relation* between
    people and names.  (See William Kent's "Data and Reality".)

(3) Any of a person's names should be acceptable in input, but one
    of the names should be designated as the preferred name for output.

This suggests a design fragment like

    JHPersonName (sortKey, displayText, whenValid)

    JHPerson (perferredName, otherName*, ssn?, ...)

But as someone else has already pointed out, you should approach
a design problem like this NOT by saying "what structure can I
perceive in the small range of name types I am familiar with" but
"what do I actually need to DO with names?"

(Similar observations apply to addresses.  I am fed up with
Web forms that won't let me submit them until I have filled in
a state.  My country only has 4 million people.  We _haven't_
any states.  Don't divide an address into standardised USA
parts unless you really have to.)



More information about the Squeak-dev mailing list