Phonetic (name) data types

Phonetic (name) data types


IB supports a number of data types especially tuned to searching for people by name.

stringString (full text)
soundexModified soundex
soundex2applied to whole field
phoneticComputed phonetic hash (for names)
phone2Computed phonetic hash (whole field)
metaphoneComputed optimized/modified metaphone hash (for names)
metaphone2Above metaphone hash applied to the whole field
privhashPrivate Hash (site/vendor specific)

These types are designed address the typical need to handle spelling variations of names.

Metaphone, for example, is based upon a variation of Lawrence Phillips’ Double Metaphone phonetic matching algorithm. It groups words not just by spellings, but also by their potential variations of pronunciation using same code for similar sounding (phonetic) names that can be used for searching similarly sounding names. It's more accurate than soundex because it uses the basic rules of international (English) pronunciation. The *2 types are designed to handle also the potential errors of first and last name swaps.

Soundex, although, for the most part obsoleted by metaphone, still has some uses and there are cases, albeit rarer, where its use is preferred (sometimes incorrect matches can produce interesting serendipities).

dmsoundex was developed in 1985 by genealogist Gary Mokotoff and later improved by Randy Daitch. It has a significant advantage over American Soundex for Eastern European names (especially Jewish).

BMPM: http://stevemorse.org/phonetics/bmpm.htm Beider-Morse Phonetic Matching: An Alternative to Soundex with Fewer False Hits.

Through the combined use of the thesauri subsystem--- and a catalog of names--- the system can find variations beyond their orthographic and phonetic differences--- including common nickanmes--- quite quickly.


Beyond these types there are a number of (computationally more expensive) search time possibilities to search for variations of names such as by phonetic, fuzzy and max. Levenshtein distance (the minimum number of operations needed to transform one string into another).