Objects

Objects beyond text


IB is unique among full-text systems in that it also provides numerous object types with their own methods of search.

Native Object Types



stringString (full text)
numericalNumerical IEEE floating
computedComputed Numerical
rangeRange of Numbers
dateDate/Time in any of a large number of well defined formats
date-rangeRange of Date as Start/End but also +N Seconds (to Years)
gpolyGeospatial n-ary bounding coordinates
boxGeospatial bounding box coordinates (N,W,S,E)
timeNumeric computed value for seconds since 1970, used as date.
ttlNumeric computed value for time-to-live in seconds.
expiresNumeric computed ttl value as date of expiration.
booleanBoolean type
currencyMonetary currency
dotnumberDot number (Internet Addresses, UIDs etc)
phoneticComputed phonetic hash (for names)
phone2Computed phonetic hash (whole field)
metaphoneComputed optimized/modified metaphone hash (for names)
metaphone2Above metaphone hash applied to the whole field
hashComputed 64-bit hash of field contents
casehashComputed case-independent hash of text contents
privhashPrivate Hash (site/vendor specific)
isbnInternational Standard Book Number
telnumberISO/CCITT/UIT E.164 Telephone Number (including a large numbers of local encoding conventions for national and international number formatting)

The above types have proved to be quite flexible and to solve many IR problems. The type "hash", for example, can be used to locate known images across XML/SGML documents. Metaphone addresses, on the other hand, the typical need to handle spelling variations of names. Its based upon a variation of Lawrence Phillips’ Double Metaphone phonetic matching algorithm. It groups words not just by spellings, but also by their potential variations of pronunciation. Soundex, although, for the most part obsoleted by metaphone, still has some uses and there are cases, albeit rarer, where its use is preferred (sometimes incorrect matches can produce interesting serendipities).

"External" Object types


In addition IB provides extended distributed objects via interface glue into other systems via ODBC, CORBA or object embedding. This allows indexing content--- for example from RSS/XML--- to be stored in and searched from other systems. This is useful in many dynamic applications in commerce and trading (keeping live counts of goods on hand, selling prices, etc.).Standard interface modules are provided for ODBC and platform native Berkeley DB (resp. GDB).


Autodetection


Various doctypes can automatically (if enabled, resp. not disabled) at index time detect a number of field datatypes:
  • Number (numerical)
  • Numerical Range (range)
  • Telephone Number (telnumber)
  • Date (date)
  • Date Range (date-range)
  • Monetary Currency (currency)
  • Dot Number (dotnumber)
  • Geospatial bounding box (box)
.

Inline definition


Objects are typically defined, resp. managed, via configuration (.ini files) but may for a XMLish record format (GILSXML) be defined in place as Type (alternatively via configuration another word can be set). Example:

<AGE Type="numerical">12</AGE>

defines (or adds) the type of the field AGE as numerical.

<record>
  <person>
     <uid Type="numerical">123457</uid>
     <name>
       <last Type="metaphone">Zimmermann</last>
       <first>Edward</first>
     </name>
     <company>
       <name>NONMONOTONIC Lab</name>
       <address>
         ....
       </address>
     
</record>

In the above the XML path record/person/name/last would be defined as of type "metaphone" while record/person/company/name would be of standard string (textual) type.