Fields

IB supports fielded data with extremely powerful and flexible methods.

Text Search


In its most basic form (when performing a search) you may specify a field, or a field path--- If no field is specified it means ANYWHERE in the record. Fields are specified with a / as

[FieldName/]term
Example. In the following (using the XMLified Shakespeare collection again)
<SPEECH>
  <SPEAKER>LADY MACBETH<<SPEAKER>
  <LINE>Yet here's a spot.</LINE>
</SPEECH>
The lines are available via the FIELD "LINE" as well as the path "PLAY\ACT\SCENE\SPEECH\LINE". To search for "spot" in a line would be
LINE/spot

(or to search in the specific lines that are elements of SPEECH etc. as

PLAY\ACT\SCENE\SPEECH\LINE/spot

The result of a search is a set and these sets can be combined with operators. One may also rank the importance of sets via weights as specified with :num.

LINE/spot:10

Field searching may also be expressed via some of the binary and unary operators.

XML example: Document 1

<a> <c>Shorty</c> </a>

Document 2:

<a> <b> <c>Shorty</c> <b> </a>

The query expressions:

 A\C/shorty  (or shorty WITHIN:A\C)

will match `Shorty' only in document 1. The expression

A\B\C/shorty (or shorty WITHIN:A\B\C)

matches only document 2. If you want to match `Shorty' in both you can in do that by specifying RPN:

shorty WITHIN:A\*\C,

or the generic

A\*\C/shorty

In paths the '*' character means 0 or more characters. It may include multiple sub-tag segments.

The following expressions:

A/shorty
B/shorty
C/shorty

Will all also find winnie in both document 1 and document 2. The character '/' separates in the generic format field from term so we can mix wildcards in both: The expression

A*C/sh?r?y

will too find "Shorty".

Object Search

[FieldName][Op]term
Where Op is <, <=, =, !=, >, >=. The semantics for these comparisons depends upon the type of the field that they are applied to. Example. Given a numerical field "pages", the expression
pages>10

would mean those records where "pages" is greater than 10 (for example 11, 12 or 100).

While > means "greater than" for numerical, "after" for temporal (such as date) fields, it means right truncated with text fields (author>Ed is the same as author/Ed* )--- and not lexicographic order (author>Ed does not, for example, match Simon). For geospatial boxes and polygons these comparison specifications are not even defined since every point on the earth can be viewed as before or after every other point.

Intervals (numerical ranges, date ranges etc.) support also "before" and "before during" via <[range] (before during) and <{range} (before), resp., >[range] (after during) and >{range} (after).

These relations may be directly set via the API: "Less Than", "Less Than or Equal", "Equal", "Greater than or Equal", "Greater than", "Not Equal", "Overlaps", "Enclosed Within", "Outside", "Near", "Members Equal", "Members Not Equal", "Before", "Before During", "During", "During After", "After", "Before Strict", "Before During Strict", "During Strict", "During After Strict" and "After Strict".


Field Operators


Unary field Operators:
WITHIN:fieldRecords with elements within the specified field RPN queries "term WITHIN:field" and "field/term" are equivalent. (for performance the query "field/term" is preferred to "term WITHIN:field")
INSIDE:fieldHits are elements that are exclusively in the specified field
XWITHIN:fieldAbsolutely NOT in the specified field (a set equivalent to "NOT INSIDE:field")

Binary field Operators
PEERElements in the same (unnamed) final tree leaf node. Its like an AND but with the additional (stringent) requirement that some elements intersect in the same (end) node instance. The hits in the set are, of course, only those in the intersection.
AND:fieldElements in the same node instance of field
BEFORE[:field], AFTER[:field]Without the named field its like an ordered PEER. With a named field its like an ordered AND:field.

In the above operators Field may contain wildcards according to the "Glob Expression Syntax"":
SyntaxSematics
*Match zero or more characters. PL*, for example, can match PLAY, PLAY\ACT, PLAY\ACT\SCENE etc.
?Match one character
[...]Match any of the characters (set) enclosed by the brackets. Characters '*" and '?" are interpreted in the set as normal characters and not as wildcards.
[!...]Any character NOT in the set is matched.
[.-.]A '-' between two characters denotes a range. The set [A-C], for example would match any character between A and C: namely 'A', 'B' or 'C'.
{.,.}..{.,.}Match, for example, {1,2}{a,b} to 1a 1b 2a 2b. The term "S{z,c}ene would match "Szene" as well as "Scene".

Example

mantua WITHIN:P*E

matches the mantua in PLAY\ACT\SCENE\TITLE but also PLAY\ACT\SCENE\SPEECH\LINE.

"mantua" WITHIN:PLAY\A*\S*E\TITLE

matches only those mantua in PLAY\ACT\SCENE\TITLE.

"out" PEER "spot"

is those records where "out" and "spot" occur within the same element. In the works of Shakespeare this is LINE (both are in the same line) but a search like

"lady" PEER "Macbeth"

would probably more often be the SPEAKER fields.

If we wanted to explicitly search for "out" and "spot" in the same LINE we'd do something like

"out" AND:LINE "spot"

We could also have explicitly specified a path as:

 "out" AND:PLAY\ACT\SCENE\SPEECH\LINE "spot"

If we wanted a line with "out"..."spot" (in that order) we'd use the BEFORE:line (resp. AFTER:line as A BEFORE B is the same as B AFTER A) binary operation.

Limiting BEFORE and AFTER to the same field instance is not the same set the intersection of a set BEFORE:10000 (or some other large number greater than the size of the record) and AND:field.

  • The set of SetA AND:field SetB is the set of all records where SetA and SetB have elements in the same instance of FIELD
  • The set of SetA BEFORE:10000 SetB is the set of records where records common elements where one hit in SetA is before and within 10000 bytes of a hit in SetB. The distance may cross (and includes in the count) tag boundaries.
  • SetA BEFORE:field SetB is the set of records where there are common hits in the same instance of FIELD and where a hit in SetA is before SetB.