Terms

Terms


A term is composed of a word or literal phrase enclosed in " marks as well as an optional post-fix matching operator and weight.

WordorPhrase[PostFixOp][:IntegerWeight]

PostFixOpFunction
.Exact Match
~Phonetic match. Note: IB contains several phonetic field types. These have the advantage that they are more powerful and provide significantly better performance. Their disadvantage is that a field needs to be defined as phonetic at index time.
=Case dependent match
$"Freeform". Don't store the hit coordinates. It allows one to include terms in a search to effect the ranking but not structure or proximity nor be highlighted in the result. Since its also very fast (a factor of several times over standard term search) its also useful in applications when one is just interested in knowing what documents contain a term (the typical operation, for example, of Internet search engines).


Example:

President=:10

means to search for terms that match exactly the case of "President" (not, for example, "president") and assign these a weight of 10x. Weights may be positive or negative. If no weight is specified its default factor is 1 (one). Negative values for weight say "terms that match are less relevant",

President=:10 or cheese:-4

would rank documents with "cheese" towards the bottom of the list (before, for example, those about Putin, Bush or Mugabe).


Words may contain wild cards. IB supports the so-called " Glob Expression Syntax":

SyntaxSematics
*Match zero or more characters. term*, for example, is equivalent to right truncated search
?Match one character
[...]Match any of the characters (set) enclosed by the brackets. Characters '*" and '?" are interpreted in the set as normal characters and not as wildcards.
[!...]Any character NOT in the set is matched.
[.-.]A '-' between two characters denotes a range. The set [A-C], for example would match any character between A and C: namely 'A', 'B' or 'C'.
{.,.}..{.,.}Match, for example, {1,2}{a,b} to 1a 1b 2a 2b. The term "L{e,i}banon would match "Lebanon" as well as "Libanon".
\The character '\' is an escape. When used with wildcards or other special characters it means that the character should match itself and not have its special sematics. \*, for instance, matches '*'.

The "-" character


The characer "-" has a special meaning in a term. Its typically used for hyphenation but means in IB that there is a character (even white space) between the bits. The query "auto-mobile" then matches "auto-mobile" but also "auto-mobile". Should the terms "auto" and "mobile" not be too common and "auto" is not found then (and only then) does the system seach for "automobile".
Searching the news, for example, for "Bank-ING" will find Bank ING (Dutch International Netherlands Group Bank) and not "banking" but "ba-nking" will probably find "banking", "flow-er" might find "flower" and "kern-energie" might find "kernenergie".
The "-" feature is controlled by index (search time) configuration and may differ from index to index.