Query Languages

The IB engines supports alongside "smart modus" expressions, multiple query languages and includes a well defined class structure to extend the engine with syntax for other languages.

Out of the box it supports queries expressed in both "Infix" (common to arithmetic and logical formulas) and "RPN" notations.

Smart

Smart is a query interface unique to IB. It tries to be "smart" and guess the intentions of a query. If the query, for example, looks like a "RPN" or "infix" notation query its then treats them as an RPN, resp. infix, notation query. If however it contains multiple words or an expression (Infix or RPN) where all the words are OR'd its treated as follows:
  1. The words are treated as a literal expression ("phrase"). If no results are found...
  2. The words are treated as terms within the same field (path node) instance (if none were specified by the interface designers it defaults to the name node instance (as PEER). If no results are found...
  3. It searches for results with any of the terms (like OR)
The result is then "reduced" to the number of terms in the query(records with nnn matching terms). Example:

dog cat fish

If there are records with all 3 words ("dog", "cat", "fish") then the reduced set would consist of only those records that contained all three terms. If, however, no record had all three terms but some had 2 terms it would return those records with 2 terms. If no record had 2 terms its the set of all records containing any of the 3 terms. One can think of reduction as a way to do an intuitive AND then OR.

NOTE: With the smart modus reduction is also applied to "RPN" and "Infix" expressions that contain ONLY the operator OR. The infix query, for example

"dog" or "cat" or "fish"

would produce the same result set-- providing that there is no phrase "dog cat fish" in the collection-- as

dog cat fish

In our Shakespeare example lets look at some smart searches:

To be or not to

The match is to the line "To be, or not to be: that is the question:" as spoken by Hamlet. The queries her love and love her result in different hits since both find phrases "her love" (as in "Her love is not the hare that I do hunt") and "love her" (as in "Mark the encounter: if he love her not"). The query

jew love

finds only one match: "love her, I am a Jew. I will go get her picture." (as spoken by Benedick in `Much Ado about Nothing'). Its the only line that contains both the words "love" and "jew". The query

hate jew

returns multiple lines. There are no lines nor even speeches in all of the works that contain both the words "hate" and "jew". Only `The Merchant of Venice' contains both words. Relevant lines would be then those from `The Merchant of Venice' that contain either the word "love" or "hate". The lead-up is the line by Shylock "I hate him for he is a Christian,"


Smart not only tries to recognize infix and postfix notation queries but also is polymprphic. In IBU News, for instance, since PUBDATE is a field of date type one can search

iran and PUBDATE>=20100208

to search for articles on iran that were published on or since 2 Feb 2010.


Infix Notation

Infix notation is the common arithmetic and logical formula notation, in which operators are written infix-style between the operands they act on (e.g. 2 + 2).
See: IB Infix Notation.


RPN Notation

Reverse Polish notation (RPN), also known as postfix notation, was invented by Australian philosopher and computer scientist Charles Hamblin in the mid-1950s. It is derived from the Polish notation, which was introduced in 1920 by the Polish mathematician Jan Łukasiewicz.

In RPN the operands precede the operator, removing the need for parentheses. For example, the expression 3 * ( 4 + 7) would be written as 3 4 7 + *.
See: IB RPN Notation.