IB Postfix (RPN) Notation

Introduction


Reverse Polish notation (RPN), also known as postfix notation, was invented by Australian philosopher and computer scientist Charles Hamblin in the mid-1950s. It is derived from the Polish notation, which was introduced in 1920 by the Polish mathematician Jan Łukasiewicz.


In RPN the operands precede the operator, removing the need for parentheses. For example, the expression 3 * ( 4 + 7) would be written as 3 4 7 + *, Because of this there is no need for grouping and, similarly, no requirement for the precedence rules required by infix notation.


Implementations of RPN are stack-based; that is, operands are popped from a stack, and calculation results are pushed back onto it. Although this concept may seem obscure at first, RPN has the advantage of being extremely easy, and therefore fast, for a computer to analyze.


The algorithm for evaluating any postfix expression:

  • While there are input tokens left
    1. Read the next token from input.
    2. If the token is a value ---> Push it onto the stack.
      Else the token is a function. (Operators, like +, are simply functions taking two arguments.)
      1. It is known that the function takes n arguments.
      2. If there are fewer than n values on the stack --> Error: The user has not input sufficient values in the expression.
      3. Else, Pop the top n values from the stack.
      4. Evaluate the function, with the values as arguments.
      5. Push the returned results, if any, back onto the stack.
    3. If there is only one value in the stack -> Done: That value is the result of the calculation.
    4. If there are more values in the stack ---> Error: The user input too many values.

Implementation


A RPN sentence is constructed as
<RPNTerm1> <RPNTerm2> <Op1> <RPNTerm3><Op2>...

RPN Terms are

  • Terms with an optional postfix unary operator and/or an optional field specification prefix: [<Field Name>/]<Term>[<PostfixTermOp>][:<Weight>]
  • Valid RPN sentences.

A Term is

  • A word
  • A literal phrase enclosed in " (quotation) marks, eg. "A literal phrase".

Postfix TermOps
OpFunction
.Exact match.
~Phoenetic match.
=Case Dependant match.


A weight is a non-zero integer value

  • Positive := the larger, the more relevant.
  • Negative := the larger, the less relevant.


Supported Binary Operators (a selection)
Binary Operators
ORUnion, the set of all elements in either of two sets
ANDIntersection, the set of all elements in both sets
ANDNOTElements in the one set but NOT in the other
XORExclusive Union, elements in either but not both
ADJMatching terms are adjacent to one another (as stored on the file system)
NEAR[:num]matching terms in the sets are within X elements as stored on the file system (file offsets). The value num as integer is bytes (octets). As fraction of 1 its % of the length of record.
PEERElements in the same (unnamed) final tree leaf node
AND:fieldElements in the same node instance of field
BEFORE, AFTERIn fielded records ts like an ordered PEER.
BEFORE:field, AFTER:fieldWith a named field its like an ordered AND:field.
BEFORE:num, AFTER:numlike NEAR but ordered
FOLLOWS, PRECEDESWithin some ordered elements of one another
FARElements a "good distance" away from each other
NEARElements "near" one another.
See also: Binary Operators for the complete list.


Supported Unary Operators (a selection)
Set Unary Operators:
NOTSet compliment
WITHIN[:field]Records with elements within the specified field. RPN queries "term WITHIN:field" and "field/term" are equivalent. (for performance the query "field/term" is prefered to "term WITHIN:field")
WITHIN[:daterange]Record dates within the range
INCLUSIVE[:field]Inclusive Within: ALL Hits (and ONLY THOSE) are elements that are in the specified field. Matching records are those that have their hits absolutely in the specified field and nowhere else.
XWITHIN[:field]Absolutely NOT in the specified field
Special Unary Operators
BOOST:fff.ffBoost the score of the set by fff.ff (Score = Score * fff.ff)
REDUCE[:nnn]Reduce set to those records with nnn matching terms.
NOTE: REDUCE:metric is a special kind of unary operator that trims the result
TRIM:fff.ffTrim to the set to contain a max. number of records. If fff.fff is an integer then its the maximum number. If fff.fff is a floating point number between 0 and 1 it is taken as a per-cent of the total number of records in the index. An index with 1 million records and TRIM:0.1 would mean max. 100000. A floating point number > 1 is taken as the integer component + the percentage of the number of records. Example: 1000.01 for above = 1000 + 10000 = 11000
HITCOUNT:nnnTrim the set to contain only those records with, when nnn is positive, at least nnn hits. When nnn is a negative number then the set it to contain those records with no more than -nnn hits.

Example: HITCOUNT:10 would return those with no less than 10 hits. HITCOUNT:-10 would return those records with up to 10 hits but not more. The combination HITCOUNT:-10 HITCOUNT:10 creates the set of records with exactly 10 hits.

One may specify this as HITCOUNT>nnn (HITCOUNT>10 is equivalent to HITCOUNT:11), HITCOUNT>=nnn (same as HITCOUNT:nnn), HITCOUNT
SORTBY:<keyword>Sort the set (reserved names for <keyword>: Key, Hits, Date, Index, Score, AuxCount, Newsrank, Function, Category, ReverseHits, ReverseDate, etc.)

NOTE: The default sort **MUST** be set to unsorted for the query sort to propagate into the final set.
Unary Neo-Operators (Sets)
FILE:<filespec>The set of all records whose input file path match <filespec> (example: FILE:shakesp*.xml).
EXTENSION:<ext>The set of all records whose input file has the extension <ext> (example: EXTENSION:cgm).
KEY:<keyspec>The set of all records whose key match the <keyspec>.
DOCTYPE:<doctype>The set of all records whose doctype (index format) matches <doctype>.
Note: The above specifications fully support wildcards. They may not be used alone but only as part of a query sentence with at least 1 term and a binary operator. Example (Infix):

"hedgehog " and FILE:shakespeare.*

See also: Unary operators for the complete list.


Example RPN Sentences:

  • term1 term2 AND
    return records that include both terms

    term1 and term2.

    As logical sets, term1 && term2.

  • term1 term2 OR
    return records that include either
    term1, term2 or both terms.

    As logical sets, term1 || term2.

  • term1 term2 ANDNOT
    return those records that include the
    term term1 but not the
    term term2

    As logical sets, term1 - (term1 && term2) .
  • term1 term2 NEAR
    return records that include term1

    and term2 within at most a "few words" (distance or for structured data: in the same container) in-between.

  • term1 term2 before
    return records where term1 is
    within a few
    words before term2.
  • term1 term2 AFTER
    return records where term1 is
    within a few
    words after term2.

  • term1 term2 ANDNOT term1 ANDNOT
    This is equivalent to term1 term2 and
  • term1 NOT
    The complement of term1: all without term1.
  • term1 term2 NOT AND
    This is the equivalent to term1 term2 ANDNOT.
  • term1 term2 XOR
    return those records with term1 or term2 but not both

    As logical sets, (term1 || term2) - (term1 && term2).
  • term1 term2 Op1 term3 term4 Op2 Op3

    As logical sets, (term1 Op1 term2) Op3 (term3 Op2 term4)


Complex Attribute Search


In RSS we have some complex attribute fields such as:
      <category domain="http://www.fool.com/cusips">MSFT </category>
where we'd like to search in the domain of the category as well as its content.
The field describing the domain is called category@domain (which contains the fool.com URL) while the content is MSFT and defined in the field category. These are technically two different fields. They are, however, related. In IB we have a special neo-field "." (called "dot"). It refers to a special kinds of relation between the content of complex fields and the context of the attributes. It allows us to express search queries relating the two.


Example: <law domain="Bavaria">Bayerische Gesetz zum Schutz der Gesundheit<law>

To search for laws with domain "bavaria" we'd search law@domain/bavaria.
To search for laws about "Gesundheit" we'd search law/Gesundheit.

Now if we'd like to have all the laws in domain "Bavaria" that are about Gesundheit we'd search:

law@domain/Bavaria law/Gesundheit AND:.

Notice the above and:. to mean in the same place.

The query:

Schutz Gesundheit AND:LAW

means to search for "Schutz" and "Gesundheit" in the same LAW container. To combine with the above

Schutz Gesundheit AND:LAW law@domain/bavaria AND:.

to search for only those "Schutz" and "Gesundheit" terms in LAW containers where the domain is "Bavaria".


Since we can have objects other than text such as numbers in complex attributes allows us to define queries such as

LINE@NUMBER>100 LINE/spot AND:.

to find lines that contain the word "spot" in those LINE elements whose NUMBER complex attribute is greater than 100.

See also: Infix Notation Queries