| Set Unary Operators: | | NOT | Set compliment |
| WITHIN[:field] | Records with elements within the specified field. RPN queries "term WITHIN:field" and "field/term" are equivalent. (for performance the query "field/term" is prefered to "term WITHIN:field") |
WITHIN[:daterange] | Record dates within the range | INCLUSIVE[:field] | Inclusive Within: ALL Hits (and ONLY THOSE) are elements that are in the specified field. Matching records are those that have their hits absolutely in the specified field and nowhere else. | | XWITHIN[:field] | Absolutely NOT in the specified field |
| Special Unary Operators |
|---|
| BOOST:fff.ff | Boost the score of the set by fff.ff (Score = Score * fff.ff) |
| REDUCE[:nnn] | Reduce set to those records with nnn matching terms. NOTE: REDUCE:metric is a special kind of unary operator that trims the result |
| TRIM:fff.ff | Trim to the set to contain a max. number of records. If fff.fff is an integer then its the maximum number. If fff.fff is a floating point number between 0 and 1 it is taken as a per-cent of the total number of records in the index. An index with 1 million records and TRIM:0.1 would mean max. 100000. A floating point number > 1 is taken as the integer component + the percentage of the number of records. Example: 1000.01 for above = 1000 + 10000 = 11000 |
| HITCOUNT:nnn | Trim the set to contain only those records with, when nnn is positive, at least nnn hits. When nnn is a negative number then the set it to contain those records with no more than -nnn hits.
Example: HITCOUNT:10 would return those with no less than 10 hits. HITCOUNT:-10 would return those records with up to 10 hits but not more. The combination HITCOUNT:-10 HITCOUNT:10 creates the set of records with exactly 10 hits.
One may specify this as HITCOUNT>nnn (HITCOUNT>10 is equivalent to HITCOUNT:11), HITCOUNT>=nnn (same as HITCOUNT:nnn), HITCOUNT |
| SORTBY:<keyword> | Sort the set (reserved names for <keyword>: Key, Hits, Date, Index, Score, AuxCount, Newsrank, Function, Category, ReverseHits, ReverseDate, etc.)
NOTE: The default sort **MUST** be set to unsorted for the query sort to propagate into the final set. |
| Unary Neo-Operators (Sets) |
|---|
| FILE:<filespec> | The set of all records whose input file path match <filespec> (example: FILE:shakesp*.xml). |
| EXTENSION:<ext> | The set of all records whose input file has the extension <ext> (example: EXTENSION:cgm). |
| KEY:<keyspec> | The set of all records whose key match the <keyspec>. |
DOCTYPE:<doctype> | The set of all records whose doctype (index format) matches <doctype>. | Note: The above specifications fully support wildcards. They may not be used alone but only as part of a query sentence with at least 1 term and a binary operator. Example (Infix):
"hedgehog " and FILE:shakespeare.* |
The WITHIN:field operator can quite effective in exploring (partially unknown) structure paths in XML data.
In our Shakespeare example (SGML/XML markup of Shakespeare's works by Jon Bosak) we
have as paths to LINES where things are said:
- PLAY\ACT\EPILOGUE\SPEECH\LINE
- PLAY\ACT\PROLOGUE\SPEECH\LINE
- PLAY\PROLOGUE\SPEECH\LINE
- PLAY\INDUCT\SCENE\SPEECH\LINE
- PLAY\INDUCT\SPEECH\LINE
- PLAY\ACT\SCENE\SPEECH\LINE
And the only LINE which a child is:
- PLAY\ACT\SCENE\SPEECH\LINE\STAGEDIR
To search for a term in the field LINE one would typically not use one of these operators but issue a LINE/term. The standard field search is faster and more efficient than WITHIN. WITHIN (and its friends) are, however, very powerful and extremely useful.
By specifying in an RPN query: LINE/term WITHIN:PROLOGUE
one can specify only those terms in a line that are within the PROLOGUE.
Multiple unary operators can be applied so one could also say: term WITHIN:LINE WITHIN:PROLOGUE.
To search for the term "love" in LINEs that are in a scene: LINE/love WITHIN:SCENE
Only those "scene's that are in an act (and not those in an induct): LINE/love WITHIN:SCENE WITHIN:ACT
If we wanted those explicitly not in INDUCT: LINE/love WITHIN:SCENE XWITHIN:INDUCT
The mechanism is very powerful, flexible and more generic than the XPATH models but we are after all also more generic and abstract than XML.