IB Thesaurus Format

Personalized thesauri


IB provides a flexible model for the support of personalized thesauri. They are hooked into the internal query structure (a push stack of terms and their associated attribute structures) as OR'd expansion. The expansion is fully independent of the query language originally used to write the expression.

IB 3.x Thesaurus Format:

# Rows of the form:
# parent phrase = child1:weight+child2+multiword child:weight+ ... +childN
#
# White space is ignored at the start and end of child terms
# Comments start with #
spatial=geospatial+geographic+terrestrial # Here are more comments
land use=land cover + land characterization + land surface + ownership property
wetlands=wet land+NWI+hydric soil+inundated
hydrography=stream+river+spring+lake+pond+aqueduct+siphon+well   
Al-Qaida=Al-Qa'ida+Al-Kaida+Al-Qaida+Al-Qaida+al-qaeda
kernkraft=automkraft
nuclear=atomic

Thesauri are associated with query structures so its easy to implement an interface where each use can manage their own collection of personalized thesauri. Since the thesauri effect only the query structure they have no negative effect on search caching.

Examples

# Sample synonyms
war=krieg:2+combat+battle
peace=pax

The Infix query: ("War" and "Peace") is expanded as-if (("war" or "krieg":2 or "combat" or "battle") and ("peace" or "pax")) was entered into the system.

The RPN query: "War":2 "Peace" && (really the same as the above Infix query save the use of the weight "2" on the "war" term) is expanded as-if "war" "krieg":4 || "combat" || "battle" || "peace" "pax" || && was entered into the system.

Notice that the weights are multiplicative.