NavigationUser login |
full-textIB Search Engine Design // Beyond XML full-text search and beyond native XML databasesSearch functionality (inclusive of ranking) is handled by the embedded IB sub-system. IB is a development of BSn's NONMONOTONIC Lab in Munich. BackgroundOrganizations of all sizes and within all industries generally distribute their corporate knowledge amid a variety of heterogeneous database applications: from customer relationship systems, staff directories, content management systems (CMS), electronic document and records management systems (EDRMS) to library catalogues.
The default modus is to index all the words and all the structure of documents. It provides powerful and fast search without prior knowledge about the content yet enables arbitrarily complex questions across all the content and from different perspectives. Not bound by the constraints of "records" as unit of information, one can immediately derive value from content with the flexibility to enhance content and the application incrementally over time without "breaking anything". IB was designed from the ground up to address three key goals: universal SGML/XML (and other document formats) hierarchical/context search, distributed objects (transparent integrated views to other sources of information such as relational DBs, search services and object brokers) and to provide optimal support for features (current and future) of the ISO 23950 (ANSI/NISO Z39.50) Information Retrieval Protocol services standard to allow for standard interoperable interfaces.
IB Search Sub-SystemThe IB engine (written in C++) provides extremely sophisticated contextual content search to heterogeneous (mix format) information including XML (with all the functionality of a native XML database) and SGML. It goes well beyond most XML full-text search solutions to support alongside text also other objects. Not limited to the XML paradigm and designed upon a more abstract model it can go beyond the more commonplace hierarchical text model (volumes, chapters, sections, paragraphs, sentences and even lines), fields and paths (XPaths and XQuery model) to (abstract hierarchical) path expressions and overlapping structures. In the IB engine a single index can have mixed document content--- and mechanisms are provided to even unify fields from different formats-- and these can be on the fly mixed with others to provide dynamic collections. While most all search engines demand that the unit of retrieval be defined prior to indexing as "record design" we not only can index a diverse collection of heterogeneously formated and structured information but allow also the "unit of retrieval" to be quasi-defined per search via abstract paths. Together these features enable state of the art context querying of a sort not found in any of the so-called XML search, full-text or IR engines.
By Edward C. Zimmermann at 2010-07-08 14:34 SGML | XML fulltext engine | Z39.50 | full-text | full-text search | native XML database | native XML database | read more
|