The image verification code you entered is incorrect.

Document base clases ("doctypes")

The IB engine is designed for and supports heterogeneous data sources. Fields, including many implicit, are automatically--- respective to their document format (such as lines, sentences, paragraphs, pages etc. of plain text; subject, author, references, email addresses etc. in email etc.)--- detected as if-tagged (implicit auto-tagging). In PDF we have, for example, not only the document properties (metadata or info section) that PDF documents define as fields (including handling of their types such as dates) but also the implicit textual structure of the content in sentences, paragraphs and pages. Several doctypes also automatically (if enabled, resp. not disabled) at index time detect a number of field datatypes (objects) and set them accordingly (telephone number, date, number etc.).

The following document base classes are provided:

Available Built-in Document Base Classes (v28):
        AOLLIST           ATOM     AUTODETECT       BIBCOLON
         BIBTEX         BINARY            CAP       COLONDOC
       COLONGRP       DIALOG-B            DIF        DVBLINE
        ENDNOTE      EUROMEDIA       FILMLINE    FILTER2HTML
    FILTER2MEMO    FILTER2TEXT     FILTER2XML      FIRSTLINE
            FTP           GILS        GILSXML        HARVEST
           HTML         HTML--      HTMLCACHE       HTMLHEAD
       HTMLMETA     HTMLREMOTE       HTMLZERO        IAFADOC
       IKNOWDOC         IRLIST     LISTDIGEST     MAILDIGEST
     MAILFOLDER        MEDLINE           MEMO        METADOC
       MISMEDIA     NEWSFOLDER         NEWSML            OCR
        ONELINE       OZSEARCH        PAPYRUS           PARA
            PDF      PLAINTEXT             PS          PTEXT
            RDF       REFERBIB            RIS        ROADS++
         RSS.9x           RSS1           RSS2     RSSARCHIVE
        RSSCORE           SGML       SGMLNORM        SGMLTAG
         SIMPLE           SOIF         TSLDOC            TSV
        XBINARY        XFILTER            XML        XMLBASE
      YAHOOLIST

Extensibility

Via the various FILTER2 doctypes— FILTER2MEMO, FILTER2TEXT, FILTER2XM— other document formats, custom data cleansing and/or content enrichment packages can be inserted into the indexing data pipeline to easily provide best-of-breed 3rd party functionality. Via the object type system access can be provided to and from proprietary information and database systems as required.

System developers and integrators can also via the Doctype Development Kit develop their own custom doctype plugins. The standard delivery includes the following doctype plugins:

External Base Classes ("Plugin Doctypes"):
  NULL:              // Empty plugin
  MSWORD:            // M$ Word Plugin
  MSRTF:             // M$ RTF (Rich Text Format) Plugin [XML]
  MSOLE:             // M$ OLE type detector Plugin
  MSEXCEL:           // M$ Excel (XLS) Plugin
  RTF:               // "Rich Text Format" (RTF) Plugin
  USPAT:             // US Patents (Green Book)
  ESTAT:             // EUROSTAT CSL Plugin
  ISOTEIA:           // ISOTEIA project (GILS Metadata) XML format locator records
  ADOBE_PDF:         // Adobe PDF Plugin
  PDFDOC:            // OLD Adobe PDF Plugin
  TEXT:              // Plain Text Plugin

Reply

  • Allowed HTML tags: <a> <em> <strong> <cite> <code> <ul> <ol> <li> <dl> <dt> <dd>
  • Lines and paragraphs break automatically.
More information about formatting options Captcha Image: you will need to recognize the text in it.
Please type in the letters/numbers that are shown in the image above.