The Domain You Searched For Is Still Available: On Sale Now!
Submitted by Edward C. Zimmermann on Tue, 2009-11-17 19:07- Login to post comments
How Google directly charges for inclusion: PPC (Pay Per Click) advertising and ranking.
Submitted by Edward C. Zimmermann on Wed, 2009-06-24 07:36- Google is crawling JavaScript links on Web sites. These outbound links are, it seems, handled by Google just like any other outbound link. Google's Ad-Words uses JavaScript for links.
- In the cached pages: The Google robot does store pages with their own PPC ad-campaigns on them. The outbound ad links at the moment of being gathered are used by Google in their link analysis. The in-bound text in the advertisement which produced the ad with the link on a page will, in turn, effect the ranking of the link. The selection of costly words (and inclusion on highly ranked sites) will drive (and this can be shown) up ranking and visibility.
- Edward C. Zimmermann's blog
- Login to post comments
Brief Fast ESP comparison
Submitted by Edward C. Zimmermann on Wed, 2009-06-10 09:43- Login to post comments
XML:DB API
Submitted by Edward C. Zimmermann on Thu, 2009-05-14 09:36| General Requirements | |
|---|---|
| Language Independence - The API MUST NOT preclude the usage with more than one language binding. | IB provides a large number of languages bindings including C++, Tcl, Python, Perl, PHP, Ruby, Java, C#) |
| Textual Interface - The API MUST provide a textual representation of XML result sets. | Yes. |
| XML-API Interface - The API SHOULD provide a SAX or DOM based representation of XML result sets. | A SAX or DOM based representation of the XML result sets is available via loading the XML representation of the result sets into DOM |
PHP Fulltext search
Submitted by Edward C. Zimmermann on Wed, 2009-05-13 13:36- PHP development using the PHP loadable module
- PHP inteface using Rain.
- DB offloaded (client/server).
- Client side (Drupal) which talks to a number of distributed (federated) servers for high availability.
- Uses powerful and highly performant IB search engine.
- Supports AJAX for a host of features including Scan
- Clickless search: selecting some on a page runs a search in a layer.
IB4J
Submitted by Edward C. Zimmermann on Tue, 2009-03-31 15:52IB4J: IB client/server solution for Java
User features:- DB offloaded (client/server).
- Client side (Java) which talks to a number of distributed (federated) servers for high availability.
- Uses powerful and highly performant IB search engine.
Raining content: IB for Drupal
Submitted by Edward C. Zimmermann on Mon, 2009-03-30 12:07Rain: IB client/server solution for Drupal
Rain is what's behind this site (IBU News). User features:- DB offloaded (client/server).
- Client side (Drupal) which talks to a number of distributed (federated) servers for high availability.
- Uses powerful and highly performant IB search engine.
- Supports AJAX for a host of features including Scan
- Clickless search: selecting some on a page runs a search in a layer.
IB4Typo3: IB client/server solution for Typo3
Submitted by Edward C. Zimmermann on Mon, 2009-03-30 07:49IB4Typo3: IB client/server solution for Typo3
User features:- DB offloaded (client/server).
- Client side (Typo3 end) as a Typo3 extension which talks to a number of distributed (federated) servers for high availability.
- Uses powerful and highly performant IB search engine.
- Allows for the display in results of breadcrumbs etc.
A comparison of Typo3 search solutions
Submitted by Edward C. Zimmermann on Fri, 2009-03-27 10:06Typo3 search solutions
The following short article attempts to outline a few of the search solutions for Typo3 that I am currently familiar with. They fit more or less into the groups: RBDMS "full-text" extensions (to MySQL) and outboard extensions using a fulltext library (Lucene, IB). To my knowledge only IB4Typo3 supports breadcrumbs in the results."Famous" indexed_search extension
This is the typical standard search installed on most Typo3 installations. Its quite easy to install and there are a handful of extensions that, in turn, build upon it to provide some slight enhancements to usability. Its a mature product and at this point well understood.It is, however, officially considered suspect: see Known problems:
- Currently the extension is under observation because instances of heavy server load/unstability has been reported. It is not yet clear if THIS extension has anything to do with. So it's only under suspicion at this point until further data has been collected. But for now it is adviced to be careful with the application of the extension for mission critical, high-load environments.
- It's still uncertain how performance is under heavy load conditions and when MANY pages are indexed. Currently benchmarks has been done only up to 2000 pages indexed/approx. 400.000 relation records. It is probably that some parts has to be optimized for such scenarios.
"The Indexed Search Engine (indexed_search) system extension in TYPO3 4.0.0 through 4.0.9, 4.1.0 through 4.1.7, and 4.2.0 through 4.2.3 allows remote attackers to execute arbitrary commands via a crafted filename containing shell metacharacters, which is not properly handled by the command-line indexer." — vulnerability CVE-2009-0258
General characteristics of the extension:
- White space is used to split words.
- Words are limited to a minimum of 2 characters and a maximum of 200 characters in length.
- Booleans are not supported but the concept of "all words" (AND), "any words" (OR) or "none of these words " (NOT).
- Only rendered pages are indexed and ONLY those that are cacheable. Pages where the cache is disabled are not indexed.
- Each page is uniquely identified to an ID for that page.
- Pages in more than one language must be indexed as different pages since they are IDed as id/type/language/cHashParams
- While the same page may have different content based on the user-groups (and so must be indexed once for each) such pages may just as well present the SAME content regardless of usergroups.
- The search itself offers acceptable but non-stellar performance as long as the data and index are in memory.
- Word positions are not stored. Only word frequencies.
- Basic ranking and sorting: frequency based ranking.
mnoGoSearch SQL extension
One of the most popular SQL extensions to try to provide some fulltext functionality is mnoGoSearch. It features- external indexer (runs as a cron job at night)
- reindexes only modified pages (no need to crawl the whole site, extension tracks modified pages)
- supports word forms (do/does/did/done - all will be found when searching for "do" or "did") using Aspell
- correctly works with "index" flag for content elements (indexed search ignores it)
- search results are internally cached, so the same query returns quickly
- There are limitations in current version of TYPO3 extension:
- needs database per site or will return results from all sites
- requires one time compiling on the server
- requires PHP extension
Indexing and search performance for small collections is acceptable.
Zend (PHP) / PowerSearch
The Zend toolkit extensions use a Java-PHP bridge to try to slap Lucene into the picture but that's just a big drain (aside from the limitations on search).Around the Zend toolkit the most popular current search extensions for Typo3 are:
- powersearch (Basic Extension)
- powersearchui (Frontend Plugin)
IB4Typo3: IB client/server solution for Typo3
User features:- DB offloaded (client/server).
- Client side (Typo3 end) as a Typo3 extension which talks to a number of distributed (federated) servers for high availability.
- Uses powerful and highly performant IB search engine.
- Allows for the display in results of breadcrumbs etc.
The IB solution is client/server and offloads fragments of DB content from behind the CMS into the IB engine. Mirroring of searchable DB content— what in the DB one wants to search (allowing for selective exclusion of information)— into a search engine improves not just query capacity and user experience but also
- Avoids expensive high-end server hardware upgrades providing over the course of typical project development significant cost savings.
- Scales very well: Backend search servers can be load distributed over a cluster.
- Helps avoid costs for planning, developing and implementing improved information retrieval performance on current RDBMS architectures.
- Avoids expensive re-indexing and rebuilding of data tables.
- Improves RDBMS performance: no need to have indexed fields (inserts into indexed fields on a RDBMS is generally very expensive since the index needs to be constantly rebuilt).
- Offer more flexible and powerful search.
- Enables the creation of user groups allowing different groups to have different views to both search and information presentation.
- Allows the integration of page attachments (and other external information objects such as PDFs, Word Files, Audio/Video) into search.
- Provides a higher level of information security:
- Removes the need for SQL queries to directly access data in search.
- A clear functional and even physical separation between search and storage.
- Full control over information provided for search: Nothing is there other than what is intended for others to see.
- The protocol itself is so designed that there is little one can accomplish with a hijacked search server.
- The identify of the search server is private to the outside world— so only insiders or those that have already broken the integrity of the platform could even know where it is.
- The search server does not need to be accessible to the outside world (not even through a firewall).
- The text information in the protocol itself is parsed by the client (Web server) and converted into a form for presentation. There is no means to embed foreign code.
- Other "better" (but more complicated protocols) such as SRW/U— to which we are active developers in the ZIG— are, of course, available since they plug right into the design of the IB engine which was developed for SGML/XML and ISO23950 interoperability.
- The IB engine offers also a more generic OO design that allows one to connect different services into search to offer a kind of search Swiss Army Knife (jack of all trades). Search can even transparently access other DBs and networked distributed objects.
Nano HTTP[D] License
Submitted by Edward C. Zimmermann on Tue, 2009-03-17 08:53Copyright 1994-2009 NONMONOTONIC LAB of Basis Systeme Netzwerk, Muenchen.
Zimmermann und Poellmann GbR. All rights reserved.
http://www.nonmonotonic.net
http://www.bsn.com http://www.bsn.de
The files in this directory (/opt/BSN/nano_http) are covered by an open source license:
Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:- Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
- Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.
Enhancements and Modifications shall be defined as follows:
- Changes to the source code, support files or documentation.
- Documentation directly related to Licensee's distribution of the software.
