NavigationUser login |
Frequently Asked Questions (FAQ)"A newspaper is not only a collective propagandist and a collective agitator, it is also a collective organizer."-- V.I.Lenin (written in Munich and published in "Iskra", Nr4, May 1901) Q: What is this site? An aspiring user supported news portal with publishing, search, aggregation, discovery and commentary services. In a nutshell its an advanced functionality "News Search Engine" with blogging and commentary functionality. News search is "synchro-contextual". News is continuously synchronized (we don't just update, add new stories but delete the old ones as soon as they are no longer "news") and drawn from a growing list of international sources including newspapers, magazines, weblogs and this site itself (as an independent community "newspaper" we also enable people to publish their stories as they happen). We bring together numerous sources syndicated via various flavours of RSS (including RDF), IETF Atom and CAP (Common Alerting Protocol) and even "scrape" (extract content from Web pages) and "auto-tag" (convert into canonical RSS) a few screens of significant or relevant news sources that that don't provide a standard (or even non-standard) syndication feed. As long as the story is syndicated (currently published in the Internet) we pull it in. As soon as its been dropped from their feed it gets dropped too by us. News is volatile and about breaking developments. Old stories are historical documentation but not news. We track and continuously synchronize our index to reflect the stories as they are being reported. Stories live on when people comment upon them. These stories by being the object of discussion become persistent objects of content that temporally transcend the original news article. They have their own destiny. The content here is the discussion and not really the original story. This is what goes into our archive. Q: What's so advanced about the search in this news search engine? The search engine and its functionality. The engine supports structural search, the use of wild cards, phonetic search (but only when the user wants it), objects (such as date, numerical etc and operations upon them), sophisticated user interfaces (from a smart natural language search to feature rich boolean query languages) and a range of different information search paradigms from searching for content with specific words to searching for words ("Live Scan") and relevant feedback snippet search via "Clickless Search" interfaces. Q: Is the news really sychronised? More properly the system is Plesiochronous ( Plesio="near", Chronos="time"): Almost synchronous. Newspapers don't continuously update their pages but publish in discrete actions. The alignment (synchronization) with news (feeds) is asynchronous and periodical (triggered by a self-optimizing "chron" process). Not only is it not pragmatic to try to catch these as they happen but its asocial to barrage servers with these "if modified" (HTTP) requests. We try to find an optimal alignment cycle that minimizes the subjective latency (delay) for stories entering or leaving the main index corpus. The "cost function" of the spider/harvester is adaptive and designed with a "learning" heuristic towards the ideal of synchronized news. Q: Why are there accounts? Must I register? Registration is optional. Unregistered (anonymous) users are allowed to read, search and browse. Unregistered can publish comments and register but are put to an image recognition test to distinguish them from an automated SPAM bot. These messages may also go to an moderation queue to be approved before publication. Registered users are allowed to create content, write unmoderated comments and register new feeds. Due to spamming we've unfortunately had to restrict some features and/or add CAPTCHA ("Completely Automated Public Turing test to tell Computers and Humans Apart") and other (annoying) features. Data privacy is held to the strictest standards and only used to control access to this site. See our privacy policy. Q: What is its current status of IBU News? Its a construction site that is open for the public to come in and play. Just watch out for the holes in the stairwells, hanging wires and missing fixtures. Its watch as we go. Suggestions, Ideas or Comments? Please! See our Request for Comments. Q: How is it best viewed? Best, I guess, with Firefox(tm) but it should work with most other current browsers such as IE, Opera, Safari, Konqueror, Mozilla. We're Unix (Solaris, BSD, Linux) people and platform independence is high on our development priorities. We also try to support PDAs and smart cell phones. This site offers user selectable interfaces (themes) and one is even for small screen mobile phones. We use CSS, AJAX (available in Firefox, IE6, Opera8), Javascript, Cookies and provide Firefox-style live links. They are not needed but "as available" provide some nice added features such as our novel "Live Scan" and "Clickless Search" interfaces. Access to search facilities is also available via numerous other protocols. Q: How do I add a new source (feed)? Under "News Search" is a "Feed Registration" sub-menu item. Its, I hope, a straightforward form. It supports most flavours of RSS/CAP/Atom as created by most News, BLOG and CMS packages. Q:Where do I send feedback? HERE! This site is also a Weblog (BLOG). One can add comments to ANYTHING. Q: What is "Live Scan"? "Live Scan" is a search interface that provides a live real-time list of suggestions for word completion. The list is derived from all terms in the index-- and NOT some pre-cooked list or dictionary. It allows one to search without really knowing the precise spelling or even terms. It supports even wildcards and is not just limited to words in the whole index but to any field, any part of the document tree and may even be the result of a query. Q: What is "Clickless Search"? "Clickless Search" is a real-time search as you type interface. As you type, for instance "pol" it shows all the hits for terms that start with the letters pol, then as poli all the hits that start with poli etc. As with "Live Scan" its all live, real-time and NOT based (as some of the other attempts we've since seen) on some pre-cooked dictionary or set of term completion guesses. Q: Access via other protocols? Not just Web? We support, among other initiatives (such as Opensearch/A9), the Search/Retrieve Web and URL Service protocols. SRW and SRU (Search/Retrieve via URL) enables searching of databases using web standard protocols. SRU is a HTTP based protocol. Queries are in CQL (Common Query Language). SRW is a variation of SRU where messages are conveyed from client to server using XML over HTTP via W3C SOAP instead of by a URL. SRU/W are built upon Z39.50 semantics and so provide a low overhead standard to integration to and provide access with existing Z39.50 implementations such as those provided by the World's major libraries and industry databases. We have, like Z39.50 (ISO 23950), taken an active role in the initiative and participate in the working groups. Q: Do you have a release roadmap? Right now its (extremely) useful (I hope), highly functional but under development and sometimes (temporarily) unstable. We are just a few very bright developers with more ideas than time and money. Q: What does user sponsored community mean? We are completely independent and dedicated to the free and uncensored flow of information. Our aim, for now, is to be commercial free and 100% user sponsored. We believe that the "Internet should be a force for political freedom, not repression. People have the right to seek and receive information and to express their peaceful beliefs online without fear or interference." Companies like Google, Yahoo, News Corp. (Daily Telegraph, Sun, Fox, MySpace and many many other media assets) and Microsoft are multi-billion dollar multinational corporations. They have unfortunately openly colluded with repressive measures of totalitarian governments in exchange for twopence, a lentil soup and a dream of "pie in the sky". Their willful complicity has gone beyond censorship to providing intelligence support to repressive "state security" and "secret police" organisations. The "Internet" dream was and is about free information but de-facto these players have become bedfellows of forces that seek to control the flow of information and squash any potential forms of democratization and to turn the Internet into a modern day "Volksempfanger". "Television brought the brutality of war into the comfort of the living room. Vietnam was lost in the living rooms of America-not on the battlefields of Vietnam." -- Marshall McLuhan (1975) Q: Who chooses the top stories? Software. Its all automatic! Q: Who runs this site? We do: Basis Systeme netzwerk (BSn). It's our idea. We are hosting it. We are maintaining it. We are developing it. Its even built around full-text (proximal node) search and retrieval technology developed in our own research and development lab (NONMONOTONIC Labs) in Munich. Our servers too are currently located in Munich but we already have a node set-up in Amsterdam. Q: And who is behind BSn? We are. Two Internet pioneers: Norbert Poellmann und Edward Zimmermann. Ed Zimmermann is a trained mathematician and research economist. He goes back in his early teens to the dawn of the ARPAnet where as enfant terrible he got tutored by Leonard Kleinrock (the father of packet switching), explored artificial intelligence and brain research, fiddled with designing a distributed network language (the network as a bemoth virtual computer inspired by the Iliac-IV) and defined the early semantics for the term "computer hacker" and the historical basis for some contemporary cyber/nethacking folklore. He was there when the first Unix machine was delivered to UCLA NMS. He was also part of a team of other adolescents that developed one of the very first personal micro-computers (1972). Before he toyed up with ARPA be was haunting the U.S. telephone networks as a legendary "phone phreak" and before that he was drinking milk from a bottle and learning how to walk... Norbert Poellmann is a trained and certified psychologist. One man's insanity is another man's vision. Or is it the other way around? BSn started in 1992 to provide and define technology for a vision of nomadic information appliances. From providing Internet services BEFORE even the U.S. national backbones were privatized from the NSF (U.S. National Science Foundation) and released from the AUPs (Acceptable Use Policies which prohibited commercial use) to running one of the first dozen web server in the world, early developer of pen based system, author of the first MAIL to HTML program (See the W3c archive), one of the earliest developers of streaming MPEG-3 (when it was still a developing CODEC at the Frauenhofer Institute) to being one of the first to deploy XML (before it was even called XML).. co-developer of wxPython... and and and.. developing Z39.50/ISO 23950.. search technology.. The team has, to put it mildly, "been there, done that"! This service continues the vision. Q: What does discovery mean? How do we find what's there when we don't even know its there? That's what discovery is about. Using features such as "Scan" (available via the search options) one can discover new terms being used in the news to, in turn, find new stories. "Hisbollywood" is a recent example that comes to mind. There is a lot more. News and BLOGs are exported in a data exchange format called RSS, its hierarchical with meta-data (albeit praxis seems to differ much from theory) so we can exploit this to provide quite powerfull search possibilities. Q: What is this site based on? For the user interface and BLOG components we use Drupal with Postgres SQL. As operating system we use FreeBSD running on low-cost hardware based on AMD chips. Q: Powerful search? What do you use? We use our IB search engine developed by our own NONMONOTONIC lab (that's me!). Its a fast, powerful XML-savvy full-text search engine we have been developing for many years. Its history goes back to our co-development of the open source search package Isite/Isearch in the 1990s. The design is built around Z39.50 and SGML/XML from the bottom up. IB is not about speed--- although our customers tell us that its faster than much of the "competition"--- but about search power and flexibility. Not trying (as the popular Internet search engines) to find anything about something but to find specifics. Its the difference between cocktail party chat and running experiments in the laboratory. Each have their place. We index each and every word. The entire document is vectorized. Every element including its path in its tree is stored. This lets us do things like proximity, search for terms in the same node (even without knowing which field) and perform real highlighting. Retrieval is also context specific. The traditional model of information retrieval is the document. Given the structure, however, of input we can implement a more focused (and flexible) retrieval paradigm that respects the structure with respect to the underlying search query. In other words: instead of retrieving whole documents (in this context, feeds) we set out to return "appropriate components" (unit of retrieval defined at query time). More for technical detail see: IB Search Engine Design. Q: Why do you show multiple articles about the same subject from different sources? That's part of the concept. We are not interested in just reading one view, one approach, one angle on a story but want to see the larger picture. Q: What is "Newrank" Its a ranking we're developing especially for things like news, blogs and alerts. Its based upon data quality (the news source) as percieved by experts and users (importance, trust and popularity); the metric distance of terms from one another; the measure of vector space (word frequency); its position within the feed; and its position within the temporal continua. Right now (before we reach a critical mass of what we consider significantly diverse users) some of these metrics are not active and set to defaults. Its a learning heuristic and so should improve with use! Q: How do you use popularity? The Newsrank model includes "popularity" as an issue of reception but not of immediate relevance. In the editorial Personalized news, nothing personal I wrote: "Popularity is not and should never be a guide to the news. Being popular has no relevance to data quality, trust or even importance. It just says that according to some model its "popular".". Popularity cannot even be directly taken into consideration as its easily falsified witness the significant near criminal energy found the day-to-day manipulation of the front line "community sites" (such as MySpace, Facebook, Friendster, YouTube, Twitter, etc.). Popularity (the distribution and rate of people clicking on a story) tells us much about the media and less about the subject. Less than 1% of their users seem to control most of the front page news. Should a tiny minority driven by surplus (leisure) time or even (criminal) energy be allowed to control ranking and define "significance"? Q: What about link counts? Citation and links I don't think is a terribly good measure of either data quality, trust or even importance. Its like holding a microphone to its loudspeaker: it squeals. People link to top ranked sites. This has demonstrably lead, in the Web arena, to an over-representation of some marginal positions as computer programs can't distinguish between positive links ("this site is good") and negative ones (look at the lies). Link counts are popular because they work well to help find anything about something and are computationally very cheap. They let one pre-sort results which saves a lot of effort and processing power. They also let one use simpler search algorithms that are also very fast (b-tree). Their social costs, however, are high. Link counts not only misrepresent relevance but they also appear to have a significant impact on content and available information. Since most people are directed to the few highly linked and known sites, the rest remain invisible and unsustainable. The impact of the reliance on links in popular search ranking has been to limit discussion and concentrate media control in the hands of a few. The information is maybe still there but its hidden since their voices are drowned out by the squawk of the popular highly linked sites. Our ambitions, by contrast, is to focus on all the news as a instrument of comparative analysis and discovery. If I'm just interested in the top popular stories I don't need to use the Internet but could limit myself to a single popular boulevard newspaper. "Komputer geben sich nicht mit Kleinigkeiten ab." Q: Do you do semantic extraction or apply Bayesian search theory in search ranking It works great in demonstrations and might be quite good with homogeneous intranet applications they tend among heterogeneous applications to deliver more noise to content.
We have time and again found that variations on the vector-space data model seem to be the most universal. By adding some novel features such as "realtime scan"--- even on the level of container--- and the controlled use of thesauri we feel we have almost the best of the breed. Q: So you don't use semantic extraction or apply Bayesian statistical approaches? Actually we do. Just not in search ranking or query processing. Its applied in categorization and routing. It helps us distinguish between "sports stories" and reports from the battle front or news about a new technical gadget. Q: Why is such a site needed? Because reading a "top story" in a popular boulevard newspaper is not enough to be informed. Q: Are there not other sites offering some of these services? Compared to the mainline news search services, I think, we:
--- "Der [...] Kanal, den wir meinen, meine lieben Damen und Herren, führt Unflat und Abwässer.".."Und ihm werden wir uns von heute an zu dieser Stunde widmen, als Kläranlage gewissermaßen." — "Der schwarze Kanal", 1960-1989, DDR-Fernsehen, Karl-Eduard von Schnitzler. |