Ad verba per numeros


Thursday, September 5, 2013, 02:10 AM
Today, BBC published an interesting piece titled "Jonathon Fletcher: forgotten father of the search engine". The article is quite interesting since it vindicates one pioneer of the now essential field of Web-IR (Web Information Retrieval) but it has a number of "flaws" that I, as an academic, would like to elaborate.

Certainly, the work by Fletcher describing his search engine is very little known (4 cites [1], according to Google Scholar).

Maybe one reason for this is that his original paper vanished with his homepage at the University of Stirling. Fortunately, we have the Internet Archive so you can read it.

Another reason for Fletcher work to remain mostly unnoticed is that his search engine (Jumpstation) only operated from December 1993 to April 1994. In other words, it was a proof of concept rather than a full-fledge system.

Finally, Fletcher was not the only one making search engines around 1993-1994. As I like to say "It steam-engines when it's steam-engine-time": just in WWW94 (aka The First International Conference on the World-Wide-Web) nine papers about Web indexing and Web-IR were presented. Needless to say, they did not call it Web-IR at the time.

Martijn Koster, Oliver A. McBryan, Pinkerton or Mauldin & Leavitt were some of the researchers that were working in the same problem that Fletcher did and producing virtually the same architecture: that is, a robot to crawl the Web, in order to feed a database that was, in turn, indexed to be eventually queried by users of the search engine.

So, is Fletcher the parent of the search engine? Not really. Who is then? None of the aforementioned researchers or, better, all of them! That's why I much prefer talking about pioneers in a field than about "parents".

I highly encourage you to read all of their papers, specially if you are pursuing a PhD in CS or if you are still an undergraduate in CS. After reading those papers you'll feel that it's something you could do in a couple of days, maybe a week. The systems they describe are crude, simplistic and primitive. However, the architecture behind those toys has been powering the search engine industry for the last 20 years. That's what I call an influential idea and that's why those papers are relevant and highly cited.

The BBC piece obviously talks about Google as a counterpoint to Fletcher's story. You know, you have this guy who invented something great but some other people get all the credit (and the money). Unfortunately, there are other stories from the early search engine pioneers. Indeed, some of the aforementioned researchers got some credit (and probably some money too). For instance, Pinkerton created WebCrawler that was sold to America Online and then Excite. And if you are old enough you must remember Lycos, right? Well, it was the product of Mauldin & Leavitt.

I would like to talk also about the impact of the work by Brin & Page (Google), the way in which they were virtually forced to found a company to monetize their idea, and the way in which all this search engine industry is in debt with IBM in 1950s but that will be another post.

As usual you can find me at Twitter: @PFCdgayo.

[1] Although not appearing in the citations to Fletcher's work I mentioned him as a search engine pioneer in my PhD dissertation (2005).



Back Next