Ever looked back at something youāve worked on and thought: āGee, itās too bad that project didnāt get to itās ultimate goal, but, Iāve learned a lot from it.ā I have one of those projects. Such is the world of technology, your toolset is constantly evolving and shape-shifting; even āThe Next Big Thingā can become obsolete. We move on to the next āNext Big Thing.ā
One such area in the Web Developerās toolset is āSearch.ā Iām sure we can all relate to the experience of our first textbox with some behind-the-scenes code doing SQL āSELECT ⦠LIKEā statements. Perhaps at first it was raw DBI calls; maybe moving on to an abstraction layer (ORM and whatnot) shortly thereafter. Hereās where things get interesting.
What happens when this is no longer Good Enough(tm). Google being essentially ubiquitous, people expect to plunk some words in the box and magically get what they want out the other side. I put in āCat Hatā ā why didnāt it give me āCat in the Hatā? Okay, no problem. We can do some field and query normalization; removing stopwords, add term parsing ⦠wait, wait wait. There has to be prior art for this.
In 2004, the options are somewhat limited as far as Free/Open Source search software goes. Especially in Perl land. Swish-e looks pretty neat. We actually did some prototyping with it. It was definitely a step ahead of plain old SQL. Plucene came on the scene. Unfortunately, itās poor performance was a bit of a non-starter for us. The fact that it was modeled after the Lucene Java library, however, caught my eye.
I wanted to harness their project and its community, and bring it into our little Perl world. Luckily for me, someone else had already started down that road. The Lucene Web Service was a project by Robert Kaye, sponsored by CD Baby, which allowed users to talk to Lucene via an XML-based web service. After using it for a while, we developed some patches for bug fixes and enhancements. Because of our momentum with the project, we were eventually given total control over its development.
We attempted to strengthen the project by hooking into some existing standards. We leveraged the Atom Publishing Protocol as an analogy for dealing with indexes and documents. Search results were returned as an OpenSearch document. A documentās field-value pairs were listed in the XOXO microformat. Creating a client for this setup meant a bunch of glue between the existing components (XML::Atom::Client and WWW::OpenSearch).
Almost in parallel, the Solr project emerged. Similar idea, much more support behind it. In the end, our idea never got very far, and Solr has turned out to be a fabulous product ā which we now use.
To this end, the Lucene WebService website will (finally) be shutting down in about a weekās time. Iāve moved the pertinent code and wiki data to github in case anyone wants it. I still think it has some niche applications, but without some serious revamping of the java code, it will likely just rot.
At least itās a project that has led me to bigger and better things.