Monday, June 27, 2011

The Little (Search) Engine that Could

I'm guilty, and maybe many of us are, of taking search engines for granted. We expect that when we put in a search phrase -- we will get back what we want. And we can definitely identify bad search: The notorious sites that never yield anything close to what is being looked for, or those which vomit back way too much to be useful. To be certain, having strong semantic metadata including a rich taxonomy will help yield the right results. But when sites work really well -- we often don't consider that there is a marvelous little engine pulling all the kibbles and bits together to make our user experience engaging.

When a company like Microsoft buys up a search engine -- like Fast Search and Transfer (FAST ESP) -- the install-base, or at least the guys and gals in charge, gulps. It has to wait for the inevitable day when the large company picks a favorite operating system and says, sorry we aren't going to support the rest -- but don't worry, we'll give you a year to sort it out.

Here's the problem. Ripping and replacing an entire search system is not a one-year effort. From the day a company begins a search for a new search engine, to when contracts are signed, SOWs are signed off on, and the plumbing, permissions, logistics, business rules and UX are considered -- well over a year if not two have passed.

When my boss, Mike Makeley, asked me to do an educational webinar on how MarkLogic server can easily swap out the FAST ESP search engine to keep all that plumbing in place, I was only lukewarm on the idea. Our educational webinars are usually geared to the execs and offer strategies on "the art of what is possible" when it comes to online interaction. Talking about the swap out of some boring old search engine sounded like changing the motor on the sump pump, necessary, but hardly glamorous. I mean how much more could we say after we showed that the MarkLogic server replacement program leaves all the FAST “plumbing” in place – and just replaces the FAST ESP engine. A very quick and relatively painless strategy -- that gives all the lift of MarkLogic – and none of the agita of a total replacement. But what I underestimated was the magnitude of what that total lift would be.

I learned from my two webcast guests -- Seth Shearer, MarkLogic's director of technical development, and Jagannath Saha, lead consultant for Avalon Consulting LLCs search practice, just how much enterprise search impacts all the audience engagement initiatives that a company offers. It makes perfect sense -- but I had never really stopped to think about it.

Gone are the days of the "10-blue-links" results page, said Saha -- today's search engine needs to be able to conform to any data representation -- density maps, bar charts, cloud tags, virtually any way you might want to visually present data. Here are some other things a search engine must do well:

Typeahead
The search box needs to support "typeahead" sometimes called "autocomplete" -- a fairly difficult-to-execute-well maneuver that presents readers with real-time suggestions. To be useful, explained Saha, these suggestions this must be executed in milliseconds -- no small challenge if the database is large. In a series of benchmarks by Avalon, MarkLogic was the speediest of leading search engines on scouring vast amounts of data and suggesting typeaheads.

Further, said Shearer, the typeahead must only suggest words eligible to the reader. "You don't want people selecting suggestions to documents that they aren't allowed to see," he said.

Consume all types of data
If your search engine is pickier than a two-year old, time to ditch it. It needs to be able to handle different types of data from - structured to unstructured, text files to binary. No excuse, no temper tantrums.

Not in-a-minute, NOW!
Search engines typically rely on relational databases -- which rely on an index to speed up queries. Challenge is there is often data residing in a queue waiting to be re-indexed. Depending on the criticality of the business -- and the volume of the data, that reindex may be done once a day -- or even once a week. An XML database indexes real-time -- meaning all data is available the second it is ingested by the search engine.

Consume all types of data
If your search engine is pickier than a two-year old, time to ditch it. It needs to be able to handle different types of data from - structured to unstructured, text files to binary. No excuse, no temper tantrums.

Not in-a-minute, NOW!
Search engines typically rely on relational databases -- which rely on an index to speed up queries. Challenge is there is often data residing in a queue waiting to be re-indexed. Depending on the criticality of the business -- and the volume of the data, that reindex may be done once a day -- or even once a week. An XML database indexes real-time -- meaning all data is available the second it is ingested by the search engine.

Bi-directional
It’s no longer enough for a search engine to just serve – it must log, learn and analyze too. To increase engagement, any engine must be able to log the behavior of the users: monitor which content is being consumed, the path that a user took to get there, and be on the ready to deliver trending topics. It also must be able to ascertain where the user is. With the reliance on search through mobile, the engine needs to be able to track the geospatial coordinates of the user – and match that to geospatially relevant content. And mighty speedily at that.

The reality is, the search engine needs to be evolve to satisfy the business needs of the entire enterprise. And search-based applications need a powerful entity to drive development. No longer are search engines a dumb-waiter meekly serving up matter – they are a full-blown intelligence system that need to be able to give as good as they get.

1 comment:

Ed Dodds said...

I think 2 of your paragraphs may be re-redundant