Monday, August 2, 2010

Mixing, Mashing XML into Content Derivatives

After much deliberation, I have changed jobs, moving from Nstein (now OpenText) to MarkLogic, a provider of “purpose-built databases for unstructured content,” which means we handle all that data that doesn’t fit nicely into rows and columns – you know content like documents, articles, books, graphics -- which is often (and best) represented in XML. Over the years I have written about the importance of semantic metadata but it is but a scintilla of types of metadata that can be appended to content -- as long as that all-important infrastructure is in place. It was so last year to be absorbed in knowing what content you created -- now what is important is to complement that content with information from other resources.

What does this mean to information providers, publishers and other types of media (and if you read my blogs, you know that I believe all of us are publishers!)? Well consider the iPad. Selling at a head-shaking rate of one every three seconds (despite the recession), more than 13 million will be in consumers' hands by Christmas. Add to it the hordes of other mobile devices: iPhones, Blackberries, Androids, eReaders ... and you have 12 percent of the market looking for content for their gizmos. Which in itself can be a challenge -- since content doesn't just magically play nicely on every device. Most of that content will need to flow into its own native application to really exploit the devices' features, which mean that content needs to first be available in a neutral format.

And while you are exploiting the gizmos features ... remember that if gizmos are everywhere their owners are -- there will be an increased desire for what is known in the military as situational awareness ; deriving additional, contextual information that relates to the user -- usually time (temporal data) and space (geo mapping). Think of a soldier needing to know what threats are in the area where he currently is -- or where he is going. Publishers too should think in terms of creating new derivative, situational content. For example, what types of information might a business person want while on Maple and Elm at 8am in the summer -- versus at 8pm in the winter? Or what location-based weather patterns does a commodities trader want when looking at crop futures?

This need for situational awareness provides a great opportunity for publishers to take their knowledge bases and mix it with external resources -- such as public information from NOAA, Google Maps, LinkedIn, or proprietary information from partners. The key to mixing and mashing is having content in a mutable format -- and a database that can handle it. Extensible Markup Language, or XML, is a highly flexible text format, a W3C standard that is sometimes called atomic or a neutral format. It is designed to be easily stored and retrieved -- void of any display format. Which means it "pours" nicely into any layout. MarkLogic's database in my mind then is akin to a gourmet mixing bowl that takes in XML and allows it to be stored and retrieved into any application.

XML is hardly new as it was designed for large-scale publishing, although there is an increased awareness around it due to the Web and blog feeds, and is a great way to describe unstructured data. Unstructured data can reside in regular relational databases (RDBMS) -- but they tend to get bogged down. By storing this unstructured data on a database built specifically to handle these datatypes -- you can search and retrieve much more quickly. Forrester Analyst Noel Yuhanna told me estimated that by unburdening RDBMS of unstructured data -- they saw a 30% lift in database performance, which is huge.

In any event, the real advantage to having content in XML -- and residing in a database built to handle it -- is that you can easily mix and mash it up into new types of content, ready it for new delivery platforms, or ease syndication. And you can do all of this in a matter of weeks not months -- which is terrific since we don't yet know what other new gizmo might be in readers' hands -- least of all by Christmas.

No comments: