Tuesday, June 14, 2011

An XML Primer

Wow. I just did a "Bing" on "XML" and found 88,300,000 results. The third facet on the results page (with faceted search being the reason I prefer Bing) was "XML Definition." Nineteen million pages fell under the "related searches" facet of XML definition. I zapped off a few other searches of popular tech terms and three-letter acronyms (RSS, IP address, namespace, API, RDF -- and none of them had a facet called "Definition."

So what can be made of this? If you attend any digital media seminar, workshop or webinar or sit in on any content strategy, XML is de rigueur, but could be it be that people are throwing out this TLA without really knowing from whence they speak? The answer is absolutely. And solution providers, technologists and product makers are guilty of not recognizing that the community is struggling to keep up.

Cathy Palmer and I partook in a web series put on by the IDEAlliance this morning on making the case for XML. IDEAlliance is a non-profit that develops standards and best practices surrounding publishing and technology -- it offers events virtually every week of the year depending upon practice area. Cathy is a trainer from New Horizons, a nationwide IT training company. A couple hours later I was listening to a webinar by Publishing Executive -- featuring two book publishing executives. Peppered liberally throughout both webcasts was our little friend XML. And then came the question asked in a variety of ways: "But what if we don't have XML, what do we do?" Cathy did a super job explaining how you can extrapolate XML from InDesign files, while I offered that another way is to use combinations of machines (semantic analysis engines) and man (offshore) to create XML.

But how do executives create a content strategy -- determining man, machine and markup if they don't have a rudimentary understanding of what this eXtensible Markup Language is all about? The definition is easy -- the why is more complex. XML is a decade-old method of mark-up that can be used to classify and add meaning to content so that it can be organized, “sliced and diced” and repurposed.

XML tags look similar to HTML (HyperText Mark-up Language) ones, in that they both use start and end tags but that’s about it with the similarities. HTML includes a set of pre-defined formats that impact how information is rendered, eg. the command (along with it's close command ) makes a word bold. Unlike HTML, XML does not have predefined formats (although it does use the same syntax) and display commands; instead XML provides a structure so you can effectively find information again.

This format agnostic markup language means you can categorize sections of content -- find them again -- and then transform them (using style sheets) to be ready for virtually any digital channel. XML allows bodies of content to be broken down into reusable components -- for instance, maybe you would like to markup statistics within a text -- particularly if you know that you will be researching for that same type of statistic again. Or maybe you want to markup quotations by luminaries; charts by researchers, lyrics to songs, ingredients to recipes. Having the ability to search, find and reassemble these components of content is the secret to repurposing.

There's more to XML than that -- but that's the high-level basics. The key to managing is understanding what you know -- and don't know -- and filling in the gaps.

No comments: