Wednesday, March 24, 2010

History of Indexing & what exactly is 'semantic metadata?'

The Internet is the biggest library in the world -- with all the books on the floor.
-- unknown
I'm sure most of us have never spent even a minute or so pondering about the classification system that keeps libraries sane. But they weren't always pristine. According to editors at TheStraightDope.com, when Melville Dewey spent his junior year (1872-73) working in Amherst College's library, he was frustrated by the disarray of Amherst's collection. A real self-starter (apparently) Dewey examined how other libraries were organizing books. One method, which dated back to the Han Dynasty (206 BC - 9) was to assign books a given a spot on a shelf and log in a journal the location and book title. Another mode was to alphabetize the entire collection -- which created a fair amount of juggling when new books came in! Other libraries (harkening back to the Hellenic era) chose to organize by subject. But what constituted a subject? Sir Francis Bacon in the 1600s, said there were three branches of knowledge: history (deriving from memory), poetry (from imagination), and philosophy (from reason.) The Vatican said there were a pithy two: sacred and profane.

Dewey channeled his frustrations and within four years took the best of all the methods and developed a standard of classification and indexing that still exists today. The system consisted of combining "the analytical simplicity of decimal numbers to an intuitive scheme of knowledge, one that would fluidly accommodate all the books ever written, and all the books that could be written as well," said Mathew Battles, Library Historian. Further, Dewey's systems worked for any asset -- not just books.

Alas, Dewey's system for libraries has not been the way the Web sites are indexed -- although that is starting to change. Like Dewey, companies are starting to create federated classification systems for all of their content, creating Taxonomies and Authority Files for centralized knowledge management. Another term for this type of information is semantic tags -- or more accurately, semantic metadata. Most of the efforts to create semantic metadata have been largely manual, which, as Chris Hill says "is a losing proposition" with the deluge of content now being acquired and developed. The typical approach is tag "after the fact" is problematic as well - as it tends not to be comprehensive nor complying with classification standards that have been created.

As Nstein Technologies spokeswoman for the last 3 years, I've spoken on the subject of semantic metadata quite a bit, usually as the lone voice in the room uttering those words! So I was amazed last month when at two different DAM conferences semantic metadata was uttered by virtually every speaker! Yet, it was apparent that few in the audience understood the scope of what metadata could be.
It seems we are at a point in history when every knows they need metadata -- but really don't know the "aboutness" of the subject!!


So a trio of us are hosting a webinar next week on "Semantic Metadata 101: Your Assets are Bare Without It." Linguistics expert Sheila Woo (and Director of Product Development) and Nstein's Sales Engineer will be joining me on Wednesday, March 31, 10am EDT/ 3pm GMT.

If you are involved with managing large repositories of content, you head up knowledge management, you are a CMO looking to leverage content to drive readers, engage and cross promote products and services, then join us. They'll be plenty of time for questions -- just bring your own cookies and coffee.

No comments: