The Owl -- WebWise -- Vol. 12, No. 10 -- December, 1997

A Column by Andy Anderson
Special Collections, Ekstrom Library
Can't find the information you need on the World Wide Web? It's easy to see why. Imagine how chaotic things would be if we removed the spine labels and title pages from all the books in Ekstrom and then just piled them up in the lobby. In addition to burying the coffee bar, we'd seldom be able to find anything needed by a patron. And that big pile bears a striking resemblance to the arrangement of all the documents which make up the Web.
Search engines such as Alta Vista and indexing services such as Yahoo! help, but they can only point us to documents based on what is known about them. These services operate robotic engines called "parsers," "spiders" or "crawlers" which periodically examine every web document they are able to locate. From this examination, the engines attempt to extract keywords which will help direct users to individual documents. But since few Web documents have title pages (author/title information) or spine labels (subject classification), the system is fraught with error. The result is that it's impossible to be confident that you've found the information you need.
What's missing, you're probably thinking, is librarians. You'll be happy to know that Web librarians are riding to the rescue. In a series of annual conferences over the last few years, they have evolved a concept called "Metadata." Metadata is data about data--in the case of Web documents, it is data about the creator (author/title,) subjects (classification) and other data contained within a particular document or site. The 1995 conference hosted by OCLC (Dublin, Ohio) developed a "Dublin Core" of 15 metadata elements and structure which are to be attached to web documents as tags.
The elements will comprise what has been described as the equivalent of a catalog card for electronic resources. The tags will identify for search engines (and for the user who bothers to examine them) such elements as author, title, date of creation, subject (structured, controlled-language headings) and keywords.. In addition, they will identify the resource type (home page, image, etc.,) format (to identify software needed for viewing) and rights ("May be used with permission of author"). The simplicity of the elements and their commonly understood meaning will make the Dublin Core easily useable, it is hoped, by the non-catalogers who produce documents for the web.
The result should be a vast improvement in the information search engines find about documents and that should translate into much more accurate and fruitful searchers for the user.
This description of Metadata and the Dublin Core is a greatly simplified one. Additional information can be found at these World Wide Web sites:

Scientific American article on the deficiencies of web searching and the operation of "spiders": http://www.sciam.com/0397issue/0397lynch.html

Matadata for the Masses, a good discussion of the concept of Metadata and its uses: http://www.ariadne.ac.uk/issue5/metadata-masses/

Description of the Dublin Core and its purpose: http://purl.oclc.org/metadata/dublin_core/ While at this site, click on Semantics to see a list of the 15 Core elements with content descriptions and requirements. Click on Projects to see links to world-wide implementation efforts.

Examples of the application of Dublin Core Metadata tags to web pages at the Library of Congress' American Memory site: http://lcweb2.loc.gov/ammem/award/docs/dublin-examples.html