« What do we need to know? | Main | Compost update »
June 23: Introduction to metadata standards
Summary *Pre-lecture blogtalk* Blogs, forums, and wikis; copyright and copyleft *Lecture* Data vs. information recap; ceci n'est pas une pipe; syntax and grammar; metadata vs. AI; introducing XML, RDF, and DC *Exploration* Semantic web Deep thought of the day: Why do we feel guilty for not posting entries? We're hardwired not to ignore social spaces. *Pre-lecture blogtalk* Professional vs. personal blogs and their effect on authority Blogs, forums, and wikis -- different takes on the same idea Are librarians threatened by blogs? Authenticity: how can you prove you are who you write about? Does literary merit trump "truth"? See Jonathan Delacour Shelley Powers: Burning Bird What of Copyright? Internationally: no protection But does lack of copyright protection prevent collaborative/personal artmaking online? Intro to copyleft: Machine code is in your computer. No human can read it. Get a complier and ... Now they're at the level of code. Richard Stallman (kind of a nutbar) developed copyleft in order to escape the tragedy of the commons. This made it difficult to collaborate on projects. Trust is implicit in long-term, group projects. Copyleft=this material is free to use, modify, etc., but you must distribute any derivative work with the source code in the future. Think of it as a fence around the commons. You can go in, but you can't take stuff out. Hence, the more code is in the commons the more valuable the code is. New hackers can stand on the shoulders of giants. "Everything is free except the ability to destroy the foundation of that freedom." LINUX a good example -- reverse engineered from UNIX (clone in techspeak=not a copy) Makes the expensive free. *Lecture* Data vs. information recap Data=no human understanding. Again, the dark world of the CPU. Datum vs. data sets. Datum are senseless in isolation; data presupposes a grouping intended for a purpose. Sentence diagramming is just as much metadata as markup. Ceci n'est pas une pipe. "heh heh, I'm a famous surrealist painter and you're in a gallery and this will blow your mind." "Bathe in the human before we immerse ourselves in the machine" Nominalist vs. idealist (?), Platonic ideals. Fry a robot brain while you're at it Bridging the Gap Syntax: How we group data and signal its intended use (well-formedness). Has to be orderly or it's impossible to tell what you're saying. Grammar: How we group information and signal its intended meaning or interpretation. Has to make sense to the human. Distinction is kind of medieval, but grammar refers to individual words: Sometimes it makes sense to use them, sometimes not. Metadata: encoding for information metadata vs. AI Automatic classification: current algorithms are limited (mostly statistical, i.e, just math); true AI would indeed obviate tne need for metadata; it would also obviate the need for humans (save us, John O'Connor!) Manual organization: a social problem; difficult to program, difficult to maintain; not very scalable. Race between Semantic Web and Skynet Basic Metadata syntax embedded linked -- a question of addressing in both cases, there is a question of namespaces (the context for metadata) Introducing XML It is cool. Why? The universal syntax for metadata. A single metatdata parser can read anything in XML. XML is data and metadata blended. But what about the grammar? Things are written in XML. Things are NOT XML. I.e., HTML can be written in XML; (can MARC be written in XML?) Three kinds of metadata grammar Structural: Markup. The realtionship between bits of data. Descriptive: keywords and info about the info Administrative: manipulateds the data. e.g. draft and publish status in MT Metadata grammars Why do we have these? Because a community needs them. Specific needs are met by different types of metadata. Community-specific: (EAD, EdNA) General-purpose, or "glue" (DC, RDF) Introducing RDF Expressed as XML Magic triple: subject->predicate->object This is not a pipe! Introducing DC 15 fields extensible glues different domain-specific vocabularies together expressible in RDF, HTML, XHTML or XML. Different syntactical ways of expressing the same goal. The 15 elements of DC Took two years to develop. Trying to be as universal as MARC, without the knots. "Ancient concept" of distinct creator and publisher in online world -- will it be weird when these differences are obsolete? Relation field: to an index, or to nearby pages *Exploration* Semantic web intro: see www.amk.ca/talks/semweb-intro/ Some examples of modeling info objects via RDF Browsers don't do that much from an IT point of view -- they just present info for human consumption. Screenscraping is one method of cutting to the metadata chase: but requires a high level of detail in programming Now, we want information retrieval to be more automated -- we want AI or MD to do it for us. Resource Description Format (RDF) -- very low level description (see s->p->o above) RDF Schema lets you describe controlled vocabularies and use them to describe things Web Ontology Langauge (OWL): lets you describe relationships between vocabularies The higher the level, the more powerful the semantic web, as it allows more communication and interaction between info. Creating new contexts for the info described. Overview of RDF RDF="spceification that defines a model for representing the world, and a syntax for serializing and exchanging that model." RDF can be used without the fear that your work will have been in vain. A more self-describing approach, while more complicated itself, becomes more universal and easier to use to describe the info So where is the MD? In the headers of a document, followed by a blank line Syntax can be limited: the pairs of tag and content must all refer to the same item, but this can be worked around by providing multiple definitions (exmaple: a file for reivews and one for authors) like a relational database See RDF graph at www.amk.ca/talks/semweb-intro/ Don't get weirded out by RDF: It is the way it is because it describes things in the electronic world - pointers to electronic data URI (identifier) and URL (locator): the difference. URL is a locator. It can be found using this bit of data. URI is an identifier, a label that says this bit of data is unique. It doesn't matter what it means, it just IS. URI can be made up. URL must be resolvable to a place on the web. the Triplet Shuffle: Circles, lines with labels, and boxes. Circle=bit of data. Line=relationship to Box=terminal. Where you fall off the road. That's all. Difference between RDF and relational databases Toby: looks like entity relationship diagrams in databases. But DBs are weak when relationships change because each change=destroying and rebuilding tables. Steve: Is RDF more robust because each bit of data is unique? I.e., one and only one ISBN, author, etc. Bag in RDFspeak=unordered list. This creates anonymous resources, where there is no subject/object. There can be one-to-many relationships. RDF is better than relational DBs becuase it is more rigorous than them, which have all that cumbersome redundancy. If you remember nothing else from this lecture, think of metadata and data as a chain of events that need to get grounded in reality for anything interesting to happen. resources, properties, and literals -- read the web site in more depth Distinction between syntax and model: there are many ways of writing the graph -- XML, etc. Jumping into another example. wwww.ukoln.ac.uk/metadata/resources/rdf/examples/2/ 2 versions of a journal: Use the ISSN to ground a metadata framework for one entity with two manifestations: print and electronic. Nailing down the data. Electronic world brooks no vagueness. In this example, the print version establishes the basis for (IsBasisFor) the electronic version. XML allows you to declare namespaces, to allow you to understand what the URIs are identifying.∞ | June 23, 2003 in lis 875: where this blog began
TrackBack
TrackBack URL for this entry:
http://www.typepad.com/t/trackback/10353/277696
Weblogs that reference June 23: Introduction to metadata standards:
Comments
The comments to this entry are closed.




