RDF Primer Primer

Editor's Draft, 20 November 2002

This version:
http://notabug.com/2002/rdfprimer/1 (slightly modified)
Latest version:
http://notabug.com/2002/rdfprimer/
Previous version:
None.
Editors:
Aaron Swartz, me@aaronsw.com
Series editor:
Brian McBride, Hewlett-Packard Laboratories, bwm@hplb.hpl.hp.com

Status of this Document

This document was written by Aaron to fill what he saw as a hole in RDF's documentation. It has not been looked at or approved or asked for by the Working Group.

Note: If you do read it, please do let me know how it was!

Introduction

The Resource Description Framework (RDF) is a system for storing and sharing data between computer programs. This document will help you understand the important pieces of RDF quickly and then walk you through writing your own RDF files, and creating your own types of RDF information (often called RDF vocabularies or schemas).

One of the most common uses of RDF is for people who have databases on the Web to provide the information in this database in a way that computer programs can more easily use. HTML, the standard for Web documents, is great for humans but not for programs. Think of RDF is the analogous language for Web data.

For our examples, you'll be a programmer and Yoyodyne, a maker of widgets, sprockets and frobni. You'll provide your company's catalog in RDF, using the TMRC catalog standard. You'll learn how to understand the RDF encoding of the standard, also called its schema. Finally, you'll learn how to make schemas of your own.

RDF in 5 Minutes

The data model of RDF is extremely simple. There are URIs, which identify things; blank nodes, which identify things which don't have their own URI; and literals, which are used for pieces of text.

In the simplest RDF format, N-Triples, each of these are distinguished by the special way we write them. URIs (often called URLs) are surrounded with angle brackets: <http://example.net/rdf/title>. Blank nodes are identified by giving them a placeholder name: _:myWidget. Literals are identified by surrounding them with quotes: "Mega Widget 2002".

We put these individual pieces together to form RDF statements, which are like English sentences. RDF statements are also pretty simple: they have a subject (the thing you're talking about), a predicate (what you're saying about it), and an object (the thing you're saying). For example, take this English sentence:

My widget has the title "Mega Widget 2002".

"My widget" is the subject, "has the title" is the predicate, and "Mega Widget 2002" is the object. Here's that same RDF statement in N-Triples:

_:myWidget <http://example.net/rdf/title> "Mega Widget 2002" .

(Yes, the period is required.) To create an RDF document, we simply string these statements together, one to a line:

_:myWidget <http://example.net/rdf/title> "Mega Widget 2002" .
_:myWidget <http://example.net/rdf/description> "Dark brown. Goes well with coffee." .
_:myWidget <http://example.net/rdf/price> "$19.95" .

Writing out the full URI each time gets a bit cumbersome, so in this document we'll use an abbrevation called a qualified name (qName). ex: will stand for http://example.net/rdf/ so we can write <http://example.net/rdf/title> as ex:title. Here's a full statement using this abbreviation system:

_:myWidget ex:title "Mega Widget 2002" .

(Note that these abbreviations aren't valid N-Triples; we're just using them here for convenience.)

That's it! The RDF model is officially specified in RDF: Concepts and Abstract Data Model ("RDF Concepts" for short).

Making Money with RDF

Now that you understand RDF, it's time to start writing it. Let's pull out the TMRC catalog standard... Ah, here's the summary page:

The URI for the catalog standard is http://tmrc.example.org/catalog/. It's often abbreviated as cat:.

Classes

Properties

From the RDF core vocabulary (abbreviated rdf:) we use:

From the Dublin Core Elements (abbreviated dc:) we use:

And we define:

price
The cost in US dollars, without dollar sign but with optional decimal portion (e.g. "200.47", "20").
color
A structured element with a lowercase dc:title ("red", "green", "fuschia") and a hexColor ("F00", "0F0", "F0F").
hexColor
The hex color code with a red digit, a green digit and a blue digit.
...

Seems reasonable. Let's take a page from on our Web catalog:

Titanium Goorplaster 27

[Frobnitz] Very high quality; ideal for industrial use.

Price: $200.47
Color: fuschia
...

Here's how we would write that as RDF:

<http://yd.example.com/catalog/fg27> rdf:type cat:Frobnitz .
<http://yd.example.com/catalog/fg27> dc:title "Titanium Goorplaster 27" .
<http://yd.example.com/catalog/fg27> dc:description "Very high quality; ideal for industrial use." .
<http://yd.example.com/catalog/fg27> cat:price "$200.47" .
<http://yd.example.com/catalog/fg27> cat:color _:b1 .
_:b1 dc:title "fuschia" .
_:b1 cat:hexColor "F0F" .
...

Pretty simple, eh? The only complicated thing is the blank node, _:b1. Those names can simply be generated; just be sure not to use the same name for two different things in the same document.

For tips on writing RDF documents, check out Sean B. Palmer's Semantic Web Tips.

N-Triples is defined in RDF Test Cases, N-Triples section.

Specifying the Specification

RDF vocabularies are often defined in RDF, using RDF. They follow the same basic structure as our catalog standard above, except they provide some more data for machines to deal with. Here's how we'd encode the above in RDF Schema (abbreviated as rdfs:):

cat:Widget rdf:type rdfs:Class .
cat:Widget rdfs:label "Widget" .
cat:Sprocket rdf:type rdfs:Class .
cat:Sprocket rdfs:label "Sprocket" .
...

cat:price rdf:type rdfs:Property .
cat:price rdfs:label "price" .
cat:price rdfs:comment "The cost in US dollars, without dollar sign but with optional decimal portion (e.g. \"200.47\", \"20\")." .
...

Some things to note:

Sometimes you'll have "subclasses", classes which are a specialized version of another class. For example, a zoo:Chihuahua is a subclass of zoo:Dog and a cat:Wodget is a subclass of cat:Widget which is a subclass of cat:Product. In RDF, we indicate this using the rdfs:subClassOf property:

cat:Wodget rdfs:subClassOf cat:Widget .
cat:Widget

The same thing can happen with properties: fam:father is a special kind of fam:parent, cat:salePrice is a special kind of cat:price. RDF has rdfs:subPropertyOf:

cat:salePrice rdfs:subPropertyOf cat:price .

Another useful thing is to say what rdf:type of subjects and objects a property can have. We specify the subjects with rdfs:domain and the objects with rdfs:range. If we want to say that the range is a literal, we use the special class rdfs:Literal.

cat:Color rdf:type rdfs:Class

cat:color rdfs:domain cat:Product .
cat:color rdfs:range cat:Color .

cat:hexColor rdfs:domain cat:Color
cat:hexColor rdfs:range rdfs:Literal .

These rdfs: terms are defined in RDF Vocabulary Description Language 1.0: RDF Schema ("RDF Schema" for short).

Deeper into the Semantic Swamp

Reification: Statements About Statements

Reification means turning something you use to talk (like an RDF statement or an English sentence) into something you can talk about (like something with a URI). In English, we do reification using quotes. In RDF we do it with some special properties.

Reification uses four special RDF properties: rdf:Statement (the class of reified statements) and rdf:subject, rdf:predicate, and rdf:object (the properties that identify each part of a triple). So an English statement like:

"Widget X is on sale for $19.95," said John.

would be encoded in RDF as:

_:s rdf:type rdf:Statement.
_:s rdf:subject <http://yd.example.com/catalog/widgetX> .
_:s rdf:predicate cat:salePrice .
_:s rdf:object "19.95" .

Since this can get a bit wordy, again we'll use an abbreviation. We'll represent the above four triples by using the following notation:

_:s :- {<http://yd.example.com/catalog/widgetX> cat:salePrice "19.95" .}

Again, this isn't valid N-Triples but it's a useful abbreviation.

Like English, even when the contents of the reified statement are the same, you can't merge the nodes. If you heard:

"we are engaged in a great civil war" was said on November 19, 1863.
"we are engaged in a great civil war" John said.

You wouldn't conclude that John said it in 1863. Similarly, from:

_:x :- {:US :engagedIn :CivilWar .} .
_:y :- {:US :engagedIn :CivilWar .} .

_:x :saidOn "1863-10-19" .
_:y :saidBy :John .

you can't merge the two reification nodes and conclude:

_:x :saidOn "1863-10-19" .
_:x :saidBy :John .

In this sense, a reification refers to a specific statement — an instance of a triple — not the abstract triple itself.

Closures: What You Can Conclude

To be written.

These issues are discussed in RDF Semantics (formerly "RDF Model Theory" or "MT" for short).

RDF in XML's Clothing

To be written.

pubrules check