XML Linking: State of the Art
Eve Maler, Senior XML Standards Architect
Sun Microsystems XML Technology Center
Back in the early days -- that is, two years ago -- XML was most often compared to HTML, and in the eyes of many, XML came up short. HTML was a simple language anyone could learn; XML had complexities that could confuse developers. HTML had built-in formatting; XML needed a stylesheet to be displayed as anything other than raw code. HTML had hyperlinking functionality with the
<A HREF> tag; XML didn't even give you a linking starter kit for embedding hyperlinks into XML in a standardized way.
Today, we know that XML is scalable and flexible in ways that would stretch HTML to the breaking point, which has allowed XML to become the "universal solvent" for all data, not just the narrative information that HTML was originally designed to hold. However, if XML is to capture one of the most important features of the web, it still needs to offer a standardized way to do linking. The goal of the XML Linking Working Group of the World Wide Web Consortium is to provide exactly this, and we're closing in on our goal.
This paper describes the features, benefits, and basic technical details of XML linking technologies.
The web has an ever-growing set of information resources interconnected by links. Every resource has a URI, or Uniform Resource Identifier, which enables you to find it on the web.
These resources are of many types. For example, in addition to HTML, there are graphic files in various formats and, increasingly, XML. Each type of resource must adhere to the rules of its own data format, and cannot speak the language of any other type. For example, HTML files must contain tags in angle brackets, whereas GIF files are encoded entirely differently.
However, for any resource type that can contain URIs, a resource of that type has the ability to use URIs to link to other resources of any type. Thus, an HTML file can link to an image file, allowing a picture to be presented as if it were embedded directly in the HTML content. Similarly, some non-HTML formats (such as mail messages) are allowed to contain linked regions within them so that users can click on a region to go to an HTML document.
One strength of web linking as it exists today is that it can associate resources of arbitrary types. Another strength is that it can associate arbitrary individual resources. If you want to create a link, you just need write access to the starting point (the file in which the link is coded); you don't need someone's permission to make their resource an ending point.
Most pre-web hypermedia systems had a goal of ensuring that links never break (that is, lead to the wrong thing or to nothing). Compared to these systems, the architecture of the web is unusual in that it tolerates broken links; the worldwide system doesn't grind to a halt if some resource isn't where it's supposed to be. Seen this way, the web's tolerance of broken links, while sometimes an annoyance, is actually a strength.
Linking, XML Style
XML was designed to build on the popular features of HTML while offering a more flexible, scalable data format. Likewise, XML linking is designed to take all the good parts from HTML linking while adding much more powerful functionality.
The XML Linking Working Group has created two specifications to solve the two main requirements of XML linking:
XLink is an XML vocabulary that allows XML resources to contain links themselves. XLink can be used to create HTML-style hyperlinks, but also has new features that make it easier to manage links over time and to create new "information products" consisting just of links.
XPointer is an adjunct to URIs that allows XML resources to be addressed into (for example, by links, XML or otherwise). XPointer can be used to get to any arbitrary region inside an XML file, unlike HTML.
This example illustrates many of the features of XLink and XPointer.
Despite the strengths of web linking, HTML links have some limitations.
For example, it can be a challenge to keep all the links pointing to the right places on a large web site with handcrafted content; if the file system is reorganized, someone has to go back and edit all the documents containing links that point to files that have moved.
Also, not all resources can contain links. If you're a professor reviewing a student's paper online and you want to annotate a particular sentence so that the student can read the paper and click on your link to see your comments, you're out of luck because you don't have write access to the paper. If you were a film professor reviewing a student's video, your luck may be doubly bad because the video format may not allow the insertion of links at any point.
XLink helps solve the challenges presented in both the above scenarios. How? In HTML you have to surround the starting point with an
<A> element, and then provide a URI to the ending point; in order to do this, you have to have write permission on the resource containing the starting point in order to do any linking at all. In XLink, you simply provide a URI reference for both the starting point and the ending point — no permissions are required for either one, and there's no need to edit the starting points to fix them when the ending points get moved around.
Because XLink allows this, we can expect "content" to be created that consists solely of huge databases of links, with no content actually created by the link authors.
Following are details on how XLink works.
The Model Underlying XLink
The three main types of items in the XLink universe are:
The link itself, which brings the other components together and which may have metadata attached describing the link as a whole.
simplelinkelements in the code example represent XLink links.
Participants in the link. They can be whole XML or non-XML resources, or they can be subresources that consist of portions or fragments of those resources (such as a single XML element identified by means of an XPointer). You can attach metadata describing each participant, and you can have as many participants in a link as you want. You indicate a participant either directly by containing it in your link element (a local resource) or indirectly by means of a URI reference (a locator to a remote resource).
In the code example, the first
bibelement and the
simplelinkelement represent resources that participate in XLink links. The
locelements and the
xlink:hrefattribute represent locators.
Arcs that pair up participants and indicate which is the starting point and which is the ending point for link traversal. Again, you can attach metadata that describes each arc. One important kind of metadata describes what happens when the link is traversed -- either when someone clicks to activate it or when a browser or other application activates it by other means.
In the code example, the
simplelinkelements represent specifications of XLink arcs.
The types of metadata you can apply to these components include both machine-processable roles modeled on RDF's (Resource Description Framework) notion of properties and human-readable titles. You might use titles, for example, to provide mouse-over text that helps a user choose the correct link.
XLink's Markup Design
In order to include XLink features in an XML document, you need to express them in terms of elements and attributes. XLink uses a somewhat unusual vocabulary design: It consists only of attributes, so that you can designate whatever elements you want to be linking elements. In this way, XLink is an "enabling" vocabulary; you don't use it by itself, but rather incorporate it into your own vocabulary. (For this reason, there is no normative DTD for XLink.) The XLink vocabulary is in a namespace with the name
As an example, to create an XLink "arc" element, you would use the XLink
type attribute with a value of "arc" on the desired element, here called
The possible XLink types are
none. For each type, there are rules about what other XLink attributes may appear on that element, and about what values those attributes may have. In the code example, the
arc element has
xlink:to attributes. These attributes have an effect only if they are on an element with an
xlink:type attribute value of
Kinds of Links and Arcs
extended-type element provides a fairly close mapping of the XLink model to actual syntax. Extended links allow you to specify an arbitrary number of participants and arcs, where any one participant, starting or ending, may be local or remote. Groups of extended links stored together are called linkbases (link databases).
The local/remote distinction in any one arc turns out to be important: If its starting point is local (contained in the link), then the link itself contains all the information required for traversing the arc. This is called an outbound arc because it "emanates from" the linking element when traversed. If the arc's starting point is remote but its ending point is local, it is called an inbound arc. If both the starting point and the ending point are remote, the arc is called a third-party arc. In the code example, the two extended links contain only third-party arcs.
Of course, in the cases of inbound and third-party arcs, XLink processors need to be told where they can find the link information, because it doesn't reside in the same place as the starting point. If they can't find the link, a user viewing the resource containing the starting point will never know the link is there -- the starting point just looks like undistinguished content. XLink defines a way for link processors to hunt down linkbases that might be relevant for a document, so that the starting points can be traversed.
As you may have noticed,
simple-type elements seem to play multiple parts in the model. They provide a more convenient and compact syntax for a very common kind of extended link, one which has exactly two participants and one outbound arc. This is identical to the linking structure of HTML
Traversal Behavior Options
The example shows two cases of assigning traversal behavior to an arc. The arc created by the
arc element, when traversed, is supposed to show the ending point in a new window on a user's (or application's) request. The arc created by the
simplelink element is supposed to show the ending point in the same window as the starting point, but again only when requested.
xlink:show attribute controls the options for presentation of the ending point, and the
xlink:actuate controls the options for the type of event that kicks off the traversal. Each has a finite list of allowed values. The attributes have two values in common:
none, which specifies that the presentation or actuation is not controlled at all by XLink processors, and
other, which specifies that XLink processors should examine other (non-XLink) attributes on the element in order to discover link-processing instructions.
Other values for the
xlink:show attribute are
replace, explained above, and
embed, which embeds a presentation of the ending point directly into the presentation context (window, pane, or whatever) of the resource containing the starting point. This value lets you achieve the effect of the HTML
Other values for the
xlink:actuate attribute are
onRequest, explained above, and
onLoad. You can use different combinations of attribute values to achieve different effects. For example, with a combination of
onLoad, the link produces the effect of an automatic redirect.
When you provide a URI reference that has as its target an HTML document, you have the opportunity to identify a spot within the document to which a browser should scroll. Essentially, you're addressing into the HTML document, not just to it.
However, pointing inside an HTML document only works if the creator of the HTML document was kind enough to provide an
<A NAME> or
<A ID> for you to use; you point to locations that weren't so identified. Also, the best you can do with HTML is point to the spot where such an identifier occurs, not a whole "chunk" such as a paragraph or division. That is because the structure of the average HTML file is not very regular; even though there is a formal DTD specification for HTML, browsers don't require that you comply with it. Broken HTML files make it hard to point to a particular region without ambiguity about which region was actually intended.
XPointer takes advantage of XML's inherent structure to allow addressing into any portion of an XML document. Programmatically, you just use the structure to provide guideposts in your description of the desired content: "Give me the first paragraph in the section of the paper that has ID 'compareAndContrast'," or "Give me the second through the last of the
item elements in the purchase order."
XPointer gets most of its power from XPath, which deals with whole nodes such as elements and attributes. What XPointer adds to the mix is the ability to address arbitrary ranges of content, even if they don't form whole nodes. For example, in our college professor scenario, if the professor wants to comment on a passage that spans the last sentence in a paragraph, even if the sentence has no distinguishing markup around it, XPointer can handle it.
Following are details on how XPointer works.
XPointer and URIs
The combination of a URI and a string beginning with a crosshatch (#) character is called a URI reference, and the string after the crosshatch is called a fragment identifier. The defining standard for each MIME type has the opportunity to define the fragment identifier language to be used in URI references that point into resources of that MIME type. For example, HTML's fragment identifier language consists of a simple string that references the value of some
<A NAME> or
<A ID> value in the document.
XML's structure allows for a much finer granularity of referencing, and so the fragment identifier language for resources of MIME type
application/xml is correspondingly more complex. XML's fragment identifier language is defined by the XPointer specification.
Keep in mind that XPointers are used when your URI points to a document of type
application/xml. Therefore, XPointers can actually appear inside any kind of document that can contain URIs -- not just XML documents.
The XPointer Language
The following five XPointer fragment identifiers appear in the code example (though not in the order shown here):
These XPointers illustrate much of the variety you can find in the XPointer language. To keep things simple, each is attached to a null URI (that is, the URI field before the crosshatch is empty), which means that each refers to a location in the current document.
The first XPointer is an example of a bare-name XPointer. It points to the
bib element. Bare-name XPointers function very much like HTML fragment identifiers, except that they find any element that has an attribute of type
ID (as opposed to an HTML element called
ID). In this way, it is also like XML's attribute type
IDREF; if you are pointing into the current document, bare names differ from the syntax of
IDREF only in the addition of the crosshatch character.
The second is an example of a child sequence XPointer. It counts element children to locate the desired element. The
/1 refers to the
doc element, the
/2 refers to the
body element, which is the second child element inside
doc, and the
/5 refers to the desired
prod element, which is fifth inside
body. A child sequence can begin with either
/1 or an ID value.
The last three XPointers show the full syntax, which starts with the keyword
xpointer and is followed by a parenthesized expression. The expression is a superset of XPath; in fact, the third and fourth XPointers contain expressions that are directly usable as XPath expressions. The third XPointer locates the
prod element whose
num attribute value is
22. The fourth XPointer locates the first
citetitle element inside the first
para element anywhere inside the
The final XPointer demonstrates a feature that is unique to the XPointer language. String ranges allow for the targeting of substrings in XML content that are otherwise not marked up. This XPointer actually targets all the references to the word
prolog in the document. The XLink in which the XPointer is used creates arcs from each of them to the
prod element in which the
prolog construct is defined.
Not shown in the code example is another unique feature, a range. Ranges allow for the targeting of arbitrary ranges of XML content whether or not they contain whole nodes. To locate a range with an XPointer, you supply two inner XPointers as parameters to the
range-to() function for the starting and ending points of the range. Following is an example of a range that stretches all the way from the beginning of the
prod element with the
num value of
1 to the end of the
prod element with the
num value of
22, including both the paragraph and the production that are in between:
The Status of XML Linking
As of this writing, things are moving quickly in the XML linking world. XLink and XPointer (along with XML Base, another specification owned by the XML Linking WG) are in the W3C Candidate Recommendation phase, during which the W3C actively seeks implementation experience. After this phase, specifications typically move on to a Proposed Recommendation phase and then reach Recommendation status. Also, several W3C Notes have been published on technical matters related to XML linking, and one more is expected. The following sources are already publicly available:
- The public W3C XML Linking page, which lists known implementations: http://www.w3.org/XML/Linking
- The XLink specification: http://www.w3.org/TR/xlink
- The XPointer specification: http://www.w3.org/TR/xptr
- The XPath specification on which XPointer is based: http://www.w3.org/TR/xpath
- A Note on mapping XLink to RDF: http://www.w3.org/TR/xlink2rdf
- A Note exploring XLink's future use with existing XML vocabularies such as XHTML: http://www.w3.org/TR/xlink-naming
As an XML Standards Architect in Sun's XML Technology Center, Eve Maler specializes in the development of XML-related standards and vocabularies.
Eve was a charter member of the World Wide Web Consortium working group that created XML, and currently serves as Sun's voting representative to the W3C. She co-chairs its XML Linking working group and edits the XLink and XPointer specifications. Eve is co-author of Developing SGML DTDs: From Text to Model to Markup, the only book available on a methodology for designing DTDs.