Michael Shapiro <mshapiro@ncsa.uiuc.edu>
Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."
To learn the current status of any Internet-Draft, please check the "1id-abstracts.txt" listing contained in the Internet-Drafts Shadow Directories on ftp.is.co.za (Africa), nic.nordu.net (Europe), munnari.oz.au (Pacific Rim), ds.internic.net (US East Coast), or ftp.isi.edu (US West Coast).
This Internet Draft expires 17 Jan 96.
Last modified: Fri Sep 15 13:18:15 CDT 1995
This document is also available in HTML at:
http://www.hypernews.org/~liberte/www/path.html
Modifications of that document relative to the internet draft are shown in italic font (none so far).
The significant features of the path URN scheme include the following:
The resolution process is highly scalable due to several factors. Resolution is distributed as much as the named resources themselves are. (This also permits the resolution of names to be handled by servers that are motivated to maintain the service because they also serve the named resources.) The public hierarchy enables clients to make use of caches of resolver locations.
The resolution process is reconfigurable to support additional scalability and persistence of names in the event of relocations. The responsibility for resolution of a part of a name space may be delegated to another resolver or several parts of the name space may be recombined and resolved by a single server.
The resolution process has a built-in fallback mechanism in case the original resolver is uncooperative in forwarding references to resources that have moved.
The resolution and name assignment mechanisms are easily deployable since they use existing DNS technology and URL resolution schemes such as HTTP and FTP. Only a small amount of path-specific code is added to clients or proxy servers. Existing URLs may be automatically mapped to path URNs.
A path resolves first into a list of sets of equivalent URLs, and then second, that list is resolved into the named resource using one of the URLs. The type of the resource is identified by the protocol of the particular URL that is used; if metadata for the resource is desired instead, the particular URL scheme may provide it. The path URN scheme does not depend on URCs.
Names of resources are assigned by naming authorities that are responsible for a subtree of the name space, and naming authorities may delegate naming responsibility to sub-authorities. The top-most naming authority in the hierarchy is known as the root naming authority. Each naming authority corresponds to a name resolution service; a name resolution service may be shared by several naming authorities.
A naming authority may create any new name for a resource as long as the encoding rules described below are met. Once a name has been assigned, it should never be assigned again for a different resource, as per the URN requirements. Naming authorities are responsibile for meeting this uniqueness requirement.
A path name may be declared by the appropriate naming authority as the name of a collection of resources. Such a name must end with a final "/". The resource that a collection name resolves into is undefined by the path scheme protocol. Not all prefixes of path names are guaranteed to be names of collections.
An automatic mapping from most FTP and HTTP URLs to path URNs is feasible and will speed deployment. However, the generated names may not be appropriate for some HTTP URLs due to encoding requirements or misleading semantics, so some manual intervention or customization of the generation process will be required. Since the process is repeatable, the same generator service may be used as a URN lookup service given URLs. The generator service is not described in this document.
The resolution process is described in two steps. The first step resolves the name into an ordered list of URL-sets. The second step attempts to resolve URLs from successive sets in the list until the resolution succeeds or the list is exhausted.
The first step in the resolution process involves traversing the components of the path, left to right. Each component in the path (except the final opaque string) has two functions. One function is to provide a context for resolving the remainder of the path. The context for resolving the first component is the resolver for the root naming authority. The other function is to optionally provide a set of equivalent URLs (called a URL-set) constructed from URL-prefixes and the remainder of the path. All URLs in a set are equivalent in that each should resolve to the "same" resource, if it resolves at all.
The first step ends when no more URL-sets are found. The result is a list of URL-sets ordered from most-specific to least-specific in the reverse order that they were discovered during the first step.
The second step is to resolve the list of URL-sets to the named resource. A URL (which may be a URN) is selected from the first set (e.g., randomly) and resolution of the URL is attempted. Any of the URLs may be URNs, even other path URNs. If the resolution fails because the URL service is unavailable (e.g. connection failure), another URL is selected from the set, until none are left; retries with exponential backoff may then follow, or the path resolution process may be declared a failure. Alternatively, if the resolution of a URL fails because the URL is unknown, then the process is repeated with the next set in the list. The process is repeated until the resolution succeeds or the list is exhausted (which implies resolution failure).
If the resolution of a URL results in a redirection to yet another URL, then that redirection should be followed to determine if it succeeds before declaring that the first URL has been resolved. A failure to resolve the redirection should be treated as the same kind of failure to resolve the first URL.
The resolution process may be dynamically reconfigured in a number of ways to meet the requirements of scalability and persistence.
Administrators of a resolution service may want to delegate resolution to sub-resolvers for one of two reasons: to reduce the load on a resolver, or to allow a sub-resolver to be located elsewhere on the internet.
/ | ------------------------------- A1 A2 | | -------------------------- B1* B2* | | ---------- | C1 C2* C | D*
This section describes more details of the path scheme resolution process using existing capabilities of the Domain Name System (DNS) [3]. In principle, the path scheme protocol could use any global, hierarchical name system that provides the necessary functionality, but it is necessary to specify one protocol so clients and servers can communicate. The main reason for using DNS is that it is widely deployed and relatively stable.
The path name space may use existing the DNS name space, or a newly created name space within DNS devoted to the path name space, or some combination of both. (This draft does not specify which will be used.)
A small amount of new code is required on the client side to drive the resolution process, but generic proxy mechanisms available in many WWW browsers may be used with a path proxy server to share the process across a number of clients.
The "path-u" TXT record is followed by a single URL-prefix. Note that a URL-prefix is not necessarily a full URL; it specifies a resolution service and it is used to construct a full URL during resolution. There may be multiple "path-u" TXT records for a single DNS name, and each should logically specify equivalent resolution services.
The DNS step of the resolution process proceeds as follows.
If there are any "path-u" TXT records for a particular DNS name, then a URL-set is constructed from the URL-prefixes in the TXT records and the set is added to the head of the list. The URLs in a URL-set are constructed by appending the remaining components of the path and the opaque string to each URL-prefix.
For example, suppose that while resolving path:/A/B2/C1/doc.html, we discover the the TXT record corresponding to the DNS name b2.a. isTo clarify the above algorithm, some examples are presented. The examples use the partial document tree specified previously. The DNS entries for this partial tree are:
TXT
a. -none-
b1.a. "path-u http://ietf.org/path/docs"
c2.b1.a. "path-u http://www.org:70/docs"
b2.a. "path-u http://ietf.org/path/docs"
d.c.b1.a. "path-u http://www.org:70/docs"
"path-u http://w3c.org/docs/www"
/A/B1/C1/doc.html
a. no "path-u" record
repeat with b1.a.
b1.a. URL http://ietf.org/path/docs/c1/doc.html
repeat with c1.b1.a.
c1.b1.a. unknown DNS name - done
List of URL-sets is
{http://ietf.org/path/docs/c1/doc.html}
/A/B2/C/D/doc.html
a. no "path-u" record
repeat with b2.a.
b2.a. URL ftp://ietf.org/path/docs/c/d/doc.html
repeat with c.b2.a.
c.b2.a. no "path-u" record
repeat with d.c.b2.a.
d.c.b2.a. URL http://www.org:70/docs/doc.html
URL ftp://w3c.org/docs/www/doc.html
done
List of URL-sets is
{http://www.org:70/docs/doc.html, ftp://w3c.org/docs/www/doc.html}
{ftp://ietf.org/path/docs/c/d/doc.html}
After constructing a list of URL-sets, it must be resolved into the named resource. The list of URL-sets could itself be an object that may be passed back from proxy servers to clients or cached for later use. But here we describe the resolution of the list of URL-sets into the named resource independent of which agent resolves it or whether it is a first class object.
A URL is selected from the first set (e.g., randomly) and resolution of the URL is attempted. Any of the URLs may be URNs, even other path URNs. The appropriate protocol, as indicated by the scheme of the URL and user preference, is used to resolve it. For example, if the URL were http://ietf.org/path/docs/c1/doc.html, then the HTTP protocol is typically used to resolve that URL using the GET method.
If the resolution fails because the URL service is unavailable, another URL is selected from the set, until none are left; retries with exponential backoff may then follow, or the path resolution process may be declared a failure. Resolution may fail because the server doesn't exist, or the connection times out before or after it is made, or the server returns an error code indicating that the service is unavailable.
Alternatively, if the resolution of a URL fails because the URL is unknown, then the process is repeated with the next set in the list. The process is repeated until the resolution succeeds or the list is exhausted (which implies resolution failure).
If the resolution of a URL results in a redirection to yet another URL (which may be a URN), then that redirection should be followed to determine if it succeeds before declaring that the first URL has been resolved.
(This section will describe what administrators of naming authorities and resolvers need to do to manage their portion of the path name space.)
The encoding rules may vary depending on the underlying implemenation, but, again, we assume DNS is used. Therefore, the components of a path must be compatible with DNS <label> names. Hex encodings must be used for uppercase characters in the name that are to be distinguished from the corresponding lowercase characters. Hex encoding is also required for dot ("."), the DNS component separator, and slash ("/"), if it is used within a component name or the opaque string. Here is a BNF description of the encoding rules.
<path-urn> ::= "path:" <name>
<name> ::= <path> "/" [ <final-part> ]
<path> ::= "" | "/" <label> [ <path> ]
<final-part> ::= any ascii character except "/"
<label> ::= any ascii character except "/", or "."
The decentralized path scheme is arguably less vulnerable to attack than are centralized services.
The path scheme depends on DNS for most of the resolution process, and insofar as DNS is secure or insecure, so is the path scheme. A more complete reference of relevant weaknesses should be included here.
The hierarchical path scheme allows security constraints to be imposed on just the subtree of names that require it. The resolution process hides whether a name actually is resolvable by first requesting authentication.
Daniel LaLiberte MIT/LCS/W3C NE143-344 545 Technology Square Cambridge, MA 02139 liberte@w3.org Michael Shapiro National Center for Supercomputing Applications 152 Computing Applications Building 605 East Springfield Avenue Champaign, IL 61820 Tel: (217) 244-6642 mshapiro@ncsa.uiuc.edudraft-ietf-uri-urn-path-01.txt