Scalability

Here is my understanding of scalability. As the world grows, and more people get on the internet, more people will be requesting more data and services from remote places on the net. Since the capabilities of the net and servers remain relatively constant (bandwidth and processing power grow, but not as fast as the number of people on the net), it is necessary to move data and services closer to clients. This is why data caching is essential, along with replication (which is essentially preemptive caching). Caching of services is more problematic, but it can also be done. More often, services are merely manually replicated.

No matter how much things are cached, some things cannot be cached. So the next level up, to remain as scalable as possible, is to distribute the load both for finding out how to access the data or services and for actually accessing them. A hierarhical cache or a web of caches could help here as well. But for scalability, we have to assume the size of each cache remains relatively constant.

URNs

A global name system requires some degree of centralization. If a name to be resolved is not known in a local cache, then we must somehow avoid asking a central authority how to resolve it, to avoid overloading it. A hierarchical name scheme allows us to try to resolve a prefix of the name, where the prefix corresponds to a naming authority, and once that is found, it may be asked to resolve the full name. I don't know that a hierarchical name scheme is required, but I don't know any other scheme that could work.

The issue of URN scalability should be divided into two separate issues: the scalability of name assignment and of name resolution. Assignment of trillions of names is not really very difficult, in my opinion, given even a modest subdivision of the name space. Having a large enough name space is also fairly easy, but there should probably be at least some subdivision of the name space if only to enable efficient assignment of names. The implication of having a structured name is that not all possible names will be used, so the name space needs to be larger to accomodate the wasted space.

The real problem comes in the resolution of URNs since that happens many times for each document that has been created. Therefore, the scalability requirements for URNs must be geared toward solving the name resolution problem, not the name assignment problem.

To be scalable, I believe the name space must be hierarchically structured. Whether the hierarchy is at the level of naming authories or within each naming authority (as a prefix to the opaque string) makes no difference as long as it is a standard, public hierarchical scheme. That is, it is not sufficient for the hierarchy to be hidden within opaque strings, known only to name resolvers. The reason for this standard, public hierarchical scheme is so that resolution of URNs may be delegated to resolvers that either know how to resolve a particular URN or know where to delegate the resolution, and furthermore so that clients may also know which resolver to go to directly, or which one is close if a direct one is not known.

For the same reasons that DNS cannot be a flat name space, URNs cannot be a flat name space. The URN resolution problem is several orders of magnitude more difficult than domain name resolution.

In IETF documents, I've seen references to the *possible* need for a hierachical name space, but I believe scalability *requires* it, or at least, I haven't seen any argument to the contrary.


Daniel LaLiberte (liberte@ncsa.uiuc.edu) NCSA 152 CAB 605 E Springfield Champaign, IL 61820
Last modified: Fri May 23 17:02:54 CDT 1997