Names and addresses
Another wack at a permathread in web architecture. A new URI scheme is not necessary to, nor does it actually, solve the perceived problem of names and addresses.
What's in a name? That which we call a rose By any other word would smell as sweet
Time and again, we see individuals and organizations inventing new URI schemes in order to tackle the problem of “names” versus “addresses”. That is, they want to provide some sort of a globally unique identifier for “This Thing” independent of where representations of that thing might reside. Almost inevitably, these individuals and organizations fall into the trap of thinking that an “http” URI is somehow an address and not a name and is, therefore, inappropriate for their purpose. They are mistaken. I used to believe this too and I was wrong. A new URI scheme is not necessary, nor does it actually solve the problem.
I fear that most of this essay will recapitulate arguments already presented in URNs, Namespaces and Registries, a TAG Finding under development by Henry Thompson and David Orchard, but this misunderstanding about the nature of URIs is so common, I think it probably bears repeating. (I'm not going to recapitulate all of the arguments, so please don't consider this essay any sort of substitute for the finding.)
By any other name…
If I hold up an object, an object that you have never seen before and which you do not recognize, and I tell you the name of this object is “HB88”, then that is its name. You might invent other names for it, “that weird cube”, “the object Norm held up in the meeting”, “Fred”, but you have no grounds to argue that “HB88” is not its name. One reason names exist is to facilitate the social process of communication. If you walk up to me after the meeting and ask, “Norm, can I see HB88?” and I respond, “Huh? What's that?” then I've violated your social expectation about names. But let's agree that we'll uphold social expectations. You'll ask me for HB88 and I'll hand it to you.
In fact, there is a potential problem with the name “HB88”: it might be ambiguous. There might be a dozen things named HB88 in the world and so I might not know which one you meant. But we're really going to be talking about URIs so we can avoid that problem.
Now, clear your mind of preconceptions. No preconceptions. Are you ready? If I hold up another object and tell you that the name of this object is “http://norman.walsh.name/knows/what#nikon-5700”, then that's its name. Don't try to decode that string for a moment, just accept it as a string. I assert that there's nothing intrinsic about that string that makes it less suitable as a name than “HB88”. Ok, it's sort of long and it's hard to pronounce, but let's agree those aren't important problems. I think we can agree that, in principle, it's just a string and if I use it as a name, it is a name.
Trouble is, the chances are good that you've been using the web for a while and you've become used to typing strings like “http://…” into your web browser. You do have some preconceptions about those names. I'll bet that many of you recognize “norman.walsh.name” as the name of a particular machine somewhere on the network and the some among you recognize that “http” identifies a protocol that can be used to transmit representations of the resource over the network.
Consequently, you might argue that the “http” name is a bad choice. It is tied to particular machine on the network and requires a particular protocol to access it. You are all too aware that DNS registrations expire and may change hands and that deleting or renaming files on that machine is likely to change the retrieval characteristics of the resource in question.
Instead, you might propose that the name “newscheme:x:y:n5700” is a much better name because it doesn't suffer from any of those problems.
But wait a minute. You've gotten way the heck out in front of me. We were talking about names. We weren't talking about retrievability or persistence or anything like that. None of those arguments has anything to do with names. As names, “HB88”, “http://norman.walsh.name/knows/what#nikon-5700”, and “newscheme:x:y:n5700” are entirely equivalent. They're just strings.
It turns out that what you really want then is a name that can be created in a distributed fashion, unambiguously identifies the resource you intended, is persistent, and can be used to retrieve representations. Ok, on that basis, let's toss “HB88” as inadequate and concentrate on the other two names.
Distributed names
There are lots of ways to achieve distributed naming, but most of them are pretty useless for this task because we don't just want names, we want (reasonably) memorable names. Computers have no trouble dealing with UUIDs, but humans just can't cope.
That makes the distributed naming task a social one. First, we establish some space for names, then we setup some system for dividing up the naming space, and then we setup some system for handing portions of it out to the folks that need to make up names.
In the “http” case,
                     I've leased a hunk of the naming space by registering
                     norman.walsh.name, so I
                     get to make up names that begin “http://norman.walsh.name/”
                     and you don't.
There's no technical solution to the problem of unlicensed use of names (making up names in a part of the space you don't have license to use), so that's an orthogonal issue.
In the newscheme case,
                     we'll have to create some other organization to create, divide, and
                     maintain the naming space.
                     If I had to place a bet,
                     I'd gamble that the organization that maintains DNS names will outlast
                     any new organization created to manage “newscheme” names, but I suppose
                     that's not really an issue for organizations with deep enough pockets.
The extent to which either kind of name satisfies our requirements for distributed names is dependent on the mechanisms that are established to facilitate their creation.
Unambiguous names
Assuming that there are no bugs in our system for assigning distributed names, the extent to which any individual name is ambiguous is an entirely social issue.
There's nothing that can be done technically to prevent me from using the same name for two different things. As long as there's nothing done technically that requires me to use the same name for two different things, it's just a matter of diligence and trust.
On this score, the “http” name and the “newscheme” name are indistinguishable.
Persistent names
The hard part of persistence, like distributed naming and ambiguity, is social. The only thing that makes a name not persistent is if someone uses it ambiguously. We've already agreed not to do that, so there's really no problem.
In the case of the “http” name, the only part of the name that appears beyond my control is the DNS name. If I fail to maintain my registration, it may get reassigned and the new owner may not feel any obligation to maintain the unambiguous nature of the names I assigned. The solution for this problem seems straightforward to me. If you're worried about the persistence of my names, ask me to demonstrate that I've purchased a 10 year lease on the domain name, or a 100 year lease, or a lease in perpetuity if you've got the legal framework to do that. Problem sorted.
In the case of “newscheme” names, users actually have to be persuaded to follow the mandate of using only the portions of the namespace that they've registered (considering the widespread use of unregistered URI schemes and unregistered URN namespaces, there's no reason to be optimistic on this score), the organization created to manage the naming space has to exist indefinitely, and it has to successfully manage the naming space.
Again, on this score, I think the safe money is on the DNS system.
Retrievable names
One common social contract is that a name, once created, will always refer to the same thing or sequence of bits. So, if I ask you for something with a particular name and you return “101010”, I can be certain that you will always return “101010” when I ask for that name. This is a matter of trust between us. Unless the sequence of bits is actually encoded in the name, there is nothing about the name per se that can enforce this constraint.
The same is true for any representation and any negotiated set of acceptable alterations to the sequence of bits (or whatever is returned).
So in terms of fidelity, all names are created equal.
By extension, if we agree to allow copies of the resource to be distributed across the network, the extent to which we can be sure that all the copies have appropriate fidelity is independent of the name of the resource.
So the question becomes simply, is the name retrievable?
On this score, the “http” name has a clear advantage. There already exists a huge, deployed infrastructure for handling efficient, cached distribution of resources over the HTTP protocol and tying the “http” name to that infrastructure is dead obvious.
It's important to note two things here:
- 
                           Contrary to what you may believe, there is nothing about the “http” URI scheme that requires use of the “http” protocol. And even where the “http” protocol is used, there's nothing about it that requires access to any particular machine. On your desktop, your web browser may return things from its cache without ever hitting the web. On my desktop, I've got even more infrastructure in place. An attempt to retrieve an http: scheme URI starts by looking up that URI as a name in a table and returning a local copy of the resource if it exists. If it doesn't exist in that table, it goes to a local proxy which may return a cached copy without ever hitting the web. Beyond that, the request goes off into the web where other caching proxies may come into play. 
- 
                           Even in the case where there are no cached representations and you're using HTTP to connect to a particular host name, there is no reason to believe that a given host name (e.g., “ norman.walsh.name”) refers to a single, physical machine. Although my domain is on a single machine, the W3C domain is hosted on at least a half dozen machines around the world. So an attempt to access a resource onwww.w3.orgdoesn't actually imply a network transfer across the globe to some machine in Cambridge, MA. On an even bigger scale, companies like Akamai have built a business around transparent, global distribution of enormous quantities of information.
So there's nothing unsuitable about “http” names from the perspective of retrievability.
For the “newscheme” names, making them retrievable would require deployment of an entire infrastructure in parallel to the infrastructure that already exists for HTTP.
In practice, this is so expensive, difficult, and impractical, that most systems defer to HTTP for the actual retrieval. So the mechanism for retrieving “newscheme:x:y:n5700” is to retrieve “http://example.org/resolver?newscheme:x:y:n5700” (or some such mechanism) which makes the actual bits returned subject to exactly the same issues as simply using an “http” name in the first place.
Pay up!
Beyond the fact that there's no reason to invent a new scheme, one of the things I find personally irksome about most of the proposals for new schemes is that the organizations proposing them want me to pay money for the privilege of using them.
I've already paid to register my domain name. I pay a hosting company to provide a server that responds to requests to access that domain name. I need to pay for a different URI scheme, why exactly?
In fairness, the organizations need to make enough money to stay in business so that they can live up to their obligations of managing the naming space and the other social aspects of names. Problem is, I don't see them offering anything I don't already have.
URIs are names
They're all names. There's no technical reason to invent new URI schemes to address the goal of providing names that can be created in a distributed fashion, that unambiguously identify a resource, are persistent, and can be used to retrieve representations.
I hope this essay helps to clarify that we already have all we need.
Comments
The practical problem I need to deal with is the identification of abstract things like books (identified with isbns) and articles (using dois). If I create a citation in a document with a uri like "urn:isbn:23429834" I'm far more confident in future interoperablity, retrieval, etc. than I would be if I used, say, an Amazon url for the same book.
How would those fit in your argument? Or is your argument here against something else?
I don't know what guarantees (if any) Amazon makes about the long term stability of their URIs. If they do make any such guarantees, then I think the Amazon URI is probably a better choice. The particular problem with ISBN numbers is that publishers reuse them, so they aren't really very good identifiers.
In this particular case, I wish that the registrar for the URN namespace "isbn" had instead setup http://isbn.org/ so that http://isbn.org/23429834 had the requisite distributed naming, persistence and ambiguity constraints.
As an added benefit, if typed into a browser, it could return bibliographic data about the publication.
Might be little OT question, but how can you decide whether http://kosek.cz is name for me, or name for my homepage? Until now, no RDF geek was able to answer my question. Topic Maps can solve this problem by using two different types of "names" -- subjectLocator and subjectIndicator. But RDF supports only one type of identifier.
OK, so your point is more about the organizations that invent new schemes than about the users who are forced to live with them.
ISBNs may be problematic, but they're certainly much more useful than the old BiBTeX methods: doe99.
The Librsry of Congress has their catalog numbers, but then they also have signed on to the info uri schema, so that you can have "info:lccn/334546656" (or whatever it'd be).
DOIs, OTOH, can be represented with http , with their own (unregistered?) "doi" prefix, or with the info schema.
All of this makes knowing what to do as a user and developer rather tricky suffice to say!
"so I get to make up names that begin “http://norman.walsh.name/” and you don't"
http://norman.walsh.name/foo-foo-magoo
Ha!
Nice essay otherwise! 8-)
Jirka, I assume you own
kosek.cz, so you get to decide.Yes, Mark. I know. There aren't any technical solutions to that problem.
Very good writing.
About the DNS/Name, I remember I commented about it when Web Arch was in last call. The DNS is for me bad in the sense, it's really a part of the infrastructure which relies on "private property" (economy), and not necessary social usage.
For things like ISBN - http://worldcatlibraries.org/wcpa/isbn/274275525X
I think everything is right here, except the part about unambiguous names. This is where the business about leasing the domain for a long time (or in perpetuity) belongs, not under persistent names. A name can be persistent even after the namer is gone and forgotten (who remembers who named London, though he probably spoke proto-Welsh?).
But since there are (AFAIK) no actual domain-name leases in perpetuity, there is every chance that the same name will be used legitimately by two different non-simultaneous owners of the same domain, particularly if it's an obvious one like "http://www.example.com/blog/atom.xml". (At least four people have owned hack.com that I know of for sure.) That's an ambiguity problem, not a persistence one.
The cure, of course, is to make sure a date reflecting the ownership of the domain gets into the name. There are a variety of possibilities here, like the W3C using URLs like http://www.w3.org/2001, or tag: URIs. Or there's my favorite, newsml: URNs, which are of the form urn:newsml:domain:date:serial:version", where "domain" is any domain owned by the namer, "date" means any date at which the namer had rights to the domain, "serial" is any unique string private to the namer, and "version" is irrelevant for this purpose (can always be 1).
Regarding the issue of URI persistence and stability, there are services and tools like purl to ensure that the URIs do not change even in the case of relocating the "resources" they identify in a different machine or domain.
There are in fact a number of solutions. The simplest one, proposed by Tim Berners Lee, is that you use the fragment identifiers to identify things or concepts. Since what is identified by a frament identifier depends on the mime type of the returned document this can be done.
Hence "http://www.w3.org/People/Berners-Lee/card#i" refers to Tim Berners Lee, but "http://www.w3.org/People/Berners-Lee/card" refers to his foaf:PersonalProfileDocument .
There are other ways involving http redirects to help you distinguish the one from the other, in usage.
Of course a name can name anything: documents, things, concepts, ... so it is not by looking at a name itself that you are going to be able to tell.
The very nice thing about using URLs for names is that one can get self describing concepts. See GET my meaning?
"http://norman.walsh.name/foo-foo-magoo Ha!"
Of course you can create names using other people's name spaces.
BUT:
- legally you are not the owner of the domain, so you don't control that name. IE. as mentioned in the article, one would be within one's rights to deduce from your non ownership of the domain that your coining was very unstable. If you made claims to the contrary you would probably be lying.
- technically: you can't put a meaning at the URL location, which means that people won't be able to GET your meaning. That is essential. Because names for which one can just GET the meaning will spread a lot faster, than names for which one can't.
No, the domain names are not persistent, even if your expectations are low. There are no companies which rent eternal leases and, moreover, even with a ten-years lease, your domain name can be hijacked or reverse-cybersquatted through an UDRP. If you trust your registrar to never make any mistake, you are wrong :-)
There are possible solutions but all are at the expense of resolvability, as you point out. My favorite is "tag" URI (RFC 4151) which include the date in the "domain name" so they are really permanent (but are not resolvable until someone builds a DNS time machine, may be on the top of a VCS).
So, for instance,
tag:norman.walsh.name,2006-07-25/blog/names-and-addressesis a really permanent name.When you say "I think the safe money is on the DNS system" and explain that the other registries (such as the very closed IDF for the DOI names) are not safer than the DNS system, I agree.
But there are names that do not require a registry:
* names choosen at random in a large space (OK, they are not memorable but they are free and wild),
* names that depend on a registry for the creation but not for the maintenance (such as tag URIs).
When you say "On your desktop, your web browser may return things from its cache without ever hitting the web", you are playing with words, I believe. The cache is just a local optimization, it changes nothing to the semantics of http URIs.
And when you say "On my desktop, I've got even more infrastructure in place. An attempt to retrieve an http: scheme URI starts by looking up that URI as a name in a table and returning a local copy of the resource if it exists", it is even worse. I get the point: http URIs may be resolved by local means (like it is commonly the case with XML catalogs). But, again, it is a local optimization, a sort of cache managed by hand. And you do not suggest that everyone has the same table as you have: you rely on the Web to carry the authoritative version.
Stéphane, I don't think the issues of domain hijacking are really relevant. Malicious abuse of any system will cause problems. And even though indefinitely leases aren't easily available, the overhead for actually achieving one is pretty low. Even a fairly conservative investment of $5,000 would yield enough return over ten years to buy another ten year lease. So that simple investment and some trust arrangement setup with a law firm would suffice.
Given that any organization that really cares about the long-term persistence of its URIs can keep its domain name in perpetuity with relative ease, I don't think the tag: scheme buys you very much. It's only relevant if I lose control of the domain name and if the future owner abides by the rules of RFC 4151.
I didn't mean to be "playing with words", though perhaps my point was not well made.
Proponents of "newscheme:" URIs often point out that one can establish some other sort of distributed resolution system for them. Whatever benefits (if any) that such a system may have, my point is simply that you can use http: URIs in that system. You don't have to invent a new scheme in order to use URIs to address resources stored on a magtape across the country, even if "retrieval" will involve shipping the physical magtape to me by overnight carrier.
I used to believe in the "http URI as names mantra", and thought that it was OK to use a http address as a namespace name. It had the benefits of using the social contarct of not stepping on someone's DNS registration to give globally unique names, and it gave casual readers a hint of who coined the name in the first place.
Unfortunately, XSD schema and some other later technologies and "findings" such as the one quoted in your article changed all that. these lead to an expectation that it is sensible to attempt a http GET on an http URI used as a namespace name, this despite the fact that the Namespace spec goes to such lengths to stress that a) the identity rules for namespace names and URI are different and b) it isn't a goal of the namespace spec to provide URI for schema retrieval.
If you publish a specification of an XML language using a http URI on a real server that you control, then you need to be prepared to accept requests on that server for ever, even if you just send back a 404 page each time that is still a cost. It is true that clients _could_ configure their caches not to hit my server every time every instance document is parsed, but there is nothing I as a language specifier can do about that. Well what I can do is use data: URIs for namespace names, or of course http URI to a non existent server. David
Norman, you write, about the tag URIs, "It's only relevant if I lose control of the domain name and if the future owner abides by the rules of RFC 4151." But it is not true. The future holder has nothing to do. tag URIs will work whatever the future holder will do (and even if there is no future holder) because they identify the domain at a point in time.
"As an added benefit, if typed into a browser, it could return bibliographic data about the publication."
isbn.nu provides exactly this service in exactly this way, and also returns price comparisons.
uri crisis again.
I always say that when this thread comes up to find it back on google later.
Norman, it seems that you are right in most points. A new uri scheme doesn't solve the problem in question. Keep the faith.
some links and a similar impressive load of comments are on my post on the same thing last year: http://leobard.twoday.net/stories/1165470/
Trick question, just for the fun of it: What is the uri of "love"?
"isbn.nu provides exactly this service in exactly this way". Not at all! It is just a personal Web site (see http://isbn.nu/about.html) which is nice and useful but offer no persistence at all.
Also, I tried the first book on my desk "Mondialisations et technologies de la communication en Afrique", ISBN 2-84586-547-3 and I got a "Title not found".
"these lead to an expectation that it is sensible to attempt a http GET on an http URI used as a namespace name.....even if you just send back a 404 page each time that is still a cost."
One way to reduce the cost is to use a non-existent subdomain in your namespace names. For example, if I own example.com, I could begin any namespace names I invent with http://namespaces.example.com/, but not actually set up a DNS record for namespaces.example.com. You'll get DNS queries for the non-existent subdomain, but nothing more. And if you don't host your own DNS server, there's no cost to you.
Even if you lease your domain name for 1000 years, you could still lose it next year in a trademark dispute.
And anyway, why should only companies that can afford to spend $5000 on DNS be able to create globally-unique names? tag: URIs let anyone do that now.
"What's in a name? That which we call a rose By any other word would smell as sweet" —William Shakespeare
Yeah but I sure had a frustating time when I called my florist to order a dozen gets.
NWalsh >>> so I get to make up names that begin “http://norman.walsh.name/” and you don't"
MBaker >> http://norman.walsh.name/foo-foo-magoo
NWalsh > Yes, Mark. I know. There aren't any technical solutions to that problem.
That's because no one can give exclusive rights to the creation of particular names. This would be like saying that only certain people can add new words to the English language. Impossible.
The only thing that DNS allows is for someone to have control over what the names represent (when we use DNS and supporting technologies). Of course, it is more or less futile to create names using the HTTP scheme which contains a domain name that you don't control the resources of. And that's why people (wrongly) think they control the names themselves.
This debate has been going around in circles for several years.
I agree URI's are very good names and can have all the useful properties of other schemes if we make it that way.
However, as the examples include the ISBN, it is necessary to point out that a single ISBN identifies many physical objects and usually one logical object. In other words it is name that can be re-used in many different contexts and as such probably appears within many URIs - Amazon etc., library OPACs, metadata harvest URIs and so on.
Is the solution to recognise there are two different purposes: a unique name for a singular thing such as your personal object, an address or a location, and a unique name for a conceptual thing that can be used unambiguously in the name for a single thing ?. Use a URI in the first instance, use some other scheme such as URN in the second and if necessary embed the URN identifier in the URI. This may mean that URIs cease to be totally opaque but it provides redundancy to support persistence.
Bill
Re: the book discussion, the OCLC has just opened up its new web catalog, complete with nice URIs like http:///www.worldcat.org/isbn/... and http:///www.worldcat.org/oclc/..... Their numbers may well be better ids than isbns it seems to me.
I may be missing something here, but surely the key point is that any permanent naming standard should use the DNS system for resolution of hosts rather than some other, arbitrary registry?
ie. Is ftp://my.server.com a worse choice of URI than http://my.server.com? What about xmpp:john.smith@my.server.com/laptop?
The first two are widely implemented and the third is an open standard. Other factors, such as a lack of caching may not be a problem or even desirable in certain instances.
In all cases, the applications need to be aware of the meaning of the URI, but as long as the DNS server routes them to an appropriate server to put in a request, does that matter?
I suppose what I am asking is, is "http URI" just shorthand for "any URI that uses DNS for resolution"?
If an user requires that a URI must be persistent, and return the same object on retrieval then that user is quite free to implement their own canonical cache and/or a mechanism like an XML catalogue to access it.
Real objects are not persistent. People die, move home, buy a new car etc. Why should network objects (sorry, names) be any different? Just because a computer is involved many expect to be presented with an idealised reality. Computers and programmers are not that omnipotent.
URNs tried to address this issue but what you end up with, after N-levels of redirection is a name beginning with http://...!
I suspect new namespace proponents are the same crowd lobbying for patents on software, and for the same reason - influence.