space
Home > Factsheets > DOI System® and Internet Identifier Specifications
space
Factsheet
DOI® System and Internet Identifier Specifications
Version 1.1
 
[ View/Print a PDF Version of this document.]
 
A standard represents an agreement by a community to do things in a specified way to address a common problem. Whilst the DOI System community has developed the system, it has also ensured conformance with relevant generic external formal standards. This note discusses those relevant in the Internet communities IETF and W3C. There is currently considerable debate here on the issue of generic standards for naming objects. The DOI System is capable of being used in any specification which may finally be endorsed. Until a clear consensus is reached in the Internet communities on which approach is to be preferred the DOI System remains agnostic as to formal registration as a generic scheme, but useable and widely implemented for millions of objects.
Generic identifier standards
Persistent and actionable object names are required for coherence in the digital realm.
  • "Persistent and actionable object names" thus necessarily require mechanisms for persistence (provided by social infrastructure); actionability (resolution from a name to some service); specification of an object (either through simple referencing or more formal description); and naming syntax (prescriptive rules for assigning identifiers in a standard format and ensuring uniqueness).
  • The DOI System uses as its naming syntax the NISO standard Z39.84 "Syntax for the DOI". The DOI System uses for its name resolution the Handle System® (IETF RFCs 3650, 3651, 3652). The DOI System uses for its optional object specification a DOI® data model including the indecs Data Dictionary and its subset the ISO MPEG 21 Rights Data Dictionary, ISO/IEC 21000-6. (The data dictionary component is designed to maximise semantic interoperability with existing metadata element sets; the data model allows descriptions to be grouped in meaningful ways so that certain types of DOI® names all behave the same way in an application). DOI name persistence is guaranteed through the IDF social infrastructure which provides rules for registration, formal resilience procedures in the event of any single agency failing, etc.
  • The DOI System conforms to the functional requirements of the two generic approaches for naming first-class objects on the Internet: the Uniform Resource Name (URN) and the Uniform Resource Identifier (URI). URI and URN specifications deal only with syntax and (in part) associated implementation through resolution, not with description or persistence policy. Broadly, the URN approach is favoured by IETF and the URI approach by W3C, though there is considerable ongoing debate about each; some documentation on these specifications is incomplete. Crucially, widespread practical implementations of these specifications as object naming do not exist: both URI and URN are specifications, not in themselves working implementations. The DOI System is de facto a practical implementation of URI and URN.
  • The DOI System can also be implemented using current URL (http) specifications. The system is also a defined Digital Item Identifier within the ISO MPEG 21 multimedia framework specification.
URI implementation
The Uniform Resource Identifier (URI) specification is IETF RFC 2396, URI Generic Syntax, currently under revision as RFC 2396 bis. URIs formally encompass URNs as a sub set. In practice, the URI specification defines (1) an implementation more often called the Uniform Resource Locator, a location on a file server, commonly accessed using the http protocol though other protocols are allowed; (2) a syntax for referencing in XML, through which e.g. ISBNs can be specified as URIs. This provides a single framework which can accommodate any other identifier for referencing, but it is not as such persistent (since persistence is not determined by the specification but by the practical implementation). Conflating these two causes confusion. URLs, as currently understood, are demonstrably not persistent; redefining them as URIs doesn't fix that.
  • URL implementation. Users may resolve DOI names using the URL syntax through the DOI System proxy server (http://dx.doi.org). A DOI name of the form doi:10.123/456 would be resolved from the address: "http://dx.doi.org/10.123/456". Any standard browser encountering a DOI name in this form will be able to resolve it. The use of the DOI proxy server does not interfere with any http requirements, so DOI names may be used with other http-based mechanisms such as OpenURL, PURL, parameter passing, etc. The proxy server is maintained by the IDF and the DOI System community for use by all. However performance and functionality can be improved beyond the level achieved through http by using the optional native resolver protocol (Handle System) not requiring the use of http. DOI® resolution in native resolver form does not require the use of the DNS (Domain Name System), though does of course when used with the proxy http resolver.
  • URI syntax implementation. In the URI specification, the network path of the URI is implicitly DNS based; there are no real provisions to include systems that are not DNS based. Original URI specifications, and good design practice, assume the URI to be opaque (that is, it is not assumed that software can parse the body of the URI but that it would simply recognize the name of the scheme and hand it off to some other software that understood the scheme). The current URI specification, however, assumes that the initial URI parser will look into every URI, no matter what the scheme, looking for certain meaningful characters such as dot and slash. This version of the URI proposed in RFC 2396 bis is so restrictive that it is difficult to see what system could make use of it.
  • A specification for the DOI System as a URI exists as an Internet Draft: this document defines the 'doi' Uniform Resource Identifier (URI) scheme for DOI names, which allows a DOI name to be referenced by a URI for Internet applications. The current revision of the URI specification, plus ongoing debate within the IETF and W3C communities on several proposed URI specifications, have delayed the processing of this Draft. DOI System implementation does not depend on implementation of this specification.
URN implementation
The URN (Uniform Resource Name) specification is RFC 2141 URN Syntax. In practice, the URN specification defines (1) a formal registration process as a urn namespace, e.g., urn:doi:10.1000/1 and (2) accompanying specifications to implement a series of functional requirements for such namespaces.
  • Namespace referencing. One may specify any existing identifier as a URN: e.g urn:isbn:123456789, but this has no advantage over the simpler isbn:12345678. Such identifiers may be implemented using a specially written URN plug-in and resolved to URLs: functionally this gives nothing beyond the functionality achieved by coherent management of the corresponding URLs.
  • URN implementation. In order to implement the functional requirements, the URN architecture assumes an additional network service: a DNS-based Resolution Discovery Service (RDS) to allow a client to deal with a previously unknown URN type by finding the specific service appropriate to the given URN scheme. URN resolutions are then delegated to that scheme-specific resolution service. However no such deployed RDS schemes currently exist: browsers cannot action URN strings without some additional programming in the form of a "plug-in". The lack of any wide-spread infrastructural support will require any URN implementation to develop their own resolution mechanisms, such as plug-ins or proxy servers. Resolution mechanisms which require functionality beyond 1 URN to 1 URL also require the creation of data models. Several such implementations have been developed for specific uses where deployment to a closed group of users may be achieved; these carry no guarantee of ready interoperability with other deployments, which may require a different plug in for each implementation and may use conflicting data approaches.
  • DOI names do not require a plug-in but offer this as an option. The CNRI Handle System web browser plug-in (native resolver, available in binary) delivers URN functional requirements in Windows-based browsers. URN plug-ins and Handle System plug-ins share the problem of any new functionality of deploying the software to users. The Handle System has significant advantages: (1) it is a global supported resolution service; (2) the plug-in is freely available, widely tested and proven across multiple applications; (3) unlike URN plug-ins, it is part of a suite of freely available and managed supporting software configured to provide coherent server-side support, including the Local Handle Service and Handle System Client Libraries. These are available across platforms and with several added security features such as trusted resolution and distributed administration. DOI System functionality can therefore be delivered through http, browser plug in, or incorporation of HANDLE.NET ® software in a dedicated application.
  • The DOI System is not registered as a formal URN, despite fulfilling all the functional requirements, since URN registration appears to offer no advantage to the DOI System. It requires an additional layer of administration for defining a DOI name as a URN namespace (the string urn:doi:10.1000/1 rather than the simpler doi:10.1000/1) and an additional step of unnecessary redirection to access the resolution service, already achieved through either http proxy or native resolution. If RDS mechanisms supporting URN specifications become widely available, the DOI System will be registered as a URN.
DOI System functional requirements
The DOI System is designed to fulfil several additional functional requirements which we believe offer significant advantages in generic naming, notably:
  • Neutral as to implementation. The DOI System allows but does not require http or other protocols. The design principle is that DOI names are not specific to the web or any other implementation (e.g. information may be delivered in non-web platforms such as PDAs). The DOI System is designed to be applicable in any environment on the Internet (the global information system linked by a globally unique address space based on the Internet Protocol (IP) using the Transmission Control Protocol/Internet Protocol (TCP/IP) suite).
  • Granularity of naming and administration at the object level. Allows but does not mandate coarser level granularity tools such as domain names. Specifically, DOI resolution in native resolver form does not require the use of the DNS (Domain Name System): the DNS administrative model argues against using it as a general-purpose name system and has well-recognised problems of security and updating.
  • Neutral as to language/character set. Compatible with, but not restricted to, the ascii character set. DOI names can use the Unicode capability of the Handle system to develop DOI names in Japanese, Chinese, etc characters. The current DOI name syntax restricts initial implementations to ascii simply for ease of adoption, but is intended to be widened (backward compatibly) to Unicode at the next revision.
 
[ View/Print a PDF Version of this document.]
 
Updated 21 September 2006

DOI® and DOI.ORG® are registered trademarks and the "doi>" logo is a trademark of the International DOI Foundation.