Internet-Draft | The "doi" URI Scheme | August 2024 |
Lemieux | Expires 7 February 2025 | [Page] |
This document specifies the "doi"
URI scheme.¶
This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.¶
Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at https://datatracker.ietf.org/drafts/current/.¶
Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."¶
This Internet-Draft will expire on 7 February 2025.¶
Copyright (c) 2024 IETF Trust and the persons identified as the document authors. All rights reserved.¶
This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document.¶
A DOI name is a global unique identifier of a referent, which can be any digital, physical or abstract entity, including inventions, literary and artistic works, ideas, symbols, names, images, designs, etc. DOI names are, for example, widely used to identify academic publications. The DOI system is specified in [iso26324] and [doi-handbook], with the former offering regular formal snapshot of the latter.¶
EXAMPLE 1: The DOI name "10.1103/PhysRevLett.59.381" refers to the article Per Bak, Chao Tang, and Kurt Wiesenfeld, "Self-organized criticality: An explanation of the 1/f noise", Phys. Rev. Lett. 59, 381.¶
A DOI name is persistent over time. This persistence is provided by the independence of the DOI name from the referent itself and its descriptive elements. These descriptive elements of a referent, including location and ownership, can change over time, and their current values are retrieved by resolving the DOI name. The set of elements retrieved by resolving a DOI name is called the DOI record. The DOI name resolution process uses the Handle System specified at [RFC3650], [RFC3651] and [RFC3652], as updated by [DOI-RP].¶
This document specifies a URI scheme for DOI names. This scheme conforms to the syntax specified at [RFC3986] and formalizes the notation "doi:<DOI name>", which is in widespread use. When derefenced as detailed in Section 4, the URI corresponding to a DOI name yields the DOI record associated with the name.¶
EXAMPLE 2: "doi:10.1103/PhysRevLett.59.381" is the URI corresponding to the DOI name above.¶
This document intended to satisfy the guidelines and registration procedures specified at [RFC7595].¶
As specified at [iso26324], a DOI name consists of an ordered sequence of Unicode code points of the Graphic type.¶
A DOI Name URI is a URI that corresponds to a given DOI name. As
defined at [RFC7595], its scheme name SHALL be "doi" and
its scheme-specific-part
SHALL be equal to the result of the
following ordered sequence of steps:¶
unreserved
nor equal to "/"
.¶
A DOI Name URI shall contain neither a query component nor a fragment component.¶
EXAMPLE 1: The DOI name "10.5594/SMPTE.ST2067-21.2020" corresponds to
the URI <doi:10.5594/SMPTE.ST2067-21.2020>
.¶
EXAMPLE 2: The DOI name "10.26321/Á.GUTIÉRREZ.ZARZA.02.2018.03" with
the code point sequence <U+0031, U+0030, U+002E, U+0032, U+0036, U+0033,
U+0032, U+0031, U+002F, U+00C1, U+002E, U+0047, U+0055, U+0054, U+0049,
U+00C9, U+0052, U+0052, U+0045, U+005A, U+002E, U+005A, U+0041, U+0052,
U+005A, U+0041, U+002E, U+0030, U+0032, U+002E, U+0032, U+0030, U+0031,
U+0038, U+002E, U+0030, U+0033> corresponds to the URI
<doi:10.26321/%C3%81.GUTI%C3%89RREZ.ZARZA.02.2018.03>
.¶
NOTE 1: The sequence of code points comprising a DOI name is not normalized and equivalence between DOI names is based on code points. For example, two DOI names that differ only in the abstract character "Á" being encoded as <U+00C1> in the first and as <U+0041, U+0301> in the second are not identical.¶
NOTE 2: Presenting a DOI name by rendering its sequence of code points to glyphs can be ambiguous since multiple code points or sequences of code points can result in the same glyphs. For example, U+002D HYPHEN-MINUS, U+2212 MINUS SIGN and U+2013 EN DASH are rendered as similar glyphs. As another example, the abstract character "á" can be represented by either the code point U+00E1 or the sequence of code points <U+0061, U+0301>. Presenting a DOI name in its URI form resolves this ambiguity.¶
The following procedure SHALL be performed to determine whether two DOI Name URIs are equivalent:¶
scheme-specific-part
of each of the two URIs is
percent-decoded into a UTF-8 String;¶
NOTE: When testing for equivalence, DOI names are case-insensitive only with respect to the Basic Latin Unicode block.¶
Resolving a DOI name means retrieving its DOI record, which contains the descriptive elements associated with the referent identified by the DOI name.¶
A DOI name URI can be used to resolve its corresponding DOI name by performing an HTTP GET request at the following URL (expressed using ABNF syntax as defined at [RFC5234]):¶
"https://doi.org/api/handles/" scheme-specific-part
¶
where scheme-specific-part
is the scheme-specific-part of the
DOI name URI, as defined at Section 2, and the
"https" scheme is specified at [RFC9110].¶
The body of the response is a JSON object, as defined at [RFC8259], that contains the following members:¶
The property is a Number. The following values are defined:¶
200
.¶
500
.¶
404
.¶
200
.¶
Figure 1 illustrates the DOI record, at the time of this writing, for the DOI name corresponding to the URI <doi:10.1000/182>. The DOI record was retrieved by performing an HTTP GET request to <https://doi.org/api/handles/10.1000/182>.¶
While Section 4 specifies the procedure for retrieving the DOI record associated with DOI name, the steps necessary to retrieve the actual referent described by the record depends on the nature of the referent, e.g., a referent can be a physical object.¶
Some, but not all, referents can be retrieved by dereferencing an HTTP/HTTPS URI found in their respective DOI records, as illustrated in Figure 1 where the referent identified by the DOI name "10.1000/182" can be retrieved at "http://www.doi.org/hb.html".¶
The single DOI resolution and multiple doi resolution functions at [doi-handbook] specify the process of retrieving a referent that is available by dereferencing an HTTP/HTTPS URI.¶
A DOI name is an opaque string, which does not have a discernible meaning on its own and is for use by humans and machines alike. It consists of a sequence of Unicode codepoints and the security considerations at [UNICODE-TR36] apply. In particular, and as noted at Section 2, presenting a DOI name by rendering its sequence of code points to glyphs can be ambiguous. As a result, two DOI names rendering to the same sequence of glyphs can identify referents, including, for example, two software executables with wildly different side-effects. Presenting a DOI name in its URI form, which consists of a limited subset of characters, can lessen this risk.¶
The DOI name resolution process is conducted using the Hypertext Transfer Protocol Secure, which ensures condifentiality and integrity of the transaction, and he security considerations at [RFC9110] apply.¶
The results of the DOI name resolution process is a JSON object and the security considerations at [RFC8259] apply.¶
The following is the permanent URI Scheme Registration request, as defined in [RFC7595]:¶