Internet-Draft IDNA: Registry Restrictions October 2024
Klensin & Freytag Expires 7 April 2025 [Page]
Workgroup:
Network Working Group
Internet-Draft:
draft-klensin-idna-rfc5891bis-07
Updates:
5890, 5891, 5894 (if approved)
Published:
Intended Status:
Standards Track
Expires:
Authors:
J.C. Klensin
A. Freytag
ASMUS, Inc.

Internationalized Domain Names in Applications (IDNA): Registry Restrictions and Recommendations

Abstract

The IDNA specifications for internationalized domain names combine rules that determine the labels that are allowed in the DNS without violating the protocol itself and an assignment of responsibility, consistent with earlier specifications, for determining the labels that are allowed in particular zones. Conformance to IDNA by registries and other implementations requires both parts. Experience strongly suggests that the language describing those responsibilities was insufficiently clear to promote safe and interoperable use of the specifications and that more details and discussion of circumstances would have been helpful. Without making any substantive changes to IDNA, this specification updates two of the core IDNA documents (RFCs 5890 and 5891) and the IDNA explanatory document (RFC 5894) to provide that guidance and to correct some technical errors in the descriptions.

Status of This Memo

This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.

Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at https://datatracker.ietf.org/drafts/current/.

Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."

This Internet-Draft will expire on 7 April 2025.

Table of Contents

1. Introduction

Parts of the specifications for Internationalized Domain Names in Applications (IDNA) [RFC5890] [RFC5891] [RFC5894] (collectively known, along with RFC 5892 [RFC5892], RFC 5893 [RFC5893] and updates to them, as "IDNA2008" or just "IDNA") impose a requirement that domain name system (DNS) registries restrict the characters they allow in domain name labels (see Section 2 below), and the contents and structure of those labels. That requirement and restriction are consistent with the "duty to serve the community" described in the original specification for DNS naming and authority [RFC1591]. The restrictions are intended to protect against security problems and confusion about and between names by limiting the permitted characters and strings to those for which the registries or their advisers have a thorough understanding and for which they are willing to take responsibility.

That provision is centrally important because it recognized that historical relationships and variations among scripts and writing systems, the continuing evolution of those systems, differences in the uses of characters among languages (and locations) that use the same script, and so on make it impossible to generate a guideline consisting of a single list of characters and/or simple rules that would provide a completely adequate guideline with the character of "if we use these rules, we will be safe from confusion and attacks".

The algorithm and rules of RFCs 5891 and 5892 eliminate many of the most dangerous and otherwise problematic cases, but they cannot eliminate the need for registries and registrars to take additional steps to mitigate security risks and confusion by suitably restricting the repertoire and structure of labels they permit. This, in turn, requires that they or their advisers have a thorough understanding of the issues associated with for a given set of characters or writing system, that they understand what they are doing and that they take responsibility for the decisions that are made.

The way in which the IDNA2008 specifications expressed these requirements may have underemphasized the intention that they actually be treated as requirements. Section 2.3.2.3 of the Definitions document [RFC5890] mentions the need for the restrictions, indicates that they are mandatory, and points the reader to section 4.3 of the Protocol document [RFC5891]. That document, in turn, points to Section 3.2 of the Rationale document [RFC5894], with each document providing further detail, discussion, and clarification.

At the same time, the Internet has evolved significantly since the management assumptions for the DNS were established with RFC 1591 and earlier. In particular, the management and use of domain names have gone through several transformations. Recounting of those changes is beyond the scope of this document but one of them has had significant practical impact on the degree to which the requirement for registry knowledge and responsibility is observed in practice. When RFC 1591 was written, the assumption was that domains at all levels of the DNS would be operated in the best interest of the registrants in the domain and of the Internet as a whole. There were no notions about domains being operated as a profitable service, much less with a business model that made them more profitable the more names that could be registered (or even, under some circumstances, reserved and not registered). At the time RFC 1591 was written, there was also no notion that domains would be considered more successful based on the number of names registered and delegated from them. While rarely reflected in the DNS protocols, the distinction between domains operated primarily as a revenue source for the organizations operating the registry and ones that are operated for, e.g., use within an enterprise or otherwise as a service, have become very important today. See Section 4 for a discussion on how those issues affect this specification.

This specification is intended to unify and clarify these requirements for registry decisions and responsibility and to emphasize the importance of registry restrictions at all levels of the DNS. It also makes a specific recommendation for character repertoire subsetting that is intermediate between the code points allowed by RFCs 5891 and 5892 and those allowed by individual registries. It does not alter the basic IDNA2008 protocols and rules themselves in any way.

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119 [RFC2119] and RFC 8174 [RFC8174].

2. Registry Restrictions in IDNA2008

As mentioned above, IDNA2008 specifies that the registries for each zone in the DNS that supports IDN labels are required to develop and apply their own rules to restrict the allowable labels, including limiting characters they allow to be used in labels in that zone. The chosen list MUST be a subset of the collection of code points specified as "PVALID", "CONTEXTJ", and "CONTEXTO" by the rules established by the protocols themselves. Labels containing any characters from the two CONTEXT categories or any characters that are normally part of a script written right to left [RFC5893] require that additional rules, specified in the protocols and known as "contextual rules" and "bidi rules", be applied. The entire collection of rules and restrictions required by the IDNA2008 protocols themselves are known as "protocol restrictions".

Registries may apply (and generally are required to apply) additional rules to further restrict the list of permitted code points, contextual rules (perhaps applied to normally PVALID code points) that apply additional restrictions, and/or restrictions on labels as distinct from code points. In particular, safe and secure use of any of a large number of widely-used scripts or writing systems require the addition of contextual rules that go beyond the very limited restrictions implemented in by "CONTEXTJ" and "CONTEXTO" at the protocol level, but which are otherwise functionally equivalent in that they constitute limitations on where allowable code points may be placed in a label.

In contrast, protocol restrictions are by necessity somewhat generic, having to cater both to the union of the needs for all zones and to the fact that some zones are naturally more permissive than others. That must be done without negative impact on the DNS.

Other restrictions that are necessary, perhaps obviously so, include provisions for restricting suggested new registrations based on conflicts with labels already registered in the zone, so as to avoid homograph attacks [Gabrilovich2002] and other issues. The specifications of what constitutes such conflicts, as well as the definition of "conflict" based on the properties of the labels in question, is the responsibility of each registry. They further include prohibitions on code points and labels that are not consistent with the intended function of the zone, the subtree in which the zone is embedded (see Section 3), and consequent differences in the stringency of security-related measures.

These per-registry (or per-zone) rules are commonly known as "registry restrictions" to distinguish them from the protocol restrictions described above. Such registry restrictions are essential to provide for the necessary security in the face of the tremendous variations and differences in writing systems and their ongoing evolution and development, as well as the human ability to recognize and distinguish characters in different scripts around the world and under different circumstances.

3. Progressive Subsets of Allowed Characters

The algorithm and rules of RFCs 5891 and 5892 determine the set of code points that are possible for inclusion in domain name labels; registries MUST NOT permit code points in labels unless they are part of that set. In addition, labels MUST NOT contain code points in positions where they violate the "CONTEXTJ" or "CONTEXTO" rules or other restrictions defined in the protocol. Labels that contain code points that are normally written from right to left MUST also conform to the requirements of RFC 5893. Each registry that intends to allow IDN registrations MUST then determine the strict subset of that set of code points that will be allowed by that registry. It SHOULD also consider additional rules, including contextual and whole label restrictions that provide further protection for registrants and users. For example, the widely-used principle that bars labels containing characters from more than one script is not an IDNA2008 requirement. It has been adopted by many registries but there may be circumstances in which that limitation is not required or not appropriate.

In formulating their own rules, registries should normally consult carefully developed consensus recommendations about maximum repertoires to be used for each script. The important example for the root zone is the ICANN Maximal Starting Repertoire 5 (MSR-5) for the Development of Label Generation Rules for the Root Zone [ICANN-MSR5] (or its successor documents). The RFC Series includes specific recommendations about particular scripts or languages, including RFCs for Cyrillic [RFC5992], Arabic Language [RFC5564]. Additional recommendations for script-based repertoires based on the approved ICANN Root Zone Label Generation Rules (LGR-5) [ICANN-RZLGR-5] (or its successor documents) are discussed in Section 6 below. Many of these recommendations, most of which are focused on a repertoire of characters in actual wide-spread common everyday use, also cover rules about relationships among code points that may be particularly important for complex scripts. They also interact with recommendations about how labels that appear to be the same should be handled.

It is the responsibility of the registry to determine which, if any, of those recommendations are applicable and to further subset or extend them as needed. For example, several of the recommendations are designed for the root zone and therefore exclude digits and U+002D HYPHEN-MINUS; this restriction is not generally appropriate for other zones. On the other hand, some zones may be designed to not cater for all users of a given script, but perhaps only for the needs of selected languages, in which case a more selective repertoire may be appropriate.

In making these determinations, a registry SHOULD follow the IAB guidance in RFC 6912 [RFC6912]. Those guidelines include a number of principles for use in making decisions about allowable code points. In addition, that document notes that the closer a particular zone is to the root, the more restrictive the space of permitted labels should be. RFC 5894 provides some suggestions for any registry that may decide to reduce opportunities for confusion or attacks by constructing policies that disallow characters used in historic writing systems (whether these be archaic scripts or extensions of modern scripts for historic or obsolete orthographies) or characters whose use is restricted to specialized, or highly technical contexts. These suggestions were among the principles guiding the design of ICANN's Maximal Starting Repertoires (MSR) [LGR-Procedure]. ICANN has continued that work into development of a set of suggested prototype Label Generation Rules (LGRs) for the second level (and, presumably, for consideration for zones at additional levels). That work has not been reviewed by the IETF and is not part of the set of IDNA Standards that this document updates. The ICANN work in this area is ongoing and it, and the context and methods involved, are described in a separate document [LGR-forward-reference].

A registry decision to allow only those code points in the full repertoire of the MSR (plus digits and hyphen) would already avoid a number of issues inherent in a more permissive policy such as "use anything permitted by IDNA2008", while still supporting the native languages and scripts for the vast majority of users today. However, it is unlikely, by itself, to fully satisfy the mandate set out above for three reasons.

  1. The MSR, like the set of code points permissible under IDNA2008 itself, was conceived merely as a boundary condition on permissible letter code points (it excludes digits and the hyphen). It was always intended to be used as a starting point for setting registry policy for the second level and beyond, with the expectation that some of the code points in the MSR would not be included in final registry policies, whether for lack of actual usage, or for being inherently problematic.
  2. It was recognized that many scripts require contextual rules for many more code points than are covered by CONTEXTO or CONTEXTJ rules defined in IDNA2008. This is particularly true for combining marks that are typically used to encode diacritics, tone marks, vowel signs and the like. While, theoretically, any combining mark may occur in any context in Unicode, in practice rendering and other software that users rely on in viewing or entering labels will not support arbitrary combining sequences, or indeed arbitrary combinations of code points, in the case of complex scripts.

    Contextual rules are needed in order to limit allowable code point sequences to those that can be expected to be rendered reliably. Identifying those requires knowledge about the way code points are used in a script, whence the mandate for registries to only support code points they understand. In this, some of the other recommendations, such as the Informational RFCs for specific scripts (e.g., Cyrillic [RFC5992]) or languages (e.g., Arabic [RFC5564] or Chinese [RFC4713]), or the Root Zone LGRs and other LGRs developed by ICANN, may provide useful guidance.

  3. Third, because of the widely accepted practice of limiting any given label to a single script, a universal repertoire, such as the MSR, would have to be divided on a per-script basis into subrepertoires to make it useful, with some of those repertoires overlapping, for example, in the case of East Asian shared usage of the Han ideographs.

Registries choosing to make exceptions -- allow code points that the recommendations mentioned above and/or discussed in the descriptions of the ICANN efforts [LGR-forward-reference] -- should make such decisions only with great care and only if they have considerable understanding of, and great confidence in, their appropriateness. The obvious exception from the MSR would be to allow digits and the hyphen. Neither were allowed by the MSR, but only because they are not allowed in the Root Zone.

Nothing in this document permits a registry to allow code points or labels that are disallowed or otherwise prohibited by IDNA2008. Conversely, nothing in this document should be construed as changing what is permissible under IDNA 2008.

4. Considerations for Domains Operated Primarily for the Benefit of the Registry Owner, Operator, or a Related Organization

As discussed in the Introduction (Section 1), the distributed administrative structure of the DNS today can be described by dividing zones into two categories depending on how they are administered and for whom. These categories are not precise -- some zones may not fall neatly into one category or the other -- but are useful in understanding the practical applicability of this specification. They are:

Rules requiring strict registry responsibility, including either thorough understanding of scripts and related issues in domain name labels being considered for registration or local naming rules that have the same effect, typically come naturally to registries for zones of the first type. Registration of labels that would prove problematic for any reason hurts the relevant organization or enterprise or its customers or users within the relevant country and more broadly. More generally, there are strong incentives to be extremely conservative about labels that might be registered and few, if any, incentives favoring adventures into labels that might be considered clever, much less ones that are hard to type, render, or, where it is relevant to users, remember correctly.

By contrast, in a zone in which the revenues are derived exclusively, or almost exclusively, from selling or reserving (including "blocking") as many names as possible, there may be perceived incentives to register whatever names would-be registrants "want" or fears that any restrictions will cut into the available namespace. In such situations, restrictions are unlikely to be applied unless they meet at least one of two criteria: (i) they are easy to apply and can be applied algorithmically or otherwise automatically and/or (ii) there is clear evidence that the particular label would cause harm.

As suggested above, the two categories above are not precise. In particular, there may be domains that, despite being set up to operate to produce revenue above actual costs, are sufficiently conservative about their operations to more closely resemble the first group in practice than the second one.

The requirement of IDNA that is discussed at length elsewhere in this specification stands: IDNA (and IDNs generally) would work better and Internet users would be better protected and more secure if registries and registrars (of any type) confined their registrations to scripts and code point sequences that they understood thoroughly. While the IETF rarely gives advice to those who choose to violate IETF Standards, some advice to zones in the second category above may be in order. That advice is that significant conservatism in what is allowed to be registered, even for reservation purposes, and even more conservatism about what labels are actually entered into zones and delegated, is the best option for the Internet and its users. If practical considerations do not allow that much conservatism, then it is desirable to consult and utilize the many lists and tables that have been, and continue to be, developed to advise on what might be sensible for particular scripts and languages. Some of those lists, tables, and recommendations are described in Section 6 below.

5. Other corrections and updates

After the initial IDNA2008 documents were published (and RFC 5892 was updated for Unicode 6.0 by RFC 6452 [RFC6452] and for Unicode 12.0 RFC 9233 [RFC9233]) several errors or instances of confusing text were noted. For the convenience of the community, the relevant corrections for RFCs 5890 and 5891 are noted below and update the corresponding documents. There are no errata for RFC 5893 or 5894 as of the date this document was published. Because further updates to RFC 5892 would require addressing other pending issues, the outstanding erratum for that document is not considered here. For consistency with the original documents, references to Unicode 5.0 are preserved in this document.

5.1. Updates to RFC 5890

All but one of the outstanding errata against RFC 5890 (Errata IDs 4695, 4696, 4823, 4824, and 5484 [RFC-Editor-5890Errata]) are associated with the same issue, the number of Unicode characters that can be associated with a maximum-length (63 octet) A-label. In retrospect and contrary to some of the suggestions in the errata, that value should not be expressed in octets because RFC 5890 and the other IDNA 2008 documents are otherwise careful to not specify Unicode encoding forms but, instead, work exclusively with Unicode code points. Consequently, the relevant material in RFC 5890 should be corrected as follows:

Section 2.3.2.1
Old:
expansion of the A-label form to a U-label may produce strings that are much longer than the normal 63 octet DNS limit (potentially up to 252 characters).
New:
expansion of the A-label form to a U-label may produce strings that are much longer than the normal 63 octet DNS limit (See Section 4.2).
Comment:
If the length limit is going to be a source of confusion or careful calculations, it should appear in only one place.
Section 4.2
Old:
Because A-labels (the form actually used in the DNS) are potentially much more compressed than UTF-8 (and UTF-8 is, in general, more compressed that UTF-16 or UTF-32), U-labels that obey all of the relevant symmetry (and other) constraints of these documents may be quite a bit longer, potentially up to 252 characters (Unicode code points).
New:
A-labels (the form actually used in the DNS) and the Punycode algorithm used as part of the process to produce them [RFC3492] are strings that are potentially much more compressed than any standard Unicode Encoding Form. A 63 octet A-label cannot represent more than 58 Unicode code points (four octet overhead and the requirement that at least one character lie outside the ASCII range) but implementations allocating buffer space for the conversion should allow significantly more space (i.e., extra octets) depending on the encoding form they are using.

Errata ID 7291 identifies RFC 5890 as updating RFC 4343. The RFC Editor's metadata has been updated to make that correction. Readers of RFC 5890 should note the correction and any replacement for RFC 5890 should address it as appropriate.

5.2. Updates to RFC 5891

There is only one outstanding erratum for RFC 5891, Errata ID 3969 [RFC5891Erratum] on improving the reference for combining marks. Combining marks are explained in the cited section, but not, as the text indicates, exactly defined.

Old:
The Unicode string MUST NOT begin with a combining mark or combining character (see The Unicode Standard, Section 2.11 [Unicode] for an exact definition) .
New:
The Unicode string MUST NOT begin with a combining mark or combining character (see The Unicode Standard, Section 2.11 [UnicodeA] for an explanation and Section 3.6, definition D52 [UnicodeB]) for an exact definition).
Comment:
When RFC 5891 is actually updated, the references in the text should be updated to the current version of Unicode and the section numbers checked.

This document is one of a series of measures that have been suggested to address IDNA issues raised in other documents and discussions but the only one that actually updates the IDNA Standard. Those other discussions and associated documents include suggested mechanisms for dealing with combining sequences and single-code point characters with the same appearance, including ones that Unicode normalization neither combines nor decomposed as IDNA2008 assumed. That topic was discussed further in an Internet-Draft that was never completed and published [IDNA-Unicode] and in the IAB response to that issue [IAB-2015].

RFC 8753 [RFC8753] discusses some of these issues and updates RFC 5892 to clarify and improve the review process in ways that should improve the issues discussed in Section 3. Even if applied carefully, it will not fundamentally change those issues: it is impossible for those reviews to catch all possible problematic code points. RFC 9233 [RFC9233] reflects a partial implementation of RFC 8753.

Those and other documents also discuss issues with IDNA and character graphemes for which abstractions exist in Unicode in precomposed form but that can be generated from combining sequences. Another approach has been to create a registry of code points known to be problematic [Freytag-troublesome], but that work was never carried forward either. In combination, the various discussions of combining sequences and non-decomposing characters may lay the foundation for an actual update to the IDNA code points document [RFC5892]. Such an update would presumably also address the existing errata against that document discussed at the end of Section 5.1.

Perhaps the most important contemporary efforts are ones coordinated by ICANN to develop rules for specific scripts and writing systems. They including the twin efforts of creating per-script Root Zone Label Generation Rules [ICANN-RZLGR-5] and Second Level Reference Label Generation Rules [SL-REF-LGR] (the latter of which may be per language). They also include other lists of code points or code point relationships that may be particularly problematic and that should be treated with extra caution or prohibited entirely. An overview of that work is being assembled [LGR-forward-reference].

At a much higher level, discussions are ongoing to consider issues, demands, and proposals for new uses of the DNS.

7. Security Considerations

As discussed in IAB recommendations about internationalized domain names [RFC4690], [RFC6912], and elsewhere, poor choices of strings for DNS labels can lead to opportunities for attacks, user confusion, and other issues less directly related to security. This document clarifies the importance of registries carefully establishing design policies for the labels they will allow and that having such policies and taking responsibility for them is a requirement, not an option. If that clarification is useful in practice, the result should be an improvement in security.

8. Acknowledgments

Many thanks to Patrik Faltstrom who provided an important review on the initial version, to Jaap Akkerhuis, Don Eastlake, Barry Leiba, John Levine, and Alessandro Vesely who did reviews that improved the text and to Pete Resnick who acted as document shepherd and did an additional careful review of the 2020 version of this specification.

Thanks also to Murray Kucherawy and Orie Steele who managed to get it moving again in 2024 after an extended delay after the initial IETF Last Call was completed in August 2019 without problems being identified by the community.

9. IANA Considerations

RFC Editor: Please remove this section before publication.

This memo includes no requests to or actions for IANA. In particular, it does not contain any provisions that would alter any IDNA-related registries or tables.

10. References

10.1. Normative References

[RFC1591]
Postel, J., "Domain Name System Structure and Delegation", RFC 1591, DOI 10.17487/RFC1591, , <https://www.rfc-editor.org/info/rfc1591>.
[RFC2119]
Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, DOI 10.17487/RFC2119, , <https://www.rfc-editor.org/info/rfc2119>.
[RFC5890]
Klensin, J., "Internationalized Domain Names for Applications (IDNA): Definitions and Document Framework", RFC 5890, DOI 10.17487/RFC5890, , <https://www.rfc-editor.org/info/rfc5890>.
[RFC5891]
Klensin, J., "Internationalized Domain Names in Applications (IDNA): Protocol", RFC 5891, DOI 10.17487/RFC5891, , <https://www.rfc-editor.org/info/rfc5891>.
[RFC5891Erratum]
"RFC 5891, "Internationalized Domain Names in Applications (IDNA): Protocol"", Errata ID 3969, , <http://www.rfc-editor.org/errata_search.php?rfc=5891>.
[RFC5893]
Alvestrand, H., Ed. and C. Karp, "Right-to-Left Scripts for Internationalized Domain Names for Applications (IDNA)", RFC 5893, DOI 10.17487/RFC5893, , <https://www.rfc-editor.org/info/rfc5893>.
[RFC5894]
Klensin, J., "Internationalized Domain Names for Applications (IDNA): Background, Explanation, and Rationale", RFC 5894, DOI 10.17487/RFC5894, , <https://www.rfc-editor.org/info/rfc5894>.
[RFC6912]
Sullivan, A., Thaler, D., Klensin, J., and O. Kolkman, "Principles for Unicode Code Point Inclusion in Labels in the DNS", RFC 6912, DOI 10.17487/RFC6912, , <https://www.rfc-editor.org/info/rfc6912>.
[RFC8174]
Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, , <https://www.rfc-editor.org/info/rfc8174>.

10.2. Informative References

[Freytag-troublesome]
Freytag, A., Klensin, J., and A. Sullivan, "Those Troublesome Characters: A Registry of Unicode Code Points Needing Special Consideration When Used in Network Identifiers", , <draft-freytag-troublesome-characters-02>. Reference supplied for historical purposes. This document is no longer under development.
[Gabrilovich2002]
Gabrilovich, E. and A. Gontmakher, "The Homograph Attack", Communications of the ACM 45(2):128, .
[IAB-2015]
Internet Architecture Board (IAB), "IAB Statement on Identifiers and Unicode 7.0.0", , <https://www.iab.org/documents/correspondence-reports-documents/2015-2/iab-statement-on-identifiers-and-unicode-7-0-0/>.
[ICANN-MSR5]
Internet Corporation for Assigned Names and Numbers (ICANN): Integration Panel, "Maximal Starting Repertoire -- MSR-5 Overview and Rationale", , <https://www.icann.org/en/system/files/files/msr-5-overview-24jun21-en.pdf>.
[ICANN-RZLGR-5]
Internet Corporation for Assigned Names and Numbers (ICANN): Integration Panel, "Root Zone Label Generation Rules (RZ LGR-5) Overview and Summary", , <https://www.icann.org/news/announcement-2-2019-04-25-en>.
[IDNA-Unicode]
Klensin, J. and P. Faltstrom, "IDNA Update for Unicode 7.0.0", , <draft-klensin-idna-5892upd-unicode70-05>. Reference supplied for historical purposes. This document is no longer under development.
[LGR-forward-reference]
?? TBD ??, "Overview and Summary of ICANN Label Generation Rule Effort".
[LGR-Procedure]
Internet Corporation for Assigned Names and Numbers (ICANN), "Procedure to Develop and Maintain the Label Generation Rules for the Root Zone in Respect of IDNA Labels", , <https://www.icann.org/en/system/files/files/draft-lgr-procedure-20mar13-en.pdf>.
[RFC-Editor-5890Errata]
RFC Editor, "RFC Errata: RFC 5890, "Internationalized Domain Names for Applications (IDNA): Definitions and Document Framework", August 2010", , <https://www.rfc-editor.org/errata_search.php?rfc=5890>.
[RFC3492]
Costello, A., "Punycode: A Bootstring encoding of Unicode for Internationalized Domain Names in Applications (IDNA)", RFC 3492, DOI 10.17487/RFC3492, , <https://www.rfc-editor.org/info/rfc3492>.
[RFC4690]
Klensin, J., Faltstrom, P., Karp, C., and IAB, "Review and Recommendations for Internationalized Domain Names (IDNs)", RFC 4690, DOI 10.17487/RFC4690, , <https://www.rfc-editor.org/info/rfc4690>.
[RFC4713]
Lee, X., Mao, W., Chen, E., Hsu, N., and J. Klensin, "Registration and Administration Recommendations for Chinese Domain Names", RFC 4713, DOI 10.17487/RFC4713, , <https://www.rfc-editor.org/info/rfc4713>.
[RFC5564]
El-Sherbiny, A., Farah, M., Oueichek, I., and A. Al-Zoman, "Linguistic Guidelines for the Use of the Arabic Language in Internet Domains", RFC 5564, DOI 10.17487/RFC5564, , <https://www.rfc-editor.org/info/rfc5564>.
[RFC5892]
Faltstrom, P., Ed., "The Unicode Code Points and Internationalized Domain Names for Applications (IDNA)", RFC 5892, DOI 10.17487/RFC5892, , <https://www.rfc-editor.org/info/rfc5892>.
[RFC5992]
Sharikov, S., Miloshevic, D., and J. Klensin, "Internationalized Domain Names Registration and Administration Guidelines for European Languages Using Cyrillic", RFC 5992, DOI 10.17487/RFC5992, , <https://www.rfc-editor.org/info/rfc5992>.
[RFC6452]
Faltstrom, P., Ed. and P. Hoffman, Ed., "The Unicode Code Points and Internationalized Domain Names for Applications (IDNA) - Unicode 6.0", RFC 6452, DOI 10.17487/RFC6452, , <https://www.rfc-editor.org/info/rfc6452>.
[RFC8753]
Klensin, J. and P. Fältström, "Internationalized Domain Names for Applications (IDNA) Review for New Unicode Versions", RFC 8753, DOI 10.17487/RFC8753, , <https://www.rfc-editor.org/info/rfc8753>.
[RFC9233]
Fältström, P., "Internationalized Domain Names for Applications 2008 (IDNA2008) and Unicode 12.0.0", RFC 9233, DOI 10.17487/RFC9233, , <https://www.rfc-editor.org/info/rfc9233>.
[SL-REF-LGR]
Internet Corporation for Assigned Names and Numbers (ICANN), "Second Level Label Generation Rules", , <https://www.icann.org/resources/pages/second-level-lgr-2015-06-21-en>.
[Unicode]
The Unicode Consortium, "The Unicode Standard, Version 5.0", Section 2.11, . This printed reference has now been updated online to reflect additional code points. For code points, the reference at the time this document was published is to Unicode 5.2. (Note that this is the reference exactly at it appeared in RFC 5891. The best handling for this reference and the next two have been posed as questions to the RFC Production Center.)
[UnicodeA]
The Unicode Consortium, "The Unicode Standard, Version 16.0", Section 2.11, , <https://www.unicode.org/versions/Unicode16.0.0/core-spec/chapter-2/#G1708>. Note that this can be adjusted for newer Unicode version by adjusting the version portion of the URL.
[UnicodeB]
The Unicode Consortium, "The Unicode Standard, Version 16.0.0", Section 3.6, definition D52, , <https://www.unicode.org/versions/Unicode16.0.0/core-spec/chapter-3/#G30602>. Note that this can be adjusted for newer Unicode version by adjusting the version portion of the URL.

Appendix A. Change Log

RFC Editor: Please remove this appendix before publication.

A.1. Changes from version -00 (2017-03-11) to -01

  • Added Acknowledgments and adjusted references.
  • Filled in Section 5 with updates to respond to errata.
  • Added Section 6 to discuss relationships to other documents.
  • Modified the Abstract to note specifically updated documents.
  • Several small editorial changes and corrections.

A.2. Changes from version -01 (2017-09-12) to -02

After a pause of nearly 34 months due to inability to get this draft processed, including nearly a year waiting for a new directorate to actually do anything of substance about fundamental IDNA issues, the -02 version was posted in the hope of getting a new start. Specific changes include:

  • Added a new section, Section 4, and some introductory material to address the very practical issue that domains run on a for-profit basis are unlikely to follow the very strict "understand what you are registering" requirement if they support IDNs at all and expect to profit from them.
  • Added a pointer to draft-klensin-idna-unicode-review to the discussion of other work.
  • Editorial corrections and changes.

A.3. Changes from version -02 (2019-07-06) to -03

  • Minor editorial changes in response to shepherd review.
  • Additional references.

A.4. Changes from version -03 (2019-07-22) to -04

  • Editorial changes after AD review and some additional changes to improve clarity.

A.5. Changes from version -04 (2019-08-02) to -05

  • Small editorial corrections, many to correct glitches found during IETF Last Call.
  • Updated acknowledgments, particularly to reflect reviews in Last Call.

A.6. Changes from version -05 (2019-08-29) to -06

Other than some small editorial adjustments, these changes made after, and reflect, IESG post-last-call review and comments. To the extent it was possible to do so without making this document inconsistent with the other IDNA documents, established IETF, Unicode, and ICANN community i18n terminology, or well-established IDNA or i18n practices, the first author believes that the document responds to all previously-outstanding IESG substantive comments.

  • Fixed a remaining citation issue with a Unicode document. This version has not been updated to reflect Unicode 13, but the document should be adjusted so that all references are contemporary at the time of publication.
  • Added reference to homograph attacks, and slightly adjusted discussion of them, per discussion with IESG post-last-call.
  • Removed pointer to RFC 5890 from discussion of mixed-script labels in Section 3.
  • Rewrote parts of Section 4 to eliminate the term "for-profit" and clarify the issues.
  • Removed pointer to draft-klensin-idna-unicode-review because RFC 8753 has been published and is therefore no longer pending / parallel work.
  • Rewrote Section 6 to make the relationships among various documents and efforts somewhat more clear.
  • References to RFCs 5893 and 6912 moved from Informative to Normative.

A.7. Changes from version -06 (2020-07-13) to -07

  • Significant parts of this draft have been rewritten, and text rearranged, to reflect discussions subsequent to the end of the original IETF Last Call in late August 2019 and the posting of version -06 nearly a year later to resolve IESG comments and objections that appeared to be consistent with the purpose of the document and the Last Call comments. The items below reflect the most significant changes. Note of these changes are believed to be substantive rather than improvements of clarity and explanation.
  • Multiple small editorial corrections including one more change from "profits" to "revenues" to make it clear that the motivation problem might exist even for a registry that was loss-making.
  • Extensive changes to clarify the intent of, and need for, the document and improve the explanation of its context and relationship to define additional restrictions for particular scripts or writing systems.
  • Added reference to RFC 8174 to the 2119 boilerplate.
  • In Section 5, updated the errata description for RFC 5890 and verified the absence of errata for RFCs 5893 and 5894 as of 2024-09-08.
  • Updated references including those associated with the errata list in Section 5.1.
  • Clarified the Unicode references (those in RFC 5891 were to Unicode 5.0).
  • Updated source to RFCXMLv3.

Authors' Addresses

John C Klensin
1770 Massachusetts Ave, Ste 322
Cambridge, MA 02140
United States of America
Asmus Freytag
ASMUS, Inc.