+3 votes
248 views
in Domain management by (242k points)
reopened
Punycode

1 Answer

+4 votes
by (1.6m points)
edited
 
Best answer

Coding method development
How does Punycode encoding work?
Free Punycode converters
Punycode in domains with emojis
Is Punycode a security risk?

image

Punycode

Punycode is a standardized encoding method that allows you to reproduce Unicode characters using the limited set of ASCII characters, which is made up of the following elements:

  • Lowercase letters: from? To? a? z?
  • Digits: 0 to 9
  • Special characters: dash (-)

The items listed are considered the basic characters..

The method is primarily used to process internationalized domain names (IDNs) with non-ASCII special characters.

Index
  1. Coding method development
  2. How does Punycode encoding work?
  3. Free Punycode converters
  4. Punycode in domains with emojis
  5. Is Punycode a security risk?

Coding method development

In 2003, the Internet Engineering Task Force (IETF) standardized Punycode as a syntax for encoding internationalized domain names in applications (IDNA) ..

The IETF defines IDN as those domain names that contain special characters (such as the umlaut, the cedilla or the tilde) or non-original letters of the Latin alphabet (a clear example is the case of ñ). These non-ASCII characters prevent basic protocols such as the Domain Name System (DNS) from processing them.

Thus, for example, since IDNs were introduced, the domain name azulejos-coruña is supported in the top-level domain .es . However, within the name resolution framework, it could only be processed if the non-basic character encoding is performed (in the example presented, the? Ñ?). Many protocols use the language written in English, and therefore only support the limited set of ASCII characters..

To ensure the compatibility of IDNs with the older Internet standards, the IETF prescribed an encoding for internationalized domain names with the characters previously allowed, at the same time that it standardized the corresponding procedure with Punycode .

Note

For email addresses, Punycode is only used for internationalized email domains. Addresses that contain non-ASCII characters in the local part, that is, characters before @, are encoded using UTF8.

How does Punycode encoding work?

The IETF states in the RFC 3492 standard that Punycode is one of the possible applications of a general encoding algorithm known as Bootstring. The Bootstring algorithm allows you to represent strings with a limited selection of elements. The development of the coding procedure is based on six principles:

  • Integrity : With Bootstring each output string can be represented by a simplified string.
  • Uniqueness : The classification of the output string and its Bootstring encoding is unambiguous. Each Punycode can be assigned exactly one ASCII equivalent and vice versa.
  • Reversibility - Bootstring encoding can be undone without losing information.
  • Efficiency : The encoded string is only minimally (sometimes not even) longer than the output string.
  • Simplicity : Bootstring uses simple encoding and decoding algorithms.
  • Readability : Only those characters that cannot be represented in the target character body are encoded. The rest of the characters remain the same.

Bootstring specifies Punycode according to the requirements of internationalized domain names. This should allow rendering Unicode characters with the basic characters allowed up to now.

We show this syntax below with the following example:

IDN : tiles-coruña

The azulejos-coruña IDN contains the letter? Ñ ?, not included within the characters previously allowed for domain names and which, therefore, must be encoded using Punycode to guarantee compatibility.

In the first step, the encoding process foresees a normalization of the output character string (thus, all uppercase letters are replaced by lowercase).

In the second step, all non-ASCII characters are removed, replacing them in the domain with their encoded form and separating them by a hyphen.

When encoding Internet addresses with Punycode, each resulting string is accompanied by the prefix ACE (short for ASCII Compatible Encoding):

ACE prefix : xn--

The ACE prefix ensures that domain names that contain hyphens are not misinterpreted as international domain names.

Finally, as a result coded for azulejos-coruña, we obtain:

ACE : xn - azulejos-corua-2nb

image
The ACE string consists of the ACE prefix and a Punycode string.

Deviations from this scheme occur when the domain name contains only non-ASCII characters or does not contain any at all: a domain name that only contains non-basic characters will display the ACE prefix together with a fully encoded string after the encoding process.

So, for example, to a domain name like ?????????? (in Greek, example) corresponds to the following encoding:

IDN : ??????????

ACE : xn - hxajbheg2az3al

When, on the other hand, a domain name consists only of ASCII characters, it will appear accompanied by the ACE prefix and a hyphen at the end. In this case, it is not necessary to encode with Punycode.

If the fully qualified domain name or FQDN (Fully Qualified Domain Name) is considered , each of its categories (top-level domain, second-level domain, third-level domain, etc.) will be coded separately.

A domain like ??????. ?? (Bulgarian for example.bg ) could be coded as follows:

IDN: ??????. ??

ACE : xn - e1afmkfd.xn - 90ae

The following table shows an overview of the different variants of the Punycode procedure.

  IDN Punycode ACE
ASCII and non-ASCII characters tiles-coruña.es tiles-corua-2nb.es xn--azulejos-corua-2nb.es
Non-ASCII characters only ??????????. gr hxajbheg2az3al.gr xn--hxajbheg2az3al.gr
ASCII characters only example.org example.org- Does not apply

It is important to note that the algorithm underlying the Punycode method guarantees that, despite the conversion, none of the domain categories exceed 63 characters .

When it comes to encoding, keep in mind that Unicode characters are not translated one by one into ASCII characters. Instead, the algorithm determines a string that results from the distance between the characters that have been removed and their corresponding position in the output string .

If we go back to our example, the character string 2nb indicates that corua must be complemented by the Unicode character? Ñ? in fifth position.

Note

RFC 3492 describes the underlying algorithm of Punycode in detail. The document also provides an implementation of the coding procedure in the C programming language.

For the encoding of internationalized domain names, users often use free Punycode converters.

Free Punycode converters

Several sites offer free Punycode converters for transferring IDNs into ASCII-compliant representations.

For namespace with top-level domain .es or. mx can be named, for example, the domain converter of the web Cyberneticos. This tool places special emphasis on its ability to encode non-ASCII characters as characteristic of the Spanish language as? Ñ ?, the umlauts or accents, but also unusual characters from other languages.

image
Along with some simple instructions, the Cyberneticos website offers a Punycode converter that allows ACE coding.

Another notable converter is Mathias Bynens' Punycode converter based on punycode.js. Like the tool explained above, this converter can be applied for IDN encoding in Spanish, but also in other languages.

image
Mathias Bynens offers with Punycode domain name converter an open source tool for converting internationalized domain names.

Punycode in domains with emojis

Not only internationalized domain names, but also domains with emojis can be encoded with Punycode. As a requirement, it is necessary that the top-level domain allows its use and that the emoticon that you want to use is registered in the Unicode standard.

advice

As of today, the TLDs listed below allow domain registration with emojis : .ws, .tk, .to, .ml, .ga, .cf, .gq and .fm.

From a technical point of view, emoji domains are rendered as Punycode, although in theory they are presented to the user as a combination of text and emoticons.

Domain with emoji: i? .Ws /

ACE: xn--i-7iq.ws/

However, today virtually no standard browser implements this domain model. If you enter a domain with emoji in Firefox, Chrome, Safari, Edge, or Opera, the address bar only shows the ACE string.

Is Punycode a security risk?

Punycode poses a security risk when it comes to homographic attacks , a type of phishing in which criminals mimic the appearance of different characters to lure unsuspecting users to fake websites.

In order for users to understand this type of phishing attack, blogger Xudong Zheng shows an example on his page using the following Punycode domain:

www.xn--80ak6aa92e.com

Which directs users to a page with the following IDN:

www.?????.com

However, this URL does not correspond to the official website of California technology company Apple Inc., but rather to a phishing website created for example only.

Instead of the ASCII character? A? with Unicode U + 0061, the Cyrillic character is used? a? (U + 0430). Although at first glance it is very difficult to differentiate these two characters, browsers do interpret them as different characters.

Furthermore, the fact that even certificates cannot provide security becomes a major disadvantage for users. For modern phishing campaigns, criminals register valid SSL certificates, which give the created web an appearance of security and professionalism.

To avoid these types of attacks, current versions of Chrome and Opera show the ACE string instead of the internationalized domain. Internet-Explorer and Microsoft Edge completely prevent access to these types of domains. Firefox is the only browser that does not offer protection against Punycode phishing.

image
Example of a homographic domain: the URL is visually the same as that of the official Apple website. However, the Unicode character U + 0430 has been used, which corresponds to a Cyrillic letter with a striking resemblance to the ASCII character? A ?.

If you are a Firefox user, you can reduce the risk of phishing attacks by generally preventing Punycode to IDN translation. For this temporary solution you only have to follow two steps:

  • Access editor settings : Type about: config in the web browser's address bar to open the Firefox configuration editor.
  • Force Punycode : Find the network.IDN_show_punycode setting and change the value from false to true .

After configuration, Firefox will display the internationalized domains in the address bar as ACE strings.


...