{"id":7147,"date":"2015-02-24T20:26:30","date_gmt":"2015-02-24T19:26:30","guid":{"rendered":"http:\/\/daniel.haxx.se\/blog\/?p=7147"},"modified":"2026-02-24T09:56:35","modified_gmt":"2026-02-24T08:56:35","slug":"curl-smiley-urls-and-libc","status":"publish","type":"post","link":"https:\/\/daniel.haxx.se\/blog\/2015\/02\/24\/curl-smiley-urls-and-libc\/","title":{"rendered":"curl, smiley-URLs and libc"},"content":{"rendered":"\n<p>Some interesting Unicode URLs have recently been seen used in the wild &#8211; like in this billboard <a href=\"http:\/\/mashable.com\/2015\/02\/20\/coke-emoji-web-addresses\/\">ad campaign from Coca Cola<\/a>, and a friend of mine asked me about <a href=\"http:\/\/curl.haxx.se\/\">curl<\/a> in reference to these and how it deals with such URLs.<\/p>\n\n\n\n<figure class=\"wp-block-image\"><a href=\"https:\/\/twitter.com\/stevencoleuk\/status\/568698681852973056\"><img loading=\"lazy\" decoding=\"async\" width=\"450\" height=\"268\" src=\"http:\/\/daniel.haxx.se\/blog\/wp-content\/uploads\/2015\/02\/emojicoke-by-stevecoleuk-450.jpg\" alt=\"emojicoke-by-stevecoleuk-450\" class=\"wp-image-7150\" title=\"coke ad with emoji URL\" srcset=\"https:\/\/daniel.haxx.se\/blog\/wp-content\/uploads\/2015\/02\/emojicoke-by-stevecoleuk-450.jpg 450w, https:\/\/daniel.haxx.se\/blog\/wp-content\/uploads\/2015\/02\/emojicoke-by-stevecoleuk-450-150x89.jpg 150w, https:\/\/daniel.haxx.se\/blog\/wp-content\/uploads\/2015\/02\/emojicoke-by-stevecoleuk-450-300x178.jpg 300w\" sizes=\"auto, (max-width: 450px) 100vw, 450px\" \/><\/a><\/figure>\n\n\n\n<p class=\"has-text-align-right\">(Picture by <a href=\"https:\/\/twitter.com\/stevencoleuk\/status\/568698681852973056\">stevencoleuk<\/a>)<\/p>\n\n\n\n<p>I ran some tests and decided to blog my observations since they are a bit curious. The exact URL I tried was &#8216;www.O.ws&#8217; (<em>not<\/em> the same smiley as shown on this billboard &#8211; note that I&#8217;ve replace the actual smiley with &#8220;O&#8221; in this entire post since wordpress craps on it) &#8211; it is really hard to enter by hand so now is the time to appreciate your ability to cut and paste! It appears they registered several domains for a set of different smileys.<\/p>\n\n\n\n<p>These smileys are not really allowed IDN (where IDN means International Domain Names) symbols which make these domains a bit different. They should not (see below for details) be converted to <a href=\"http:\/\/en.wikipedia.org\/wiki\/Punycode\">punycode<\/a> before getting resolved but instead I assume that the pure UTF-8 sequence should or at least will be fed into the name resolver function. Well, either way it should either pass in punycode or the UTF-8 string.<\/p>\n\n\n\n<p>If curl was built to use libidn, it still won&#8217;t convert this to punycode and the verbose output says &#8220;<strong>Failed to convert www.O.ws to ACE; String preparation failed<\/strong>&#8220;<\/p>\n\n\n\n<p>curl (exact version doesn&#8217;t matter) using the stock threaded resolver<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Debian Linux (glibc 2.19) &#8211; <strong>FAIL<\/strong><\/li>\n\n\n\n<li>Windows 7 <strong>&#8211; FAIL<\/strong><\/li>\n\n\n\n<li>Mac OS X 10.9 &#8211; <strong>SUCCESS<\/strong><\/li>\n<\/ul>\n\n\n\n<p>But then also perhaps to no surprise, the exact same results are shown if I try to ping those host names on these systems. It works on the mac, it fails on Linux and Windows. Wget 1.16 also fails on my Debian systems (just as a reference and I didn&#8217;t try it on any of the other platforms).<\/p>\n\n\n\n<p>My curl build on Linux that uses <a href=\"http:\/\/c-ares.haxx.se\/\">c-ares<\/a> for name resolving instead of glibc succeeds perfectly. host, nslookup and dig all work fine with it on Linux too (as well as nslookup on Windows):<\/p>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<pre class=\"wp-block-preformatted\">$ host www.O.ws\nwww.O.ws has address 64.70.19.202<\/pre>\n\n\n\n<pre class=\"wp-block-preformatted\">$ ping www.O.ws\nping: unknown host www.O.ws<\/pre>\n<\/blockquote>\n\n\n\n<p>While the same command sequence on the mac shows:<\/p>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<pre class=\"wp-block-preformatted\">$ host www.O.ws\nwww.O.ws has address 64.70.19.202<\/pre>\n\n\n\n<pre class=\"wp-block-preformatted\">$ ping www.O.ws\nPING www.O.ws (64.70.19.202): 56 data bytes\n64 bytes from 64.70.19.202: icmp_seq=0 ttl=44 time=191.689 ms\n64 bytes from 64.70.19.202: icmp_seq=1 ttl=44 time=191.124 ms<\/pre>\n<\/blockquote>\n\n\n\n<p>Slightly interesting additional tidbit: if I rebuild curl to use <strong>gethostbyname_r<\/strong>() instead of <strong>getaddrinfo<\/strong>() it works just like on the mac, so clearly this is glibc having an opinion on how this should work when given this UTF-8 hostname.<\/p>\n\n\n\n<p>Pasting in the URL into Firefox and Chrome works just fine. They both convert the name to punycode and use &#8220;www.xn--h28h.ws&#8221; which then resolves to the same IPv4 address.<\/p>\n\n\n\n<p><em><strong>Update<\/strong>: as was pointed out in a comment below, the &#8220;64.70.19.202&#8221; IP address is not the correct IP for the site. It is just the registrar&#8217;s landing page so it sends back that response to any host or domain name in the .ws domain that doesn&#8217;t exist!<\/em><\/p>\n\n\n\n<h2 class=\"wp-block-heading\">What do the IDN specs say?<\/h2>\n\n\n<div class=\"wp-block-image\">\n<figure class=\"alignright\"><img loading=\"lazy\" decoding=\"async\" width=\"256\" height=\"256\" src=\"http:\/\/daniel.haxx.se\/blog\/wp-content\/uploads\/2015\/02\/U-263A-smiley.png\" alt=\"The U-263A smiley\" class=\"wp-image-7175\" title=\"The U-263A smiley\" srcset=\"https:\/\/daniel.haxx.se\/blog\/wp-content\/uploads\/2015\/02\/U-263A-smiley.png 256w, https:\/\/daniel.haxx.se\/blog\/wp-content\/uploads\/2015\/02\/U-263A-smiley-150x150.png 150w\" sizes=\"auto, (max-width: 256px) 100vw, 256px\" \/><\/figure>\n<\/div>\n\n\n<p>This is not my area of expertise. I had to consult <a href=\"http:\/\/stupid.domain.name\/\">Patrik F\u00e4ltstr\u00f6m<\/a> here to get this straightened out (but please if I got something wrong here the mistake is still all mine). Apparently this smiley is allowed in <a href=\"https:\/\/tools.ietf.org\/html\/rfc3940\">RFC 3940<\/a> (IDNA2003), but that has been replaced by RFC 5890-5892 (IDNA2008) where this is <a href=\"https:\/\/tools.ietf.org\/html\/rfc5892#appendix-B.1\">DISALLOWED<\/a>. If you read the spec, this is 263A.<\/p>\n\n\n\n<p>So, depending on which spec you follow it was a valid IDN character or it isn&#8217;t anymore.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">What does the libc docs say?<\/h2>\n\n\n\n<p>The <a href=\"http:\/\/pubs.opengroup.org\/onlinepubs\/9699919799\/functions\/freeaddrinfo.html\">POSIX docs for getaddrinfo<\/a> doesn&#8217;t contain enough info to tell who&#8217;s right but it doesn&#8217;t forbid UTF-8 encoded strings. The regular <a href=\"http:\/\/man7.org\/linux\/man-pages\/man3\/getaddrinfo.3.html\">glibc docs for getaddrinfo<\/a> also doesn&#8217;t say anything and interestingly, the <a href=\"https:\/\/developer.apple.com\/library\/mac\/documentation\/Darwin\/Reference\/ManPages\/man3\/getaddrinfo.3.html\">Apple Mac OS X version of the docs<\/a> says just as little.<\/p>\n\n\n\n<p>With this complete lack of guidance, it is hardly any additional surprise that the <a href=\"http:\/\/man7.org\/linux\/man-pages\/man3\/gethostbyname.3.html\">glibc gethostbyname docs<\/a> also doesn&#8217;t mention what it does in this case but clearly it doesn&#8217;t do the same as getaddrinfo in the glibc case at least.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">What&#8217;s on the actual site?<\/h2>\n\n\n\n<p>A redirect to <strong>www.emoticoke.com<\/strong> which shows a rather boring page.<\/p>\n\n\n\n<figure class=\"wp-block-image\"><a href=\"http:\/\/www.emoticoke.com\/\"><img loading=\"lazy\" decoding=\"async\" width=\"450\" height=\"524\" src=\"http:\/\/daniel.haxx.se\/blog\/wp-content\/uploads\/2015\/02\/emoticoke.png\" alt=\"emoticoke\" class=\"wp-image-7198\" title=\"emoticoke\" srcset=\"https:\/\/daniel.haxx.se\/blog\/wp-content\/uploads\/2015\/02\/emoticoke.png 450w, https:\/\/daniel.haxx.se\/blog\/wp-content\/uploads\/2015\/02\/emoticoke-128x150.png 128w, https:\/\/daniel.haxx.se\/blog\/wp-content\/uploads\/2015\/02\/emoticoke-257x300.png 257w\" sizes=\"auto, (max-width: 450px) 100vw, 450px\" \/><\/a><\/figure>\n\n\n\n<p><\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Who&#8217;s right?<\/h2>\n\n\n\n<p>I don&#8217;t know. What do you think?<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Some interesting Unicode URLs have recently been seen used in the wild &#8211; like in this billboard ad campaign from Coca Cola, and a friend of mine asked me about curl in reference to these and how it deals with such URLs. (Picture by stevencoleuk) I ran some tests and decided to blog my observations &hellip; <a href=\"https:\/\/daniel.haxx.se\/blog\/2015\/02\/24\/curl-smiley-urls-and-libc\/\" class=\"more-link\">Continue reading <span class=\"screen-reader-text\">curl, smiley-URLs and libc<\/span> <span class=\"meta-nav\">&rarr;<\/span><\/a><\/p>\n","protected":false},"author":5,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[13,6,45],"tags":[292,214,33,86,407,408,372],"class_list":["post-7147","post","type-post","status-publish","format-standard","hentry","category-net","category-floss","category-web","tag-chrome","tag-command-line","tag-curl-and-libcurl","tag-firefox","tag-idn","tag-unicode","tag-url"],"_links":{"self":[{"href":"https:\/\/daniel.haxx.se\/blog\/wp-json\/wp\/v2\/posts\/7147","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/daniel.haxx.se\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/daniel.haxx.se\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/daniel.haxx.se\/blog\/wp-json\/wp\/v2\/users\/5"}],"replies":[{"embeddable":true,"href":"https:\/\/daniel.haxx.se\/blog\/wp-json\/wp\/v2\/comments?post=7147"}],"version-history":[{"count":54,"href":"https:\/\/daniel.haxx.se\/blog\/wp-json\/wp\/v2\/posts\/7147\/revisions"}],"predecessor-version":[{"id":29347,"href":"https:\/\/daniel.haxx.se\/blog\/wp-json\/wp\/v2\/posts\/7147\/revisions\/29347"}],"wp:attachment":[{"href":"https:\/\/daniel.haxx.se\/blog\/wp-json\/wp\/v2\/media?parent=7147"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/daniel.haxx.se\/blog\/wp-json\/wp\/v2\/categories?post=7147"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/daniel.haxx.se\/blog\/wp-json\/wp\/v2\/tags?post=7147"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}