Two Lions Holding The Zürich Flag

Country Codes

Two Lions Holding The Zürich Flag

The initial emoji flags were restricted to a small set o countries, which was pretty discriminatory, the regional indicators were added to Unicode to avoid this issue, but this only allowed entities which have defined ISO 3166-1 alpha-2 codes. This allowed to side-step the issue of quasi-countries like Taiwan or Palestine, which have ISO codes even if they are not widely recognised as countries. Confusingly, some weird colonies like or are supported, but not the significant divisions of Great Britain (like England).

Change is a problem: characters tend to have a longer life that countries. Many countries have come and gone since my birth; even if the Soviet Union has disappeared and its country code is officially deprecated, old texts referring to it still exist, and so do graphical representations of its flag. What will happen when a country whose flag has been emojified disappears and its country code is unassigned? People complained that the system does support regional flags that appear in sporting competitions, like Scotland. Yet some of these might become country flags soon.

To make things a bit more confusing, many organisations which are not countries are assigned codes, some of which like the UN (:united_nations:) or the EU have flags. You can see what is supported on not on the test page, which I updated. Apple support the EU flag, but not the UN. Still, what is the flag of the Eurozone? You could composite a euro sign with a flag, maybe. The funny thing is, there is character for some euro bills (💶) similar characters exist for the US dollar and the British pound, but no other currency. It would have been more logical to represent them as composite characters, a banknote character and zero width joiner with the currency symbol.

Things are not going to get simpler, there is a proposal to encode country sub-divisions (like England, Wales, the US states, but also things like Swiss cantons). surprisingly these flags are not encoded using the regional indicator range, but instead the flag character (🏴) and characters in the tag range. So for instance the flag of Zürich would be the sequence 🏴󠁣󠁨󠁺󠁨󠁿 (black-flag, tag-C, tag-H, tag-Z, tag-H, tag-cancel). There is a test page using the Babel Font.

Flattr this!

Plastic Emoji

Emoji – plastic

Plastic Emoji

Six years ago, I wrote a blog entry about a clock implemented using emoji-characters, at that time, they were a pretty obscure feature of Unicode, a compatibility element to support the Japanese market. Recently, Coop – one of the biggest retail chains in Switzerland – started giving plastic emoji with a suction cup support as customer gifts.

On this picture, you can see rocket (🚀), alien face (👽), smiling face with hearts shaped eyes (😍), grimacing face (😬) thumb up sign (👍). In six years, theses symbol went from typographic oddity of the Japanese market to being the main subject of a marketing campaign for a Swiss supermarket.

To quote the internet, I’m not even mad, that’s amazing

Flattr this!

Unicode Gender


A few months ago, I wrote a blog post about Unicode skin colour selectors which lets you change the skin colour of certain characters. Meanwhile a new version came out, which specifies how to select the gender of characters. Interestingly, the mechanism used this time is different, instead of a gender modifier, this is implementing by merging one character with a gender symbol using the zero width joiner character (U+200D). The gender symbol is either ♀ (female) U+2640, or ♂ (male) U+2642.

Why is there an offset of two between the female and male sign? These are actually astronomical symbol for the planets. The female symbol is also used for Venus (and in Alchemy, Copper) and the male symbol for Mars (and in Alchemy, Iron). Between them, there is the symbol for earth (♁). It also means that there are a few spare planets to encode other genders. There are many more alchemical symbols encoded in the range U+1F700U+1F773, so if you need the symbol for antimoniate  🜥, or another symbol for earth 🜨, they are there.

Using an existing character with a clear semantic meaning gives nicer degradation, the combination 🏃︎ + ♀ is kind of understandable, if you know the gender / planet symbols; I had the impression these were not taught in the USA. What is annoying is that there are now two different mechanisms to affect the appearance of a given character, so for instance the Runner character (U+1F3C3) now has 12 variations, two genders (implemented using a zero width joiner + a gender symbol), and six skins modes (implemented using skin colour modifier characters). The table below shows all the combinations (which might or might not work in your browser).

Gender Base Type 1-2 Type 3 Type 4 Type 5 Type 6
Female 🏃‍♀ 🏃🏻‍♀ 🏃🏼‍♀ 🏃🏽‍♀ 🏃🏾‍♀ 🏃🏿‍♀
Male 🏃‍♂ 🏃🏻‍♂ 🏃🏼‍♂ 🏃🏽‍♂ 🏃🏾‍♂ 🏃🏿‍♂

Flattr this!

A pale skinned, dancer with black hair and a red dress

Emoji Skin Color

A pale skinned, dancer with black hair and a red dress

Skin color is not something you traditionally associate with typography, yet in Unicode, there are control characters for skin color. More precisely, the modifiers (1F3FB to 1F3FF) change the skin colour of the previous character. With no modifier, the emoji should display the people with a Lego yellow skin. Now many emoji character support the variant selector control characters, which means that for many characters we now have 7 variants: a text variant, a neutral (yellow) emoji, and then five skin coloured variants. The text and emoji variant are not very consistent, the runner changes direction, and the dancer changes gender – interestingly, a new version of Unicode will allow the specification of gender in emoji.

The table below shows the different variants for some characters, in some cases the skin selection works, in some others it does not, you can change the skin color of the princess but not of the Japanese ogre. The DOS era smiley face has no race. All these features kind of work on OS X, but there are some quirks, the Fitzpatrick seem to implicitly trigger the emoji variant, even when they cannot apply – and as long as there is no line break between the character and the skin selector…

Text Emoji Type 1-2 Type 3 Type 4 Type 5 Type 6
💃︎ 💃️ 💃🏻 💃🏼 💃🏽 💃🏾 💃🏿
🏃︎ 🏃️ 🏃🏻 🏃🏼 🏃🏽 🏃🏾 🏃🏿
👸︎ 👸️ 👸🏻 👸🏼 👸🏽 👸🏾 👸🏿
👹︎ 👹️ 👹🏻 👹🏼 👹🏽 👹🏾 👹🏿
☺︎ ☺️ ☺🏻 ☺🏼 ☺🏽 ☺🏾 ☺🏿

Flattr this!

Bolt character with both ANSI color and Unicode variation selector

Double Escape

Bolt character with both ANSI color and Unicode variation selector

are a mechanism to control the display of text in computer command line tools. While this mechanism is quite old – it originated in the 80s – it is still somehow used nowadays, mostly to color the text in terminals.

The use of control codes to format text has mostly died out, and the range of ASCII characters (escape in particular) used for the escapes has mostly died out. Nowadays people expect text formatting like color, underlines and such not to be expressed in the text itself, but escaped in another language like HTML.

It turns out the idea has not died out, but merely came back, as Unicode as the notion of escape sequences to control the appearance of characters. Some characters, like for instance ⚡ bolt (26A1), can be displayed in two modes:

  • ⚡︎ Text Style
  • ⚡️ Emoji Style

If you look at the source code of this page, you will notice that there is no formatting tag around these characters, instead they are followed by a variation selector: FE0E selects the text variant, and FE0F selects the coloured, emoji variant. If you see only one type of bolt, your browser/operating system does not support variant selectors – if you see nothing, your browser/operating system is missing the font for that particular character.

Unicode variation selectors only apply to the single character they follow where ANSI escape sequences mark a range, with a start and an end. Now the question is, how do they interact? To check this I generated the bolt character in the simplest 7 ANSI colours with both variation selectors. As you can see in the image, ANSI controls the font-color, which is honoured in the text variation and ignored in the emoji (color) variation. This means that in a modern terminal, for certain characters you can get 257 color variations, 256 from ANSI and one from Emoji…

Of course you can get the same behaviour in a web-browser

⚡︎ ⚡️ ⚡︎ ⚡️
⚡︎ ⚡️ ⚡︎ ⚡️
⚡︎ ⚡️ ⚡︎ ⚡️

Flattr this!

Skype call icons – 📹 🎙 + 🕽

Icon Failures

Skype call icons – 📹 🎙 + 🕽

Ever since the idea of graphical user interface came out of Xerox labs, icons have been a tool of choice of user interface designers. The problem is that introducing a new icon amounts to defining a new graphical word.

Using analogies with the real world supposedly helps understanding, but this only works if the user has seen that object and recognises its stylised representation. The phone icon (☎) is widely recognised, but pretty abstract for generations Y and following, which never saw a phone with that shape outside of a museum.

Many windows user interfaces used the a 3½” floppy disk (🖫) as a metaphor for saving, which is pretty bad, because they are basically a plastic rectangle with a metallic bit and maybe some label with nothing useful written on it. It has also died out quicker than the old style phone receiver, so it is abstract for most people.

Designers always want to simplify these icons, as it makes them more elegant, and more readable, it also makes them more abstract. Trying to help my mother remotely with the Skype user interface, I got the following reading for the control icons:

  1. Wrapped sweet
  2. Flower vase
  3. Plus sign
  4. Phone

A recognition rate of 50% is pretty bad. The problem here is that the first two icons are both very abstract, and representations of objects my mother did not interact with a lot: a video camera and a professional microphone.

Interestingly, all these symbol became actual characters with the emergence of emoji. But their representation is much less stylised, and depending on the fonts your operating system is using to render them, they might be in color. There is also typically multiple icons for the same concept.

Camera Microphone Plus Sign Phone Headset

Flattr this!

Waning Crescent Moon (🌘)

Cultural Moon

One of the strangest features of Unicode 8.0 is ability to change the appearance of human faces, by selecting some skin color, as some perceived the default faces has having some cultural bias. It is pretty hard to design things without any bias, consider the characters for the moon phase: the shadow moves from side to side, which is what you perceive when you are far from the equator. The closer to the equator, the more the illuminated part of the moon is down, i.e. the more the crescent moon looks more like this: 🌙. Latitude has a big influence on what you see in the sky…

Flattr this!

Screen capture of the page displaying regional indicator flags

Unicode Flags

Screen capture of the page displaying regional indicator flags

One entry of this blog that gets some traffic is about emoji flags (in French). It explains how these flags are encoded in Unicode, instead of a single character per flag, there is a special range of letters, and when a region’s ISO 3166-2 code is written with these letters, the operating system’s font might replace the two letters of the code with a flag. As these letters have code-points above 0xffff, they are in turn represented by two chars on systems that encode text as UTF-16, like for instance Java, so a single flag will look to older code as four characters.

When I wrote the article, the supported national flags were basically those present in the proprietary encoding of Japanese mobile phone operators, as the whole emoji project was first about providing compatibility with these systems. So the French flag would display, but not the Swiss one.

Meanwhile things have changed and emoji has taken its own life and is increasingly adopted outside of Japan and the support in operating systems was greatly improved: on Mac OS X, the Swiss flag is also displayed. This led me to wonder to what extent flags of the regional indicator ranges were supported nowadays.

So I generated a page from the list of iso 3166-2 country codes with for each country the region code encoded using the regional indicator characters, that display for a given browser while flags are handled. On Mac OS X, Switzerland now has its flags, the Republic of Chad does not. Many micro-states (Vatican, Monaco) don’t have their flags yet. Taiwan is also missing; given Apple’s focus on mainland China, this is hardly surprising. Android has no such problem…

Flattr this!


Out of band communication


We always learn that languages are for communicating, but people usually forget to specify what is meant to be communicated. A large fraction of what is communicated is not the official message, but some sub-text: emotional, social. That subtext is usually not explicit, you learn there is correct language, and the interpretation of deviations and style is something you acquire on the side, you learn that this kind of mistake is a sign of this social group, this style is from that region of the planet. Life would be pretty hard if you cannot judge people based on their writing.

Internet has made it so easy for people to communicate between group, that we are faced with the problem of understanding each other, without any context, sentences like I play football, or I live on the first floor cannot be interpreted. Figures of speech like irony or exaggeration further confuse the conversation, to the point where it was necessary to make implicit communication explicit: for instance by using emoticons. Emoticons are one example of out-of-band communication, a narrative on a different level, but of course you could make it the main channel: emojili is an app that lets you communicate using only emoji.

While we tend to think of text as a single flow, there is a layering of communication systems, consider an old-school book, we have the following layers (theoretically, each layer except the first could be omitted).

  • The text.
  • The style of the text.
  • First subordinate level: parentheses.
  • Text formatting: italics, bold, fonts.
  • Second subordinate level: footnotes.
  • Page formatting.
  • Inset, figures and sidebars
  • Third subordinate level: foreword, appendix, etc.

Like most things, those levels are linked to a culture, typography rules change from one country to another, so do the rules of layout, and even the meanings of typography and font styling. Punctuation, capitalisation, even spaces are in a sense out-of-band information, you do not strictly need them to read the text, and there was a point in time when such artifices were optional. Nowadays they are considered an integral part of the text in western languages.

One type of out-of-band channel I like is furigana, a type of phonetic annotation that is used to give phonetic hints for kanji that are not well-known to the public, either because they are obscure, not used in Japanese, or the word they are used in is not read in a Japanese mode. For instance 上海 (Shangai) would be read as Jokai in Japanese, or even Ueumi; furigana tells the reader the way the kanji ought to be read. This annotation is not just a help to the reader, Japanese has a phonetic alphabet, so you could just write シャンハイ (Shangai), but writing it 上海 (シャンハイ) emphasises that this place is in China, and the relationship with the Japanese writing system, and carries semantic information about the name of place: top of the sea…

I often wish that this system would more widely used, as many words and names, because they have been taken out of the original context, do not have a pronunciation that can be guessed from their writing. This is particularly bad in English.

The web has brought another level of subtext, as text is now not only read by humans, but by algorithms. The web browser is the simplest of these algorithms, it re-creates some of the subtext levels for the viewer: adapting the layout for the device the user has, but also translating some of the sub-text, if the user is blind, the whole sub-text must be transformed, as it cannot be expressed visually. Other algorithms transform, synthesise and aggregate the information, search engines are the most visible example, but the logic that builds the snippet for a page when you share in on a social network is another, so are systems that extract dates and tracking numbers from confirmation e-mails.

While many concentrated on features to build applications inside web-pages, HTML5 actually contains many changes around making the semantic information in a web-page explicit:

  • Tags like <u> changed meaning: this tag now marks text that is stylistically different, and indicates a proper noun in Chinese. Underline should be done using CCS
  • New tags like <time> are used to indicate elements with a specific semantic meaning, with an attribute specifying the time in machine readable format.
  • The <input> tag now supports many more semantically defined input types, like phone numbers.

This concept has always been present in the web (<address> was there from day one), but were mostly abused by people trying to do page-layout in web-pages. Two things have happened since: more and more algorithms are parsing web-pages, and more and more people on the web cannot handle the content in its original form, either because of a handicap, or because they do not understand the language.

What I find interesting is that the same thing is happening in reality, more and more things get annotated with barcodes and QR-Codes so that machines can make sense of them, they are just another form of sub-titles, but in turn people are now learning to make them pretty, to stylise them, playing with the sub-text once again…

Flattr this!

Seamless interfaces and emoji injection


One recent trend in UI design is to build seamless interfaces. The underlying idea is pretty reasonable, get rid of useless clutter to get cleaner, more readable interfaces. This typically means removing a lot of lines, boxes, shadows and other highlights. In short, cut out the structure, leave the content.

Like any good idea, taken to extremes, the result is horrible, instead of a UI that is clear if a bit clunky, you end up with the UI equivalent of modern art, vaguely elegant, but very difficult to understand. The other problem is that by removing the boundaries between pieces of information, you remove the context they are valid in, which opens up a lot of way to confuse the users.

Name Secure
Good Guy yes
Bad Guy no

Consider this table to display that trust status of users in a system. The data is presented in a very structured way, nearly a database dump, really, with a little bit of decoration.

Good Guy 🔒
Bad Guy

The same information made seamless would look like this, boxes and column headers removed, the information distilled to simple icon (here we can even use an emoji character). The interface is much more compact, and easier to localise to boot, better in each and every way.

Good Guy 🔒
Bad Guy 🔒

Problems arise if the users can change their names: bad guy can now try to put out of context information into his name, for instance here he can inject an emoji character into his name. Bad guy’s lock icon is part of his name (Unicode character x1F512), because there are no boundaries, no context it is difficult to figure out which information came from where.

Name Secure
Good Guy yes
Bad Guy 🔒 no

Such an attack is less efficient in an old style UI, because the context shows us what information is what explicitly, instead of making that information implicit. We end up with the usual tradeoff: implicit informations is more natural, more fluid, but also more error prone and more difficult to check.

Flattr this!