Thursday, March 02, 2006


This week I had the pleasure to see a number of questionmarks where I should see characters. Where I should have seen the name of a language in that script. It seems so obvious that when you have a computer, that you can see all the information that is available.. It is however much less obvious than it seems.

When you use the English Wikipedia, it is normal to include the names of countries and languages in the script. Many people do not have the necessary fonts to see the characters that are in many languages. They get like I did question marks. For WiktionaryZ we will have a similar situation but on a completely different scale. We want all words in all languages and, a word like water, air, fish, bird can be expected to be translated to any language. This will make it even more relevant for WiktionaryZ to consider fonts.

WiktionaryZ will be in UTF-8 but that only helps in so far as fonts exist in the first place. Also UTF-8 is a living standard, a new version; 5.0 beta 2 represents the latest developments. As we want to have both user interfaces and content in all languages, we sure are going to have our share of issues. With 5.0 we will have better bidi support, that will be a real boon.

It is easy to recognize that people do not have the fonts to see all Wikipedia and Wiktionary content. To solve this, we could provide the best information on the fonts needed for particular languages, scripts. When we start doing this, we will help people. It may also open several cans of worms. Then again, we could go whole hog and offer fonts to download.

PS this article had me start looking into fonts again.. a good and interesting read.



Blogger Kipmaster said...

Just a short commentary: on Linux, we don’t have questionmarks for missing characters, but a square box with the utf-8 code of the missing character displayed in it, so that we actually know which character is missing.

12:59 pm  

Post a Comment

Links to this post:

Create a Link

<< Home