Tuesday, June 20, 2006

Translation of structural words

Now WiktionaryZ has some more features ... it is getting closer and closer to what we want. User Sanna started to add definitions + translations to terms like broader terms - and of course so did we ...
Now when it comes to German I don't have too many problems on how to translate these terms, even if sometimes I have some doubts, because they could be translated in different ways ... how do linguists use them? When it comes to Italian, which is not my mother tongue, I have some more problems ... I was not sure how to call these terms ... and decided not to add any translation or definition. Well: we should need a glossary about all these terms :-)
If you happen to know where we can find one, or if you are a linguist and can help us in any language: please tell us and do.

Saturday, June 10, 2006

Wordsegmentation .. for Thai

When you want to work with languages that are not familiar to you, you hear of all kinds of issues with languages that are completely new. The Thai language is one such for me; when you have a sentence in Thai, you do not have spaces between the words and consequently it is not clear to me where I could break a sentence to a new line. This is really relevant for the localisation of MediaWiki; when a text is translated, the translator does not know where it has to fit. It is therefore important to be able to know how to do this.

Luckily there is software , even GPL software that does wordsegmentation for Thai, the next thing is how do we make it available in MediaWiki and, how are we going to use the results.


Saturday, June 03, 2006

The case of Mandarin

This is not about the difference between Mandarin or mandarin. Written standardized Mandarin exists in at least two variants they are cmn-Hans and cmn Hant or in human terms traditional and simplified Mandarin. For WiktionaryZ this distinction is crucial because an Expression needs to identified as being one or the other.

For me it becomes more complicated when I find this 普通话/國語/华语 as how to identify Mandarin, I would have 普通话 and 國語 and 华语. This article on the English Wikipedia helps a bit; 普通话 is simplified, 國語 is traditonal script and 华语 is how "standard Mandarin" is called in Malaysia and Singapore (in simplified script, 華語 being the traditional variation). This was "easy" right?

According to this article, Mandarin is linguistically speaking a group of Northern Chinese dialects. The cmn code is associated with this group. This leaves me with a headache. I do not want to use zh or zho for Chinese because that is too broad as it includes languages like Wu and Min-Nan. I can use cmn for standard Mandarin and have additional suffixes to indicate both a specific dialect and its script.

Doing so means that I give the official version of a language the official tag. The problem is that I do not understand the implications when this is made a WiktionaryZ policy.


Friday, June 02, 2006

What languages to choose

I was given the impossible task to select a minimal group of languages that will be part of the first extension of languages that can add translations to WiktionaryZ. Impossible because what languages to select? What argument for selection would be "neutral and objective" and be accepted as such. What exceptions can we make that can be explained and are acceptable?

What we came with is to start with the languages that are have started the localization of their MediaWiki messages. Most of these languages are currently served really well on the BetaWiki. Some languages have not been imported yet here and the ones I know off will be included as well (for instance Thai). Others like Chinese will not be included, Mandarin will be used in stead.

For all the people who want their language included as well, the answer is obvious, at this stage we are truly at the pre-alpha stage we cannot do everything now. When you make the effort and start localizing the MediaWiki software, you provide a powerful incentive to add yet another language .. We will be happy to accommodate you ..