Monday, September 11, 2006

WiktionaryZ and Machine Translation

Most of you probably know that Jeffrey V. Merkey is using machine translation to translate the complete English wikipedia to Cherokee and that the output is quite good so that only 5% of it need real corrections.

He agreed to give us the used wordlist under GFDL and so we added it to WiktionaryZ.

Yesterday I tried to work with it and immediately I found some limits for its contents. This is not because his wordlist is wrong, but just because it is so personalized that for other people it becomes somewhat difficult to understand which meaning has been translated to Cherokee and wich meaning than accordingly needs to be translated to other languages, in my case Neapolitan.

I did an estimate of how many words I have to translate to get the work done in a month: 233 words ... that is really a lot ... and I suppose that doing things correctly - I mean creating the English word, adding the defined meanings and translate them into Neapolitan will take me quite a long time and probably I will not be able to do more than an average of 10 words a day.

Now you are asking yourself why I take such a long time ... well: let's say an English word has averagely 3 meanings (well, that's a wish ... there will be words with many more meanings). Then: I am not used to translate from English to Neapolitan - already Italian to Neapolitan is quite difficult. The only dictionary I was able to find here is Neapolitan to Italian ... I did not manage to go to Naples by now and therefore could not try to get one for Italian to Neapolitan there. This means that I have at least to translate from English to Italian - well, then it makes sense to add German as well since I know most translations in German - and then, if I don't know the translation from Neapolitan to Italian I have to guess how it is written in Neapolitan and find poof of it in the dictionary :-) funny right? Now people say: well you speak quite a good part of your day Neapolitan, so why do you need to look it up ... speaking is different from translating ... also writing is different from translating.

The next thing is: Wikipedia articles are of specific domains ... well: we do not have a differentiation of specific domains by now - but when translating for some languages we will need that, because the same English word, according to its use in a specific domain will have different translations.

Considering these points I suppose that we will have approx. 21,000 words in the foreign language starting with the approx. 7,000 words in English.

It will be fun :-)


Blogger GerardM said...

It would be great if Jeff would have his content including definitions at WiktionaryZ .. it would make things easier.

3:54 pm  

Post a Comment

<< Home