Wednesday, March 08, 2006

About sorting or is it collating ?

When you have a dictionary, you expect words to be sorted in such a way that you can find them. Sorting is obvious right ? You use the alphabet and that is the end of it. According the article about the character IJ, the character is to be found in between the X and the Z together with the Y. The best Dutch dictionary Van Dale, has the ij sorted in the i range.

The consequence is that how to sort is not obvious. It does not follow from knowing the alphabet for a language and yes, the Dutch alphabet is different from the French or the English alphabet in the same way as the Farsi alphabet is different from the Arabic, the ....

In my personal blog, I wrote about what I would have a programmer do when I had money to spare. I mentioned this to Gangleri and jested that I would expect his choice to be working on right to left issues and sorting issues. I am really happy that he took it as a challenge and he made me really happy with the template wikivar. I do not understand all the ins and outs of it, but what I do understand is that he is asking people to help with defining the sorting order that makes sense for that resource, for that language.

One thing is, this sorting order only makes sense when you know the language the articles are in. For WiktionaryZ this will be obvious; WiktionaryZ will be language aware. For the current Wiktionary projects this is not possible; Dutch words, German, English and Farsi words are all together to be sorted. The current search routine is not that great, it does not allow for case insensitive sorting a wish that many would really like to see realized.

With [[Multilingual MediaWiki]] it will be possible to make the traditional wiktionaries language aware. This in turn will allow for the sorting of words according to how it is done for a language, for a locale. The question that I have would be; do people have the stomach to go for this. The good news is, that most wiktionaries become more and more structured. This makes it feasible for bots to do a good job.

The irony when it happens, the technology will be courtesy of the WiktionaryZ project. Then again, given its definition of success, it would be considered a success :)



