Monday, October 23, 2006

Dialects any one ?

When there is a need for new languages in WiktionaryZ, the current procedure is that one of the bureaucrats can and do enable languages for editing content. So far we have added languages and had special considerations for scripts and locales. Scripts are "easy" as we have in the ISO-15924 a list of what are considered scripts, examples of their use is in how we deal with for instance Hausa, Mandarin or Serbian. Locales are more problematic, but so far we have restricted ourselves to using country codes. Country codes are easy too they can be found in the ISO 3166.

Now what to do with dialects. Let me be really practical; I have received a request for several languages by Mark Williamson. His request seems reasonable to me; he indicates clearly what he wants, the languages with their dialects and their scripts. When I try to do some research I find nothing that makes his request unreasonable, it is just that I cannot really judge it.

The biggest stumbling block however is, how to name these dialects as a code. It is one thing to insist on standards but another to find that there is no apparent standard for these dialects. I have been looking around and I subscribed to a few mailinglists to do with linguistics, and I hope that this will get me an answer.

With WiktionaryZ still in "pre-alpha" mode, I have an excellent excuse not to rush into the creation of these languages, but I think it is also fair to let it be known what the issue is. It is a practical one, it is not that we do not want dialects to be part of WiktionaryZ.




Anonymous Anonymous said...

RFC 4646 describes the standard for language variant & dialect tags and the process to register them with IANA.

1:12 pm  
Blogger GerardM said...

RFC 4646 is useful up to a point. The point is that it only supports ISO-639-1 and ISO-639-2 at the moment. From a linguistic point of view, ISO-639-3 has some issues, the older versions are a disaster.

Given that IANA does not support ISO-639-3 they have little relevancy for the dialects of ISO-639-3 languages.

2:08 pm  

Post a Comment

Links to this post:

Create a Link

<< Home