Friday, July 28, 2006

Definition in UI language, if present in that language

From today on WiktionaryZ has another nice feature. When your user interface is for example set to Italian and you click on any term (English, German etc.) and there is an Italian definition present you will get that one show up. Only if there is no definition in the language of the user interface you chose an English definition will show up.

Tuesday, July 25, 2006

What has THAT to do with POV

WiktionaryZ is a wiki project under development. We want all words of all languages, we want the words of all domains and, we want to collaborate with organizations both companies, universities, NGO's and governmental organizations.

This was easy.. WiktionaryZ is very much a Free/Open project. The software is Open Source, the content is available under a Free license.. Free is very much how we define ourselves. Now when Sabine finds us a resource like this one, we have an important body of computer terminology really rich in it's multilingual content. Consider; 14.000 terms in some 45 languages makes it really important. Given the information about the license we could use it.

Typically we would upload it using the name of the organization responsible for the content.. I think we should, then again some might find this problematic.


Monday, July 24, 2006

The Swadesh list

On WiktionaryZ we are now starting to build our content; we can add new Expressions and new DefinedMeanings. This proves to be an important stage in the development of WiktionaryZ; it is no longer abstract, there are now concrete examples where we need multiple DefinedMeanings like the word German where it is either a person, a man or a woman that is of the German nationality.

With all this functionality, it becomes relevant to give all the activity some focus. For us a Swadesh list is an instrument that helps us to give a clue to the differences between related languages and dialects. This is why it was a first resource to complete. The challenge will be to complete this one list for all the languages that are supported on WiktionaryZ.

There are several reasons for concentrating on a small list of 207 Expressions, it can demonstrate among other things that many concepts share the same expression. When all concepts of all expressions of all languages connected to the Swadesh list are defined as part of this project, I would expect at least a few thousand DefinedMeanings as a result.


Sunday, July 23, 2006

Adding relations ... for now impossible ...

Today I corrected a translation into Italian on WiktionaryZ - forbici ... it had to be forbice. Of course, since Italian is not my mother tongue, even if I am sure about a change I check it with other dictionaries. In that way I found several related terms which I then wanted to add as relations. Big surprise: there are only four possible relation types and all are connected to the GEMET multilingual thesaurus that was the first contents of WiktionaryZ. Therefore I could not add these related words, since they simply did not match this scheme.

Then I thought about technical terminology and how a term can be used in different domains ... well I could not even add those right now, since it would require double and triple work afterwards ... or better: consider what happens if we have some 10,000,000 entries and then, for my work I search for a term being my translation of a specific domain - for example the automotive industry. Let's say I get only 50 results for the term I search, all mixed up ... no possibilty to have them filtered by domain ... that would cost me in the worst case 50 clicks to understand if one of the terms matches or not .... considering that clicking through 50 matches, reading the definition will at least take 20 minutes (and that is even fast to my opinion) and considering a price of only 30 EUR/hour it would mean that the search of a single term for a translator costs 10 EUR ... well: that is simply too much - not having domains in WiktionaryZ for practical use means that WiktionaryZ simply would be useless.

Of course there are many other reasons for having domains and I'd say not one reason for not having them ....

I also considered to use semantic web on my wiki ... if these terms were translated with a wikidata structure without the attribution of a domain semantic web would be simply useless for a correct search.

No ... we need it ... we definitely need them asap. And yes, you who are reading this, tell us your opinion - how would domains (like: technics, medicine, biology etc. etc.) be helpful for your use.


Tuesday, July 18, 2006

What language is H2O

Everybody with some general knowledge will know that H2O is the chemical formula for water. This formula is the same in at least all the languages that I speak. Now as there are so many languages that could use the same code, it would be a folly to repeat this and all the other chemical formulae for all the languages. However, it would be equally foolish not to have them at all.

Yesterday I was pleasantly surprised by a WiktionaryZ newbie who informed me about the existence of three special ISO-639-3 codes. They are und for undetermined content, mul for multiple languages and zxx for no linguistic content. As is usual for the language codes, I created a portal page for these codes. The question is; what code to use. I would opt for the zxx code because it is not really linguistic content and it seems to be a universal standard.

I am eager to learn what you think..

Saturday, July 01, 2006

How to deal with conventions

There are conventions to indicate certain things. One such is the convention for the the name in Latin of a species; it is customary to capitalize the name of the genus an to have the whole in italics type. This would make it Homo sapiens.

What would we do in WiktionaryZ when we get data from another resource where these names are NOT treated according to the conventions. To me it would be obvious, if a convention is adopted, you do want to move data over in that direction.

Another convention that we are certain to adopt is about the spelling; an expression like MALARIA or Malaria is certainly to be replaced by malaria. In many scientific resources, the first two formats are the convention that is used for expressions. WiktionaryZ will want to maintain these forms in one way or another. One way of doing that is using the table that is currently called MisSpelling. I would not mind when the name of this table changes to become more politically correct.