Thursday, February 16, 2006

Elements of a Great Dictionary: Words

Now that we have identified a few items that don't belong in a dictionary, let's start with the basics of what does belong: words. A great dictionary, seeking to catalog all words, must begin with words themselves, or more precisely, with lexemes.

Linguists and writers of dictionaries distinguish between words and lexemes because they are not the same. Most words are lexemes, though a few (such as lickety, hightail and handbasket) have little or no existence outside larger phrases. More commonly, a phrase may comprise a lexeme, when it has a single, idiomatic meaning. When we say in English that something is "old hat," we mean not that it is headgear of some great age, but that it is familiar and well-practiced. Phrases of this sort also need special attention to translation. By contrast, a phrase like "Greek history" merely describes history of or relating to Greece, so we need not define more than the component words, in such a case. Of course, language can be subjective, and the distinction is not always clear.

What shall we do with words hovering on the edge of a language? Most people, I think, would agree that paper, archaic, and horsefeathers are English words that belong in a great dictionary. Some words, though, are not so clear. How widely must the term ginormous, for instance, be used before it is deemed a word? What about l337? In formal writing, the term humongous is rejected as a non-word, yet most native English speakers have heard it and used it colloquially, and would agree that it means "very large". How long must words like metrosexual or astroturfing exist before they merit inclusion?

In some languages, a government-appointed academy decides what words exist, and how to spell them. A great dictionary should have some means of flagging words that are accepted in this manner. In other languages, including English, publishers of dictionaries set standards of word-worthiness by choosing what words to include in the limited space in their dictionaries. Either way, they employ some arbitrary standard, such as how long or how frequently a word has been in use.

Whether academies, publishers, or nobody prescribes standards and proper usage, no language is static. People reuse words for novel meanings, or invent new or combined words, often as slang, at first. New technologies and concepts need names. A great dictionary is inclusive, and it should reflect the shifting language, though it should caution users about questionable words. At the same time, a dictionary should not seek to introduce new words, nor words that are used only by three friends at a certain school.

There must be some basic standard, or at least guideline, by which to judge the existence of a word. Search engines offer an excellent (though by no means certain) tool for determining the how popular or widespread a word is, at least in print. The existing Wiktionaries use a combination of guidelines and consensus. In WiktionaryZ, the community surrounding each language should decide the answers for itself.

3 Comments:

Blogger GerardM said...

You suggest that a language community in WiktionaryZ will decide what the content will be for a language. This is not really practical; when particular content exists, it can be translated. This is enough to make content available in a language.

What is to be included and what not is not that easy to decide. Another point to consider is that WiktionaryZ is not only lexicological; it is also terminological and it will also include thesaurus information.

WiktionaryZ will eventually include a lot of information that is of no interest to some but very relevant to others. I expect that we will define filters for information that will be seen.

Thanks,
GerardM

5:17 pm  
Blogger Dvortygirl said...

Thanks, Gerard. I write these pieces not because I think I have the answers but because I wish to raise questions for discussion and gather comments.

What I think you're trying to say here is that you expect to have some very specialized information and that the community may not be qualified or prepared to judge the worthiness of very specific terms, such as medical ones, for instance. Do not underestimate the community! That said, relevance alone is not the question. Rare and specialized words may certainly merit inclusion.

The question I hoped to raise is, by what standard shall we judge what is rubbish? I have definitely seen no shortage of made-up words come through Wiktionary, many with matter right there in the definition stating, "this is a word my friends and I at XYZ high school made up to describe...".

How do you propose we draw the line to eliminate the patent nonsense?

9:10 pm  
Blogger GerardM said...

I know that there is much expertise in our existing communities. The point that I try to make is that no language community can deny certain categories of information.

The patently stupid are the easiest. This will be removed like we have always done.

PS I really enjoy your writing :)
GerardM

10:53 pm  

Post a Comment

<< Home