Archives: March 2008

Wed Mar 19, 2008

Semantic and Syntactic Endemes for the Most Common Words in English

I figure it's time to put out another copy of my list of 5000 words. I'm building a time travel game. One reason is to showcase and to develop the theory of artificial creativity. When players change something in the past, that changes history and the game will have to create history again form that point. My first attempt is likely to be with an artificially created world so that I get the advantages of working within a complex toy universe. Another reason for creating the game is that I want to play it.

I want the word's characters to be able to talk to the players and understand them (in text only to start with). To do this I have been working on a list of what I think are the 5000 most common words in English up to about 1910 (12M) plus about 70 more game specific words. I got the list from the Brown Corpus, from my own downloaded article base, and a few other sources. For each word I am constructing two endemes, one for semantics, the other for syntax.

An endeme is an ordered list of characteristics. Each word has the same list, but in a different order. I have chosen to call this list by an Old English word 'endemes'. Endemes means together in Old English, implying that the list of characteristics is not just a list but melds together into a new meaning which is defined by the order of the characteristics. Usually each characteristic is represented by a different letter so an endeme is a list of letters, in a different order for each instance.

The characteristics for the semantic endeme are A. Area, B. Biological, C. Change, D. Directive, E. Express, F. Feeling/Sense, G. Good, H. Human, I. Incident, J. Jump/Movement, K. Kind/Category, L. Love/Like, M. Mental, N. Newness/Create/Future, O. Object, P. Physical, Q. Quantity/Number, R. Resource/Wealth, S. System/Relate, T. Temporal, U. Ugly/Bad, V. Value/Quality/Metaphor. So the endeme for the word 'creativity' is NGECLMTFVSIRKHOBDAPQUJ.

I am also in the process of building an endeme set of characteristics for the implied syntax of words. These are at present A. Action, B. Being, C. Clause, D. Distinct, E. Every/plural, F. Future, G. General, H. Higher, I. Indirect, J. Present, K. Symbol, L. Link, M. Modifier, N. Noun, O. Ownership, P. Past, Q. Query, R. Response, S. Self, T. Target, U. User, V. Verse/Phrase. The endemes produced by this are as yet problematic so I don't have a good example.

My first attempt was simply to have one word for each part of speech. This list constructs parts of speech from pairs of characteristics This has the problem that they can mix badly when a word has more than one part of speech. For example the word clear presently has the pairs MN and AT - noun modifier and targeted action, this might imply that we could combine the A and the M to have clear be an action modifier (MA) which clearly it is not. The related action modifier is clearly not clear. It's a work in progress.

I have also included many other bits and pieces for each word. The codes and columns worksheets describe these to some extent.

When I have the list done, I will have some chance of being able to write code that allows players and artificial characters to understand each other. My intention is not to get it perfect but to get it just good enough for barely minimal understanding to happen. My meme in this part of the development is 'the perfect is the enemy of the good'.

Thank you.

Posted by: Jon Grover on Mar 19, 08 | 7:35 pm | Profile

[0] comments (347 views) |  [0] Trackbacks   [0] Pingbacks