I figure it's time to put out another copy of my list of 5000 words. I'm building a time travel game. One reason is to showcase and to develop the theory of artificial creativity. When players change something in the past, that changes history and the game will have to create history again form that point. My first attempt is likely to be with an artificially created world so that I get the advantages of working within a complex toy universe. Another reason for creating the game is that I want to play it.
I want the word's characters to be able to talk to the players and understand them (in text only to start with). To do this I have been working on a list of what I think are the 5000 most common words in English up to about 1910 (12M) plus about 70 more game specific words. I got the list from the Brown Corpus, from my own downloaded article base, and a few other sources. For each word I am constructing two endemes, one for semantics, the other for syntax.
An endeme is an ordered list of characteristics. Each word has the same list, but in a different order. I have chosen to call this list by an Old English word 'endemes'. Endemes means together in Old English, implying that the list of characteristics is not just a list but melds together into a new meaning which is defined by the order of the characteristics. Usually each characteristic is represented by a different letter so an endeme is a list of letters, in a different order for each instance.
The characteristics for the semantic endeme are A. Area, B. Biological, C. Change, D. Directive, E. Express, F. Feeling/Sense, G. Good, H. Human, I. Incident, J. Jump/Movement, K. Kind/Category, L. Love/Like, M. Mental, N. Newness/Create/Future, O. Object, P. Physical, Q. Quantity/Number, R. Resource/Wealth, S. System/Relate, T. Temporal, U. Ugly/Bad, V. Value/Quality/Metaphor. So the endeme for the word 'creativity' is NGECLMTFVSIRKHOBDAPQUJ.
I am also in the process of building an endeme set of characteristics for the implied syntax of words. These are at present A. Action, B. Being, C. Clause, D. Distinct, E. Every/plural, F. Future, G. General, H. Higher, I. Indirect, J. Present, K. Symbol, L. Link, M. Modifier, N. Noun, O. Ownership, P. Past, Q. Query, R. Response, S. Self, T. Target, U. User, V. Verse/Phrase. The endemes produced by this are as yet problematic so I don't have a good example.
My first attempt was simply to have one word for each part of speech. This list constructs parts of speech from pairs of characteristics This has the problem that they can mix badly when a word has more than one part of speech. For example the word clear presently has the pairs MN and AT - noun modifier and targeted action, this might imply that we could combine the A and the M to have clear be an action modifier (MA) which clearly it is not. The related action modifier is clearly not clear. It's a work in progress.
I have also included many other bits and pieces for each word. The codes and columns worksheets describe these to some extent.
When I have the list done, I will have some chance of being able to write code that allows players and artificial characters to understand each other. My intention is not to get it perfect but to get it just good enough for barely minimal understanding to happen. My meme in this part of the development is 'the perfect is the enemy of the good'.
Thank you.
July 2010
April 2010
March 2010
February 2010
January 2010
August 2009
July 2009
June 2009
May 2009
March 2009
February 2009
January 2009
December 2008
November 2008
October 2008
September 2008
August 2008
July 2008
March 2008
February 2008
January 2008
November 2007
September 2007
August 2007
July 2007
March 2007
January 2007
December 2006
November 2006
October 2006
September 2006
August 2006
July 2006
June 2006
May 2006
April 2006
March 2006
February 2006
January 2006