Defining General Concepts
The component
library is intended to include knowledge base nodes for
general, domain-independent concepts. The user of the library will
build nodes for more complex, domain-specific concepts by composing
existing components. Fundamental questions, then, in building the
library are:
- what are the general, domain-independent concepts?
- how do we make sure that the library has broad coverage?
- how do we know our primitive concepts are the "right" ones?
Choosing general concepts through introspection is one way to answer
question #1, but perhaps not the best way to ensure satisfactory
answers to #2 and #3. Fortunately, there are existing lexical-semantic
resources that can be used for inspiration. Both dictionaries and
thesauri often group words into general categories. For example,
the Longman Dictionary of Contemporary English (aka LDOCE:
Summers 1987) uses a "defining vocabulary" of about 2,000 words. All
definitions in the dictionary are supposed to ground out eventually to
the defining vocabulary. WordNet (Miller 1991) groups semantically
similar words into "synsets", which are themselves linked
hierarchically. Roget's Thesaurus (see, for example, Lloyd 1982)
divides the universe into six classes. Each class is
subdivided into multiple divisions and sections,
themselves further subdivided. The one thousand leaves in Roget's
tree contain semantically related words (not quite synonyms), one of
which is chosen as the representative for the group: the
headword.
The LDOCE defining vocabulary, a horizontal slice of the WordNet
hierarchy, the Roget headwords: each of these could be used as a list
of general concepts, or as inspiration for an original list. (See Rick
Harrison's "Vital English Vocabulary" (Harrison 1997) for a similar
experiment). None of these sources is perfect: LDOCE is not semantically
motivated and is apparently not without circular references; WordNet
has much less coverage than the others, especially among the non-nouns;
Roget is also somewhat arbitrary, and obviously influenced by his
culture.
Using an established lexical resource not only identifies general
concepts, but goes a long way to ensuring broad coverage (question
#2). It can be argued (but won't be here by me) that coverage of
English words implies coverage of general knowledge.
Question #3 is best avoided. Let's argue that there are no "right"
primitive concepts. Our concepts will be validated by the ease with
which the user of the library can compose other, more complex concepts
using them.
I have chosen to experiment with Roget's Thesaurus as a source for a
list of general actions. The actions were chosen by going
through Roget's headword-by-headword, looking specifically at verb
paragraphs (see my description of the format of
Roget's, including the upper level, non-leaf nodes). For each
headword, I chose one representative verb (usually). These verbs are
grouped in categories that correspond roughly to mid-level Roget
categories. The list still needs some cleaning, and its ability to
capture domain-specific actions is still uncertain.
I have also put Longman's defining
vocabulary online.
References
HARRISON, RICK (1997). "Vital English Vocabulary".
http://www.rick.harrison.net/annex/vitaleng.html
LLOYD, SUSAN M., (1982). Roget's Thesaurus. Essex: Longman.
MILLER, GEORGE A., ed. (1990). "WordNet: An On-Line Lexical Database."
International Journal of Lexicography 3(4).
SUMMERS, DELLA, ed. (1987). Longman Dictionary of Contemporary
English: New Edition. Essex: Longman.
The List of General Actions
Here is a first kick at the cat at deriving a list of general
actions from the public domain Roget's Thesaurus (based on the 1911
Thomas Crowell edition). The list still needs some cleaning. For
example, there are some duplicates (referring to different senses
of the same verb) that should be renamed. Furthermore, I tried to
err on the side of "Recall": at this stage it is better to have too
many specific verbs than to miss some important general ones.
- STATE
- be, become, appear, be in a state of, etc. #1
- disappear #2
- die #2
- ORDER
- TIME
- CHANGE
- CAUSE
- SPACE
- DIMENSION
- FORM
- MOTION
- MATTER
matter verbs in Roget tend to = state or change state
or cause to be in state
- SENSE
- INTELLECT
for the most part, too complex
- COMMUNICATION
- VOLITION
- POSSESSION
- AFFECTIONS
I'm going to leave out affections... too messy