1 Overview of the Project
There is no one-to-one mapping from syntactic relationships to semantic relationships in English sentences. A sentence with a main clause and a subordinate clause does not always express a causal relationship between two events. The subject of a verb does not always represent the instigator of an act. An adjective does not always indicate a physical property of some entity. Yet there is a mapping from syntactic relationships to semantic relationships; some part of the meaning of a sentence is reflected in its surface-syntactic form.
This dissertation is about finding links between syntactic and semantic relationships. In particular it is about defining sets of semantic relationships that can be expressed by syntactically related sentence elements. It is about cataloguing clues that hint at the syntax-to-semantics mapping and using those clues in a program to uncover semantic relationships where they exist in sentences. It is also about recognizing when the clues are insufficient. When this occurs, linguistic facts lurking beneath the surface become necessary; they can be learned from user assistance to make uncovering subsequent semantic relationships easier and more accurate.
1.1 Automated Semantic Analysis
Automated semantic analysis of texts involves such processes as determining word senses and referents, recognizing modification relations and semantic roles, interpreting quantification, and many others. Whatever the processes involved, automated semantic analysis usually attempts to uncover components of the meaning of a fragment of text beyond its surface-syntactic structure. Consider sentences (1) and (2).
(1) Ed broke the thumb on Joe's left hand when he swung the hammer off-target.(2) Ed swung a heavy metal hammer askew and broke Joe's left hand thumb.
There are many syntactic and lexical differences between (1) and (2). Yet the events and participating entities are similar enough that if both sentences occurred together in a text, one of them might be considered redundant. Semantic analysis should produce structures that reveal some of the underlying similarities of meaning in the sentences.
1.1.1 Oracles
Automated semantic analysis does not necessarily imply full automation or autonomy. Often some outside source of knowledge is required to inform analysis. This oracle may be a large, comprehensive knowledge base, a corpus of analyzed text or a human user--as in this project.
1.1.2 Entities and Events
Since this dissertation deals with semantic relationships marked by syntactic forms, at times I will find it necessary to talk about the semantic things that syntactic elements refer to.
Frawley (1993: xiv) identifies certain "basic objects in a model of the world: things, actions, and the relation of things to actions." Things more formally are entities: "relatively stable and atemporal discourse, ontological, and conceptual phenomena" (p. 68) and they usually surface as nouns. Actions more formally are events: "relatively temporal relation[s] in conceptual space" (p. 144) and they usually surface as verbs.1 Finally, the relation of things to actions is accounted for by thematic roles (cases): "grammatically relevant relations between predicates (often events) and arguments (often entities)" (p. 199).
Borrowing liberally from Frawley, then, I will use the term entity to refer to the semantic thing represented in syntax as a noun. I will also refer to the semantically complex thing represented in syntax by a noun phrase as an entity. Modifiers of a noun in a noun phrase serve to ground it, but the complex noun phrase still refers to a single entity. Using this terminology, both (3) and (4) are entities.
(3) cat(4) the fat cat on the porch
Similarly, I will use the term event to refer to the semantic thing represented in syntax as a verb. I will also refer to the semantically complex thing represented in syntax by a clause as an event. Both (5) and (6) are events.
(5) snooze(6) the fat cat on the porch snoozes all day long
1.2 Goals of the Project
The semantic relationships in the sentences of a text form a model of that text. With the present state of the art, fully automatic construction of such a model is not feasible without a knowledge base. A fully manual construction of the model is feasible, but onerous. My thesis is that it is possible to acquire the model interactively without demanding of a user as much as a large knowledge engineering effort. This claim can be tested through the following specific project goals. The extent to which the goals have been met will be investigated in chapter 6.
1.3 Semantic Analysis in
Semantic analysis in
At each level
Clause level relationships (CLRs) are semantic relationships between events represented syntactically by syndetically connected finite clauses. The CLR analyzer recognizes these relationships in sentences by considering the lexical items that mark them (coordinators, correlatives, subordinators) and the syntactic features of the connected clauses.
Case Relationships
Cases are semantic relationships that express the roles of the participants and the circumstances of an event. The case analyzer recognizes cases in finite clauses by considering the syntactic or lexical items that indicate verb arguments (subject, objects, prepositional phrases, adverbials).
Noun Modifier Relationships
Noun modifier relationships (NMRs) are semantic relationships that define entities. They are expressed within noun phrases as modification. The NMR analyzer recognizes relationships expressed by adjectives, premodifying nouns and postmodifying prepositional phrases and appositives.
1.3.3 Background Knowledge
Both
These examples are knowledge of language in general. Such
knowledge is closely tied to the surface syntax and does not
vary from domain to domain or from text to text. It is
independent in the sense that each linguistic fact can be
encoded to assist the semantic analysis task without
requiring that further knowledge (such as lexical semantics)
be added for the fact to be useful.2 Furthermore, the
linguistic knowledge in
The sets of semantic relationships recognized by
The semantic knowledge allowed in
The techniques used by
In the absence of other kinds of semantic knowledge,
1.4 Evaluation
A significant part of this dissertation is devoted to
the evaluation of
There has been increased interest in the evaluation of natural language processing (NLP) tools over the last few years. The message understanding conferences (MUCs) are a direct reaction to a lack of standardization in the evaluation of such tools. The MUC competitions compare the performance of different systems on some uniform, predetermined text processing task. The motivation for and background of the MUC competitions is described in Grishman & Sundheim (1996).
Although these competitions have successfully emphasized the importance of evaluation for the NLP community, they have also revealed certain limitations. In particular, researchers have noted that the predefined tasks in evaluation competitions result in applications that are designed to score well, but are not portable (see MUC-6 1996). Furthermore, little attention has been paid to the evaluation of interactive systems and the role of users (Hirschman & Thompson 1996).
To address these issues, Sparck Jones (1994) and Sparck Jones & Galliers (1996) offer more general strategies for evaluating generic NLP systems: "there is far too much variety in the situations and subjects of evaluation to come up with a definite [evaluation] scenario" (Sparck Jones & Galliers 1996: 193).
Hirschman & Thompson (1996) distinguish two kinds of evaluations--diagnostic evaluations and performance evaluations--that make different assumptions about the distribution of test data. A diagnostic evaluation tests a system on all of the linguistic phenomena that it is designed to handle. Lehmann et al. (1996) describe the construction of an exhaustive test suite for diagnostic evaluations of natural language processing systems. The distribution of linguistic phenomena in the test suite of a diagnostic evaluation is almost certainly not the same as the distribution of phenomena in complete texts, which form the test suite for a performance evaluation. For performance evaluations it is important to identify criteria (what is being evaluated), measures (what properties of performance reflect the criteria) and methods (how the measures are analyzed to arrive at an evaluation of the criteria).
The components of
The first two criteria are objective and can be measured directly. Coverage is objective, but cannot always be measured directly, since the number of semantic relationships in an entire text is not always easy to determine. I will use sampling when direct measurement is not feasible (see sections 3.6.1 and 4.7.2). The fourth criterion is mostly subjective, but its evaluation will take into account quantitative measurements of user participation (see sections 1.4.3 and 1.4.4).
1.4.1 Test Texts
The clouds experiment was a performance evaluation
summarized in Barker & Delisle (1996). The text for the
experiment was the Junior Science Book of Rain, Hail,
Sleet & Snow (Larrick 1961). The clouds text has
fairly simple syntax and was chosen to ensure a high number
of correct parses. The belief was that the higher parse
success rate would allow the system to recognize more
semantic relationships, since
Results of the small engines experiment appear in
Barker et al. (1998). The Mechanics of Small
Engines (Atkinson 1990) has more complex syntax,
resulting in fewer correct parse trees available for
The building code experiment used sentences from the Ontario Building Code (Ontario Ministry of Housing 1991) in a diagnostic evaluation of the CLR analyzer. The building code text contains thousands of complex sentences with a variety of conjunctions connecting clauses and a variety of syntactic verb phrase features.
The sparc experiment was a performance evaluation of components of the NMR analyzer on noun phrases from the SPARCstation 1 Installation Guide (Chan 1989).
1.4.2 Parser Evaluation
For the clouds and small engines
experiments, the quality of
The clouds experiment applied
In the small engines experiment, 55% of the 557
sentences received a complete parse. Due to the much more
complex syntax, only 31% of the parses were perfect and
another 9% were good enough not to affect semantic analysis.
That means that for 60% of the sentences in the small
engines text, parse errors were serious enough to
prevent
The individual evaluations of CLR analysis (section 2.6),
case analysis (section 3.6) and NMR analysis (section 4.7)
investigate the extent to which
The evaluation of
The first type of user action is accept:
The second type of action is choose:
The third action is supply: either
In certain instances it will be convenient to compare the number of supply actions to the sum of the other two types. In those comparisons I will use the term system assignments to refer to semantic relationships assigned as a result of a choose or accept action and user assignments to refer to relationships assigned as a result of a supply action.
1.4.4 User Burden
For each user interaction in the clouds and small engines experiments, the degree of difficulty of interaction (referred to as onus) was recorded as an integer from 0 to 3. 0 means that the interaction is trivial. 1 is assigned to an interaction that requires a few moments of reflection. 2 rates an interaction as requiring serious thought, but eventually a semantic relationship is assigned. 3 means that even after much contemplation no relationship is deemed appropriate for the given input.
User burden is also reflected in the amount of time required to oversee the analysis of a text. The average number of tokens (words or punctuation) in the small engines sentences was 15.4. The average number of CLRs per sentence was 0.04, while the average number of case patterns was 1.05 and the average number of NMRs was 1.59. On average, we spent 1 minute, 49 seconds on each sentence. By coincidence, the average user time for the clouds experiment was also 1 minute, 49 seconds for an average of 0.10 CLRs and 0.86 case patterns per sentence (NMRs were not evaluated in the clouds experiment).
1.5 Applications
Information extraction from text often involves the identification of essential participants and circumstances in acts. The participants and circumstances can be represented directly as cases. More specific knowledge of the participants involves recovery of the relationships within noun phrases (NMRs).
Template filling is a kind of information extraction. A template has a fixed set of slots to be filled with specific information from the text. For events, these slots are often analogous to cases (with names such as Agent, Instrument, Location, etc.) and sometimes also relate events to other events (comparable to recognizing CLRs). For entities the slots often correspond to hypernyms, subcomponents, purposes and properties--knowledge of the kind that is captured by NMR analysis.
It is often claimed that case is a universal phenomenon across natural languages (Fillmore 1968). Research has identified strikingly similar lists of cases for many human languages (see Campe 1994 for a multilingual bibliography of case). The semantic roles in a source language clause could be used as part of an interlingua in machine translation.
Question answering systems are often faced with questions about properties of entities and the participants and circumstances of events.
The construction of semantic lexicons has become a popular area of research in natural language engineering. Noun entries in these lexicons often identify properties of entities, hypernyms, subcomponents, as well as the events in which the entity expressed by the noun participates.
1.6 Organization of the Dissertation
The five goals of section 1.2 apply to all three levels
of processing in
Clause level relationship analysis, since it deals with the largest (and most intricate) parse tree fragments, has access to more linguistic evidence for recognizing relationships. The connection between syntax and semantics is stressed more in chapter 2 than in the other chapters, simply because there is more syntax at that level than the others.
Case analysis is not part of the contribution of this
project, since the techniques have already been described in
Delisle (1994) and Delisle et al. (1996). What
is part of this project is the process of
constructing the set of cases and the evaluation of their
coverage.
There is little surface-linguistic evidence to guide the assignment of noun modifier relationships. The NMRs themselves have not been tested for coverage to the extent that the cases have been. What is unique about the NMR analyzer is its use of partial matching on a growing base of previous instances, and the attention paid to elements of the user interface. Chapter 4 has an emphasis on learning and user interaction that does not occupy as prominent a place in the other chapters.
Chapter 5 explores reasonable extensions to the project along with bolder departures. Chapter 6 summarizes the project and evidence that its goals have been met. Three appendices contain samples of the clause level relationship marker dictionary (see section 2.4), the case marker dictionary (section 3.3.1) and the noun modifier relationship marker dictionary (4.5). Literature review has been parcelled out to sections 2.1, 3.1 and 4.1. There is an abundance of interesting work related indirectly to each of the three main areas of the project, but less that is directly applicable.
1.6.1 Paper MapParts of the dissertation have already been the subject of technical reports and papers:
| topic | dissertation sections | previously described in |
| CLR Analysis | 2.1-2.4, 2.5.1-2.5.4, 2.6.1, 2.6.2 | Barker (1994), Barker & Delisle (1996), Barker & Szpakowicz (1995), Barker et al. (1998) |
| Cases | 3.1-3.4, 3.6.2 | Barker (1996), Barker & Delisle (1996), Barker et al. (1997) |
| Case Analysis | 3.5, 3.6.1 | Barker & Delisle (1996), Barker et al. (1998), Delisle et al. (1993), Delisle et al. (1996) |
| NMR Analysis | 4.1, 4.2, 4.4-4.6, 4.7.2 | Barker (1997), Barker & Szpakowicz (1998), Barker et al. (1998) |
| Bracketing | 4.3, 4.7.1 | Barker (1997), Barker (1998), Barker et al. (1998) |
| Evaluation | 2.6.1, 2.6.2, 3.6, 4.7 | Barker & Delisle (1996), Barker et al. (1997), Barker et al. (1998) |
2 An example of linguistic/semantic knowledge that does not have this property is the knowledge that nouns denoting animate objects are more likely instigators of acts. That knowledge would require that nouns in a lexicon be tagged as ±animate.